Challenges Involved in Multimaster Replication

Challenges Involved in Multimaster Replication

Brian Keating
Database Specialists, Inc.
http://www.dbspecialists.com

Introduction

This document consists of two main sections. The first section provides a basic overview of the concepts of multimaster replication. The second section describes some very specific challenges that have been observed with multimaster replication, as well as solutions for those challenges.

General Overview of Multimaster Replication

Multimaster replication is a utility that allows data in multiple databases to be automatically kept in sync. For example, in a multimaster replication system, if a row gets inserted into one of the databases in the system, that row will be automatically propagated to all of the other databases in that system. Updates and deletes to the data in any of the databases will be propagated in the same way.

A multimaster replication environment is set up by configuring databases to be part of a “replication group”. One of the databases in the group is defined as the “master definition site,” and all of the other databases in the group are classified as “master sites.” The main difference between the two types of sites is that most of the replication administration commands must be invoked from the master definition site.

There are two basic ways that transactions get propagated to remote databases—“synchronously” and “asynchronously”. Synchronous replication occurs by causing each transaction to be applied to all the master sites in a group immediately. The way this is achieved is by using Oracle’s two- phase commit functionality, to ensure that all of the databases in question can apply a given transaction. If any of the sites in the group cannot accept the transaction (such as because the site’s database has crashed, or the network connection to a database is down) then none of the master sites in the replication group will be able to accept the transaction—the transaction will not be able to take place.

The way asynchronous replication works is that all the transactions that occur on a site are temporarily placed in a buffer, called the “deferred transaction queue,” or deftran queue. Periodically, such as once per minute, all of the transactions in a site’s deftran queue get sent to all of the other sites, by “push” jobs. These jobs get created by calling the “schedule_push” procedure. Finally, the transactions in a deftran queue that have already been sent to other sites must be periodically purged, to prevent the deftran queue from growing too large.

The vast majority of customer sites that use multimaster replication use asynchronous replication rather than synchronous. One of the reasons for this is that asynchronous replication has been available for a much longer time; the initial versions of multimaster replication only allowed for asynchronous propagation. The main reason that asynchronous is used, though, is because it has many advantages over synchronous.

First of all, asynchronous replication uses much less network bandwidth and provides higher performance than synchronous replication. The primary reason for this is that it is more efficient to store multiple transactions and then propagate them all as a group, rather than to propagate each transaction separately.

This is particularly important when the sites in question are very far apart geographically (such as having one site in San Francisco and another in New York). Another reason for these bandwidth and performance improvements is that there is much more overhead associated with synchronous replication because each and every transaction requires that separate connections be established to all of the other sites in the replication group. With asynchronous replication, fewer connections need to be established, since transactions are propagated as a group.

Finally, starting with Oracle 8.0, it is possible to use “parallel propagation”(i.e. propagating more than one transaction at a time) with asynchronous replication. This can improve replication performance significantly. Parallel propagation is not applicable to synchronous replication, because the “degree of parallelism” for propagation is defined in the schedule_push procedure (which synchronous replication doesn’t use).

The biggest advantage of asynchronous replication, though, is that it provides for high availability of the replication group. With asynchronous replication, if one of the sites in the replication group crashes, all of the other sites will still be able to accept updates—the transactions that are made on the remaining sites will just “stack up” in those sites’ deftran queues until the down site becomes available.

On the other hand, with synchronous replication, if any one of the sites becomes unavailable (such as because of a database crash or a network failure) then none of the sites will be updatable. This is because with synchronous replication, each and every transaction must be able to be immediately applied to all of the sites in the replication group, and of course if a site is unreachable no transactions will be able to be applied to it. This means that not only does synchronous replication not provide any higher database availability, it can actually provide lower availability than using a single database!

Despite all of the advantages that asynchronous replication provides, it has one major drawback: Asynchronous replication opens up the possibility for data conflicts. This is described in detail in the next section.

Challenges Associated With Multimaster Replication

In this section we will discuss some of the thornier issues of multimaster replication, and some possible solutions. In particular, we will look at conflict resolution, using extremely short push intervals, the delay_seconds parameter, how to monitor a replication system, issues with adding new sites to a replication group, applying DDL to replicated objects, using multiple replication groups, and the “quick” purge method.

Conflict Possibilities in Asynchronous Replication

Because of the store-and-forward nature of asynchronous replication, the possibility exists for inconsistent data changes, or “conflicts” to occur. There are three basic types of conflicts, and the easiest way to explain how they work is to provide examples.

An “update” conflict occurs when the same row is updated on two (or more) different databases before either one of those updates can be propagated to the other databases. For example, let’s say that in a replication group propagation is scheduled for once per minute, and there are two databases in the group—database A and database B. Let’s also say that one particular row (in both databases) has a value of 10 for one of its columns.

An example of an update conflict would be as follows: At 09:00:04 AM, a user connects to database A and updates that row’s value to 20. At 09:00:23 AM (before the change on database A has been propagated to database B) a user connects to database B and updates the row’s value to 30. When Oracle tries to propagate these changes (at 09:01:00 AM) it will generate an “update conflict” error, because the same row has been updated multiple times—and therefore Oracle will not know which value to use for that row: 20 or 30.

A “uniqueness” conflict occurs when propagation of data would cause two different rows to have the same value for a column, when that column has a unique constraint on it. For example, assume that you are using the replication group described above. Also assume that one of the replicated tables in the database has a column called user_login_name, and that column has a unique constraint on it.

A uniqueness conflict could occur as follows: At 11:00:12 AM, a user called “John Smith” logs in to database A and inserts a row with the user_login_name column set to “JOHNS”. At 11:00:42, a user called “John Stein” logs in to database B and inserts a row with the user_login_name column also set to “JOHNS”. At 11:01:00, when Oracle tries to propagate those rows, it will generate a uniqueness conflict error because two different rows have the same value for a column that has a uniqueness constraint.

A “delete” conflict occurs when one or more rows get deleted from one database, but when Oracle tries to propagate the deletes it cannot find all of the rows in the remote databases. This can either be because those rows have been deleted already or because values in those rows have been updated.

For example, assume that you have the same replication group as described above. Also assume that there is one particular row in a replicated table that has a value of “ABC123” for the column called “ID_value”. A delete conflict could occur as follows: at 02:00:13 PM, a user logs in to database A and deletes that row, with the command delete from where ID_value = ‘ABC123’. At 02:00:41, a user logs in to database B and updates that row so that the ID_value column gets set to a value of ”XYZ789”. When Oracle tries to propagate the delete statement, it will generate a delete conflict because the row that the statement tries to delete no longer exists (because the ID_value column of that row has been updated).

Because there is a possibility of data conflicts, Oracle has provided “conflict resolution handlers” to try to resolve data conflicts. In general, these conflict resolution handlers work well for resolving update conflicts, but they do not work well with uniqueness conflicts (and there are no handlers provided for delete conflicts). It is also possible for you to write your own conflict resolution handlers.

For update conflicts there is a wide variety of conflict resolution handlers available, some of which are more useful than others. One of the most straightforward of these handlers is called “site priority.” With the site priority handler, you can specify that updates to one master site should take precedence over updates to another master site. In the update conflict example above, this handler would allow you to specify (for example) that database A has a higher site priority than database B. If the site priority conflict resolution handler were set, then the update conflict error described above would not be generated. Since the site priority handler is set, Oracle would know which update to accept—it would use the value of 20, because database A has a higher priority than database B.

Another, more bizarre update conflict resolution handler is called “average.” As the name implies, this handler takes the average value of the columns that have been updated. In the update conflict example above, this handler would cause Oracle to update the rows in both databases to 25, since that is the average of 20 and 30.

One of the more useful update conflict resolution handlers is called “latest timestamp.” This handler allows you to specify that when an update conflict occurs, Oracle should apply the update that occurred the most recently. In the update conflict above, Oracle would apply the value of 30, since the update to 30 occurred later in time (more recently) than the update to 20.

There are a couple of things to keep in mind with this handler, however. First, in order to use this handler each table that will use it needs to have a “timestamp” column. Additionally, that column needs to be updated with the current time every time an update occurs on a row in that table. (Triggers can be created for this purpose.)

Another thing to keep in mind is that if you have databases that reside in different geographic time zones, then all of those databases will need to be configured to use one time zone (such as Greenwich Mean Time). If you don’t do this, then if you had one database in New York and another one in San Francisco, the timestamps of the database in New York will always be three hours ahead of the timestamps in the San Francisco database. This would mean that New York’s updates will likely always be applied in a conflict, even if they actually occurred earlier in absolute time.

There are only two conflict resolution methods available for uniqueness conflicts. You can either cause the value that has the uniqueness constraint to be modified in the remote database, or you can simply discard the row entirely, so that it doesn’t get inserted into the remote database at all. Neither of these methods will result in the data being consistent across multiple databases. In other words, even after using one of these methods, there will still be differences in the data in one database as compared to the data in another database.

It would seem that the whole point of multimaster replication is to allow the same set of data to be accessed from multiple databases, so neither one of these methods seems very useful. So, instead of allowing uniqueness conflicts to happen and then trying to resolve them, you may want to set up your application to avoid such conflicts altogether. The way that you avoid these conflicts, though, depends on if the values in question are automatically generated by the application or if they are entered manually by an end user.

For values that are generated by an application, there are a couple of relatively straightforward ways to avoid uniqueness conflicts. The most straightforward of these methods involves the use of sequences. For example, let’s say that your application automatically generates unique numeric user ID values for new user accounts.

One way of avoiding the possibility of user ID value conflicts would be to cause the application to generate those values from sequences, and to ensure that the sequences in each database will use values that are not used by the sequences in any other database. In the replication group mentioned above, you could create a sequence in database A that begins with the number 1 and increments by 10 each time, and you could create a sequence in database B that starts with 2 and increments by 10 each time.

This would mean that all of the numbers generated by the sequence in database A would end with the number 1, and all of the numbers from the sequence in database B would end with the number 2. So, if the application generates its user ID values from values that are contained in the databases’ sequences, there would be no possibility of the same user ID being generated on both databases.

It is not very easy to avoid uniqueness conflicts on values that are entered by an end user, such as user login names. The primary reason it is so difficult is because user-entered values (by definition) are visible to an end user, so forcing those values to become unique will cause some disruption to users. For example, you could force user login names to become unique by adding on a database-specific string to a user’s login name.

For example, if one of your databases is located in New York, you could add on the string “NY1” to any user login name that is created in the New York database. A database that is located in San Francisco could have “SF1” added to its user login names. So, in the uniqueness conflict example above, let’s say that database A is located in San Francisco, and database B is located in New York. That would mean that when John Smith creates his login name, it would actually become “JOHNSSF1”, and when John Stein creates his login name, it would become “JOHNSNY1”. Of course, the application will then have to tell both of those users that their login names are now JOHNSSF1 and JOHNSNY1, respectively.

That solution works fine for avoiding uniqueness conflicts. However, as you might imagine, many users are not going to like the fact that their login names are not completely under their control. They will probably ask themselves, “Why do I have to have this arbitrary string tacked on to my login name? Why can’t I just use the login name that I typed in?” Of course, causing end users to become annoyed is generally bad for business, so this method is probably not a viable solution in most situations.

Another, more exotic way of ensuring unique login names is to have the complete user login name consist of two separate columns—one of which contains the user-entered data (such as JOHNS), and the other one contains a unique database identifier (such as SF1). These two columns would then be combined into a concatenated unique index.

This method would ensure that the same login name would not be created on two different databases, and would also allow users to use the login name that they chose. However, this method would force the end users to supply their database identifiers as well as their login names when they log in. The application could then determine which of the “JOHNS” accounts is trying to log in. At some client sites, it has been mentioned that forcing end users to supply additional information (other than their login names and passwords) is completely unacceptable; end users should not be required to remember additional, arbitrary data.End users should not have to worry about what database their accounts were created on; that kind of information should be transparent to them. Therefore, this method may also not be a viable solution, depending upon an individual client’s requirements.

As mentioned previously, there are no conflict resolution handlers provided for delete conflicts. So if it is possible to run delete statements on multiple databases concurrently in an asynchronous replication group, sooner or later you could end up with inconsistent data. One approach for avoiding delete conflicts involves designing the application so that it doesn’t actually issue delete statements. When a user tries to delete a row, the application will not actually delete that row, but rather “mark” the row for deletion.

This method also requires that the application be designed tofilter out rows that are marked for deletion, so that it looks to the end user like those rows have actually been deleted. Another thing that this method requires is that each and every table that uses asynchronous replication must be set up with a new column which will be used to keep track of rows that are marked for deletion. Finally, this method requires creating stored procedures that purge all of the marked rows, and creating database jobs to call those stored procedures periodically on each database in the replication group. Needless to say, avoiding delete conflicts will require major design changes in the application.

There is another approach to avoiding all types of conflicts—update, uniqueness and delete. This method involves configuring the application in question so that it only writes to one database at any given time—all of the other databases will only be used for reads. Since only one database is getting updated, there will be no possibility of any type of data conflict.

This approach also requires that the application be designed so that if the “writable” database crashes, then the application will automatically start sending write requests to another database. Otherwise, the one writable database will end up being a single point of failure in the environment. Of course, one significant drawback to this method is that it requires major changes in the design of the application, and that is not something that database administrators usually have control over.

The conclusion to this discussion about data conflicts in asynchronous replication is as follows:

If you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily “added on” after the fact.

Extremely Short Push Intervals

Another problem that has been observed with asynchronous replication is that setting push jobs to run at very short intervals can use up an excessive amount of system resources. This is because starting and stopping a push job uses up a fair amount of CPU cycles, and obviously the more often that has to be done, the harder the CPU has to work. When push jobs are configured with extremely short intervals, it seems like this can actually cause the CPU to fall behind. That is, there will not be enough cycles available to start and stop the push jobs at the desired rate, let alone to do the actual pushing of transactions.

In a situation like that, what will happen is it will take longer and longer for transactions to get propagated to other sites. For example, I’ve observed that when the schedule_push procedure is configured with 1 or 2 second intervals, the propagation latency of transactions increases over time. It starts out at about 5 seconds, then goes to 7 seconds, then to 10, etc. The solution to this problem is not to go overboard when setting push intervals—an interval of one minute is the shortest interval that I would recommend.

The delay_seconds Parameter

As mentioned earlier, asynchronous replication uses push jobs to propagate transactions to other sites. This job normally ends immediately after it finishes propagating the transactions that are currently in the deftran queue. However, one of the parameters in the schedule_push procedure, delay_seconds, allows you to cause push jobs to remain active for a set amount of time, even after all of the transactions have been propagated. According to several replication documents on Oracle’s web site, this parameter can also be used to cause asynchronous replication to simulate synchronous propagation.

What that suggests is that if you use the delay_seconds parameter, the push job will constantly look for new transactions to propagate for as long as the job remains active. If that were the case, then as soon as a new transaction entered the deftran queue, it would be immediately propagated by the push job. However, that is not how this parameter works. What actually happens with the parameter is that after the push job completes propagating its transactions, it will then go to sleep (not pushing any transactions) for a set period of time. It will then wake up, propagate any new transactions that have entered the queue, and then go back to sleep again.

As long as new transactions keep entering the queue, then this sleep/wake up process will continue indefinitely. If no new transactions enter the queue for the entire amount of time specified in the delay_seconds parameter, then the push job will end. The amount of time that the job will sleep is determined by the value of the delay_seconds parameter—if the value is 60 seconds or more, then the sleep interval is 60 seconds. If the value is less than 60 seconds, then the sleep interval is the same as the value of delay_seconds.

If that were the only problem with the delay_seconds parameter, then it would still be a worthwhile feature of asynchronous replication because as I’ve mentioned above, starting and stopping push jobs uses up a lot of system resources. However, there are other problems associated with delay_seconds. One of them is that before a normal shutdown can occur on a database that is part of a replication group, Oracle has to wait for all of the replication jobs that the database runs (such as push jobs) to complete.

If a push job has delay_seconds set, then Oracle will have to wait for the entire delay_seconds value before that job will end. That’s not a big problem if you have a delay_seconds value of 30 seconds However, it is a problem if you have it set to the value that several Oracle documents suggest—500,000 seconds—almost 6 days! A shutdown abort will get around this problem, but obviously shutdown aborts should be avoided if possible.

A bigger problem with delay_seconds is that under some circumstances, it seems to cause push jobs to get “stuck” holding JQ (job queue) locks. In particular, this problem seems prevalent when both the interval and the delay_seconds parameters are set to relatively low numbers, such as with the interval set to 10 seconds and the delay_seconds set to 5 seconds. I’m not sure exactly why this happens, but I suspect that it’s because with the parameters set that low Oracle may try to start a push job while a previous one is still active (causing a race condition).

One of the symptoms of this situation can be detected by monitoring the dba_jobs_running view. Under normal circumstances, when a push job has delay_seconds set, you can see the job enter dba_jobs_running at a scheduled push time, remain there for its delay_seconds value, and then (as long as there were no new transactions) be removed from dba_jobs_running. However, when a job is in a “stuck” state, it will remain in the dba_jobs_running view permanently. In fact, the job stays in that view even if you kill the job with dbms_job.remove! (When you do that, most of the values of the job become NULL in dba_jobs_running, but the job still remains there.)

Another way that you can detect this situation it to query the v$lock view as SYS with the following statement:

SELECT COUNT(*)
FROM v$lock 
WHERE type = 'JQ';

Run the above statement several times. If it returns a value greater than 0 each time, then an SNP process has (permanently) locked a push job. While a job is locked in that state, not only will you not be able to shut down the database (see above) but you also will not be able to quiesce the replication group.

The only way to get out of a situation like this (other than to do a shutdown abort) is to kill the SNP process at the operating system level. However, it may be hard to determine which SNP process is holding the lock, so you may have to kill all of the instance’s SNP processes. Not only that, but the terminate signal (as well as the stronger hangup and interrupt signals) will not kill those processes. Basically, you will have to send them all a kill signal (which is performed by executing the kill -9 command).

You will then have to wait for a minute or so (during which time new SNP processes will be automatically spawned) for the lock to be completely released. Needless to say, this is an extremely kludgey and inelegant workaround. So, my recommendation is to not use the delay_seconds parameter at all. Since it doesn’t work as advertised, it’s more trouble than it’s worth.

Monitoring Utility for Synchronous Replication

When a replication group uses synchronous replication, all of the sites in the group need to be up and running in order for any of the sites to be updatable. This is because with synchronous replication, each transaction needs to be applied immediately to all of the sites in the group, or else the entire transaction will be rolled back. This greatly reduces the availability of the replication group, so much so that most production sites probably would not be able to deploy it.

One possible approach to synchronous replication might be to write an automated utility that periodically monitors the status of all of the sites in a replication group. If the utility determines that any of the sites in the group is not running, it would try to restart that site and then recheck the site’s status. If the site is still down, then the utility would dynamically remove that site from the replication group (with the remove_master_databases procedure). Once the down site is removed from the replication group, all of the other sites in the group will again be updatable.

This utility should be run from two different sites in a replication group—from the master definition site of the group, and from one of the other master sites in the group. When the utility is run from the master definition site, it should monitor the status of all of the master sites in the group. If any one of those master sites is down and cannot be restarted, the utility would remove that master site from the replication group, as mentioned above.

When the utility is run from a master site, it should monitor only the status of the master definition site of the group. If it detects that the master definition site is dow and cannot be restarted, the utility should then dynamically relocate the master definition site so that the site running the utility becomes the master definition site of the group. The reason that this relocation must be done is that the remove_master_databases procedure can only be executed from the master definition site of a replication group. After the relocation has been completed, then the utility would remove the original master definition site from the replication group.

There are a few important things to keep in mind about removing sites from a synchronous replication group. With synchronous replication, transactions are not stored in deftran queues. Therefore, there is no way to keep track of the transactions that occur on those sites. Because of that, if a site is removed from a synchronous replication group (even for a short time) then in order to bring that site back into the replication group, all of the data in the group must be reloaded into it. This will ensure that that site will have exactly the same data as all of the other sites. There is no way to load in just the transactions that have occurred since the site was taken out of the replication group, because it is not possible to keep track of transactions with synchronous replication.

Also, in order for a site to be added to a replication group, activity on the entire replication group must be suspended. That means that in order to bring a removed site back into a replication group, all of the databases in the replication group must be made non-updatable to end users—a complete outage.

In short, once a site is removed from a synchronous replication group, bringing that site back into the group will require both a tremendous amount of work and system down time. These ramifications should be considered before implementing a completely automated utility that can remove sites from a synchronous replication group. The primary advantage of using a utility like this is that it allows you to schedule down time of a replication group, rather than having an unplanned outage.

Adding New Sites to a Replication Group

A new site is added to a replication group with the “add_master_database” procedure. One of the parameters of that procedure, use_existing_objects, allows you to specify whether you want to use objects (tables, indexes, etc.) that already exist in the new site, or whether you want them to be created automatically as part of the process of adding the new site. In order to set that parameter to true, you need to create all of the tables and indexes that are part of the replication group in the new database ahead of time, before you call the add_master_database procedure.

That may seem like a waste of time; since the procedure gives you the ability to create the objects automatically. So you may figure, why not utilize it? I’ve experienced problems with setting that parameter to false, though. Many times, the objects do not seem to get created in the proper order. For example, indexes sometimes try to get created before the tables that they refer to have been created. Therefore, I recommend pre-creating the replicated objects (and the data in those objects) in the new database, before running the add_master_database procedure.

Another parameter in the add_master_databases procedure, copy_rows, allows you to specify whether you want Oracle to copy the data from your existing sites into the new site, or whether you want to use data that has already been loaded into that new site. In order to set this parameter to false, you need to pre-load all of the data into the new site manually before calling the add_master_databases procedure.

In an environment with extremely small amount of data, using the copy_rows functionality seems to work. However, I have heard several reports that this functionality will not work very well with larger amounts of data—some people have found that it will take much longer to load the data that way than it would to use export and import to copy the data manually. Therefore, it is generally a better idea to pre-load all of the data yourself, rather than to have the add_master_databases procedure do it.

Although pre-loading the data into a new site is much faster than causing the data to get loaded as part of the add_master_databases procedure, there is one major problem that I’ve experienced with pre-loading data. If there are any foreign keys on any of the replicated tables, and if there are any rows in any of the “child” tables, then when you try to add the new site to the replication group, you may receive referential integrity constraint violations on the child tables. That doesn’t seem to make any sense at all—how can a constraint violation occur when you’ve already pre- loaded all of the data (including all of the child rows) into the new database?

Despite the fact that it doesn’t make sense, though, the error is consistent—every time I’ve tried to add a master site that had child rows, I’ve received those errors. I suspect that this is an outright bug in the add_master_database procedure, but I haven’t been able to find any information about this through Oracle support or MetaLink.

I have come up with a workaround for this problem, although it is not a very elegant solution. After you import all of the data into the new database, you then have to remove replication support (with the drop_master_repobject procedure) from all of the replicated objects in all of the existing master sites (including the master definition site). Since there may be hundreds of replicated objects, I recommend creating a script to do this (and the names of all of the replicated objects can be found in the dba_repobject view). Fortunately, the drop_master_repobject procedure only needs to be called from the master definition site—not from the other master sitess. Iin fact, the procedure can only be called from the master definition site.

Once you have loaded all of the data into the new site, and you have removed replication support from all of the objects in all of the existing sites, then you can call the add_master_database procedure to add the new site without receiving any constraint violation errors. Presumably this is because if you remove replication support from the existing sites’ objects before calling the procedure, then the procedure will not try to generate replication support on any of the new site’s objects because the procedure only generates replication support for objects that are currently replicated in the master definition site.

Once the new site gets added to the replication group, you will then have to manually generate replication support on the appropriate objects. The way to do that is to call the create_master_repobject procedure (in the master definition site) for each of those objects. For tables, procedures, and packages, you also need to call the generate_replication_support procedure. Again, there may be hundreds of replicated objects, so I advise writing a script to do this. Finally, after all of the appropriate objects have been configured with replication support, you can then resume activity on the replication group.

Note: this constraint violation problem does not occur if you cause the data to get loaded as part of the add_master_database procedure (by setting copy_rows to true), but of course loading the data through the procedure can take an unacceptably long time.

There is one other thing to keep in mind about adding a new site to an existing replication group. As mentioned, when the add_master_database procedure is invoked, Oracle looks at all of the objects that are currently replicated in the master definition site and it then initiates jobs to generate replication support on each of those objects in the new site. So, if you do not remove replication support from all of the master definition site’s objects first, when you call the add_master_database procedure, a large number of replication support jobs will all get kicked off at once.

It takes some time for all of those jobs to finish. As an example, on one database that I’ve worked with, it took about 20 minutes for all of the jobs to complete replication support for 78 objects. The reason this is important is because you should wait for all of those jobs to complete before you try to resume activity on the replication group. If you try to resume activity while some of those jobs are still running, then the “resume activity” procedure will generate errors; it will state that you need to “regenerate replication support before resuming master activity”.

The way to monitor the progress of the replication support jobs is to query the dba_repcatlog view on the master definition site. When a job is initiated, it gets placed into that view, and when it completes successfully, it gets removed from that view. So, when the number of rows in that view reaches 0, all of the replication support jobs will have completed and you will then be able to resume activity on the replication group.

If a replication job generates an error, then it will remain in the dba_repcatlog view indefinitely, and you will have to manually resolve the conditions that caused the error. Once you resolve the error in question, you can then remove that entry from the dba_repcatlog view with the purge_master_log procedure. After doing that, though, you will either have to manually generate replication support for the object in question, or you will have to rerun the add_master_database procedure from the beginning. There is no way to resubmit a failed repcatlog job. The problems described in this section are non-issues if you remove replication support from the existing master sites’ objects before calling the add_master_database procedure—and that is one reason why I recommend removing replication support first.

In conclusion, the steps I recommend for adding a new site to a replication group are as follows:

Suspend activity on the replication group.
Export all of the replicated objects and data on the master definition site.
Import those objects (and data) into the new database.
Remove replication support from all of the replicated objects in all of the existing master sites, by calling the drop_master_repobject procedure from the master definition site.
Configure the new database as a master site in the replication group with the add_master_database procedure. When you call that procedure, set the use_existing_objects parameter to true, and set the copy_rows parameter to false.
Generate replication support on all of the replicated objects by calling the create_master_repobject and generate_replication_support procedures from the master definition site.
Resume activity on the replication group.

DDL Commands on Replicated Objects

Oracle provides two procedures that can be used to execute DDL commands on replicated objects These are dbms_repcat.alter_master_repobject and dbms_repcat.execute_ddl. The main difference between the two procedures is that in order to call alter_master_repobject, the replication group needs to be quiesced. It does not need to be quiesced to run execute_ddl.

There is one other major difference: execute_ddl allows a DDL command to be run on a subset of the master sites in a group (as opposed to running the command on all of the sites). Why anyone would want to do that is not clear. I have experienced intermittent problems with both of these procedures. In some cases, when I’ve run either of these procedures on a replicated object, I have received arcane “incompatible” error messages. Unfortunately, I have not been able to determine why these errors occur, or even what the errors means. Also, these errors don’t occur every time I’ve run the procedures. I estimate that they’ve come up less than 5% of the time that I’ve run the procedures.

However, once a particular object reports an error, the error will keep recurring on subsequent calls of the procedure on that object—even if you restore the object from a backup and then re- run the procedure. For example, if you call the alter_master_repobject procedure on table “ABC”, and you get the incompatible flavor error message, then if you try to run the procedure on table ABC again, you will get the error repeatedly.

I have come up with a way to make DDL changes on replicated objects that does work consistently, although once again, the solution is not very elegant. The steps to follow to execute a DDL command on a replicated object are as follows:

Suspend activity on the replication group
Remove replication support from the object in question, by calling the drop_master_repobject procedure from the master definition site.
Manually execute the DDL command on the object in question. Note: since the object is no longer configured with replication support, you will have to execute the DDL command on all of the databases in the replication group.
Regenerate replication support for the object by calling the create_master_repobject and generate_replication support procedures from the master definition site.
Resume activity on the replication group.

Needless to say, this solution requires much more manual intervention than running the Oracle- provided procedures—but since the solution listed above works every time (and the replication procedures don’t) I recommend you follow the steps above to make DDL changes on replicated objects.

Using Multiple Replication Groups

It is possible to create multiple replication groups within a single database. In fact, it is even possible to have both synchronous and asynchronous groups within the same database. The only restriction in using multiple replication groups within a database is that any given object can only belong to one replication group. In other words, using multiple replication groups allows a database’s objects to be “partitioned” between different replication groups.

One reason why it may be appropriate to use multiple groups is if some of the objects in a database contain very time-sensitive data (such as stock quotes) while other objects contain much less time-sensitive data (such as sports scores). In a situation like this, the time-sensitive data could be placed in a synchronous replication group (or in an asynchronous group with a very short push interval), while the less time-sensitive data could be placed in an asynchronous group with a relatively long push interval.

The advantage of doing this would be that it would result in less overall network traffic; only the time-sensitive data would be propagated very often, with the less time-sensitive data being propagated much less frequently. Another, more legitimate reason for using multiple groups is if a relatively small amount of the values in a database are very “problematic” with regard to data conflicts.

For example, as mentioned previously, it is extremely difficult to resolve or even avoid data conflicts with user-generated unique values. So, if a relatively small percentage of the values in a database have that difficulty, then the objects that contain the problematic values could be placed in a synchronous replication group, while the rest of the objects could be placed in an asynchronous group.

This would have two advantages: It would eliminate the possibility of data conflicts with the problematic values, while all of the other values would have the benefits of asynchronous replication (higher uptime, better performance, etc.). If a large percentage of the values in a database are problematic, then it would be more efficient to just use a single, synchronous replication group for all of the objects.

In spite of the advantages of using multiple replication groups, there are several problems associated with using them as well. As one might expect, using multiple groups results in an increase in complexity. It makes the task of administering a mutimaster replication environment even more complex than it already is. A more quantifiable problem with using multiple replication groups is that you have to be careful to avoid dependencies between replication groups. That is, you should not allow foreign keys to span replication groups.

If a parent table resided in an asynchronous group with a long push interval, and a child table resided in a group with a short push interval (or in a synchronous group), then problems could occur. For example, if a single transaction on a local database created a parent record and then created a child record that referenced the new parent record, then the child record would probably get propagated to the remote database before the parent record did. This would obviously cause a referential integrity constraint violation.

Similarly, even if there aren’t any foreign keys that span replication groups, SQL statements that access the database should not access tables that reside in different replication groups. All of the tables that a single SQL statement accesses should reside in the same replication group. Needless to say, preventing dependencies between replication groups requires an enormous amount of coordination between DBAs and developers, and that coordination must be ongoing. In other words, every time there is an application change or a schema change, DBAs and developers must ensure that even after the change, there will not be any dependencies between the replication groups.

The “Quick” Purge Method

As mentioned, if a replication group uses asynchronous replication, transactions that have been propagated must periodically be removed from the deftran queue to prevent it from growing too large. The procedure that is used to schedule these automated purges is called “schedule_purge”. One of the parameters of that procedure, purge_method, allows you to specify the type of purge to use: a “quick” purge, which has a numeric value of 1, or a “precise” purge, which has a value of 2. The quick method supposedly uses less system resources, so one might expect that it is the preferred method to use.

However, that purge method does not seem to remove all of the propagated transactions from the deftran queue. In particular, it always seems to leave the most recent transaction in the queue, even after it has been propagated. That transaction will be removed from the queue the next time the purge job runs, though (as long as more recent transactions have been placed in the queue by that time) so this problem is really more of an annoyance than anything else. At any rate, the “precise” purge method does seem to remove all of the propagated transactions from the deftran queue.

Conclusion

Multimaster replication is a very powerful and flexible feature. However, with power and flexibility comes complexity. In this paper we have looked at the basics of multimaster replication, some of the pitfalls you are likely to run into, and some suggestions for how to work around these challenges.

Certainly there is a lot more to multimaster replication than we have been able to cover here. Hopefully, though, this paper will give you a good place to start.