|Read more about this book|
(For more resources on Oracle, see here.)
At a high level, the design must include the following generic requirements:
All the above must be factored into the overall system architecture. So let’s take a look at some of the options and the key design issues.
So you have a fast reliable network between your source and target sites. You also have a schema design that is scalable and logically split. You now need to choose the replication architecture; One to One, One to Many, active-active, active-passive, and so on. This consideration may already be answered for you by the sheer fact of what the system has to achieve. Let’s take a look at some configuration options.
Let’s assume a multi-national computer hardware company has an office in London and New York. Data entry clerks are employed at both sites inputting orders into an Order Management System. There is also a procurement department that updates the system inventory with volumes of stock and new products related to a US or European market. European countries are managed by London, and the US States are managed by New York. A requirement exists where the underlying database systems must be kept in synchronisation. Should one of the systems fail, London users can connect to New York and vice-versa allowing business to continue and orders to be taken. Oracle GoldenGate’s active-active architecture provides the best solution to this requirement, ensuring that the database systems on both sides of the pond are kept synchronised in case of failure.
Another feature the active-active configuration has to offer is the ability to load balance operations. Rather than have effectively a DR site in both locations, the European users could be allowed access to New York and London systems and viceversa. Should a site fail, then the DR solution could be quickly implemented.
The active-passive bi-directional configuration replicates data from an active primary database to a full replica database. Sticking with the earlier example, the business would need to decide which site is the primary where all users connect. For example, in the event of a failure in London, the application could be configured to failover to New York.
Depending on the failure scenario, another option is to start up the passive configuration, effectively turning the active-passive configuration into active-active.
The Cascading GoldenGate topology offers a number of “drop-off” points that are intermediate targets being populated from a single source. The question here is “what data do I drop at which site?” Once this question has been answered by the business, it is then a case of configuring filters in Replicat parameter files allowing just the selected data to be replicated. All of the data is passed on to the next target where it is filtered and applied again.
This type of configuration lends itself to a head office system updating its satellite office systems in a round robin fashion. In this case, only the relevant data is replicated at each target site. Another design, is the Hub and Spoke solution, where all target sites are updated simultaneously. This is a typical head office topology, but additional configuration and resources would be required at the source site to ship the data in a timely manner. The CPU, network, and file storage requirements must be sufficient to accommodate and send the data to multiple targets.
A Physical Standby database is a robust Oracle DR solution managed by the Oracle Data Guard product. The Physical Standby database is essentially a mirror copy of its Primary, which lends itself perfectly for failover scenarios. However , it is not easy to replicate data from the Physical Standby database, because it does not generate any of its own redo. That said, it is possible to configure GoldenGate to read the archived standby logs in Archive Log Only (ALO) mode. Despite being potentially slower, it may be prudent to feed a downstream system on the DR site using this mechanism, rather than having two data streams configured from the Primary database. This reduces network bandwidth utilization, as shown in the following diagram:
Reducing network traffic is particularly important when there is considerable distance between the primary and the DR site.
The network should not be taken for granted. It is a fundamental component in data replication and must be considered in the design process. Not only must it be fast, it must be reliable. In the following paragraphs, we look at ways to make our network resilient to faults and subsequent outages, in an effort to maintain zero downtime.
Surviving network outages
Probably one of your biggest fears in a replication environment is network failure. Should the network fail, the source trail will fill as the transactions continue on the source database, ultimately filling the filesystem to 100% utilization, causing the Extract process to abend. Depending on the length of the outage, data in the database’s redologs may be overwritten causing you the additional task of configuring GoldenGate to extract data from the database’s archived logs. This is not ideal as you already have the backlog of data in the trail files to ship to the target site once the network is restored. Therefore, ensure there is sufficient disk space available to accommodate data for the longest network outage during the busiest period.
Disks are relatively cheap nowadays. Providing ample space for your trail files will help to reduce the recovery time from the network outage.
One of the key components in your GoldenGate implementation is the network. Without the ability to transfer data from the source to the target, it is rendered useless. So, you not only need a fast network but one that will always be available. This is where redundant networks come into play, offering speed and reliability.
One method of achieving redundancy is Network Interface Card (NIC) teaming or bonding. Here two or more Ethernet ports can be “coupled” to form a bonded network supporting one IP address. The main goal of NIC teaming is to use two or more Ethernet ports connected to two or more different access network switches thus avoiding a single point of failure. The following diagram illustrates the redundant features of NIC teaming:
Linux (OEL/RHEL 4 and above) supports NIC teaming with no additional software requirements. It is purely a matter of network configuration stored in text files in the /etc/sysconfig/network-scripts directory. The following steps show how to configure a server for NIC teaming:
- First, you need to log on as root user and create a bond0 config file using the vi text editor.
# vi /etc/sysconfig/network-scripts/ifcfg-bond0
- Append the following lines to it, replacing the IP address with your actual IP address, then save file and exit to shell prompt:
- Choose the Ethernet ports you wish to bond, and then open both configurations in turn using the vi text editor, replacing ethn with the respective port number.
# vi /etc/sysconfig/network-scripts/ifcfg-eth2
# vi /etc/sysconfig/network-scripts/ifcfg-eth4
- Modify the configuration as follows:
- Save the files and exit to shell prompt.
- To make sure the bonding module is loaded when the bonding interface (bond0) is brought up, you need to modify the kernel modules configuration file:
# vi /etc/modprobe.conf
- Append the following two lines to the file:
alias bond0 bonding
options bond0 mode=balance-alb miimon=100
- Finally, load the bonding module and restart the network services:
# modprobe bonding
# service network restart
You now have a bonded network that will load balance when both physical networks are available, providing additional bandwidth and enhanced performance. Should one network fail, the available bandwidth will be halved, but the network will still be available.
Non-functional requirements (NFRs)
Irrespective of the functional requirements, the design must also include the nonfunctional requirements (NFR) in order to achieve the overall goal of delivering a robust, high performance, and stable system.
One of the main NFRs is performance. How long does it take to replicate a transaction from the source database to the target? This is known as end-to-end latency that typically has a threshold that must not be breeched in order to satisfy the specified NFR.
GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. These are:
- Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database
- Replicat to Target: The time taken for the last record to be processed by the Replicat compared to the record creation time in the trail file
A well designed system may encounter spikes in latency but it should never be continuous or growing. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well you may need to revisit the design.