20 min read

In this article by John P Jeffries, author of the book Oracle GoldenGate 12c Implementer’s Guide, he provides an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution. John tells you how to address some of the issues that influence the decision-making process when you design a GoldenGate solution. He focuses on the additional configuration options available in Oracle GoldenGate 12c

(For more resources related to this topic, see here.)

12c new features

Oracle has provided some exciting new features in their 12c version of GoldenGate, some of which we have already touched upon. Following the official desupport of Oracle Streams in Oracle Database 12c, Oracle has essentially migrated some of the key features to its strategic product. You will find that GoldenGate now has a tighter integration with the Oracle database, enabling enhanced functionality.

Let’s explore some of the new features available in Oracle GoldenGate 12c.

Integrated capture

Integrated capture has been available since Oracle GoldenGate 11gR2 with Oracle Database 11g (11.2.0.3). Originally decoupled from the database, GoldenGate’s new architecture provides the option to integrate its Extract process(es) with the Oracle database. This enables GoldenGate to access the database’s data dictionary and undo tablespace, providing replication support for advanced features and data types. Oracle GoldenGate 12c still supports the original Extract configuration, known as Classic Capture.

Integrated Replicat

Integrated Replicat is a new feature in Oracle GoldenGate 12c for the delivery of data to Oracle Database 11g (11.2.0.4) or 12c. The performance enhancement provides better scalability and load balancing that leverages the database parallel apply servers for automatic, dependency-aware parallel Replicat processes. With Integrated Replicat, there is no need for users to manually split the delivery process into multiple threads and manage multiple parameter files.

GoldenGate now uses a lightweight streaming API to prepare, coordinate, and apply the data to the downstream database. Oracle GoldenGate 12c still supports the original Replicat configuration, known as Classic Delivery.

Downstream capture

Downstream capture was one of my favorite Oracle Stream features. It allows for a combined in-memory capture and apply process that achieves very low latency even in heavy data load situations.

Like Streams, GoldenGate builds on this feature by employing a real-time downstream capture process. This method uses Oracle Data Guard’s log transportation mechanism, which writes changed data to standby redo logs. It provides a best-of-both-worlds approach, enabling a real-time mine configuration that falls back to archive log mining when the apply process cannot keep up. In addition, the real-time mine process is re-enabled automatically when the data throughput is less.

Installation

One of the major changes in Oracle GoldenGate 12c is the installation method. Like other Oracle products, Oracle GoldenGate 12c is now installed using the Java-based Oracle Universal Installer (OUI) in either the interactive or silent mode. OUI reads the Oracle Inventory on your system to discover existing installations (Oracle Homes), allowing you to install, deinstall, or clone software products.

Upgrading to 12c

Whether you wish to upgrade your current GoldenGate installation from Oracle GoldenGate 11g Release 2 or from an earlier version, the steps are the same. Simply stop all the GoldenGate running processes on your database server, backup the GoldenGate home, and then use OUI to perform the fresh installation. It is important to note, however, while restarting replication, ensure the capture process begins from the point at which it was gracefully stopped to guarantee against lost synchronization data.

Multitenant database replication

As the version suggests, Oracle GoldenGate 12c now supports data replication for Oracle Database 12c. Those familiar with the 12c database features will be aware of the multitenant container database (CDB) that provides database consolidation. Each CDB consists of a root container and one or more pluggable databases (PDB). The PDB can contain multiple schemas and objects, just like a conventional database that GoldenGate replicates data to and from.

The GoldenGate Extract process pulls data from multiple PDBs or containers in the source, combining the changed data into a single trail file. Replicat, however, splits the data into multiple process groups in order to apply the changes to a target PDB.

Coordinated Delivery

The Coordinated Delivery option applies to the GoldenGate Replicat process when configured in the classic mode. It provides a performance gain by automatically splitting the delivered data from a remote trail file into multiple threads that are then applied to the target database in parallel. GoldenGate manages the coordination across selected events that require ordering, including DDL, primary key updates, event marker interface (EMI), and SQLEXEC. Coordinated Delivery can be used with both Oracle (from version 11.2.0.4) and non-Oracle databases.

Event-based processing

In GoldenGate 12c, event-based processing has been enhanced to allow specific events to be captured and acted upon automatically through an EMI. SQLEXEC provides the API to EMI, enabling programmatic execution of tasks following an event. Now it is possible, for example, to detect the start of a batch job or large transaction, trap the SQL statement(s), and ignore the subsequent multiple change records until the end of the source system transaction. The original DML can then be replayed on the target database as one transaction. This is a major step forward in the performance tuning for data replication.

Enhanced security

Recent versions of GoldenGate have included security features such as the encryption of passwords and data. Oracle GoldenGate 12c now supports a credential store, better known as an Oracle wallet, that securely stores an alias associated with a username and password. The alias is then referenced in the GoldenGate parameter files rather than the actual username and password.

Conflict Detection and Resolution

In earlier versions of GoldenGate, Conflict Detection and Resolution (CDR) has been somewhat lightweight and was not readily available out of the box. Although available in Oracle Streams, the GoldenGate administrator would have to programmatically resolve any data conflict in the replication process using GoldenGate built-in tools. In the 12c version, the feature has emerged as an easily configurable option through Extract and Replicat parameters.

Dynamic Rollback

Selective data back out of applied transactions is now possible using the Dynamic Rollback feature. The feature operates at table and record-level and supports point-in-time recovery. This potentially eliminates the need for a full database restore, following data corruption, erroneous deletions, or perhaps the removal of test data, thus avoiding hours of system downtime.

Streams to GoldenGate migration

Oracle Streams users can now migrate their data replication solution to Oracle GoldenGate 12c using a purpose-built utility. This is a welcomed feature given that Streams is no longer supported in Oracle Database 12c. The Streams2ogg tool auto generates Oracle GoldenGate configuration files that greatly simplify the effort required in the migration process.

Performance

In today’s demand for real-time access to real-time data, high performance is the key. For example, businesses will no longer wait for information to arrive on their DSS to make decisions and users will expect the latest information to be available in the public cloud. Data has value and must be delivered in real time to meet the demand.

So, how long does it take to replicate a transaction from the source database to its target? This is known as end-to-end latency, which typically has a threshold that must not be breeched in order to satisfy a predefined Service Level Agreement (SLA).

GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. They are as follows:

  • Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database
  • Replicat to target: The time taken for the last record to be processed by the Replicat process compared to the record creation time in the trail file

A well-designed system may still encounter spikes in the latency, but it should never be continuous or growing. Peaks are typically caused by load on the source database system, where the latency increases with the number of transactions per second. Lag should be measured as an average over a specified period.

Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well, you may need to revisit the design.

Availability

Another important NFR is availability. Normally quoted as a percentage, the system must be available for the specified length of time. For example, NFR of 99.9 percent availability equates to a downtime of 8.76 hours in a year, which sounds quite a lot, especially if it were to occur all at once.

Oracle’s maximum availability architecture (MAA) offers enhanced availability through products such as Real Application Clusters (RAC) and Active Data Guard (ADG). However, as we previously described, the network plays a major role in data replication. The NFR relates to the whole system, so you need to be sure your design covers redundancy for all components.

Event-based processing

It is important in any data replication environment to capture and manage events, such as trail records containing specific data or operations or maybe the occurrence of a certain error. These are known as Event Markers.

GoldenGate provides a mechanism to perform an action on a given event or condition. These are known as Event Actions and are triggered by Event Records. If you are familiar with Oracle Streams, Event Actions are like rules.

The Event Marker System

GoldenGate’s Event Marker System, also known as event marker interface (EMI), allows custom DML-driven processing on an event. This comprises of an Event Record to trigger a given action. An Event Record can be either a trail record that satisfies a condition evaluated by a WHERE or FILTER clause or a record written to an event table that enables an action to occur. Typical actions are writing status information, reporting errors, ignoring certain records in a trail, invoking a shell script, or performing an administrative task.

The following Replicat code describes the process of capturing an event and performing an action by logging DELETE operations made against the CREDITCARD_ACCOUNTS table using the EVENTACTIONS parameter:

MAP SRC.CREDITCARD_ACCOUNTS, TARGET TGT.CREDITCARD_ACCOUNTS_DIM;
TABLE SRC.CREDITCARD_ACCOUNTS, &
FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &
EVENTACTIONS (LOG INFO);

By default, all logged information is written to the process group report file, the GoldenGate error log, and the system messages file. On Linux, this is the /var/log/messages file.

Note that the TABLE parameter is also used in the Replicat’s parameter file. This is a means of triggering an Event Action to be executed by the Replicat when it encounters an Event Marker.

The following code shows the use of the IGNORE option that prevents certain records from being extracted or replicated, which is particularly useful to filter out system type data. When used with the TRANSACTION option, the whole transaction and not just the Event Record is ignored:

TABLE SRC.CREDITCARD_ACCOUNTS, &
FILTER (@GETENV ('GGHEADER', 'OPTYPE') = 'DELETE'), &
EVENTACTIONS (IGNORE TRANSACTION);

The preceding code extends the previous code by stopping the Event Record itself from being replicated.

Using Event Actions to improve batch performance

All replication technologies typically suffer from one flaw that is the way in which the data is replicated. Consider a table that is populated with a million rows as part of a batch process. This may be a bulk insert operation that Oracle completes on the source database as one transaction. However, Oracle will write each change to its redo logs as Logical Change Records (LCRs). GoldenGate will subsequently mine the logs, write the LCRs to a remote trail, convert each one back to DML, and apply them to the target database, one row at a time. The single source transaction becomes one million transactions, which causes a huge performance overhead. To overcome this issue, we can use Event Actions to:

  • Detect the DML statement (INSERT INTO TABLE SELECT ..)
  • Ignore the data resulting from the SELECT part of the statement
  • Replicate just the DML statement as an Event Record
  • Execute just the DML statement on the target database

The solution requires a statement table on both source and target databases to trigger the event. Also, both databases must be perfectly synchronized to avoid data integrity issues.

User tokens

User tokens are GoldenGate environment variables that are captured and stored in the trail record for replication. They can be accessed via the @GETENV function. We can use token data in column maps, stored procedures called by SQLEXEC, and, of course, in macros.

Using user tokens to populate a heartbeat table

A vast array of user tokens exist in GoldenGate. Let’s start by looking at a common method of replicating system information to populate a heartbeat table that can be used to monitor performance. We can use the TOKENS option of the Extract TABLE parameter to define a user token and associate it with the GoldenGate environment data.

The following Extract configuration code shows the token declarations for the heartbeat table:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
EXTGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &
EXTTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV
("JULIANTIMESTAMP")), &
EXTLAG = @GETENV ("LAG","SEC"), &
EXTSTAT_TOTAL = @GETENV ("DELTASTATS","DML"), &
), FILTER (@STREQ (EXTGROUP, @GETENV
("GGENVIRONMENT","GROUPNAME")));

For the data pump, the example Extract configuration is shown here:

TABLE GGADMIN.GG_HB_OUT, &
TOKENS (
PMPGROUP = @GETENV ("GGENVIRONMENT","GROUPNAME"), &
PMPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV
("JULIANTIMESTAMP")), &
PMPLAG = @GETENV ("LAG","SEC"));

Also, for the Replicat, the following configuration populates the heartbeat table on the target database with the token data derived from Extract, data pump, and Replicat, containing system details and replication lag:

MAP GGADMIN.GG_HB_OUT_SRC, TARGET GGADMIN.GG_HB_IN_TGT, &
KEYCOLS (DB_NAME, EXTGROUP, PMPGROUP, REPGROUP), &
INSERTMISSINGUPDATES, &
COLMAP (USEDEFAULTS, &
ID = 0, &
SOURCE_COMMIT = @GETENV ("GGHEADER", "COMMITTIMESTAMP"), &
EXTGROUP = @TOKEN ("EXTGROUP"), &
EXTTIME = @TOKEN ("EXTTIME"), &
PMPGROUP = @TOKEN ("PMPGROUP"), &
PMPTIME = @TOKEN ("PMPTIME"), &
REPGROUP = @TOKEN ("REPGROUP"), &
REPTIME = @DATE ("YYYY-MM-DD HH:MI:SS.FFFFFF","JTS",@GETENV
("JULIANTIMESTAMP")), &
EXTLAG = @TOKEN ("EXTLAG"), &
PMPLAG = @TOKEN ("PMPLAG"), &
REPLAG = @GETENV ("LAG","SEC"), &
EXTSTAT_TOTAL = @TOKEN ("EXTSTAT_TOTAL"));

As in the heartbeat table example, the defined user tokens can be called in a MAP statement using the @TOKEN function. The SOURCE_COMMIT and LAG metrics are self-explained. However, EXTSTAT_TOTAL, which is derived from DELTASTATS, is particularly useful to measure the load on the source system when you evaluate latency peaks.

For applications, user tokens are useful to audit data and trap exceptions within the replicated data stream. Common user tokens are shown in the following code that replicates the token data to five columns of an audit table:

MAP SRC.AUDIT_LOG, TARGET TGT.AUDIT_LOG, &
COLMAP (USEDEFAULTS, &
OSUSER = @TOKEN ("TKN_OSUSER"), &
DBNAME = @TOKEN ("TKN_DBNAME"), &
HOSTNAME = @TOKEN ("TKN_HOSTNAME"), &
TIMESTAMP = @TOKEN ("TKN_COMMITTIME"), &
BEFOREAFTERINDICATOR = @TOKEN ("TKN_ BEFOREAFTERINDICATOR");

The BEFOREAFTERINDICATOR environment variable is particularly useful to provide a status flag in order to check whether the data was from a Before or After image of an UPDATE or DELETE operation. By default, GoldenGate provides After images.

To enable a Before image extraction, the GETUPDATEBEFORES Extract parameter must be used on the source database.

Using logic in the data replication

GoldenGate has a number of functions that enable the administrator to program logic in the Extract and Replicat process configuration. These provide generic functions found in the IF and CASE programming languages. In addition, the @COLTEST function enables conditional calculations by testing for one or more column conditions. This is typically used with the @IF function, as shown in the following code:

MAP SRC.CREDITCARD_PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS_FACT,
&
COLMAP (USEDEFAULTS, &
AMOUNT = @IF(@COLTEST(AMOUNT, MISSING, INVALID), 0, AMOUNT));

Here, the @COLTEST function tests the AMOUNT column in the source data to check whether it is MISSING or INVALID. The @IF function returns 0 if @COLTEST returns TRUE and returns the value of AMOUNT if FALSE.

The target AMOUNT column is therefore set to 0 when the equivalent source is found to be missing or invalid; otherwise, a direct mapping occurs.

The @CASE function tests a list of values for a match and then returns a specified value. If no match is found, @CASE will return a default value. There is no limit to the number of cases to test; however, if the list is very large, a database lookup may be more appropriate.

The following code shows the simplicity of the @CASE statement. Here, the country name is returned from the country code:

MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,
&
COLMAP (USEDEFAULTS, &
COUNTRY = @CASE(COUNTRY_CODE, "UK", "United Kingdom", "USA",
"United States of America"));

Other GoldenGate functions: @EVAL and @VALONEOF exist that perform tests. Similar to @CASE, @VALONEOF compares a column or string to a list of values. The difference being it evaluates more than one value against a single column or string.

When the following code is used with @IF, it returns “EUROPE” when TRUE and “UNKNOWN” when FALSE:

MAP SRC.CREDITCARD_STATEMENT, TARGET TGT.CREDITCARD_STATEMENT_DIM,
&
COLMAP (USEDEFAULTS, &
REGION = @IF(@VALONEOF(COUNTRY_CODE, "UK","E", "D"),"EUROPE",
"UNKNOWN"));

The @EVAL function evaluates a list of conditions and returns a specified value. Optionally, if none are satisfied, it returns a default value. There is no limit to the number of evaluations you can list. However, it is best to list the most common evaluations at the beginning to enhance performance.

The following code includes the BEFORE option that compares the before value of the replicated source column to the current value of the target column. Depending on the evaluation, @EVAL will return “PAID MORE”, “PAID LESS”, or “PAID SAME”:

MAP SRC.CREDITCARD_ PAYMENTS, TARGET TGT.CREDITCARD_PAYMENTS, &
COLMAP (USEDEFAULTS, &
STATUS = @EVAL(AMOUNT < BEFORE.AMOUNT, "PAID LESS", AMOUNT > BEFORE.
AMOUNT, "PAID MORE", AMOUNT = BEFORE.AMOUNT, "PAID SAME"));

The BEFORE option can be used with other GoldenGate functions, including the WHERE and FILTER clauses. However, for the Before image to be written to the trail and to be available, the GETUPDATEBEFORES parameter must be enabled in the source database’s Extract parameter file or the target database’s Replicat parameter file, but not both. The GETUPDATEBEFORES parameter can be set globally for all tables defined in the Extract or individually per table using GETUPDATEBEFORES and IGNOREUPDATEBEFORES, as seen in the following code:

EXTRACT EOLTP01
USERIDALIAS srcdb DOMAIN admin
SOURCECATALOG PDB1
EXTTRAIL ./dirdat/aa
GETAPPLOPS
IGNOREREPLICATES
GETUPDATEBEFORES
TABLE SRC.CHECK_PAYMENTS;
IGNOREUPDATEBEFORES
TABLE SRC.CHECK_PAYMENTS_STATUS;
TABLE SRC.CREDITCARD_ACCOUNTS;
TABLE SRC.CREDITCARD_PAYMENTS;

Tracing processes to find wait events

If you have worked with Oracle software, particularly in the performance tuning space, you will be familiar with tracing. Tracing enables additional information to be gathered from a given process or function to diagnose performance problems or even bugs. One example is the SQL trace that can be enabled at a database session or the system level to provide key information, such as; wait events, parse, fetch, and execute times.

Oracle GoldenGate 12c offers a similar tracing mechanism through its trace and trace2 options of the SEND GGSCI command. This is like the session-level SQL trace. Also, in a similar fashion to performing a database system trace, tracing can be enabled in the GoldenGate process parameter files that make it permanent until the Extract or Replicat is stopped.

trace provides processing information, whereas trace2 identifies the processes with wait events.

The following commands show tracing being dynamically enabled for 2 minutes on a running Replicat process:

GGSCI (db12server02) 1> send ROLAP01 trace2 ./dirrpt/ROLAP01.trc

Wait for 2 minutes, then turn tracing off:

GGSCI (db12server02) 2> send ROLAP01 trace2 off
GGSCI (db12server02) 3> exit

To view the contents of the Replicat trace file, we can execute the following command. In the case of a coordinated Replicat, the trace file will contain information from all of its threads:

$ view dirrpt/ROLAP01.trc
statistics between 2015-08-08 Wed HKT 11:55:27 and 2015-08-08 Wed HKT
11:57:28
RPT_PROD_Ol.LIMIT_TP_RESP : n=2 : op=Insert; total=3; avg=1.5000;
max=3msec
RPT_PROD_01.SUP_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;
max=2msec
RPT_PROD_01.EVENTS : n=1 : op=Insert; total=2; avg=2.0000; max=2msec
RPT_PROD_01.DOC_SHIP_DTLS : n=17880 : op=FieldComp; total=22003;
avg=1.2306; max=42msec
RPT_PROD_01.BUY_POOL_SMRY_HIST : n=1 : op=Insert; total=2; avg=2.0000;
max=2msec
RPT_PROD_01.LIMIT_TP_LOG : n=2 : op-Insert; total=2; avg=1.0000;
max=2msec
RPT_PROD_01.POOL_SMRY : n=1 : op=FieldComp; total=2; avg=2.0000;
max=2msec
..
==================================
=============summary==============
Delete : n=2; total=2; avg=1.00;
Insert : n=78; total=356; avg=4.56;
FieldComp : n=85728; total=123018; avg=1.43;
total_op_num=85808 : total_op_time=123376 ms : total_avg_time=1.44ms/op
total commit number=1

The trace file provides the following information:

  1. The table name
  2. The operation type (FieldComp is for a compressed field)
  3. The number of operations
  4. The average wait
  5. The maximum wait
  6. Summary

Armed with the preceding information, we can quickly see what operations against which tables are taking the longest time.

Exception handling

Oracle GoldenGate 12c now supports Conflict Detection and Resolution (CDR). However, out-of-the-box, GoldenGate takes a catch all approach to exception handling. For example, by default, should any operational failure occur, a Replicat process will ABEND and roll back the transaction to the last known checkpoint. This may not be ideal in a production environment.

The HANDLECOLLISIONS and NOHANDLECOLLISIONS parameters can be used to control whether or not a Replicat process tries to resolve the duplicate record error and the missing record error. The way to determine what error occurred and on which Replicat is to create an exceptions handler.

Exception handling differs from CDR by trapping and reporting Oracle errors suffered by the data replication (DML and DDL). On the other hand, CDR detects and resolves inconsistencies in the replicated data, such as mismatches with before and after images.

Exceptions can always be trapped by the Oracle error they produce. GoldenGate provides an exception handler parameter called REPERROR that allows the Replicat to continue processing data after a predefined error. For example, we can include the following configuration in our Replicat parameter file to ignore ORA-00001 “unique constraint (%s.%s) violated”:

REPERROR (DEFAULT, EXCEPTION)
REPERROR (DEFAULT2, ABEND)
REPERROR (-1, EXCEPTION)

Cloud computing

Cloud computing has grown enormously in the recent years. Oracle has named its latest version of products: 12c, the c standing for Cloud of course. The architecture of Oracle 12c Database allows a multitenant container database to support multiple pluggable databases—a key feature of cloud computing—rather than implement the inefficient schema consolidation, typical of the previous Oracle database version architecture, which is known to cause contention on shared resources during high load. The Oracle 12c architecture supports a database consolidation approach through its efficient memory management and dedicated background processes.

Online computer companies such as Amazon have leveraged the cloud concept by offering Relational Database Services (RDS), which is becoming very popular for its speed of readiness, support, and low cost. The cloud environments are often huge, containing hundreds of servers, petabytes of storage, terabytes of memory, and countless CPU cores. The cloud has to support multiple applications in a multi-tiered, shared environment, often through virtualization technologies, where storage and CPUs are typically the driving factors for cost-effective options. Customers choose their hardware footprint that best suits their budget and system requirements, commonly known as Platform as a Service (PaaS).

Cloud computing is an extension to grid computing that offers both public and private clouds.

GoldenGate and Big Data

It is increasingly evident that organizations need to quickly access, analyze, and report on their data across their Enterprise in order to be agile in a competitive market. Data is becoming more of an asset to companies; it adds value to a business, but may be stored in any number of current and legacy systems, making it difficult to realize its full potential. Known as big data, it has until recently been nearly impossible to perform real-time business analysis on the combined data from multiple sources. Nowadays, the ability to access all transactional data with low latency is essential. With the introduction of products such as Apache Hadoop, integration of structured data from an RDBMS, including semi-structured and unstructured data, offers a common playing field to support business intelligence.

When coupled with ODI, GoldenGate for big data provides real-time delivery to a suite of Apache products, such as Flume, HDFS, Hive, and Hbase, to support big data analytics.

Summary

In this article, we have learned an introduction to Oracle GoldenGate by describing the key components, processes, and considerations required to build and implement a GoldenGate solution.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here