Tutorials

More on ODI

June 20, 2013 - 12:00 am

3214

17 min read

(For more resources related to this topic, see here.)

Invoking an external program that requires a password

There are many instances of external programs that will require a password to be used. For instance, you may be required to establish a VPN connection before you can even connect to your databases.

The example we will use here, because it is easier to demonstrate, will be to decrypt an encrypted file using an external program.

Getting ready

The external program that we will use to build in this recipe is AES Encrypt, an open source encryption application that can be downloaded at http://www.aescrypt.com/ download.html. After downloading the console version of the program for your operating system, unzip the executable and put it in a folder next to the file that you will want to encrypt and decrypt. The steps of this recipe assume that we have copied the executable in the c: temp directory. If you are using another operating system or if you decide to put the program in a different folder, you will have to adapt the instructions of this recipe accordingly.

How to do it…

Open a command line console and go to the c:temp directory. Add a text file of your choice in this directory. Here we are using a file called foo.txt:
Run the following from the command line:
```
c:temp> aescrypt -e -p oracledi foo.txt
```
This will generate an encrypted file named foo.txt.aes that is protected with the password oracledi. If you try to edit the content of this new file, you will see that it has indeed been encrypted:

Rename the original foo.txt file to foo.txt.ori. We will now use ODI to decrypt the file and restore the original content.
In ODI Topology, create a new Data Server in the File technology called AESCRYPT. We will use this data server as a placeholder for the password that we need for this operation. We can also use this to locate the file that we will work on. You can leave the User name empty (we do not need it here), but do set the Password value to oracledi.

In the JDBC tab, select the ODI File JDBC Driver and use the default JDBC URL:
```
jdbc:snps:dbfile.
```
Create a physical schema under that server. Set both the Schema name and Work Schema name to c:temp.

Create a logical schema to point to this physical schema for the default context:
```
FILE_AESCRYPT
```
Switch to the Designer operator, and in the project of your choice, create a procedure called Ch11_Password_Decrypt_File.

Add a new step to this procedure, and call this Decrypt.

In the Command on Target tab, set the Technology to OS Command (leave all other fields to their default values) and type the following command (this is all one line of code, so please ignore the carriage returns that are only due to formatting):
```
<%=odiRef.getInfo("SRC_SCHEMA")%>aescrypt -d -p <@=odiRef.
getInfo("SRC_PASS")@> <%=odiRef.getInfo("SRC_SCHEMA")%>foo.txt.
aes
```
The technology OS Command can be used to execute any type of command at the OS level. One note of caution though: if the script or program that you execute returns anything else than 0, ODI will consider that the program failed. Typically there are two techniques that can be combined to solve this problem:

Redirect any output or errors to other files. For instance, dir 1>dir.out 2>dir.err, where 1 represents the standard output and 2 represents the errors output.

Use Jython code to execute the code and retrieve the return code in a variable. Then evaluate this variable to see whether the returned value indicates an error or is simply informational.

In the Command on Source tab, set the Technology to File, set the logical Schema name to FILE_AESCRYPT, and leave all other fields to their default values. There will be no code on the source side.
Save the procedure and run it.
Now, if you go to the Operator window to look into the details of the code that was generated, you can see that the password is not revealed in the operator logs. We are looking in the following screenshot at the code generated for the task:
Now back to the console, we can see that the original un-encrypted file, foo.txt, was restored along with its original content. You can choose to keep or delete the encrypted file foo.txt.aes.

How it works…

It is very common to have code written on both the source and target commands of a KM (LKMs and multi-technology IKMs) or in a procedure step. One technique that is not necessarily well known is to use the source technology as a placeholder for connectivity information that can be used in the code on the target side. In our example, the technology on the target side is OS Command, which does not give us much flexibility in terms of configuration and connectivity parameters. So, we use an artificial source connection to point to the data server that contains the information we need. Then we leverage that information as needed by leveraging the odiRef.getInfo substitution method to extract the required parameters. In addition, ODI makes sure that the password that we retrieve and pass into the external tool is never revealed in the operator logs, as it is encapsulated with the <@@> syntax.

There’s more…

The selection of logical schemas in the source and target connections allows us to leverage any of the parameters defined in either connection. If we were to establish a VPN connection for instance, we could leverage the Host (Data Server) entry to retrieve the name of the remote server then retrieve the username and password to authenticate against this server. When dealing with external components, which require this type of information, think of leveraging Topology to securely store the connectivity information.

The additional benefit is that if you leverage ODI contexts, the same logic will work across all environments, from development to production.

Tuning a standalone ODI agent

An ODI agent orchestrates all ODI executions. As such, it is a central piece of the infrastructure. Properly tuning the agent and understanding what requires tuning will help you better manage your infrastructure.

Getting ready

All we need for this recipe is to have a standalone agent already installed on one of your systems. If you do not have an agent available, you can run the ODI installer and select the ODI Standalone Agent option. For the recipe that follows, we assume that the agent was installed on a Windows machine, in the folder c:oraclediproducts11.1.1.6. If your agent is installed in a different directory or on a different operating system, keep this in mind as you follow these steps.

How to do it…

Go to the bin directory of the agent:

c:oraclediproducts11.1.1.6oraclediagentbin

Edit the file odiparams.bat (you will have to edit the odiparams.sh file on Linux or Unix systems).
In the file, identify the parameters ODI_INIT_HEAP and ODI_MAX_HEAP. You will notice that the default values are 32 and 256 respectively. Double these values to 64 and 512, then save the file:
If your agent is already running, you will have to stop and restart the agent to take these new values into consideration.

How it works…

The parameters we have modified control how much memory is initially allocated to the ODI agent (ODI_INIT_HEAP) and what is the maximum amount of memory that the agent can request (ODI_MAX_HEAP). These parameters are very important because they control the amount of memory available to the agent. Most actions performed by the agents are running in memory:

When data is fetched with a JDBC connection, this data will be transferred through the agent memory space
If you use the ODI memory engine, this in-memory database will actually use the memory space of the agent
The default behavior of the ODI JDBC driver for XML is to load data into the in-memory database, and as such, it uses more of the agent’s allocated memory

These operations are in addition to the orchestrations that the agent always performs:

Reading the scenarios from the repository
Completing the scenario code according to the context selected for the execution of that scenario
Updating the logs with execution results and statistics

The more operations you ask for the agent to run in memory, the more memory you will need to allocate to the agent.

Keep in mind that if you are running several agents on the same server, each agent will have its own memory space. This can be useful to segregate memory intensive operations, but this will also use more of the available memory on that server.

The same will be true with JEE agents, but in this case, the memory parameters will be part of the configuration of the server itself. Refer to your WebLogic Server documentation for more details.

When we are using JDBC to connect to the source and target servers, we can influence how much memory will be used to transfer data from source to target. If you edit any of the data servers in the Topology navigator, you will see two parameters at the bottom of the window: Array Fetch Size and Batch Update Size:

Oracle Data Integrator 11g Cookbook

The JDBC protocol is defined so that records are processed in limited batches to make sure that memory is not exhausted immediately with very large queries. By default, these two parameters have a value of 30 when ODI ships, which means that JDBC will process the records 30 at a time. By increasing the value of these parameters, you can improve performance by retrieving more records each time. Just keep in time that by doing so, you are using more of the agent’s memory.

When changing the values of the Array Fetch Size and Batch Update Size parameters, it is recommended to have the same value for the Array Fetch Size on the source data server and the Batch Update Size on the target data server. Different values can result in buffering at the agent level, which can be counter-productive in terms of performance.

There’s more…

Increasing the agent’s memory parameters will only work as long as there is enough memory available on the server hosting the agent. Before looking into increasing the value of these parameters, we should always try to use less memory. Techniques that do not leverage the memory space of the agent usually have better performance, if only because the agent does not have to handle the data anymore and simply behaves as an orchestrator:

Instead of using JDBC, try and use database loading utilities: external tables, sqlldr, and data pump are some examples available on Oracle databases. Similar utilities are available on other databases: ODI ships out of the box with KMs that support most of these utilities.
The in-memory engine has one large drawback: it runs in-memory, and as such is more limited than actual databases. Let’s be clear: it will perform well only as long as there is enough physical memory available. After that, we are talking about degrading performance, as memory blocks are swapped to disk to leverage virtual memory. You are usually better off using an actual database, and databases today do cache data in memory when it makes sense to do so.
If you are handling very large XML files that cannot fit easily in memory, use the driver’s property db_props and point to a .properties file that contains all the necessary parameters to connect to an external database. The benefit of this approach is that it allows you to process a lot more files in parallel (files processed in parallel all share the same agent memory space), and also much bigger files. You can look back to Chapter 9, XML and Web Services, where this topic is discussed in details.

Loading a file containing a LOB

Loading large objects always requires special considerations. Here, we will create a multi-technology IKM (that is, an IKM that connects to a remote source data server) that loads CLOBs and BLOBs using an external table definition.

Getting ready

For this recipe, we will need to create three files on disk:

CLOB.TXT: use notepad and write This is a CLOB in this file. Save it on disk in your c:temp directory.
BLOB.DOC: use a word processor program and create this file. Write this is a BLOB in the file and save it in your c:temp directory.
Use notepad and create the file DATA.TXT with the following record:
```
"Sample record with CLOB and BLOB", "CLOB.TXT", "BLOB.DOC"
```
Save this file in your c:temp directory.

Create a table to load the LOBs in your database:

Create table LOB_SAMPLE(
Description VARCHAR2(100),
CLOB_Data CLOB,
BLOB_Data BLOB
);

You will have to reverse engineer the file DATA.TXT in a file model. Define the file with no header, use the comma as the field separator, and use the following names for the three columns: Description, Clob_File, and Blob_File. Use the double quote character (“) for the text delimiter.
You will have to reverse engineer the LOB_SAMPLE table.
You will need a project where the KM IKM SQL Control Append has been imported.

How to do it…

Make a copy of IKM SQL Control Append and rename the new KM IKM File SQL Append (LOB).
Expand the KM in the tree and remove all the options except for INSERT, COMMIT, and DELETE_TEMORARY_OBJECTS.
Edit the IKM. In the Definition tab, select the option Multi-Connections. Then, set the source Technology to File and the target Technology to Oracle.
We will simplify the IKM, so delete all the steps except for the following ones:
- Drop flow table
- Create flow table I$
- Insert new rows
- Commit Transaction
- Drop flow table
Add a new step, name this step Create Oracle Directory, and copy this code:
```
create or replace directory dat_dir AS '<%=snpRef.
getSrcTablesList("", "[SCHEMA]", "", "")%>'
```
Move this step up to make it the second step in the list after the first Drop flow table.
Edit the two steps named Drop flow table. At the very bottom of the steps details, expand the Option entry and select Always Execute: in the original IKM, these steps were conditioned by the FLOW_CONTROL option, which we have removed.

Edit the step Create Flow table I$. At the very bottom of the step details, expand the Option entry and select Always Execute. Then replace the original code with this:

create table <%=odiRef.getTable("L", "INT_NAME", "A")%>
(
<%=snpRef.getColList("", "[COL_NAME]t[DEST_WRI_DT]",
",nt", "","")%>
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY dat_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
CHARACTERSET "WE8ISO8859P1 "
BADFILE '<%=snpRef.getSrcTablesList("",
"[RES_NAME]", "", "")%>.bad'
LOGFILE '<%=snpRef.getSrcTablesList("",
"[RES_NAME]", "", "")%>.log'
DISCARDFILE '<%=snpRef.getSrcTablesList("",
"[RES_NAME]", "", "")%>.dsc'
SKIP <%=snpRef.getSrcTablesList("",
"[FILE_FIRST_ROW]", "", "")%>
FIELDS TERMINATED BY '<%=snpRef.getSrcTablesList("",
"[SFILE_SEP_FIELD]", "", "")%>'
<% if(snpRef.getSrcTablesList("", "[FILE_ENC_FIELD]",
"", "").equals("")){%>
<%} else {%>OPTIONALLY ENCLOSED BY
'<%=snpRef.getSrcTablesList("", "[FILE_ENC_FIELD]", "",
"")%>' AND '<%=snpRef.getSrcTablesList("",
"[FILE_ENC_FIELD]", "", "")%>' <%}%>
MISSING FIELD VALUES ARE NULL
(
<%=snpRef.getColList("", " " +"<? if
(u0022[DEST_WRI_DT]u0022.equals(u0022CLOBu0022))
{?> [EXPRESSION] <?}else if
(u0022[DEST_WRI_DT]u0022.equals(u0022BLOBu0022))
{?> [EXPRESSION] <?}else{?>[COL_NAME] <?}?>"
+»CHAR([LONGC])», «,nttt», «»,»»)%>
)
COLUMN TRANSFORMS (<%=odiRef.getColList(«», «[COL_NAME]
from LOBFILE ([EXPRESSION]) from (dat_dir)
CLOB»,»,/n»,»»,»UD1»)%>
<%=odiRef.getColList(«,», «[COL_NAME] from LOBFILE
([EXPRESSION]) from (dat_dir) BLOB»,»,/n»,»»,»UD2»)%>)
)
LOCATION (<%=snpRef.getSrcTablesList(«»,
«'[RES_NAME]'», «», «»)%>)
)
PARALLEL 2
REJECT LIMIT UNLIMITED

Simplify the code of the step Insert new rows to only keep the following:

insert into <%=odiRef.getTable("L","TARG_NAME","A")%>
(
<%=odiRef.getColList("", "[COL_NAME]", ",nt", "", "((INS and
!TRG) and REW)")%>
<%=odiRef.getColList(",", "[COL_NAME]", ",nt", "", "((INS
and TRG) and REW)")%>
)
select <%=odiRef.getColList("", "[COL_NAME]", ",nt", "", "((INS
and !TRG) and REW)")%>
<%=odiRef.getColList(",", "[EXPRESSION]", ",nt", "", "((INS
and TRG) and REW)")%>
From <%=odiRef.getTable("L","INT_NAME","A")%>

Now save the IKM and create a new interface called Load LOBs where you are using the file DATA.TXT as a source and the table LOB_SAMPLE as a target. In the Overview tab, select Staging Area Different From Target and select the File logical schema that points to the DATA.TXT file in the schema drop down (no worries, we will not actually create anything in the staging area). Then, map the columns as follows:

Source columns

Target Columns

Mappings

Description

Description

DAT.Description

Clob_File

CLOB_Data

Clob_File

Blob_File

BLOB_Data

Blob_File

We have removed the alias name from the mapping of both LOBs using Clob_File and Blob_File instead of DAT.Clob_File and DAT.Blob_File, otherwise this would generate invalid code for the external table definition used in the KM.
In the QuickEdit tab, select the option UD1 for the CLOB column, and the option UD2 for the BLOB column.
In the Flow tab, click on the target data server to select the IKM File SQL Append (LOB).
Save and run the interface. You can check for successful execution in the operator navigator. If you run a select statement against the table LOB_SAMPLE, you can confirm that you have just loaded a CLOB and a BLOB.

How it works…

In this recipe, we are taking advantage of multiple elements that are at our disposal in the redaction of KMs.

Multi-technology IKMs can only be used when the staging area is not on the target server, hence their name, since they can connect to a different technology to access the staging area.

By forcing the staging area on the source data server, we eliminate the need for an LKM; source and staging are now on the same server.

As a result, by using this multi-technology IKM, we bypass the creation of both C$ and I$ staging tables. First, since we do not have an LKM, there is no C$ table. Second, since we use an external table to map the file into the database, the data will go straight from the file into the target table, removing the need for an I$ table. One thing to remember with such IKMs though: you must use the source technology as your staging area as we did in step 9 of this recipe. This can be quite counter-intuitive when you are using a flat file as your source, but since we are not creating any staging table, we are safe doing so.

Next, because of the specific nature of CLOBs and BLOBs, we need to use the name of the columns of both the target table and source table to generate the proper code that will create the external table (this is the reason why we are removing the alias name from the mappings in step 9). This allows us to leverage the tags [COL_NAME] and [EXPRESSION] to retrieve the target and source column names respectively, as we did in step 8:

COLUMN TRANSFORMS (<%=odiRef.getColList(", "[COL_NAME] from LOBFILE
([EXPRESSION]) from (dat_dir) CLOB",",/n","","UD1")%>

Finally, we take advantage of the flags available in the QuickEdit view to explicitly mark the CLOBs and BLOBs columns as UD1 and UD2, so that the proper code from the KM can be applied to these columns specifically.

There’s more…

We have over-simplified the KM to focus on the load of the LOBs, but the techniques used here can be expanded upon. By relying on other techniques as described in this book, you can avoid the manipulation of the aliases in the mappings, for instance. This could have been done by using the .replace() function available in Java.

Likewise, listing the CLOBs and BLOBs in the COLUMN TRANSFORMS section of the external table definition could have been handled with a for loop.

Several KMs have been written by the ODI community. A Google search on ODI and LOB returns too many results for us to list them all here, but they are worth a look if you are interested in this subject. A good starting point is http://java.net/projects/oracledi, where KMs are shared by the ODI developers community.

Source columns	Target Columns	Mappings
Description	Description	DAT.Description
Clob_File	CLOB_Data	Clob_File
Blob_File	BLOB_Data	Blob_File

Invoking an external program that requires a password

Getting ready

How to do it…

How it works…

There’s more…

Tuning a standalone ODI agent

Getting ready

How to do it…

How it works…

There’s more…

Loading a file containing a LOB

Getting ready

How to do it…

How it works…

There’s more…

LEAVE A REPLY Cancel reply

Interviews

Popular on Packt Hub

MobilePro

datapro

Programming

Subscribe to our newsletter