13 min read

Data transformation with External Tables

One of the main uses of the External Tables is their support of the ETL process, allowing the user to perform a data load that is transformed to the target format without an intermediate stage table.

Let’s read an External Table whose contents are:

External Tables in Oracle 10g/11g Database: Part 2

This data can be loaded in a single command to multiple tables. Let’s create several tables with the same structure:

SQL> desc amount_jan
Name Null? Type
----------------- -------- ------------
REGION VARCHAR2(16)
AMOUNT NUMBER(3)

Now issue the command to send the data from the external table to the different tables.

INSERT ALL
INTO AMOUNT_JAN (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_FEB (REGION, AMOUNT) VALUES(COUNTRY, FEB)
INTO AMOUNT_MAR (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_APR (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_MAY (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_JUN (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_JUL (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_AUG (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_SEP (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_OCT (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_NOV (REGION, AMOUNT) VALUES(COUNTRY, JAN)
INTO AMOUNT_DEC (REGION, AMOUNT) VALUES(COUNTRY, JAN)
SELECT COUNTRY,
JAN,
FEB,
MAR,
APR,
MAY,
JUN,
JUL,
AUG,
SEP,
OCT,
NOV,
DEC
FROM REGION_REVENUE;

In this example, we will perform a conditional insert to different tables depending on the value of the amount column. We will first create three tables, one for low, another for average, and a third for high amounts:

SQL> create table low_amount(
2 region varchar2(16),
3 month number(2),
4 amount number(3));
Table created.
SQL> create table high_amount as select * from low_amount;
Table created.

Now we can read the External Table and have the data inserted conditionally to one of three mutually exclusive targets.

INSERT ALL
WHEN ( JAN <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '01', JAN )
WHEN ( FEB <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '02', FEB )
WHEN ( MAR <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '03', MAR )
WHEN ( APR <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '04', APR )
WHEN ( MAY <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '05', MAY )
WHEN ( JUN <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '06', JUN )
WHEN ( JUL <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '07', JUL )
WHEN ( AUG <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '08', AUG )
WHEN ( SEP <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '09', SEP )
WHEN ( OCT <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '10', OCT )
WHEN ( NOV <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '11', NOV )
WHEN ( DEC <= 500 ) THEN
INTO LOW_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '12', DEC )
WHEN ( JAN > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '01', JAN )
WHEN ( FEB > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '02', FEB )
WHEN ( MAR > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '03', MAR )
WHEN ( APR > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '04', APR )
WHEN ( MAY > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '05', MAY )
WHEN ( JUN > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '06', JUN )
WHEN ( JUL > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '07', JUL )
WHEN ( AUG > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '08', AUG )
WHEN ( SEP > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '09', SEP )
WHEN ( OCT > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '10', OCT )
WHEN ( NOV > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '11', NOV )
WHEN ( DEC > 500 ) THEN
INTO HIGH_AMOUNT( REGION, MONTH, AMOUNT)
VALUES ( COUNTRY, '12', DEC )
SELECT COUNTRY,
JAN,
FEB,
MAR,
APR,
MAY,
JUN,
JUL,
AUG,
SEP,
OCT,
NOV,
DEC
FROM REGION_REVENUE;

Extending the alert.log analysis with External Tables

Reading the alert.log from the database is a useful feature which can help you to find any outstanding error messages reported in this file.

create table ALERT_LOG (
text_line varchar2(512)
)
organization external (
type ORACLE_LOADER
default directory BACKGROUND_DUMP_DEST
access parameters(
records delimited by newline
nobadfile
nodiscardfile
nologfile
)
location( 'alert_beta.log')
);

Once the External Table has been created, the alert.log file can be queried just like any other regular table.

SQL> select text_line from alert_log
2 where text_line like 'ORA-%';

TEXT_LINE
-----------------------------------------------------------------
ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '/u01/oracle/oradata/beta/redo01.log'
ORA-27037: unable to obtain file status
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/u01/oracle/oradata/beta/redo02.log'
ORA-27037: unable to obtain file status
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u01/oracle/oradata/beta/redo03.log'
ORA-27037: unable to obtain file status

Querying the alert.log file up to this phase is useful just to see the contents of the file and look for basic ORA-% strings. This could also be achieved by using the alert.log link in the Enterprise Manager (EM).

External Tables in Oracle 10g/11g Database: Part 2

The alert.log file can be queried by means of the EM, but as this can only be viewed from the EM in an interactive mode, you can only rely on the preset alerts.

External Tables in Oracle 10g/11g Database: Part 2

If further automatic work needs to be done, then it is useful to do some more work with the alert analysis tool. A temporary table can be used to store the contents of the ALERT_LOG table, along with an extra TIMESTAMP column, so it can be queried in detail in an EM-like manner.

create global temporary table TMP_ALERT_LOG (
LINE_NO NUMBER(6),
TIMESTAMP DATE,
TEXT_LINE VARCHAR2(512)
)
on commit preserve rows;

A bit of PLSQL programming is necessary so the ALERT_LOG file can be modified and inserted into the TMP_ALERT_LOG, (enabling further queries can be done).

declare
cursor
alertLogCur is
select ROWNUM, TEXT_LINE
from ALERT_LOG;
currentDate date;
altertLogRec ALERT_LOG.TEXT_LINE%TYPE;
testDay varchar2(10);

begin
currentDate := sysdate;

for alertLogInst in alertLogCur loop
-- fetch row and determine if this is a date row
testDay := substr(alertLogInst.text_line, 1, 3);
if testDay = 'Sun' or
testDay = 'Mon' or
testDay = 'Tue' or
testDay = 'Wed' or
testDay = 'Thu' or
testDay = 'Fri' or
testDay = 'Sat'
then
-- if this is a date row, it sets the current logical record date
currentDate := to_date( alertlogInst.text_line, 'Dy Mon DD HH24:MI:SS YYYY');
end if;
insert into TMP_ALERT_LOG
values(
alertLogInst.rownum,
currentDate,
alertLogInst.text_line
);
end loop;
end;
/

As the contents of the alert.log end up in a temporary table, more than one DBA can query it at the same time, or restrict the DBA’s accessibilities. There is no need to manage the purge and maintenance of the table after the session has ended, it can be indexed and there is little overhead by means of this procedure. Moreover, as this is a temporary object, minimum redo log information is generated.

Once the external ALERT_LOG and the temporary ALERT_LOG tables have been created, it is possible to perform, not only filters by date (provided by Enterprise Manager) but also any query against the alert.log file.

SELECT TIMESTAMP, TEXT_LINE
FROM TMP_ALERT_LOG
WHERE TIMESTAMP IN (
SELECT TIMESTAMP
FROM TMP_ALERT_LOG
WHERE TEXT_LINE LIKE 'ORA-%'
)
AND TIMESTAMP BETWEEN SYSDATE-30 AND SYSDATE
ORDER BY LINE_NO;

External Tables in Oracle 10g/11g Database: Part 2

Further treatment can be done on this concept to look for specific error messages, analyze specific time frames and perform drill down analysis.

This procedure can be extended to read the trace files or any other text file from the database.

Reading the listener.log from the database

One particular extension of the above procedure is to read the listener.log file. This file has a specific star-delimited field file format which can be advantageous, and eases the read by means of the Loader driver.

The file format is as follows:

21-JUL-2008 00:39:50 * (CONNECT_DATA=(SID=beta)(CID=(PROGRAM=perl)
(HOST=alpha.us.oracle.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)
(HOST=192.168.2.10)(PORT=8392)) * establish * beta * 0
21-JUL-2008 00:39:56 * (CONNECT_DATA=(SID=beta)(CID=(PROGRAM=perl)
(HOST=alpha.us.oracle.com)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)
(HOST=192.168.2.10)(PORT=8398)) * establish * beta * 0
21-JUL-2008 00:40:16 * service_update * beta * 0
21-JUL-2008 00:41:19 * service_update * beta * 0
21-JUL-2008 00:44:43 * ping * 0

The file has a format that can be deduced from the above data sample:

TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE

As you can see this format, even though it is structured, it may have a different number of fields, so at loading time this issue must be considered.

In order for us to map this table to the database, we should consider the variable number of fields to have the External Table created. We’ll create a temporary table so that this doesn’t create an additional transactional overhead.

Now, let’s create an External Table based on this format that points to $ORACLE_HOME/network/log:

create directory NETWORK_LOG_DIR
as '$ORACLE_HOME/network/log';

Now, let’s create the external table:

create table LISTENER_LOG (
TIMESTAMP date,
CONNECT_DATA varchar2(2048),
PROTOCOL_INFO varchar2(64),
EVENT varchar2(64),
SID varchar2(64),
RETURN_CODE number(5)
)
organization external (
type ORACLE_LOADER
default directory NETWORK_LOG_DIR
access parameters (
records delimited by NEWLINE
nobadfile
nodiscardfile
nologfile
fields terminated by "*" LDRTRIM
reject rows with all null fields
(
"TIMESTAMP" char date_format DATE mask "DD-MON-YYYY HH24:MI:SS ",
"CONNECT_DATA",
"PROTOCOL_INFO",
"EVENT",
"SID",
"RETURN_CODE"
)
)
location ('listener.log')
)
reject limit unlimited;

The structure of interest is specified above, so there will be several rows rejected. Seeing as this file is not fully structured, you will find some non formatted information; the bad file and the log file are not meaningful in this context.

Another application of the LISTENER_LOG External Table is usage trend analysis. This query can be issued to detect usage peak hours.

SQL> select to_char(round(TIMESTAMP, 'HH'), 'HH24:MI') HOUR,
2 lpad('#', count(*), '#') CX
3 from listener_log
4 group by round(TIMESTAMP, 'HH')
5 order by 1;

HOUR CX
----- ------------------------------------------------
14:00 ###
15:00 ##########################
16:00 ######################
17:00 #####################
18:00 #####################
19:00 ###############

Reading the listener.log file this way allows the DBA not only to keep track of the listener behavior, but also it allows a security administrator to easily spot hacking attempts.

Let’s find out who is trying to access the database with sqlplus.exe.

SQL> select timestamp, protocol_info
2 from listener_log
3 where connect_data like '%sqlplus.exe%'
4 /

TIMESTAMP PROTOCOL_INFO
-------------------- --------------------------------------------------------
01-SEP-2008 14:30:37 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)(PORT=3651))
01-SEP-2008 14:31:08 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)(PORT=3666))
01-SEP-2008 14:31:35 (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.2.101)(PORT=3681))

The use of External Tables to analyze the listener.log can be used not only to have an in-database version of the listener.log perform periodic and programmatic analysis of the listener behavior, but also to determine usage trends and correlate information with the audit team so that unauthorized connection programs can be easily and quickly spotted. Further useful applications can be found by reading the listener.log file. There are two fields that must be further parsed to get information out of them, but parsing those fields goes beyond the scope of this article. The structure that the analysis should consider is detailed next:

Connect String

  1. SID: The Database Oracle SID, which is populated if the connection was performed by SID, otherwise it is NULL.
  2. CID: It contains two subfields, PROGRAM and HOST.
  3. SERVER: This field indicates the connection type, either dedicated or shared.
  4. SERVICE_NAME: This field is populated when the connection is performed by a Service instead of SID.
  5. COMMAND: The command issued by the user.
  6. SERVICE: Present only when listener commands are issued.
  7. FAILOVER_MODE: In Real Application Clusters (RAC) environments this field is used if the client performed a connection due to a failover. It shows the failover mode used.

Protocol

  1. PROTOCOL: Indicates the used to perform the connection; this will be TCP most of the times.
  2. HOST: This is the client’s IP Address.
  3. PORT: The port number of the oracle server used to establish the connection.

Mapping XML files as External Tables

XML has become a de facto information exchange format, which is why oracle has included the XML Database (XDB) feature from 9.2.0. However, it requires the data to be actually loaded into the database before it can be processed. An External Table allows the user to take a quick look at the contents of the external file prior to performing any further processing.

In this example an External Table is created out of an XML file. This file is read by means of a CLOB field, and some further XDB commands can be issued against the external XML file to extract and view data.

Let’s create the external XML file first:

create table EMPLOYEES_XML (xmlFile CLOB)
organization external (
type ORACLE_LOADER
default directory EXTTABDIR
access parameters (
fields (xmllob char terminated by ',')
column transforms (xmlFile from lobfile(xmllob))
)
location('employees.dat')
)
reject limit unlimited;

The employees.dat file contains the file name of the XML file to load as an external CLOB file. This file, for the purpose of the demo, contains the file name: employees.xml.

Now the file can be queried from the database as if it was a regular table with a single XML column.

External Tables in Oracle 10g/11g Database: Part 2

Dynamically changing the external reference

When managing External Tables, there should be an easy way to redefine the external source file. It is enough to change the External Table properties by means of an ALTER TABLE command. Let’s create a stored procedure that performs this task by means of a dynamically generated DDL command. This procedure, named Change_External_Table redefines the location property. Using a stored program unit is a flexible way to perform this task.

create procedure change_external_table
( p_table_name in varchar2
, p_file_name in varchar2
) is
begin
execute immediate 'alter table '
|| p_table_name
|| ' location ('''
|| p_file_name
|| ''')' ;
exception
when others
then
raise_application_error(sqlcode,sqlerrm) ;
end ;
/

Oracle 11g External Table enhancements

External Tables work the same in 10g and in 11g, so there are no differences when working with these two versions. When working with Data Pump External Tables, and one single row proves defective, the data set reading operation is aborted. An enhancement in this 11g release prevents the data load aborting, thus saving reprocessing time.

Summary

Managing data with External Tables is a means not only for mapping external flat files as regular (but limited) tables inside the database, but also a tool to more efficiently perform administrative tasks such as programmatically processing database log files such as the alert.log or the listener.log files. It can be used to easily view external XML formatted files from inside the database without actually loading the file to the database. It can also be used as a means of unloading data in temporary external storage to exchange data among different Oracle versions. This particular feature allows the user to easily build an Oracle Datamart that allows the pre-formatting and summarization of data from the source, enabling it to be directly inserted into the target data warehouse.

LEAVE A REPLY

Please enter your comment!
Please enter your name here