16 min read

In this two-part article by Hector R. Madrid, we will learn about the External Tables in Oracle 10g/11g Database. When working in data warehouse environments, the ExtractionTransformationLoading (ETL) cycle frequently requires the user to load information from external sources in plain file format, or perform data transfers among Oracle database in a proprietary format. This requires the user to create control files to perform the load. As the format of the source data regularly doesn’t fit the one required by the Data Warehouse, a common practice is to create stage tables that load data into the database and create several queries that perform the transformation from this point on, to take the data to its final destination.

A better approach would be to perform this transformation ‘on the fly’ at load time. That is what External Tables are for. They are basically external files, managed either by means of the SQL*Loader or the data pump driver, which from the database perspective can be seen as if they were regular read-only tables.

This format allows the user to think about the data source as if the data was already loaded into its stage table. This lets the user concentrate on the queries to perform the transformation, thus saving precious time during the load phase.

The basics of an External Tables in Oracle10g/11g

An External Table is basically a file that resides on the server side, as a regular flat file or as a data pump formatted file. The External Table is not a table itself; it is an external file with an Oracle format and its physical location. This feature first appeared back in Oracle 9i Release 1 and it was intended as a way of enhancing the ETL process by reading an external flat file as if it was a regular Oracle table. On its initial release it was only possible to create read-only External Tables, but, starting with 10g—it is possible to unload data to External Tables too.

In previous 10g Releases, there was only the SQL*Loader driver could be used to read the External Table, but from 10g onwards it is now possible to load the table by means of the data pump driver. The kind of driver that will be used to read the External Table is defined at creation time. In the case of ORACLE_LOADER it is the same driver used by SQL*Loader. The flat files are loaded in the same way that a flat file is loaded to the database by means of the SQL*Loader utility, and the creation script can be created based on a SQL*Loader control file. In fact, most of the keywords that are valid for data loading are also valid to read an external flat file table.

The main differences between SQL*Loader and External Tables are:

  • When there are several input datafiles SQL*Loader will generate a bad file and a discard file for each datafile.
  • The CONTINUEIF and CONCATENATE keywords are not supported by External Tables.
  • The GRAPHICGRAPHIC EXTERNAL, and VARGRAPHIC are not supported for External Tables.
  • LONG, nested tables, VARRAYREF, primary key REF, and SID are not supported.
  • For fields in External Tables the character set, decimal separator, date mask and other locale settings are determined by the database NLS settings.
  • The use of the backslash character is allowed for SQL*Loader, but for External Tables this would raise an error. External Tables must use quotation marks instead.For example:
    SQL*Loader
    FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY “”
    External Tables
    TERMINATED BY ‘,’ ENCLOSED BY “‘”

External Tables in Oracle 10g/11g Database: Part 1

A second driver is available, the ORACLE_DATAPUMP access driver, which uses the Data Pump technology to read the table and unload data to an External Table. This driver allows the user to perform a logical backup that can later be read back to the database without actually loading the data. The ORACLE_DATAPUMP access driver utilizes a proprietary binary format for the external file, so it is not possible to view it as a flat file.

Let’s setup the environment

Let’s create the demonstration user, and prepare its environment to create an External Table. The example that will be developed first refers to the External Table using the ORACLE_LOADER driver.

create user EXTTABDEMO
	identified by ORACLE
	default tablespace USERS;

alter user exttabdemo
	quota unlimited on users;

grant		CREATE SESSION,
			CREATE TABLE,
			CREATE PROCEDURE,
			CREATE MATERIALIZED VIEW,
			ALTER SESSION,
			CREATE VIEW,
			CREATE ANY DIRECTORY
to EXTTABDEMO;

A simple formatted spool from this query will generate the required external table demonstration data. The original source table is the demonstration HR.EMPLOYEES table.

select
   EMPLOYEE_ID ||','||
   DEPARTMENT_ID ||','||
   FIRST_NAME ||','||
   LAST_NAME ||','||
   PHONE_NUMBER ||','||
   HIRE_DATE ||','||
   JOB_ID ||','||
   SALARY ||','||
   COMMISSION_PCT ||','||
   MANAGER_ID ||','||
   EMAIL
from       HR.EMPLOYEES
order by EMPLOYEE_ID

The above query will produce the following sample data:

External Tables in Oracle 10g/11g Database: Part 1

The External Table directory is defined inside the database by means of a DIRECTORY object. This object is not validated at creation time, so the user must make sure the physical directory exists and the oracle OS user has read/write privileges on it.

$ mkdir $HOME/external_table_dest

SQL> CREATE DIRECTORY EXTTABDIR AS '/home/oracle/external_table_dest';

The above example was developed in a Linux environment, on Windows platforms the paths will need to be changed to correctly reflect how Oracle has been set up.

Now, the first External Table can be created.

A basic External Table

Here is the source code of the External Table creation.

External Tables in Oracle 10g/11g Database: Part 1

The create table command syntax is just like any other regular table creation (A), (B), up to the point where the ORGANIZATION EXTERNAL (C) keyword appears, this is the point where the actual External Table definition starts. In this case the External Table is accessed by the ORACLE_LOADER driver (D). Next, the external flat file is defined, and here it is declared the Oracle DIRECTORY (E) where the flat file resides. The ACCESS PARAMETERS(F) section specifies how to access the flat file and it declares whether the file is a fixed or variable size record, and which other SQL*Loader loading options are declared. The LOCATION (H) keyword defines the name of the external data file. It must be pointed out that as this is an External Table managed by the SQL_LOADER driver the ACCESS_PARAMETERS section must be defined, in the case of External Tables based on the DATAPUMP_DRIVER this section is not required.

The columns are defined only by name (G), not by type. This is permitted from the SQL*Loader perspective, and allows for dynamic column definition. This column schema definition is more flexible, but it has a drawback—data formats such as those in DATEcolumns must match the database date format, otherwise the row will be rejected. There are ways to define the date format working around this requirement. Assuming the date column changes from its original default format mask “DD-MON-RR” to “DD-MM-RR”, then the column definition must change from a simple CHAR column to a DATE with format mask column definition.

Original format:

"HIRE_DATE" CHAR(255)

Changed format:

"HIRE_DATE" DATE "DD-MM-RR"

When working with an External Table, the access parameter is not validated at creation time, so if there are malformed rows, or if there are improperly defined access parameters, an error is shown, similar to the one below.

When working with an External Table, the access parameter is not validated at creation time, so if there are malformed rows, or if there are improperly defined access parameters, an error is shown, similar to the one below.

ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-30653: reject limit reached
ORA-06512: at "SYS.ORACLE_LOADER", line 52

Once the data is created and all required OS privileges have been properly validated, the data can be seen from inside the database, just as if it were a regular Oracle table.

External Tables in Oracle 10g/11g Database: Part 1

This table is read only, so if the user attempts to perform any DML operation against it, it will result in this error:

SQL> delete ext_employees;
delete ext_employees
       *
ERROR at line 1:
ORA-30657: operation not supported on external organized table

As the error message clearly states, this kind of table is only useful for read only operations.

This kind of table doesn’t support most of the operations available for regular tables, such as index creation, and statistics gathering, and these types of operations will cause an ORA-30657 error too. The only access method available for External Tables is Full Table Scan, so there is no way to perform a selective data retrieval operation.

The External Tables cannot be recovered, they are just metadata definitions stored in the dictionary tables. The actual data resides in external files, and there is no way to protect them with the regular backup database routines, so it is the user’s sole responsibility to provide proper backup and data management procedures. At the database level the only kind of protection the External Table receives is at the metadata level, as it is an object stored as a definition at the database dictionary level. As the data resides in the external data file, if by any means it were to be corrupted, altered, or somehow modified, there would be no way to get back the original data. If the external data file is lost, then this may go unnoticed, until the next SELECT operation takes place.

This metadata for an External Table is recorded at the {USER | ALL | DBA}_TABLES view, and as this table doesn’t actually require physical database storage, all storage related columns appear as null, as well as the columns that relate to the statistical information. This table is described with the {USER | ALL | DBA}_EXTERNAL_TABLES view, where information such as the kind of driver access, the reject_limit, and the access_parameters, amongst others, are described.

SQL> DESC USER_EXTERNAL_TABLES
Name 				  Null?     Type
------------------------------- -------- --------------
TABLE_NAME 			NOT NULL   VARCHAR2(30)
TYPE_OWNER 				   CHAR(3)
TYPE_NAME                       NOT NULL   VARCHAR2(30)
DEFAULT_DIRECTORY_OWNER 	           CHAR(3)
DEFAULT_DIRECTORY_NAME          NOT NULL   VARCHAR2(30)
REJECT_LIMIT 				   VARCHAR2(40)
ACCESS_TYPE 	   			   VARCHAR2(7)
ACCESS_PARAMETERS 			   VARCHAR2(4000)
PROPERTY 				   VARCHAR2(10)

This is the first basic External Table, and as previously shown, its creation is pretty simple. It allows external data to be easily accessed from inside the database, allowing the user to see the external data just as if it was already loaded inside a regular stage table.

Creating External Table metadata, the easy way

To further illustrate the tight relationship between SQL*Loader and External Tables, the SQL*Loader tool may be used to generate a script that creates an External Table according to a pre-existing control file.

SQL*Loader has a command line option named EXTERNAL_TABLE, this can hold one of three different parameters {NOT_USED | GENERATE_ONLY | EXECUTE}. If nothing is set, it defaults to the NOT_USED option.

This keyword is used to generate the script to create an External Table, and the options mean:

  • NOT_USED: This is the default option, and it means that no External Tables are to be used in this load.
  • GENERATE_ONLY: If this option is specified, then SQL*Loader will only read the definitions from the control file and generate the required commands, so the user can record them for later execution, or tailor them to fit his/her particular needs.
  • EXECUTE: This not only generates the External Table scripts, but also executes them. If the user requires a sequence, then the EXECUTE option will not only create the table, but it will also create the required sequence, deleting it once the data load is finished. This option performs the data load process against the specified target regular by means of an External Table, it creates both the directory and the External Table, and inserts the data using a SELECT AS INSERTwith the APPEND hint.

Let’s use the GENERATE_ONLY option to generate the External Table creation scripts:

$ sqlldr exttabdemo/oracle employees external_table=GENERATE_ONLY

By default the log file is located in a file whose extension is .log and its name equals that of the control file. By opening it we see, among the whole log processing information, this set of DDL commands:

CREATE TABLE "SYS_SQLLDR_X_EXT_EMPLOYEES"
(
  "EMPLOYEE_ID" NUMBER(6),
  "FIRST_NAME" VARCHAR2(20),
  "LAST_NAME" VARCHAR2(25),
  "EMAIL" VARCHAR2(25),
  "PHONE_NUMBER" VARCHAR2(20),
  "HIRE_DATE" DATE,
  "JOB_ID" VARCHAR2(10),
  "SALARY" NUMBER(8,2),
  "COMMISSION_PCT" NUMBER(2,2),
  "MANAGER_ID" NUMBER(6),
  "DEPARTMENT_ID" NUMBER(4)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY EXTTABDIR
  ACCESS PARAMETERS
(
  RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
  BADFILE 'EXTTABDIR':'employees.bad'
  LOGFILE 'employees.log_xt'
  READSIZE 1048576
  FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM
  REJECT ROWS WITH ALL NULL FIELDS
  (
   "EMPLOYEE_ID" CHAR(255)
    TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "FIRST_NAME" CHAR(255)
    TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "LAST_NAME" CHAR(255)
    TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "EMAIL" CHAR(255)
    TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "PHONE_NUMBER" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "HIRE_DATE" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "JOB_ID" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "SALARY" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "COMMISSION_PCT" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "MANAGER_ID" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"',
    "DEPARTMENT_ID" CHAR(255)
     TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
  )
)
  location
  (
   'employees.txt'
  )
)

The more complete version is shown, some differences with the basic script are:

  • All the column definitions are set to CHAR(255) with the delimiter character defined for each column
  • If the current working directory is already registered as a regular DIRECTORY at the database level, SQL*Loader utilizes it, otherwise, it creates a new directory definition
  • The script specifies where the bad files and log file are located
  • It specifies that an all-null column record is rejected

The more complete version is shown, some differences with the basic script are:

  • All the column definitions are set to CHAR(255) with the delimiter character defined for each column
  • If the current working directory is already registered as a regular DIRECTORY at the database level, SQL*Loader utilizes it, otherwise, it creates a new directory definition
  • The script specifies where the bad files and log file are located
  • It specifies that an all-null column record is rejected

In the case of the EXECUTE keyword, the log file shows that not only are the scripts used to create the External Table, but also to execute the INSERT statement with the /*+ append */hint. The load is performed in direct path mode.

All External Tables, when accessed, generate a log file. In the case of the ORACLE_LOADERdriver, this file is similar to the file generated by SQL*Loader. It has a different format in the case of ORACLE_DATAPUMP driver. The log file is generated in the same location where the external file resides, and its format is as follows:

<EXTERNAL_TABLE_NAME>_<OraclePID>.log

When an ORACLE_LOADER managed External Table has errors, it dumps the ‘bad’ rows to the *.bad file, just the same as if this was loaded by SQL*Loader.

The ORACLE_DATAPUMP External Table generates a simpler log file, it only contains the time stamp when the External Table was accessed, and it creates a log file for each oracle process accessing the External Table.

Unloading data to External Tables

The driver used to unload data to an External Table is the ORACLE_DATAPUMP access driver. It dumps the contents of a table in a binary proprietary format file. This way you can exchange data with other 10g and higher databases in a preformatted way to meet the other database’s requirements. Unloading data to an External Table doesn’t make it updateable, the tables are still limited to being read only.

Let’s unload the EMPLOYEES table to an External Table:

create table dp_employees
  organization external(
	type oracle_datapump
	default directory EXTTABDIR
	location ('dp_employees.dmp')
  )
as
  select * from employees;

This creates a table named DP_EMPLOYEES, located at the specified EXTTABDIR directory and with a defined OS file name.

In the next example, at a different database a new DP_EMPLOYEES table is created, this table uses the already unloaded data by the first database. This DP_EMPLOYEES External Table is created on the 11g database side.

create table dp_employees(
EMPLOYEE_ID        NUMBER(6),
FIRST_NAME VARCHAR2(20),
LAST_NAME  VARCHAR2(25),
EMAIL              VARCHAR2(25),
PHONE_NUMBER       VARCHAR2(20),
HIRE_DATE  DATE,
JOB_ID             VARCHAR2(10),
SALARY             NUMBER(8,2),
COMMISSION_PCT     NUMBER(2,2),
MANAGER_ID NUMBER(6),
DEPARTMENT_ID      NUMBER(4)
)
organization external
(
   type oracle_datapump
   default directory EXTTABDIR
   location ('dp_employees.dmp')
);

This table can already read in the unloaded data from the first database. The second database is a regular 11g database. So this shows the inter-version upward compatibility between a 10g and an 11g database.

SQL> select count(*) from dp_employees;

  COUNT(*)
----------
       107

Inter-version compatibility

In, the previous example a 10g data pump generated an External Table that was transparently read by the 11g release.

Let’s create an 11g data pump External Table named DP_DEPARTMENTS:

create table dp_departments
   organization external(
           type oracle_datapump
           default directory EXTTABDIR
           access parameters
           (
                   version '10.2.0'
           )
           location ('dp_departments.dmp')
   )
as
   select * from departments
Table created.

SQL> select count(*) from dp_departments;

  COUNT(*)
----------
        27

In the previous example, it is important to point out that the VERSION keyword defines the compatibility format.

access parameters
(
	version '10.2.0'
)

If this clause is not specified then an incompatibility error will be displayed.

SQL> select count(*) from dp_departments;
select count(*) from dp_departments
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-39142: incompatible version number 2.1 in dump file
"/home/oracle/external_table_dest/dp_departments.dmp"
ORA-06512: at "SYS.ORACLE_DATAPUMP", line 19

Now let’s use the 10g version to read from it.

SQL> select count(*) from dp_departments;

  COUNT(*)
----------
       27

The VERSION clause is interpreted the same way as the VERSION clause for the data pump export, it has three different values:

  • COMPATIBLE: This states that the version of the metadata corresponds to the database compatibility level.
  • LATEST: This corresponds to the database version.
  • VERSION NUMBER: This refers to a specific oracle version that the file is compatible with. This value cannot be lower than 9.2.0

Summary

As we can see, External Tables can serve not only as improvements to the ETL process, but also as a means to manage database environments, and a means of reducing the complexity level of data management from the user’s point of view. In the next part, we will see how External Tables can be used for data transformation and the enhancements Oracle 11g has brought about in External Tables.

LEAVE A REPLY

Please enter your comment!
Please enter your name here