SQL*Loader is a program that reads data files in many possible formats, parses the data (breaks it into meaningful pieces), and loads the data into database tables.
SQL*Loader uses the following file types:
Log: A mandatory file. If you do not specify a log file, SQL*Loader will try to create one in the current directory with the name of your control file and a .log filename extension. If SQL*Loader cannot create the log file, execution is aborted. The log file contains a summary of the SQL*Loader session, including any errors that were generated.
Control: A mandatory file. This file tells SQL*Loader where the other files are, how to parse and load the data, and which tables to load the data into and can contain the data as well.
Data: Data files are optional and, if included, hold the data that SQL*Loader reads and loads into the database. The data can be located in the control file, so these files are optional.
Bad: Hold the "bad" data records—those that did not pass validation by either SQL*Loader or the database. Bad files are created only if one or more records fail validation. Just as with the log file, if you do not specify a bad file, the database will create one with the name of your control file and a .bad filename extension.
Discard: Hold data records that did not get loaded because they did not satisfy the record selection criteria in the control file. Discard files are created only if data records were discarded because they did not satisfy the selection criteria.
SQL*Loader provides a robust toolkit to build data-loading programs for your Oracle10g database. It can operate either on the database server or on a client machine.
To invoke the SQL*Loader program, use the command sqlldr followed by one or more commandline parameters. These parameters can be identified positionally on the command line or with a keyword=value pair. You can mix positional and keyword notation provided that all the keyword notation parameters appear after all the positional parameters.
For example, to invoke SQL*Loader, telling it to use the connect string system/password and use the control file regions.ctl, any of the following command lines can be executed:
sqlldr system/password regions.ctl
sqlldr control=regions.ctl userid=system/password sqlldr system/password
The command-line parameters detailed below, which are shown in positional notation order (through bindsize), and can be seen by executing the sqlldr command with no parameters.
|userid||The database connect string, for example, scott/tiger@prod).|
|control||The name of the control file.|
|log||The name of the log file. The default is the control filename with a .log extension.|
|bad||The name of the bad file. The default is the datafile name, but with a .bad extension.|
|aata||The name of the datafile. The default is the control filename with a .dat extension.|
|discard||The name of the discard file. The default is the datafile name, but with a .dsc extension.|
|discardmax||The maximum number of discards to allow before failing. The default is all.|
|skip||The number of records to skip before starting to load. The default is none.|
|load||The number of records to load. The default is all.|
|errors||The number of errors to allow before failing. The default is 50.|
|rows||The number of rows in a conventional path bind array or between direct path data saves. The default is 64 rows in conventional path mode and all rows in direct path mode.|
|bindsize||The size of the conventional path bind array in bytes. The default is 256KB.|
|direct||If TRUE, use direct path. The default is FALSE, indicating conventional path.|
|parfile||The name of a file containing additional command-line parameters. There is no default.|
Many of the command-line parameters can also appear in the control file. When they appear as both command-line parameters and in the control file, the command-line options take precedence.
A control file has two or three main sections. The first contains session-wide-oriented information, such as log filename, bind size, and whether direct or conventional path loading will be used. The second section contains one or more INTO TABLE blocks. These blocks specify the target tables and columns. The third section, if present, is the actual data. Comments can appear anywhere in the control files (except in the data lines) and should be used liberally. The control file language can be somewhat cryptic, so generous use of comments is encouraged. Comments in a control file start with a double dash and end with a new line. The control file must begin with the line LOAD DATA or CONTINUE LOAD DATA and also have an INTO TABLE clause, together with directions on how to parse the data and load it into which columns.
The best way to learn how to construct a control file is to look at examples and then use variations of them to build your control file. This section gives you several examples, but is certainly not a comprehensive sampling. Again, the intent is to present you with enough information to get you going.
Example (1) Simple and straightforward. The control file contains both control file commands and the data. The command line is:
sqlldr hr/hr control=regions.ctl
The control file regions.ctl contains the following:
-- Control file begins with LOAD DATA INFILE *
-- The * tells SQL*Loader the data is inline
INTO TABLE regions TRUNCATE
-- truncate the target table before loading
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
-- how to parse the data
-- positional mapping of data file fields to table columns
-- lines following BEGINDATA are loaded
-- no comments are allowed after BEGINDATA BEGINDATA
4,"Middle East and Africa"
The LOAD DATA command tells SQL*Loader that you are beginning a new data load. If you are continuing a data load that was interrupted, specify CONTINUE LOAD DATA. The command
INFILE * tells SQL*Loader that the data will appear in the control file. The table REGIONS is loaded. The keyword TRUNCATE tells SQL*Loader to truncate the table before loading it. Instead of TRUNCATE, you can specify INSERT (the default), which requires the table to be empty at the start of the load. APPEND tells SQL*Loader to add the data to any existing data in the table. REPLACE tells SQL*Loader to issue a DELETE to empty out the table before loading. DELETE differs from a TRUNCATE in that delete DML triggers fire, and DELETE can be rolled back.
The lines in the control file that follow BEGINDATA contain the data to parse and load. The parsing specification tells SQL*Loader that the data fields are comma-delimited and that text data can be enclosed by double quotation marks. These double quotation marks should not be loaded as part of the data. The list of columns enclosed in parentheses are the table columns that will be loaded with the data fields.
Example (2) the same data is loaded into the same table, but it is located in a standalone file called regions.dat and is in the following pipe-delimited, fixed format:
4|Middle East and Africa |
The command line is:
sqlldr hr/hr control=regions.ctl
The content of the control file is:
INFILE '/apps/seed_data/regions.dat' BADFILE '/apps/seed_data/regions.bad'
DISCARDFILE '/apps/seed_data/regions.dsc' OPTIONS (DIRECT=TRUE)
-- data file spec
INTO TABLE regions APPEND
-- add this data to the existing target table
(region_id POSITION(1) INTEGER EXTERNAL
,region_name POSITION(3:25) NULLIF region_name = BLANKS
) -- how to parse the data
The control file tells SQL*Loader where to find the data file (INFILE) as well as the bad and discard files (BADFILE and DISCARDFILE). The OPTIONS line specifies direct path loading. With fixed-format data, the column specification identifies the starting and ending positions. A numeric datatype can be identified as INTEGER EXTERNAL. The directive NULLIF region_name = BLANKS tells SQL*Loader to set the region_name column to NULL if the data field contains only white space.
You shouldn’t have to know the minutiae of how to tell SQL*Loader precisely how to parse data—the options are far too arcane to expect you to recite them off the top of your head for an exam—but knowing the SQL*Loader capabilities of reading fixed format and variable format data is essential. More important to your job is knowing about direct path loads and unusable indexes, which are discussed in the next section.
With direct path loading, SQL*Loader reads data, passing it to the database via the direct path API. The API formats it directly into Oracle data blocks in memory and then flushes these blocks, en masse, directly to the datafiles using multi-block I/O, bypassing the buffer cache, as well as the redo and undo mechanisms. Direct path loads always write to a table above the highwater mark; thus, always increase the number of data blocks that a table is actually using.
The important thing to remember about direct path load is that it is fast, but has restrictions, including the following: