|
ETL
ETL is an acronym for Extract, Transform, and Load, and refers to the process by which data is transferred from one storage unit to another. The core concept of the ETL process is that all data maintains its integrity when being moved and is readable on both systems—even if the systems are quite different. Tools and software offering ETL features are widely available on the market.
Extraction
The first step in the ETL procedure is simply to extract data from the original system, copying it into an intermediate unit where the transformation process can proceed. Naturally, if the source and destination systems use identical programs and coding, the entire extraction process is extremely straightforward, and no transformation will be required.
Transformation
Transformation is at the heart of ETL. In this stage, data is converted from the format of the original system to that of the system to which it is intended to be copied. For example, if the original data codes the information entered in a field for gender as “F” and “M,” and the destination system recognizes only values of “Male” and “Female” in these fields, the former values will need to be altered to correspond to the new coding. Similarly, if one system accepts telephone or credit card numbers with hyphens and the other allows only a straight string of numbers, all hyphens will need to be removed from these fields.
Other transformations may involve:
- Merging redundant or related tables
- Converting text to numerical equivalents or vice-versa (for example, “Male” being represented by a 1, and “Female” by a 2)
- Altering the names and values of variables
- Deleting large chunks of incompatible and unnecessary data.
Transformation can be an arduous and lengthy process or may be quite brief, depending on how different one database is from another.
Loading
In the loading stage, the edited information is copied onto the destination unit. This is usually a fairly simple procedure, though in some cases it may require a detailed notation of the changes made to the source data during the transformation process.
ETL Software
ETL software works by analyzing data and noting patterns. For example, a well-designed ETL program will recognize that a particular string of data is an address, even if its format is irregular, and will then enforce syntactical consistency on the string by re-editing its value. Naturally, no software can anticipate the vast number of potential inconsistencies in a given database, and so ETL software makes use of rules and functions defined by human operators, serving primarily to streamline the overall ETL process.
By Matthew Ingalls
|