Extract transform load (ETL) is the process of extraction, transformation and loading during database use, but particularly during data storage use. It includes the following sub-processes:
- Retrieving data from external data storage or transmission sources
- Transforming data into an understandable format, where data is typically stored together with an error detection and correction code to meet operational needs
- Transmitting and loading data to the receiving end
The first phase of an ETL process focuses on retrieving the data from the storage source. Most data storage projects integrate data received from various source systems. Each individual system may employ a separate data organization or format. Common data source structures are relational databases and pure data files. They may also include non-relational database patterns like information management systems or other data structures like virtual storage access method (VSAM) or indexed sequential access method (ISAM). Data sources can even include external sources such as data coming from the Internet or through a scanning system.
The transform phase uses a series of rules or operations to retrieve pure data from the source to deliver the data in its final form for manipulation at the receiving end. Some data sources need very little or even no data processing. Sometimes one or more transformations may be critical to match the business and technical requirements of the target database.
The load or transmitting stage aims at sending data to the receiving end, which is likely to be data storage. According to the needs of the application, this process may be very simple or very complicated. Some data storage methods may replace old data with cumulative data. Updating of extracted data is normally done on a periodic basis.