Extraction is the process of deriving relevant information from data sources in a specific pattern for use in a data warehousing environment. Extraction adds meaning to the data and is the first step of the data transformation process. Extraction picks out only certain data that fit a condition or category from a huge collection of data coming from various sources.
In a data warehousing environment, a huge collection of data coming from various structures and unstructured sources must be processed, transformed and stored to derive meaningful conclusions and predictions. The data coming from the primary sources must be imported into the data warehousing system in a systematic manner that makes it easy to perform the various operations on data. This process is called extraction. Extraction adds structure to otherwise unstructured data by following certain rules. The following are some of the techniques used in data extraction:
- Pattern matching
- Table-based approach
- Text analytics