The Hadoop Distributed File System (HDFS) is a distributed file system that runs on standard or low-end hardware. Developed by Apache Hadoop, HDFS works like a standard distributed file system but provides better data throughput and access through the MapReduce algorithm, high fault tolerance and native support of large data sets.
Extract transform load (ETL) is the process of extraction, transformation and loading during database use, but particularly during data storage use. It includes the following sub-processes:
The first phase of an ETL process focuses on retrieving the data from the storage source. Most data storage projects integrate data received from various source systems. Each individual system may employ a separate data organization or format. Common data source structures are relational databases and pure data files. They may also include non-relational database patterns like information management systems or other data structures like virtual storage access method (VSAM) or indexed sequential access method (ISAM). Data sources can even include external sources such as data coming from the Internet or through a scanning system.The transform phase uses a series of rules or operations to retrieve pure data from the source to deliver the data in its final form for manipulation at the receiving end. Some data sources need very little or even no data processing. Sometimes one or more transformations may be critical to match the business and technical requirements of the target database.The load or transmitting stage aims at sending data to the receiving end, which is likely to be data storage. According to the needs of the application, this process may be very simple or very complicated. Some data storages may replace old data with cumulative data. Updating of extracted data is normally done on a periodic basis.
Read More »
Join 138,000+ IT pros on our weekly newsletter
Home | Advertising Info | Write for Us | About | Contact Us
2010 - 2014
Janalta Interactive Sites: