What Does SQL on Hadoop Mean?
SQL on Hadoop is a type of analytical application tool — the SQL implementation on the Hadoop platform, which combines standard SQL-style querying of structured data with the Hadoop data framework. Hadoop is a relatively new platform, as is big data itself, and not many professionals are experts in it, but SQL on Hadoop simplifies access to the Hadoop framework and makes it easier to implement on current enterprise systems.
Techopedia Explains SQL on Hadoop
SQL on Hadoop refers to various implementations of SQL for the Hadoop platform. MapReduce, which is Hadoop’s cluster job mapper and result organizer, supports SQL as a major use-case as well as other processing methods. Therefore, it makes sense to create powerful tools for allowing SQL, which is one of the most widely used languages for database query and manipulation. As Hadoop gains popularity for enterprise data architecture, SQL is key for proper adoption for both loosely-structured data and structured data used in Hadoop.
SQL on Hadoop key drivers include:
- Leveraging existing SQL skills present in most organizations
- Reusing extract transform load (ETL), business intelligence (BI) and analytics infrastructure investments in Hadoop
Some SQL on Hadoop implementations include:
- Apache Spark SQL
- Apache Hive
- Apache Tajo
- Apache Drill
- HP Vertica on MapR
- ODBC Drivers
- Presto
- Shark