Hadoop’s utility is starting to go beyond big data processing and analytics as the industry comes to demand more from it. Hadoop is steadily catering to diverse requirements related to enterprise data architecture while retaining its original strengths. The list of what Hadoop can do and is currently doing is quite long. Hadoop is now able to process huge volumes of transactional workloads, a task which was formerly expected of traditional technologies. Going forward, there are a lot of possibilities for Hadoop in the future. For example, transaction systems based on SQL can utilize a Hadoop SQL engine and Hadoop will also add a lot of RDBMS capabilities. You can say that Hadoop is becoming a hybrid of data processing and analytical capabilities with enterprise architecture capabilities.
What Is Next-Generation Data Architecture?
To put it simply, next-generation data architecture is an evolved form of data architecture. Everything, including data models, data policies, rules and standards which govern how data is collected, stored, arranged, analyzed or processed, integrated, used and dispensed, has evolved under next-generation data architecture.
The main difference between earlier data architecture and next-generation data architecture is the latter’s capability to collect, store and process enormous volumes of data, also known as big data, in real time. The architecture performs all these complex tasks without compromising on privacy, security and data governance standards.
Next-generation data architecture is faced with many challenges. It is not easy to handle the volume, velocity and variety of big data. Add to that the requirements of optimizing system workload, improving performances, speed and accuracy, and cost reduction. Needless to say, the preceding data architecture did not have to manage such demands.
So, CIOs and information architects want to find a solution that helps them achieve their goals. Operational Hadoop has been in focus for some time in this context.The following sections will discuss how operational Hadoop can solve problems.
Expectations From Hadoop in the Context of Next-Generation Architecture
Companies are under increasing pressure to deliver better results and the effects are trickling down to the expectations placed on the technologies. So, Hadoop is no longer expected to just process data. CIOs and CTOs want more from Hadoop. Given below is a list of expectations from Hadoop. In fact, Hadoop has already been delivering on a few of these expectations.
Faster Performance With the Ability to do Random Writes and Updates
It is clear that Hadoop is expected to demonstrate enterprise-grade capabilities. This expectation is in addition to its original capabilities of data processing. The constraint is that Hadoop was not originally designed to do transactional workloads. However, there are a few solutions that can drive applications which require high insert and retrieval rates within Hadoop. These solutions are Apache HBase and MapR-DB.
Working With Transaction Systems
Hadoop is expected to work with transaction systems that are based on SQL and have create, read, update and delete capabilities. The transaction systems will be leveraging the SQL engine. The systems will also have full Portable Operating System Interface (POSIX) compliance and the capability of processing high transaction volumes.
Capabilities of Relational Database Management Systems (RDBMSs)
What is required above all is Hadoop should have the atomicity, consistency, isolation, durability (ACID) capabilities of an RDBMS. Data consistency is a highly important factor when Hadoop is handling high volumes of data. Data consistency needs to be maintained, especially when master data is being stored and the data is being used in a multitenant environment.
Full Database Capabilities
Hadoop is expected to support features such as backup, fault tolerance, recovery and disaster recovery. For Hadoop to evolve into a system with RDBMS capabilities, it needs to be compatible with existing IT tools.
Hadoop is already working on fulfilling the expectations, as evident from some developments. Hadoop can provide real-time analysis and fast responses based on the resource management support provided by YARN. YARN is a large-scale and distributed operating system for big data applications in addition to being a resource manager. Other developments such as that of Apache Storm, distributed in-memory architectures such as Apache Spark, Apache Hive, Drill and MapR-FS (a high performance HDFS replacement), are known to be working, in order to offer various full database capabilities, such as backup, disaster recovery, fault tolerance, etc. (For more on YARN, see What are the Advantages of the Hadoop 2.0 (YARN) Framework?)
What Values Can Hadoop Add to Next-Generation Data Architecture?
The values operational Hadoop can add to next-generation data architecture can be viewed from two perspectives: one, whether it is fulfilling the expectations described above, and two, whether it is doing anything additional. Given below are the salient values that operational Hadoop can bring.
SQL on Hadoop
Hadoop is integrating SQL standards more and more. SQL has been the standard of application development with data for a long time, and it dominates the data transactional scene in enterprise, not to mention reporting and analytics. However, the full powers of SQL are not utilized because of vendor-specific RDBMS and inconsistent ANSI standard adoptions across the industry. There are plans in the works to incorporate the latest ANSI standards into Hadoop. This will enable Hadoop to efficiently manage growing volumes of data. SQL can then run all data in a cluster and the SQL-on-Hadoop engine will not place any limitations on the number of nodes that can be used. (To learn more about SQL on Hadoop, see How Can SQL on Hadoop Help with Big Data Analysis?)
Better Structured Data Management
Hadoop is now able to provide more scalability and manageability of data within its platform via the HDFS. And the data operating system has been enabled via Hadoop’s YARN applications. This strategy represents a shift in data architecture at a fundamental level. Now, Hadoop can store various types of data such as transaction-oriented databases, graph databases and document databases and these data can be accessed via the YARN applications. There is no need to duplicate or move the data to other locations.
Improved Performance as an Enterprise Data Architecture
Operational Hadoop is on its way to become the core system of enterprise data architecture. As Hadoop gets more into enterprise data architecture, data silos are going to be eliminated as the lines between them are eliminated. There is going to be rapid improvement in almost all aspects. Improvements are going to take place in the form of more efficient file formats, better SQL engine performance, improved file systems and robustness which will fulfill the needs of enterprise applications.
Difference Between Hadoop and Other Technologies
In the past, the main difference between Hadoop and data enterprise technologies was the big data processing, reporting and analytics capabilities of Hadoop. Now, as operational Hadoop becomes more and more a part of enterprise data architecture, the difference between the entities is getting increasingly blurred. So, operational Hadoop is emerging as a superior alternative to existing enterprise data architecture.
Given the expectations and progress, Hadoop is going to be in focus of the industry for quite some time. But it makes sense to not focus too much on Hadoop and simply ignore other technologies. This is because other technologies will be making progress on the same parameters and might even overtake Hadoop. It is never good to have a monopoly in the market. It is good that the makers of other technologies than Hadoop might be motivated to deliver better products and even plug-ins that help Hadoop improve its performance.