What Is the Open Data Platform and What Is its Relation to Hadoop?
The Open Data Platform is a relatively new method of handling big data that can work with Apache Hadoop.
The Open Data Platform (ODP) is an industry-level initiative which focuses on strengthening the adoption of the Apache Hadoop ecosystem and further enabling big data solutions to flourish with the ecosystem. It builds on the strengths of the Apache Hadoop framework.
Obviously, the proponents of the ODP claim that it is going to bring a lot of benefits to those who embrace it, but not everyone is convinced. There appears to be a lot of confusion between choosing the ODP and Apache Hadoop, as if they were entirely different technologies or concepts. The ODP is still relatively new, and it is going to be interesting to see how the industry embraces (or doesn't) the ODP.
What Is the Open Data Platform?
The core components of the ODP include the Hadoop Distributed File System (HDFS), YARN's cluster management technology and the Hadoop management console Ambari. By establishing this core for the ODP kernel, the intent is to run applications on the OPD built on the Hadoop stack. Additionally, the ODP core is a combined force of software components and open-source tests you can base to build solutions.
With the advent of the Internet of Things (IoT), currently the most pressing need is the data itself, be it structured, unstructured or raw data. Rather, the need is to enhance the communication among the growing network of objects. To facilitate it, the Open Data Platform is the key area here, as it leverages the Hadoop ecosystem.
The open data that is available freely can be used and distributed by almost anyone. This is a prospective field in terms of resolving an existing set of problems that a society faces. It is not only limited to one field of society but also affects:
- Supplier exchanges
- Predictive behavior of buyers
So, the way to approach a resolution around any problem area can be structured as follows:
- Problem Area — Identify the problem area with its current set of needs and limitations.
- Resolution — Look for a solution using open data and analytics tools.
- Key Players — Identify the players that are key to the use case, be it players or beneficiaries.
- Inclusion — Involve all open data players to enrich the efficacy of the resolution and follow the industry-level initiative, for example the ODP or Apache Hadoop ecosystem.
- Business Value — Assess the business preposition it brings to the table, for example how it reduces the involved cost.
The Game Changer: Positives
The big forces beckoning the OPD initiative are the major players — GE, Hortonworks, IBM, Infosys, Pivotal, SAS, AltiScale, Capgemini, CenturyLink, EMC, Teradata, Splunk, Verizon and VMware, as well as a few others. The core objective is to leverage open source and open collaboration to further accelerate Apache Hadoop and step up big data to the next level.
The initiative is indeed a game changer, as it addresses the needs of not only the vendors but also the end users. Needless to say, it is closely aligned with the Apache Software Foundation (ASF) process as it leverages the contributions made to the Apache projects and enhances it further. The ODP has provided the open platform to engage the diverse community as a whole.
With the interlock with leading vendors, service providers and users of Apache Hadoop, the biggest challenge to overcome is to reduce fragmentation and gain traction in developments across the Apache Hadoop ecosystem.
The intent of the ODP is to work directly with specific Apache projects, keeping in view the Apache Software Foundation guidelines on how to contribute ideas and code. The objective is to enhance compatibility and standardize the way the apps or tools run on any compliant system.
The other interesting aspect is the standardization of the deployment of solutions built on Hadoop or other big-data technology.
The main focus areas around which the ODP is working toward include:
- Developing an open-source ecosystem for big data
- Acting as catalyst for Hadoop and big data adoption
- Standardizing the Apache Hadoop ecosystem
- Standardizing the deployment mode for applications
- Adopting the best big data and analytical software to support data-driven applications
The following benefits can be gained with the ODP:
- Reduced R&D costs for vendors and solution providers
- Improved interoperability
- Standardized base for future Hadoop distributions
Negative Buzz in the Market: Flip Side
However, other players in the market see the ODP differently. According to these players, the ODP is:
- Redundant with Apache Software Foundation — The Apache Software Foundation has led to the Hadoop standard, using which applications are able to connect, exchange, and use information among Hadoop distributions. Hadoop has become the de facto standard across the industry. So, the question that arises is, what value would the ODP provide?
- Lacks participation by Hadoop leaders — Some major Hadoop players, such as MapR, Amazon Web Services and Cloudera, are not even participating in this initiative.
- Interoperability and vendor lock-in is not an issue — According to a Gartner survey, only a few companies feel that interoperability and vendor lock-in is really an issue. Furthermore, the project and sub-project interoperability is guaranteed by both free and paid distributions. So, that’s not the area the ODP should spend its effort and time.
- Questions on governance — Questions have been raised on the governance model, as equal voting rights are not provided to the leading Hadoop distributions. The governance model has not yet been disclosed.
- Not truly open — With Hortonworks as a partner, the ODP is establishing an open data platform on a single vendor’s packaging. This casts some doubt on the “openness” of the Open Data Platform.
A Matter of Choice
The way forward for the ODP is the standardization model. The standardization has its own set of advantages, but choice is what leads to empowerment. It is choice that causes healthy competition, which causes all those involved to strive for better quality.
So, let us wait and see how the industry embraces the ODP, given the standardized model. There are still many questions that are unanswered such as fee structure, governance model and voting rights. The bigger question is whether the ODP effectively addresses the key customer questions. Only time will tell how this initiative goes further and benefits the community.
One school of thought is inclined towards Apache Hadoop and its flavors, and the other is prepared to develop and embrace the ODP. Holistically speaking, one says the ODP and Hadoop are two distinct concepts, while the other says they complement each other. One says that the ODP is a threat to Apache Hadoop, while the other says it is a big opportunity to leverage Apache Hadoop further. With all sorts of theories in the market, each player perceives the future differently based on the future it promises to bring back to their organization.
So, the biggest question here is whether these two concepts would emerge differently or merge together at some juncture of time. Let us witness the crossroads of big data together to discover if these two giants embrace each other in the big data space. Whatever the case may be, it seems inevitable that these two distinct concepts would overlap each other and benefit the end user at large in the end.