Anything that looks too good to be true usually is. Such might be the case with Apache Hadoop, the much-ballyhooed open-source project that everyone keeps talking about. So what, exactly, is this thing? Good question!
Analyst Mark Madsen of Third Nature nailed it to the wall a while back in a pithy piece on InsideAnalysis.com: "What Hadoop Is. What Hadoop Isn’t." As someone who knows how to design real-world solutions, then actually deploy them, his advice should not be ignored.
But there’s a deeper current flowing here, and the time is nigh to unearth the roots of this fascinating flora, to see if we can’t gain some perspective on what’s happening at more of a macro level. After all, vendors keep saying it’s a big deal, and there are so many participants.
Employ the Committers!(?)
Three companies currently own the majority of Hadoop’s nascent market: Cloudera, Hortonworks and MapR. On a recent, fairly contentious briefing via the Boulder BI Brain Trust (#BBBT), Jim Walker of Hortonworks made this curious comment:
"You can’t advance the tech if you don’t employ the committers!"
Doesn’t this sound like something Senator Palpatine might say in a Star Wars film?
Sen. Palpatine: "Employ the committers!"
Nearby Minion: "But, but, Sir! Think of the children!"
For the layman out there just trying to get things done, committers are people who are dedicated to a particular open-source project. The Apache Foundation has strict protocols by which their projects move forward, which is often a good thing.
That said, Walker’s comment warrants examination. One pointed question (at the risk of conjuring playground days) would be: Is that a promise or a threat? Is he saying that Hortonworks might just take their ball and go home?
Cooperation or Competition?
The interesting if paradoxical angle here is that reportedly, most of the committers on the Hadoop team (some 30 or so in all) are from Hortonworks and Cloudera—who are competitors. This is a very curious case of competition.
So, what’s the deal? Here’s an educated guess: Hadoop largely owes its fame to a clever plan conceived by a group of Silicon Valley venture capitalists and engineers who are essentially trying to hedge their bets against Oracle.
The general idea is to seed the market with a foundation of code that can be enhanced and bolstered by a rag-tag fleet of developers who will ideally, over time, create all manner of data management tools, including database products. The VCs can invest and cash out some day. But there are some serious challenges in play.
Like all monolithic enterprises, Oracle often finds itself in the crosshairs of many a smaller player. And who wouldn’t want just a slice of their mind-numbing revenue? In the last quarter alone, Oracle booked ~$9 billion. But challenging Big Red and beating them are two very different realities.
|Free Webcast: What Is Hadoop and Where Is It Going?
Join Eric Kavanagh, Robin Bloor and Techopedia for a discussion on how Hadoop is vastly different from Linux or SOA, and why its future remains largely unwritten.
The thing about Hadoop, per se, is that it’s not a packaged solution by any means. Rather, it’s a complex collection of modules that enable high-quality programmers to leverage massive parallel processing algorithms to do very specific things. But there’s no fancy user interface, and the manuals are brutal.
Add to that challenge this critical hurdle: you also need business people who have at least a general understanding of what it can do. Those folks must be able to conjure up ideas of how it can be used, then communicate to the developers, who must subsequently produce, test, implement and support applications.
Orchestrating this dance is how Cloudera and Hortonworks make much of their money. Problem is, most of the solutions created via this method are unique, and typically focus on operational systems as opposed to analytical ones. Translation? Stuff like that doesn’t really lend itself to packaged software products.
Which brings us back to Oracle. Larry Ellison and the boys make their hay selling database tech, hardware, services and (wait for it…) packaged software. Cloudera apparently figured this out, hence their focus on Impala. But Hortonworks?
Their model appears to more closely mimic that of RedHat, the folks who built a billion-dollar business on top of the Linux operating system. Nary a major vendor in the enterprise software industry doesn’t write for Linux, the OS by which IBM headed Microsoft off at the pass. But Hadoop is no Linux, not by a long shot.
Dr. Geoffrey Malafsky, a former nanotechnologist for the US Navy, now a data scientist with Phasic Systems and the PSIKORS Institute, distills the Hadoop value proposition like this:
- "Hadoop is great for search, very large trend analysis for stochastic results, and likely some very cheap clever parallel processing of things like my ex-wife used to do: quantum mechanical wave function calculation of solid state and chemical reactions. This real science relies on supercomputers and moved somewhat into parallel processing, but it is a hard change of programming approach. Young, smart, energetic graduate students will be the ones to make this happen. I suspect research grants start going in this direction for some high-powered computational applications."
You’ll notice that doesn’t sound anything like data warehousing, business intelligence, data integration or even big data. It sounds like supercomputing. And for some interesting reason, the worlds of high-performance computing and business intelligence never really have collided or coalesced in any meaningful way.
Long Road Ahead for Hortonworks and Cloudera
And here’s the really bad news for Hortonworks and possibly Cloudera. The big vendors like IBM and SAP and Oracle and Teradata—to put this mildly, and to quote Dire Straits: "Them guys ain’t dumb!" Three and more years ago, all of them rolled out serious Hadoop strategies.
Central to these plans are the kinds of things business users expect: graphical user interfaces, drag-and-drop functionality, modeling and discovery tools, work flow, governance, security; in short, all the bits and pieces that make enterprise software usable. And of course, these big vendors have massive install bases.
To be sure, Cloudera and Hortonworks both have landed good business, but only a tiny fraction of what those major players get each year. Do the math on how much the challengers charge their customers, compared to how much their overhead likely is, and the picture isn’t so rosy. Granted, that’s par for the course with early stage software concerns, but still…
The Future of Hadoop?
So, might we see the classic wave of acquisitions, like we had back in the aughts, when IBM bought Cognos, Oracle got Hyperion and SAP nabbed BusinessObjects? Perhaps, but the new kids on this block don’t own Hadoop; they just borrow it. And as promising as YARN and Tez might be, the release cycles seem to be lagging behind what the heavy hitters produce.
Just the other day, an industry insider commented that the politics at Apache can be a serious bottleneck. This is not terribly surprising, especially when you consider the dollars involved—there is great motivation for innovators to strike it rich. And has anyone noticed how Chrome seems to have surpassed Firefox in functionality and operability lately? Closed-source, anyone?
One thing’s for sure: this game will play out in some interesting ways. Yes, the mammals (read: small vendors) can often outrun the dinosaurs; but there are still alligators and crocodiles all around the world; and if you stumble upon one unawares, you might just discover how sharp those teeth can be. A few crocs together could even take down an elephant or two.