The challenge of managing and leveraging big data comes from three elements, according to Doug Laney, research vice president at Gartner. Laney first noted more than a decade ago that big data poses such a problem for the enterprise because it introduces hard-to-manage volume, velocity and variety. The problem is, too many IT departments throw everything they have at the issues of data volume and velocity, forgetting to address the fundamental issue of the variety of data.
Back in 2001, Laney wrote that “leading enterprises will increasingly use a centralized data warehouse to define a common business vocabulary that improves internal and external collaboration.” The issue of that vocabulary – and the variability that keeps companies from creating it – remains the least addressed aspect of the big data conundrum today. (Check out what other experts have to say. Check out Big Data Experts to Follow on Twitter.)
Three Vs of big data
Numerous businesses have found methods for harnessing increased data volume and velocity. Facebook, for example, can analyze enormous volumes of data. Of course, that data is often presented over and over again within the same parameters. This drove technology innovations such as column databases, which are now widely used by other companies that face equally sizable stores of similar data items.
In terms of taming velocity, vendors like Splunk help enterprises analyze rapidly created data through log files that capture several thousand events per second. This analysis of high-volume events is targeted at security and performance monitoring use cases. As with the data volume challenge, the velocity challenge has been largely addressed through sophisticated indexing techniques and distributed data analytics that enable processing capacity to scale with increased data velocity.
When it comes to variety, though, too many enterprises still face a big problem in their approach to big data analytics. This problem is driven by three factors: First, due to growth, acquisitions and technological innovations that add new systems into the environment, enterprises are locked in a highly heterogeneous environment and this heterogeneity only increases with time. Enterprises need to track a plethora of types of systems and manage tens of thousands of data types, as well as the same data being represented using different nomenclatures and formats.
Second, these systems and data types in many cases report both relevant information and information that can be safely filtered out as irrelevant for the problem being addressed. There’s a need to reliably identify impactful information.
The third dimension to the variety challenge is the constant variability or change in the environment. Systems are upgraded, new systems are introduced, new data types are added and new nomenclature is introduced. This further strains our ability to tame the data variety challenge. This adds an additional layer to the variety challenge. (For more insight, check out Big Data: How It’s Captured, Crunched and Used to Make Business Decisions.)
Addressing the Data Variety Problem
To address the data variety problem, enterprises must start with the IT domain, as it often represents both the worst offenders and the worst victims of the variety problem. The first step is to start with a comprehensive definition or taxonomy of all IT elements or assets. This provides a baseline or foundation to refer to anything in or about IT and enables enterprises to manage the increasing heterogeneity against a known taxonomy or terminology.
The next step is to identify the numerous ways the same object is represented across different systems of record. This enables IT professionals to look across their heterogeneous environment and highly filter and compress the data into relevant and manageable chunks.
Finally, IT managers must adopt a process of constant examination of the environment for changes such as new types of elements being introduced or new nomenclature to refer to the same element.
With these steps, IT organizations can manage the variety problem and derive deep insights that have historically eluded IT teams. Moreover, managing the variety problem vastly improves their return on investment in tools and techniques that address the more traditional big data problems of volume and velocity.