Gartner agrees with Forrester Research; substantial hype surrounds big data. In the September 2014 report, Gartner debunks five of the biggest data myths, and Gartner analysts offer their opinion as to what’s misunderstood about big data and its manipulation. So what are big data's biggest myths? Let's have a look.
Myth: Everyone is ahead of us in adopting big data.Gartner says interest in big data is at an all-time high. Despite this, a paltry 13 percent of those polled have working systems. The reason: most companies have yet to figure out how to mine any value from big repositories of data. Here, Gartner’s survey is more optimistic than the Forrester report, which found that only 9 percent of survey participants said they planned to implement big-data tehcnologies over the next year. (Big data has a lot to offer. Learn more in 5 Real-World Problems Big Data Can Solve.)
Myth: We have so much data; we need not worry about every little data flaw.Gartner is worried about a foible we humans have: "We have so much, the little that’s bad won’t matter." Ted Friedman, vice president and distinguished analyst at Gartner believes that this is the wrong way to look at the situation.
"In reality, although each individual flaw has a much smaller impact on the whole dataset than it did when there was less data, there are more flaws than before because there is more data," Friedman said. "Therefore, the overall impact of poor-quality data on the whole dataset remains the same."
Friedman adds another reason for concern. Big-data capture often includes data from outside the business, which is therefore of unknown structure and origin. This increases the potential for errors.
Myth: Big data technology will eliminate the need for data integration.There are two key data analysis strategies that may be applied to big data: "schema on write" or "schema on read." Until recently, schema on write was the only method used. Schema on read is the current craze in database management. Unlike schema on write, which requires a structured format, data is loaded into schema-on-read databases in its raw format. Then developers - using unstructured database platforms like Hadoop - bend the disparate data into a usable format. Schema on read has obvious advantages but, as Gartner mentions, data integration has to occur at some point.
Myth: Using a data warehouse for advanced analytics is pointless.Spending the time to create a data warehouse seems pointless to many information managers, particularly when newly-captured data is different from that in the data warehouse. However, Gartner again warns even advanced data analytics will use data warehouses and new data, which means data integrators must:
- Refine new data types to make them suitable for analysis
- Decide which data is relevant, and the level of data quality needed
- Determine how to aggregate the data
- Understand that data refinement can happen in places other than the data warehouse
Myth: Data lakes will replace the data warehouse.Data lakes are repositories of disparate data, as opposed to data warehouses where data is in a structured format. Creating a data lake takes little upfront effort (no need to format the data) compared to data warehouses, which is why data lakes are of interest.
Gartner emphasizes that having the data is not the point - being able to manipulate the captured data for informed decision-making is the point. Moreover, using (somewhat unproven) data lakes to facilitate decision-making is problematic.
"Data warehouses already have the capabilities to support a broad variety of users throughout an organization," Nick Heudecker, research director at Gartner, said. "Information-management leaders don't have to wait for data lakes to catch up." (Learn more about adopting big data in 7 Things You Must Know About Big Data Before Adoption.)