How Big Data's Getting Smaller
How to collect and analyze big data is only one side of the equation; the other is how to understand it.
On October 4, 2012, Mark Zuckerberg announced that Facebook had reached a major milestone: 1 billion active users. To put this into context, he told an interviewer that the only other companies with 1 billion customers were "probably Coca Cola and McDonalds".
This is just one example of the very large numbers businesses now have to deal with. Numbers so large that most people really can’t get their arms around them, so to speak. What happens is that these numbers become abstractions. They're so big, they're just not real to us.
Add to this the fact that we process and store more and more information every day and we find ourselves almost unable to deal with both the amount of data and the size of the individual values. Google processes about 24 petabytes per day, while the video game "World of Warcraft" uses 1.3 petabytes of storage to maintain its game.
Now those are big numbers. The problem then becomes not only how to deal with such huge amounts of data, but also how to understand them. Thankfully, help is coming from a variety of directions in these areas. (Get some background on how big data is being put to use in this infographic, Humanizing Big Data.)
How Data's Getting Smaller
For the past few years, the father of the World Wide Web, Sir Tim Berners-Lee, has been actively campaigning for open data, which is defined as data that's available to everyone to explore and analyze. In a TED video, Berners-Lee gives examples of how access to data led to the exposure of racism in Ohio and helped provide much-needed healthcare to refugee camps in Haiti. Clearly, these are applications where data has moved from abstraction to actuality.
Perhaps the best-known developer of methods of presenting statistical data in easy-to-comprehend graphics is Hans Rosling. His Gapminder program, software that converts international statistics into moving, interactive graphics, is available for download on all varieties of personal computers. (You can find some great examples of how it's used in this TED talk. The development of Gapminder is discussed at another talk.) Forget pie charts: This software presents statistics not only in a way that make sense, but in that makes an impression. You'll never get goosebumps from textbook statistics but these graphics pack enough punch to blow your mind.
While Rosling is a professor who's well versed in statistics, David McCandless is a journalist who only recently became interested in the design of methods to present data analysis in a way that truly informs. His TED talk presents examples of data visualizations of such diverse studies as societal concerns about video games, the effectiveness of vitamin supplements, and romantic breakups by season and month. For McCandless, data presents a unique new direction in journalism, and a way to explore a topic and provide insight in a way that was never possible before. (You can check out some truly amazing examples of how this is being applied in the Data Journalism Handbook.)
Chris Jordan takes a different approach. Unlike Rosling and McCandless, Jordan draws on his background as an artist to present information on topics such as deaths from smoking, prison incarcerations, prescription drug addiction and other major issues in a way that is both beautiful and powerful. It's information - or data - as art and, in Jordan's case, some pretty strong political commentary. (You can check out Jordan's work here.)
Jordan, Rosling, and McCandless are just three of the many people attempting to make meaningful use of the big data that now exists in the world, but this group of big data pioneers is growing.
Tools of the Trade
Before we can turn data into something useful, first we have to make sense of it. Tools must be created to make sense of the massive expansion of facts and data being generated every year by scientists, academics and businesses. An EMC-sponsored IDC study in 2011 showed that data is doubling constantly, and it takes less than two years each time. The study further stated that a colossal 1.8 zettabytes will be created and replicated in 2011.
Now there's a number that's hard to put your arms around! The EMC study tries to put it into context by providing some interesting examples of what 1.8 zettabytes is equivalent to:
- Every person in the United States tweeting three tweets per minute for 26,976 years nonstop
- Every person in the world having more than 215 million high-resolution MRI scans per day
- Over 200 billion HD movies (each two hours in length). It would take one person 47 million years to watch every movie if they watched all day every day.
- The amount of information needed to fill 57.5 billion 32 GB Apple iPads.
With that many iPads we could:
- Create a wall of iPads 4,005-miles long and 61-feet high extending from Anchorage, Alaska, to Miami, Florida.
- Build the Great iPad Wall of China. (It would be twice the average height of the original.)
- Build a 20-foot high wall around South America
- Cover 86 percent of Mexico City
- Build a mountain 25 times higher than Mt. Fuji
To be able to make this data useful - to transform it into useful information, we need not only apps and "mashups" - the marrying of services such as Google Earth and New York Times International headlines or of a NYC Restaurant Guide with NYC Health Dept. Ratings - but also very powerful tools to filter, sort and analyze masses of data to provide the information necessary for decision making, scientific studies and difficult analysis. IBM has developed such tools, which it collectively refers to as Smarter Analytics, for use in conjunction with its big data and cloud services. It bundles software, hardware and consulting services to attempt to provide the information platform on which to make business and scientific decisions. Hewlett-Packard, Oracle and many other IT companies are also reaching out to clients with products to try to effectively deal with this information glut.
Big Data, Big Potential
To realize the potential of this new data age, we need many more systems and apps. We need IT professionals with 21st century education and skills. We need applications specialists who really understand the workings and needs of businesses, industry, government agencies, the military, entrepreneurs and researchers. We also need calm and mature analysts who will question the judgments made based on data analysis. It will be easy to be overwhelmed by the powerful computer tools working "magic" on masses of data. Common sense must always prevail or, at least, require reworking of the data.
We already know that the potential for big data boundless, but so is the capacity for error. Therefore, the tools that are built to make sense of all this information may be the key to wrapping our arms around the big data problem.