The Explosion of DataAs we move through our analysis of the impact of technology on everything we do, we must note the factor that impacts industries across the board - marketing, law enforcement, manufacturing, science, education, politics, government, and on-and-on - data! Never before in history have we had so much data available to use and misuse.
The Beginning of Big DataIt has been written that if we use the variable "X" to indicate all the data developed by humans from the beginnings of time until the inauguration of Barack Obama, that "10X" has been developed since then. Although it is also estimated that more than 50 percent of that data is redundant or spam, that is still an awful lot of data! There is census data, scientific data, economic data, marketing data, credit data, sports data ... data about every possible thing. And it's constantly increasing. (The increase of data and the importance of crunching it has made data science one of the hottest new careers. Read more in Data Scientists: The New Rock Stars of the Tech World.)
It is the personal data that tends to attract our attention though. We read and perhaps worry about the amount of data that Google, Facebook and other websites have on each of us and how it may be used. Some critics of all this data collection are concerned about government surveillance; others are bothered by the collection of information about our purchasing habits by stores and credit card companies; still others think that medical data collected by doctors, hospitals and insurance companies may somehow be used against them. In short, data is a big deal!
How did this explosion of data occur? Obviously, computers were the original prime component. No longer was information simply typed to paper and kept in articles, journals and books. With computers, it was stored and could be modified, refined, and/or built upon. The advent of the Internet added another dimension by allowing data to be transported, collaborated on, and made available worldwide.
If You Build It They Will Come: More Storage, More DataA very important factor in data collection is constant breakthroughs in storage technology that have occurred over the past few decades. Storage devices have gotten bigger in capacity, smaller in size and much less expensive. In the less than 50 years of microcomputer use, storage devices have gone from cassette tape to floppy diskette to low-capacity fixed disks to high-capacity fixed disks and the cost of storage has continued to decline.
A case in point: In 1980, I purchased my first fixed disk for a microcomputer, a 10 million-byte Corvus. It cost me $5,500. At the time of writing (2012), I have trillion-byte fixed disks. As you can see from the following table, the cost has shrunk geometrically:
|Unit||Capacity (Bytes)||Cost in 1980||Cost Today in 1980 Prices||Actual 2012 Prices|
|10-Million Byte Drive||10,000,000||$5,500|
The two terabyte disk sitting next to my MacBook cost around $200. If prices per megabyte had remained at 1980 levels, it would have cost over $1 million! In addition, I have 16 billion bytes (GB) on a chain around my neck. The 16 GB USB drive weighs about an ounce while my 10 MB Corvus drive in 1980 must have weighed 15 pounds; it was heavier than the computers it serviced.
It is important that storage capacity keeps increasing because the data universe is now doubling every two years. This doubling puts more demand not only on our storage capacity, but also on our communications channels and, most important, on the software tools that collect, extract, combine, and analyze this data.
Data Vs. InformationI have used the terms "data" and "information" rather interchangeably so far - actually they are quite different. Data is the raw material - numbers, pictures, etc., while information is data that has been shaped into a form that is understandable and useful to human beings. It is the task of the computer scientist to develop tools and algorithms that will constantly make these masses of data more useful. (Learn more about this in Big Data: How It's Captured, Crunched and Used to Make Business Decisions.)
Data and Our PrivacyWhile we are generally happy (or at least not troubled) by the fact that our orthopedist’s report goes immediately via computer link to our internist, or that Amazon immediately makes us aware of a new book by a favorite author or a new accessory for our Kindle Fire, most people don't want marketers to know about everything they do or buy online. Most of us don’t really like the idea of surveillance cameras all throughout our cities either. Even more of us chafe at the idea of being denied employment, a promotion or a mortgage because of something we posted over social media. Fewer still are comfortable with a government agency collecting and storing information about us. They can’t do that, can they??
No, the government, can’t gather such information, but as is pointed out in the very comprehensive 2005 book, "No Place to Hide", by Washington Post investigative reporter Robert O’Harrow, private firms are not restricted by law from gathering this information and putting it together - and then selling it to government agencies. (To read more, see Don't Look Now, But Online Privacy May Be Gone for Good.)
We each have many digital identifiers: a Social Security number, credit card numbers, home and cellphone numbers, retailer bonus programs, insurance policies, auto registrations and driver’s licenses, etc. And, as facial recognition becomes more refined, these identifiers are increasingly being cross-referenced to build a demographic picture of us.
Two New York Times articles, "The Age of Big Data" by Steve Lohr and Charles Duhigg’s "How Companies Learn Your Secrets," take us inside the world of those trying to find out all about us. Lohr’s piece focuses on big data itself and the vast research efforts going on to be able to glean more quality information from it by such entities as the United Nations and major U.S. corporations. In the corporate world, just a little edge in this area could mean millions of dollars in sales, while government entities hope it can provide either better service or better surveillance.
The Duhigg piece, on the other hand, goes into one firm, Target, and its mission to find out everything possible about clients and potential clients - not only what they buy and/or return but where they live, how much money they make, when they shop and generally what's going on in their lives. New baby? New house? Child away to college? Once you are in Target's system, it will purchase demographic information to supplant what it already has. Duhigg writes "Target can buy data about your ethnicity, job history, the magazines you read, whether you’ve ever declared bankruptcy or gotten divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own." All of this information can be found somewhere and it is increasingly becoming a commodity that's bought and sold.
What can we do about this?
The obvious first step is to realize what you are doing when you sign up for a new service of any type. How much information do you have to give to obtain the service? Is it worth it?
The next thing to consider is whether your privacy really concerns you. Some see the loss of privacy as inevitable. Computer scientist/science fiction writer David Brin, in his 1999 book, "The Transparent Society:Will Technology Force Us To Choose Between Privacy And Freedom?," sees no end to cameras and computer monitoring. As such, he wants us to be able to monitor those monitoring us. By extension, we should also be aware of everyone who has our information and what they may do with it. (Find out more about this in What You Should Know About Your Privacy Online.)
For those who are concerned and wish to do what they can to slow or curtail the use of personal data practices, one thing we can do is put pressure on corporations that gather data about us to be totally transparent as to who has access to that data.
As we educate ourselves on these issues, we might also consider the following paragraph from Lohr’s "The Age of Big Data" article:
"A report last year by the McKinsey Global Institute, the research arm of the consulting firm, projected that the United States needs 140,000 to 190,000 more workers with "deep analytical" expertise and 1.5 million more data-literate managers, whether retrained or hired."
Clearly, the current data explosion presents both opportunities for new careers and discoveries, and pitfalls for users who are subject to data collection. But then, that's always the case with technology: Every major advance also brings changes to the world and the way we live.