How Dark Data Can Impact the Big Data World

Why Trust Techopedia

Dark data is data the never sees the light of day, but this long-ignored data could be of use to organizations.

There are two ways to view the impact of dark data in the world of big data:

  1. As the opportunities hidden in big data
  2. As the risks dark data poses

Almost all companies store dark data for varying time lengths, without any analysis. While they do so, they lose the opportunity to gain the insights the unanalyzed data could have revealed. There are also several risks in storing dark data for such a long time such as legal, financial, reputational and loss of competitive advantages. Companies need to utilize their dark data repository better, not only to improve business-wise, but also to minimize risks.

What Is Dark Data?

Almost every company collects huge volumes of data with an intention to gain more insights into things such as customer behavior, software development processes, meeting times and productivity as well as website usability. These insights help the companies respond to deliver improved products and services. However, it may be surprising that a large percentage of the data lies unused for long periods of time. Companies just store it without performing any analysis. This category of data is known as dark data, and the size of this category is enormous. IDC estimates that 90% of the total data generated is dark data — that is a significant observation. Gartner defines dark data as,

“[T]he information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

What kind of data are left unanalyzed? The following categories of data have been found to qualify for the dark data category:

  • Raw survey inputs
  • Customer data
  • Previous employee data
  • Financial statements
  • Email conversations
  • Chat transcripts
  • Call center transcripts
  • Account data

Difference Between Big Data and Dark Data

Dark data is a subset of big data. So, there are two parts of big data collected: analyzed and unanalyzed. The unanalyzed data is dark data. Interestingly, unanalyzed data constitutes the biggest portion of big data.

Reasons Companies Build Up Dark Data Stock

The list of data types given above could potentially provide a lot of value to a company. Still, it is surprising that they lie unattended. There are a number of reasons for this, but the most important seems to be the lack of investment. Given below are a few reasons the dark data stock is building up.


Lack of Investment

Companies seem reluctant to make the investment required to analyze and process dark data. It may take investment in new technology, hiring skilled data analysts and investing significant time.

No Priorities Assigned

Companies tend to focus only on data that is required to complete a specific task at hand and ignore the potential of the data to do more things. For example, online credit card applications may be analyzed only for applicant data, financial profile and related information. But companies tend to ignore the fact that they could also analyze website usability data, for example.

Isolated Data Processing Policies

Departments in companies may be practicing their own data collection, storage and processing policies that have no similarity or coordination with those of other departments. This might result in issues such as losing data, which is relevant to other departments and security issues. Since there are silos, the organization is losing out on sound data policies.

Technology Limitations

This reason is related to the lack of investment. If the data collection is performed by technologies that do not interact with one another, this prevents the organization from creating a comprehensive data policy. Many organizations with backdated technologies are struggling to integrate the data collected from different sources such as call center chat transcripts, website click data, and video conference data. To process and integrate different formats, you need appropriate technology.

Dark Data Potential

It does not take a genius to understand that if 90% of big data is dark data, it is potentially a land of undiscovered, neglected opportunities. As the reasons above point out, companies are not utilizing dark data because they offer little value, but because of the companies’ own limitations. So, it is established that dark data has a lot of potential. Let us try to understand this potential with the help of the manufacturing sector.

According to a Frost & Sullivan study, “the Internet of things, Internet of services, big data and integrated industry will leave a decisive impact across all sections of the manufacturing value chain.” The manufacturing sector gets valuable data from the following:

Predict Demand and Solve Issues

By accurately analyzing customer clickstream data and getting product telematics, companies can accurately forecast demand and respond appropriately by optimizing the supply of goods. Companies can also solve issues by isolating them with the help of dark data generated by sensors and telematics.

Build a Smarter Supply Chain

To accurately know the time and volume of demand and respond to the requirements appropriately, companies need a smart and robust supply chain. One way of having that is having granular information of the individual components of the supply chain. Granular information enables companies to achieve quality as well as just-in-time delivery. And only dark data can provide granular information about the supply chain.

Improving Product Quality With Customer Feedback

In these changing times, a customer is no longer someone who just consumes the products. In a sense, a customer is a brand ambassador who can promote the product through word of mouth, referrals and social media. It is extremely important for the product management, design and engineering teams to leverage customer feedback and improve product quality. Dark data can help manufacturing companies by providing a 360-degree view of the product and how it is viewed in the market. So what can the company do?

  • Have a well-designed analytics framework that leverages dark data and provides access to the framework to all stakeholders.
  • Reduce unplanned, unforeseen product development downtime with the help of sensor data and telematics which can anticipate malfunctions or product failures.
  • Integrate telematics with social media so that customer feedback can be captured in real time and the data is transmitted to the concerned department.
  • Use data to improve product features in an agile way.


The potential of dark data is undoubted. But companies also need to keep in mind the risks associated with indefinite storage and poor handling of dark data. Dark data can contain sensitive information and any inadvertent or deliberate leakage of information could mean trouble. Companies need to have good data tagging and structuring technologies so that data is identified and categorized. This is necessary even if they do not intend to analyze it for their business. Otherwise, financial, regulatory, loss of competitive advantage and legal troubles could soon follow.


Related Reading

Related Terms

Kaushik Pal
Technology writer
Kaushik Pal
Technology writer

Kaushik is a technical architect and software consultant with over 23 years of experience in software analysis, development, architecture, design, testing and training. He has an interest in new technologies and areas of innovation. He focuses on web architecture, web technologies, Java/J2EE, open source software, WebRTC, big data and semantic technologies. He has demonstrated expertise in requirements analysis, architectural design and implementation, technical use cases and software development. His experience has covered various industries such as insurance, banking, airlines, shipping, document management and product development, etc. He has worked on a wide range of technologies ranging from large scale (IBM…