Big data has become an essential part of decision making in business. It offers significant insight to companies and business leaders. But at the same time it raises many challenges which our traditional system cannot handle. Therefore, one must understand these challenges in detail before implementing big data in an organization.
As per McKinsey Global Institute (MGI): "Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze." So the big data challenges need to be addressed properly. After analyzing the big data, the value obtained can be summarized as:
- Better performance and variability
- Replacing man-made decisions with automated algorithms
- Segmenting customers
Let's start with big data's strategic challenges. Big data forces us to fight with three major strategic and operational challenges:
Big data helps organizations to leverage on growth of information sources. Data analysts need to make strategic decisions which will help to grow the company’s business.This involves three key aspects:
- Strategy — If a company is preparing for a transformation based on information, it becomes the job of the analyst to inform the business which investment would deliver the best return and maximum business.
- Governance — Managing data itself is a big task regardless of its size. Most organizations rely on their historical data as a reference to help guide their decisions.
- Acquiring the desired talent — Given the increasing demand for big data and data analysis, it can be tough to get a data scientist who can really help the organization to get the desired results.
The entire IT industry is under pressure, as it has to manage the increasing volume of data day by day to help improve business. Data analysis can be further categorized into three categories:
- Predictive analysis — It is the job of the data scientist to use real-time data for predictive analysis across various domains. It is also important during this data analysis to leverage new data types, such as emotional data, video stream data, image data, text data, etc.
- Behavioral analysis — Behavioral data is important for improving customer satisfaction. The job of the data scientist is to tap into data sets which are complex in nature to create new business models that help in cost reduction and promote innovation in order to improve customer satisfaction.
- Data interpretation — Data analysts must provide new business analysis information to management and integrate these for product innovation.
Enterprise Information Management (EIM)
Managing data at the enterprise level is certainly a challenge, especially when there are so many sources — including enterprise data centers, employee data, customer data, commercial and social media data — which all must be integrated and used to their full value. EIM can be categorized into:
- User expectation — It is important to meet the user expectation when accessing big data. It should be planned properly before access begins.
- Costs — It is important for the data analyst to provide access to big data in a rapid and cost-effective way in order to establish a better decision-making system.
- Tools — It is a data scientist's responsibility to identify the processes, tools and technologies which are required to support the big data analysis of any organization.
As "data" is the key word in big data, one must understand the challenges involved with the data itself in detail. Let's examine the challenges one by one.
- Volume — The larger the volume of data, the higher the risk and difficulty associated with it in terms of its management.
- Variety — Handling and managing different types of data, their formats and sources is a big challenge.
- Velocity — One of the major challenges is handling the flow of information as it is collected.
- Veracity — A data scientist must be prepared to handle data quality and data availability. One should be able cope with imprecision, uncertainty, missing values, misstatements or untruths, and should be able to answer:
- How good is the data?
- How broad is the coverage?
- How fine is the sampling resolution?
- How timely are the readings?
- How well understood are the sampling biases?
- Is there any usable data available at all?
- Scalability — Scalability is an important factor for any application. According to Shilpa Lawande, VP of Engineering at analytics platform provider Vertica: "techniques like social graph analysis, for instance leveraging the influencers in a social network to create better user experience are hard problems to solve at scale. All of these problems combined create a perfect storm of challenges and opportunities to create faster, cheaper and better solutions for big data analytics than traditional approaches can solve." So scalability must be addressed while implementing big data strategy.
- Data discovery — Data discovery is a huge challenge. It can be really tough to extract high-quality data from the vast collections of data out there on the Web.
- Quality and relevance — Determining the quality of data sets and their relevance to a particular issue is big challenge.
- Data comprehensiveness — Data scientists must take care to ensure that all the relevant areas are covered while sampling, and can’t avoid the challenge of data comprehensiveness. At the same time, if the data isn’t complete, it may not be possible to perform proper analysis.
- Personally identifiable information — The challenge is extracting enough information to help people without compromising their privacy.
- Data dogmatism — Big data analysis can offer quite remarkable insights, but one must not rely too much on the numbers. Domain experts and common sense must be consulted from time to time.
Processing data is a major challenge, as it can take significant exploration to finalize the right model which will help to properly analyze the data. The key challenges include:
- Data capturing
- Aligning data from different sources
- Transforming the data into a form suitable for analysis
- Modeling the data with the help of mathematics and/or simulations
- Understanding the output and being able to explain it to end users
One major data management challenge is ensuring security, data privacy, governance and ethical standards. While dealing with customer data, one must abide by its intended usage and relevant rules. Tracking of data is important in terms of its use, transformation, derivation as well as managing its life cycle. The data must be secured and access controlled. At the same time, audits must be carried out at regular intervals to ensure data security, as most data warehouses store personal data, which could lead to potential legal and ethical concerns.
We have discussed different big data challenges and their impact on business. These challenges occur at all levels of implementation. So before implementing big data in any organization, one must address these challenges and plan for them.