Part of:

10 Big Data Do’s and Don’ts

Why Trust Techopedia

Big data is a new and emerging domain for most companies. Making it work takes careful fine-tuning and use of best practices.

Big data is used and applied across multiple business domains as data analytics, artificial intelligence and machine learning continue to become part of the mainstream. Big data analytics can extract the real value out of this wealth of data, and this data can be structured, unstructured or semi-structured.

The emergence of social media has given rise to many new opportunities to collect data about customer behavior. Here are some examples:

  • Clickstream data comes from website interactions such as mouse clicks and webpage scrolling.
  • Social business sites are online communities of customers who are willing to share information about their buying behavior.
  • Sensors provide data about customers’ physical environments, such as temperature, humidity, and traffic patterns.

The insights gained from data analytics can help organizations in their decision making process. But the real benefit of big data is achieved only if it is managed in a proper way. Organizations can avoid becoming lost in the big data space by ensuring they identify the starting point with simple use cases and implement it to check the output quickly.

The first step before starting any big data initiative is proper planning. An organization must clearly know the purpose of the project. They should also identify what value they want to extract and how it is going to impact business decisions. The most promising area should be chosen to start with.

Here, in this article, we will explore some of the Do’s and Don’ts of big data initiatives.

1. Do know the purpose and the starting point

The purpose of data collection and identifying the starting point is very crucial for the success of any big data project. To start with, the objective should be to identify the most promising use cases for the business. It will help the organization to identify the components for those use cases.


After this, a proper planning should be done to apply Bigdata techniques to these uses cases and extract valuable insight for the business growth. The priority of execution should depend upon the factors like:

  • Cost of implementation.
  • Anticipated impact on the business.
  • Length of time required to launch.
  • Speed of implementation.

Organizations should always start with a simple and easy to implement application as a pilot project. (Read also: Why You Should Ditch Your Project Management Tool for a Work OS.)

2. Do evaluate data licenses properly

Data is the fuel for any big data and analytics projects. So, it is very important to protect your data from misuse. Proper licensing terms and conditions should be in place before granting data access to any vendor or third party user. The data license should clearly mention the following basic points. There will be lots of other critical parameters also in the license agreement.

  • Who is going to use the data?
  • What data will be accessible?
  • How the data will be used?

If there is any failure in licensing, the resulting data loss and misuse will have an undeniably negative impact on the business.

3. Do allow data democratization

Data democratization can be defined as a continuous process, where everyone in an organization is able to access the data. The people in an organization should be comfortable working with the data and expressing their opinion confidently.

Data democratization helps organizations to become more agile and take data-informed business decisions. This can be achieved by establishing a proper process. First, the data should be accessible to all the layers, irrespective of organizational structure. Second, a single source of truth (referred to as “the Golden Source”) should be established after validating the data. Third, everyone should be allowed to check the data and give their input. Fourth, the new ideas can be tested by taking calculated risks. If the new idea is successful, then the organizations can move forward, otherwise is can be considered a lesson learnt.

4. Do build a collaborative culture

In the game of big data, mutual collaboration among different departments and groups in an organization is very important. A big data initiative can only be successful when a proper organizational culture is built across all the layers, irrespective of their roles and responsibilities.

The management of an organization should have a clear vision for the future and they must encourage new ideas. All the employees and their departments should be allowed to find opportunities and build proof of concepts to validate it. There should not be any politics to blame and stop the game. It is always a learning process, which must be accepted equally for both the success and failure.

5. Do evaluate big data infrastructure

The infrastructure part of any big data project is equally important. The volume of data is measured in petabytes, which is processed to extract insight. Because of this, both the storage and the processing infrastructure has to be evaluated properly.

Data centers are used for storage purposes so must be evaluated in terms of cost components, management, back-up, reliability, security, scalability and many other factors. (Read also: 6 Key Public Cloud Risks.)

Similarly, the processing of big data and the related technology infrastructure has to be checked carefully, before finalizing the deal. Cloud services are generally very flexible in terms of usage and cost. Established cloud vendors include heavy hitters like AWS, Azure and GCP but there are many more on the market as well.

6. Don’t get lost in the sea of data

Good data governance is very important for the success of big data projects. A proper data collection strategy should be planned before implementation. In general, there is a common tendency to collect every piece of legacy data of a business. But, all this data may not be a good fit for current business scenarios. So it is important to identify the business use cases first and determine where the data will be applied.

Once the data strategy is well defined and directly connects to the target business application, the next step of implementation may be planned. After this new data can be augmented to improve the model and its efficiency.

7. Don’t forget about open source

The usefulness of the tech you are considering should be evaluated based on the size of the project and the organizational budget. Lots of open source platforms are available for free to run pilot projects. Small and mid-size organizations can explore those open source solutions to start their big data journey. So, the organizational focus should be on the output and the ROI.

Hadoop is an open-source software framework, that uses HDFS (the Hadoop Distributed File System) and MapReduce to analyze big data on clusters of the commodity hardware—that is, in a distributed computing environment. (Read: How can I use Hadoop to analyze big data?)

The Big Data movement has matured to the point where Hadoop has become the de-facto standard for processing big data. MapReduce is a programming model for distributing data and processing it in parallel across a cluster of computers using simple programming models. It was developed by Google for efficiently processing large amounts of data on large clusters of computers.

8. Don’t start without proper planning

It is a very dangerous trend to start all your big data projects in one go. This approach will likely only lead to partial success or total failure. Organizations should plan properly before starting their big data initiatives rather than going all in or taking a leap of faith. It is always recommended to start with a simple, small and measurable application.

Once the pilot is successful, then it can be implemented in large scale applications. It is key to take the time to develop a plan and to select the pilot project carefully.

9. Don’t neglect security

Data security is another important aspect of big data projects. In any big data scenario, petabytes of data are pulled from different source systems and then it is processed. The processed data is the input to the analytical model. The output of analytics is the valuable insight to the business. Once raw data has been refined, and meaningful information has been mined from that raw data, then the Confidentiality, Integrity and Availability (CIA) of that information becomes critical.

When the data has critical business information, it becomes valuable to the organization. So, this data must be secured from the external threats. Data security must be planned as a part of big data implementation life cycle. (Read also: Cloud Security: 5 Common Cyber-Risks.)

10. Don’t focus on isolated business units

In today’s complex business scenario, focussing on a single business unit is not going to help. Organizations should take a top level view of the business as a whole, and think in terms of global perspective. The best approach should be to take small steps at a time and keep a global view. The focus should be holistic in terms of business units. It will have positive impact and better ROI.


There is no specific success path for big data implementation. But, it is a combination of planning, strategy, approach and various other factors which leads to success.

Each organization has a specific goal to achieve, so the strategy should be planned accordingly, the pilot project must be chosen with care, and the resulting information must be protected and treated properly.


Related Reading

Related Terms

Kaushik Pal
Technology writer
Kaushik Pal
Technology writer

Kaushik is a technical architect and software consultant with over 23 years of experience in software analysis, development, architecture, design, testing and training. He has an interest in new technologies and areas of innovation. He focuses on web architecture, web technologies, Java/J2EE, open source software, WebRTC, big data and semantic technologies. He has demonstrated expertise in requirements analysis, architectural design and implementation, technical use cases and software development. His experience has covered various industries such as insurance, banking, airlines, shipping, document management and product development, etc. He has worked on a wide range of technologies ranging from large scale (IBM…