Part of:

What Technologies Can Counter Big Data Security Threats?

Why Trust Techopedia

Big data security should be considered seriously, and proper measures must be taken to prevent a potentially disastrous data breach.

Big data is one of the most lucrative opportunities ever presented to businesses. Enormous volumes of varied data offer insights into the consumer, which is pure gold for business. Every day, approximately 2.5 quintillion bytes of data are being created. Ninety percent of the data that exists today has been created in the last two years alone.

Corporations can use this data to provide highly customized products and services to customers. From a marketing perspective, this is a mutually beneficial scenario for the customer and the corporations; the customers enjoy tailored, better quality products and services while corporations increase their revenues and enjoy customer loyalty. But we also need to view this wildly compounding data from the perspective of security. It turns out that big data is also a hugely lucrative opportunity for cybercriminals. Corporations, especially bigger ones, maintain gigantic data sets, and hacking even one such data set can be hugely rewarding for cybercriminals. Successful attacks on data sets can be a big setback for large organizations. The Target data breach of late 2013 cost them more than $1.1 billion, and the PlayStation breach of 2011 cost Sony more than $171 million.

Big data protection is not the same as protection of traditional data. So, organizations need to quickly wake up to the need of facing the big data security threats head-on. Facing data breaches can be quite a different experience. Corporations need to first distinguish between the ways data is protected in both traditional and big data environments. Because big data security threats present an entirely different challenge, they need a different approach altogether.

Reasons Big Data Security Threats Should be Viewed Differently

The ways big data security is managed need a paradigm shift because big data is different from traditional data. In a sense, it is easier to protect traditional data because of its nature and because attackers are currently more focused on big data. Big data is rather complex and large in volume, so its security management requires a multi-faceted strategy which constantly needs the ability to evolve. Big data security is still at its nascent stage. Here are a few reasons big data security should be managed differently.

Multiple Data Sources

Big data in an organization typically contains data from different sources. Each data source may have its own access policies and security restrictions. Therefore, organizations struggle to have a consistent and balanced security policy across all data sources. Organizations also have to aggregate data and extract its meaning. For example, big data in an organization may contain a data set with personal identification information, research information and regulatory compliance. What security policy should be used if a data scientist tries to correlate one data set with another? Additionally, since big data environments collect data from multiple sources, it provides a bigger target for attackers.

Infrastructure Challenges

Big data environments are typically distributed, and that creates a big challenge. Distributed environments are more complex and vulnerable to attacks compared to a single high-end database server. When big data environments are spread across geographies, there needs to be a single, consistent security and configuration policy, but that is much easier said than done. When there are a large number of servers, there is a possibility that the configurations across servers may not be consistent. This can leave the system vulnerable.


Technology Not Secure

Big data programming tools such as Hadoop and NoSQL databases were not designed with big data security in mind. For example, NoSQL databases, unlike traditional databases, do not provide role-based access control. This may make unauthorized attempts to access data a bit easier. Hadoop originally did not authenticate its users or servers and did not encrypt the data that is transmitted between the nodes in a data environment. Obviously, this could turn into a massive security vulnerability. Corporations love NoSQL because it allows new data types to be added on the fly and it is viewed as a flexible data analysis tool, but it is not easy to define security policies with either Hadoop or NoSQL.

Big Data Security Strategies

You must keep in mind that the security strategies for big data should be constantly evolving because the nature and intensity of threats will change, for the worse. Still, there are certain basic measures that you can take.

Security for Application Software

As mentioned earlier, big data software tools were not originally designed with security in mind. Therefore, you should use secure versions of open-source software. Examples of secure applications are open-source technologies like the 20.20x version of Hadoop or Apache Accumulo. You can also obtain application layer security with the help of technologies like DataStax Enterprise and Cloudera Sentry. Accumulo and Sentry both provide role-based access control features for the NoSQL database.

Tracking and Monitoring Accounts

Organizations must have robust big data account policies. Such policies should, to start with, require users to have strong passwords and change passwords often. Inactive accounts should be deactivated after a specified time period and there should be a specified limit of failed attempts at accessing an account, after which the account will be blocked. It is important to note that attacks may not always come from outside; account monitoring will help reduce the possibility of attacks from inside the organization.

Secure Hardware and Software Configurations

The big data architecture in your organization must feature secure images for all servers. Patches should be uniformly and consistently applied to all servers. Administrative privileges should be given to a limited number of people. To automate system configuration and ensure that all big data servers in the enterprise are uniformly secure, you may use automation frameworks such as Puppet.

Monitor and Analyze Audit Logs

It is extremely important to understand and monitor big data clusters. To do that, you need to implement audit logging technologies. Big data clusters need to be analyzed and the logs need to be carefully and regularly examined.

Protect Data

Data needs an all-around protection strategy. You need to identify sensitive data that requires encryption and integrity controls. After that, deploy approved encryption software for all hard drives and systems that hold sensitive data. Conduct regular review of the security practices followed by the cloud provider. You should also deploy automated tools on all network perimeters so that it is possible to monitor confidential information such as keywords and personally identifiable information. This way, you will be able to identify unauthorized attempts to access data. Have automated scans run periodically on all the servers to make sure that all information is present in clear text.

Respond to Incidents Quickly and Appropriately

Even the best defense can sometimes be breached, so you must have an incident response policy in place. Incident responses must be documented and should be easily accessible to relevant people. The policy should clearly define incidents and their seriousness levels and specify personnel to deal with each level. The incident response policy should be made available to all employees, and each employee should be responsible for immediately reporting any incident that falls under the purview of the policy. In fact, it is a good idea to formally train all employees on the incident response policy. The policy should be periodically reviewed and updated.


Big data security should be considered seriously, and proper measures must be taken to prevent a potentially disastrous data breach. Big data can mean big opportunities, but at the same time the security challenges should be handled with efficient tools and policies. These tools help protect the data as well as the applications, giving you peace of mind.


Related Reading

Related Terms

Kaushik Pal
Technology writer
Kaushik Pal
Technology writer

Kaushik is a technical architect and software consultant with over 23 years of experience in software analysis, development, architecture, design, testing and training. He has an interest in new technologies and areas of innovation. He focuses on web architecture, web technologies, Java/J2EE, open source software, WebRTC, big data and semantic technologies. He has demonstrated expertise in requirements analysis, architectural design and implementation, technical use cases and software development. His experience has covered various industries such as insurance, banking, airlines, shipping, document management and product development, etc. He has worked on a wide range of technologies ranging from large scale (IBM…