What are some of the key issues to consider in a big data storage strategy?


One of the biggest issues that is ignored for big data storage is accessibility for teams that need it. Data is regularly stored with no documentation, in places where it is hard to access or where the relevant teams are oblivious to the fact that it exists at all. Ultimately, big data storage should take an open first strategy where teams are made aware of its existence, what the data consists of and how to access it such that teams can make usage of it in the software if they need it.

Another critical issue that I find is the quality of data that is being stored. Data should be stored in the highest quality form that it can exist in at its final storage place. Storing low quality data in a data lake is usually fine, but as it continues down the data pipeline each stage should increase the quality of the data such that its stored in the highest quality form in a system like a data warehouse or analytics database. This will increase the quality of the systems that consume the resting place of the data.

Related Terms

Andrew Wolfe

Andrew Wolfe has an M.S. in Computer Science from Georgia Tech University and more than 10 years of experience working in the software industry for well-known companies such as Diebold, Tableau, Explorys and Onshift. After years in the corporate and startup worlds as well as running his own consulting firm, Andrew realized he had to do more to improve software products and practices. From that, Skiplist was born. Skiplist is the opportunity to focus on thoughtful, quality software and change the software consulting industry.