One of the biggest issues that is ignored for big data storage is accessibility for teams that need it. Data is regularly stored with no documentation, in places where it is hard to access or where the relevant teams are oblivious to the fact that it exists at all. Ultimately, big data storage should take an open first strategy where teams are made aware of its existence, what the data consists of and how to access it such that teams can make usage of it in the software if they need it.
Another critical issue that I find is the quality of data that is being stored. Data should be stored in the highest quality form that it can exist in at its final storage place. Storing low quality data in a data lake is usually fine, but as it continues down the data pipeline each stage should increase the quality of the data such that its stored in the highest quality form in a system like a data warehouse or analytics database. This will increase the quality of the systems that consume the resting place of the data.