Individual data warehouse projects need to be assessed on a case-by-case basis. Generally, in trying to stretch an existing data warehouse design to better handle big data analytics, there's a core process for figuring out what needs to be done. IT professionals can call this "scaling up" or "scaling out."
Webinar: Big Iron, Meet Big Data: Liberating Mainframe Data with Hadoop & Spark Register here |
Scaling up generally involves looking at getting sufficient processing power, getting a sufficient amount of memory, and accommodating more powerful server activities to handle all of the larger data sets that the business will process. By contrast, scaling out can mean collecting clusters of server hardware and networking them together to corral big data.
Some IT experts have suggested that the more common method with Apache Hadoop and other popular big data tools and platforms is to scale out and cluster hardware to achieve the desired effects. However, others point out that with today's technology, a data warehouse can scale up using a procurement strategy that adds resources to a server, such as by getting a higher number of processing cores along with a larger amount of RAM.
Whether they scale up or scale out, data warehouses need additional physical hardware assets to be able to handle the larger data workloads. They also need additional human administration, which means more training for internal teams. A lot of planning needs to go into the project to determine what kind of stress and pressure the larger data workloads will have on an existing legacy system in order to outfit it for a new big data ecosystem. One big issue is storage bottlenecks, which require upgrades to storage centers, and other kinds of performance bottlenecks that may hobble a nascent system if not they're not addressed.