Why the Data Virtualization Market is Growing
Data virtualization is a method of managing data from various sources that can be very beneficial to enterprise.
The data virtualization industry has been growing fast, a trend that experts think is going to last. As data becomes one of the most important assets in business, corporations are looking for ways to get the most out of it. The task, as you might have already guessed, is not easy, with several hurdles along the way. Businesses have to manage growing volumes of structured and unstructured data from different sources. As if that task was not difficult enough, different RDBMSs do not talk to one another. Add to that the growing needs for business intelligence (BI), analytics and reporting, and the plate for IT departments in organizations is already overflowing.
Data virtualization appears to be the solution for these problems because it decouples data from applications and places the data in middleware. Data virtualization also potentially provides a unified view of data from disparate sources in a format that BI or business users want. But putting data into middleware is easier said than done. From the perspective of the IT department, implementing data virtualization has been a big challenge. Fortunately, firms like Oracle, Red Hat, IBM and Microsoft have been working on providing high quality data virtualization tools.
|Download a free trial of Turbonomic Operations Manager|
What Is Data Virtualization?
Data has been becoming increasingly important from the perspective of good business decisions. Companies want a comprehensive and unified view of the data collected from different sources. To do that, data integration is necessary. However, the challenge of managing data has been becoming bigger and more complex, mainly because of the following reasons:
- The volume of data has been growing, especially since the arrival of the concept of big data.
- Companies now have to deal with both structured and unstructured data. Managing unstructured data puts a lot of strain on company resources.
- Companies use different database management systems (DBMSs), such as Oracle and SQL Server which were not designed to be compatible with each other.
- Companies are under legal compulsion to retain data because of data-retention regulations like the Sarbanes-Oxley Act. This has resulted in an unprecedented rise in the amount of data they have to store.
- BI or business users now need self-service analytics for making better, more informed decisions or strategies. They need a unified view of all data. It is a huge technical challenge to bring quality data together to offer a unified view.
According to Noel Yuhanna, an IT analyst with Cambridge, Mass.-based Forrester Research Inc., “Data integration is getting harder all the time, and we believe [one of the causes] of that is that data volumes are continuing to grow, you really need data integration because it represents value to the business, to the consumers and to the partners. They want quality data to be able to make better business decisions."
Data virtualization potentially addresses such problems by decoupling data from different applications and then places the data in middleware. Since data resides in the middleware, dependency on DBMSs reduces. Data virtualization tools do not place the actual data in the middleware but only maps to the actual location. Data virtualization is also capable of providing a unified view of the data collected from different sources, and this capability is going to get stronger as firms offer more powerful data virtualization tools.
From the perspective of the user, there is no need to know the technical details of the data in the middleware such as formatting and location; the user just needs to think about the data itself.
This case study describes how data virtualization solved a business problem faced by Pfizer, Inc., the largest drug manufacturer in the world, which develops, manufactures and markets medicines for both humans and animals.
The Worldwide Pharmaceutical Sciences division in Pfizer determines which drugs are going to be introduced to the market. Obviously, that is an extremely important role. However, the Worldwide Pharmaceutical Sciences division was also constrained by technological limitations. As part of its day-to-day operations, different stakeholders would want to view data that resided in multiple applications. The data integration request would be carried out by the traditional extract, transform and load (ETL) process, and that is where the problems started.
There were two types of problems, basically:
- The ETL process was slow and inflexible, and
- The applications hosting the data did not talk to each other.
Another problem was the inability to add new data sources or applications that would host data. As a result, a process that was inherently slow struggled to deliver a unified view of data, collected from different applications. That resulted in project slowdown, cost escalation and wasted investments.
Pfizer selected a data virtualization tool from a provider and over time, reaped benefits. The benefits were:
- The tool did not access the data sources to cater to data integration requests. Instead, it would store a view of the data in a middleware or a cache. So, the speed of data integration request fulfillment increased.
- Unforeseen events such as server crashes did not become showstoppers because in such events, users could still use views of the data stored in the memory.
- The data virtualization platform supported additions of multiple, different data sources such as cloud-based CRM systems and business intelligence (BI) tools.
- Since the data would be stored in middleware or in the memory and not accessed from the hosts, the platform could offer unified views of the data customized to user preferences.
Implications of the Rise of Data Virtualization
Many believe that the rise of data virtualization could diminish the importance of ETL processes significantly. Data from certain industries substantiate such views. For example, Novartis, The Phone House and Pfizer have already turned to data virtualization. Companies that deal in huge data volumes and have legacy data sources are especially investing in data virtualization. Data virtualization offers clear advantages when it comes to offering unified, real-time views of data. Companies need agile, quick fulfillment of data integration of data requests, which is extremely difficult with ETL.
However, there is another group of people who believe that it is not quite the end for ETL. According to Mark Beyer, research vice president for information management at Gartner Inc., "The EDW is not going away – in fact, the enterprise data warehouse itself was always a vision and never a fact, now the vision of the EDW is evolving to include all the information assets in the organization. It’s changing from a repository strategy into an information services platform strategy.”
It is undeniable that data virtualization is on the rise and the glory of ETL is fading, even if only slightly. However, there are still a number of hurdles along the way to seamless adoption of the data virtualization platform. IT departments are finding it technically difficult to create maps of data from the data sources and placing them in the middleware. Also, from a technical viewpoint, creating unified and customized views from several different data sources for different customers is an extremely challenging task. Such challenges need to be acknowledged and proper planning needs to be done.
This content is brought to you by our partner, Turbonomic.
More from Turbonomic
- Why would companies invest in decision automation?
- What are some advantages of multi-cloud deployments?
- How does software-defined networking differ from virtual networking?
- How does dynamic allocation in the cloud save companies money?
- Why should companies be considering intent-based networking?
- Why is it important to manage a relational database system in the cloud?
- How can businesses innovate in managing data center bandwidth?
- What are some best practices for cloud encryption?
- How does visibility help with the uncertainty of handing data to a cloud provider?
- How can companies maintain application availability standards?
- Why do cloud providers seek FEDRamp certification?
- How might a team make an app "cloud-ready"?
- Why does loosely coupled architecture help to scale some types of systems?
- How might companies deal with hardware dependencies while moving toward a virtualization model?
- Why does virtualization speed up server deployment?
- What is the virtualization "backlash" and why is it important?
- Why could a "resource hog" make virtualization difficult?
- How might a company utilize a virtualization resource summary?
- Why do undersized VMs lead to latency and other problems?
- What are some of the positives of a demand-driven migration model?
- Why should cloud services offer both elasticity and scalability?
- What are some of the values of real-time hybrid cloud monitoring?
- Why might a company assess right-sizing on-premises versus in the cloud?
- How can companies deal with “dynamic unpredictability?”
- What are some basic ideas for optimizing hybrid cloud?
- Why do some companies choose Azure or AWS over open-source technologies like OpenStack?
- What are some advantages and drawbacks of stateless applications?
- Why is it important to look at the "full stack" in virtualization?
- How does automation help individual system operators?
- How do companies develop a "data center BMI"?
- How can companies tally up cloud costs for multi-cloud or complex cloud systems?
- Why is a good HTML5 interface important for a business project?
- How do companies work toward composable infrastructure?
- How can a manager use a workload chart?
- How can companies work to achieve a desired state?
- How can companies cultivate a better approach to “object-based” network changes?
- Why do naming conventions for virtual machines help with IT organization?
- Why is reserve capacity important in systems?
- What are some values of cloud-native architecture?
- Why is it important to match uptime to infrastructure?
- What's commonly involved in site reliability engineering?
- What are some important considerations for implementing PaaS?
- What are some challenges with handling an architecture's storage layers?
- What are some of the benefits of software-defined storage?
- What are some things that rightsizing virtual environments can do for a business?
- What are some benefits of continuous real-time placement of user workloads?
- How can stakeholders use the three key operations phases of autonomic hyperconvergent management?
- Why would managers suspend VMs when VDI instances are not in use?
- Why would managers differentiate storage for I/O-intensive workloads?
- Why would companies assess quality of service for VMs?
- What's the utility of a cluster capacity dashboard?
- How can companies use raw device mapping?
- Why might someone use an N+1 approach for a cluster?
- How do companies balance security, cost, scalability and data access for cloud services?
- How do companies battle application sprawl?
- What are some benefits of self-driving data centers?
- What are some concerns companies might have with a "lift and shift" cloud approach?
- What is involved in choosing the right EC2 instances for AWS?
- What are some benefits of workload abstraction?
- What are some challenges of scaling in OpenStack?
- How do companies use Kubernetes?
- What methods do companies use to improve app performance in cloud models?
- How do businesses use virtualization health charts?
- What is the difference between convergence, hyperconvergence and superconvergence in cloud computing?
- What are some of the business limitations of the public cloud?
- What is the difference between deploying containers inside a VM vs directly on bare metal?
- What are the benefits of converged infrastructure in cloud computing?
- How is containerization different from virtualization?