Keeping Up With the Data Explosion by Virtualizing Storage

According to an IDC study the amount of data is growing at 46% per year, while Gartner reports that beginning in 2015 spending on data center systems will increase by an average 1.8 percent for the next four years. When these two reports are taken together they imply that CTOs and CIOs are expected to store more data at a lower cost. In fact, if we factor in inflation, data storage budgets are shrinking. The situation is further complicated by the demands of the current data-driven environment where we expect instantaneous access to information on demand from varied locations at any time. It is a hard, but not impossible, nut to crack since storage virtualization can increase throughput, reduce operating costs, and improve scalability of IT systems per terabyte of data stored.

While storage virtualization is not new technology, it is not as widely adapted as desktop or server (application) virtualization. This is surprising since returns on investment on applications and infrastructure are not being fully realized, according to research by IBM, if storage is not virtualized. Virtualized storage provides stable, uniform and reliable access to data, even as the underlying hardware changes as storage media is increased, removed or fails. This is possible since storage virtualization automates data storage management, it enables expansion and updating of storage resources on the fly.

Virtualization operates as an intermediate layer and the primary interface between servers and storage. Servers see the virtualization layer as a single storage device, while all the individual storage devices see the virtualization layer as their only server. This makes it easy to group storage systems — even devices from different vendors — into tiers of storage.

This layer shields servers and applications from changes to the storage environment, letting users easily hot-swap a disk or a tape drive. Data-copying services are also managed at the virtualization layer. Services such as data replication, whether for snapshot or disaster recovery, can be handled entirely by the virtualization system, often in the background, from a common management interface. Because data can be moved at will, lightly used or outdated data can be easily moved to slower, less-expensive storage devices.

How Does Storage Virtualization Work?

Virtualization of storage is simple, at least in theory — it is the aggregation of different storage systems from various vendors into a single networked environment that can then be managed as a unified pool. But as with many tech concepts, implementation is not as easy as the explanation makes it sound. Currently there are three implementation paradigms:

In-fabric or host-based — This is the most common and oldest method which is implemented in FalconStor’s IPStor, NetApp’s V-Series, DataCore’s SAN Symphony, StoreAges’s SVM and IBM’s SAN Volume Controller. These products have dedicated appliances or software running on the virtualization server to discover and manage storage resources availing them to IT for utilization directives. The most popular products are from IBM and NetApp, which have 1,000-plus installed bases each.
Network-based — McData Corp., Cisco Systems, Qlogic Corp, Brocade Communications and Maxxan systems are big players in network-based virtualization. Theoretically putting virtualization functions in network components such as switches increases efficiency since data is moved one step less at the minimum than if it proceeds to go through another device before being stored.
Storage-device-based — The biggest player is the TagmaStore network controller made by Hitachi Data Systems. Storage-device-based virtualization embeds virtualization software into storage fabric (hard disks/RAID controllers/switches) allowing for more devices to be attached downstream. The attached devices are controlled via a storage controller, most commonly a dedicated hardware device which deals with storage pooling and metadata management. Depending on the implemented solution, the system can also handle replication and storage services.

Why Virtualize Your Storage?

Virtualized Storage Reduces Costs of Storing Data

The most common reason for storage virtualization is to reduce costs, which it performs pretty well. According to an analyst at Enterprise Services Group (ESG), storage virtualization can reduce costs associated with storage in several ways: software costs are reduced by up to 16%, hardware costs go down by approximately 23% and administration costs reduce by an average of 20%. For instance, at Baylor College of Medicine in Houston, Texas, the savings were even larger according to Mike Layton, director of IT. The savings are due to integration of servers, which reduces unused storage capacity while lowering administration costs.

Simplify Management of Data

Another advantage is simplified data management. For example, at Baylor College of Medicine, storage management was complex due to scores of file servers and ERP data stores attached to Linux, Unix and Windows machines. The heterogeneous environment made for inefficient management of data since every system needed backups, redundancy and room for growth. To improve efficiency, Layton and his team at the college chose to implement virtual storage. The adoption of a single fiber channel storage and HDS arrays reduced complexity drastically, resulting in storage that is managed from a single interface.

Increased Throughput

At Dallas-Fort Worth International Airport, mission-critical data such as plane arrival times, gate information, passenger lists and baggage tracking was stored in two storage area networks using Oracle’s Real Application Clusters (RACs). RAC treated one SAN as the primary target then replicated the data to the secondary system, however, the process took so long that the two systems were perpetually out of sync. Since virtualizing storage, according John Parrish, associate VP of terminal technology, synchronization and latency issues have been resolved.

Similar issues arise when implementing mirroring of heavily used transactional databases. Most database management systems implement locks on transactional databases, which renders mirrors minutes if not hours behind the active database. Storage virtualization tricks the DBMS into thinking that it is writing and reading from a single database, allowing for real-time replication.

Pitfalls of Storage Virtualization

There is bias against virtualizing storage, mostly based on experiences of early adopters when many solutions were buggy and implementations failed. The technology has since then matured, but it is still based on proprietary and incompatible devices, which make switching platforms difficult. After virtualizing, switching providers is difficult and therefore due diligence and comprehensive analysis of possible solutions including factoring in of long-term needs is required.

A persistent myth of performance hits associated with virtualization hinder adoption of virtualization for some organizations. However, as we have seen from the examples above, virtualization can enhance system throughput. By caching data used by high-performance and real-time applications while routing infrequently used data to slower storage devices, correctly implemented storage virtualization improves performance when compared to non-virtualized storage.

Summarizing Storage Virtualization

Storage virtualization is the unification of diverse storage media into a common, centrally managed pool achieved via software or hardware. It can redress performance bottlenecks in data centers while reducing storage expenses. This is achieved by redistribution of available storage resources more equitably and responsively. In addition, storage virtualization makes storage management easier, since heterogeneous storage media can be managed from a single unified interface. The eased management results in lower administration costs, which more than compensates for the possible drawbacks of storage virtualization.