5 Things You Need to Know About Data Virtualization
By Robert Eve
Data Virtualization is software used by enterprises to reduce their analytic data bottlenecks by essentially mapping to multiple data sources through a virtual data layer that provides a unified, rationalized view of all their information. That may sound like a mouthful, but the motive is really straightforward. Functionally, data virtualization products are used to build, run, and manage IT-curated data services far faster than traditional warehousing and extract, transform, load (ETL) approaches — and with far fewer resources. The process allows an analytic application to access and use data quickly without requiring technical details about the data, such as how it is formatted or where it is physically located.
1. How it works
Companies generally start by installing virtualization middleware that fits with their existing infrastructure to access data from distributed sources including traditional enterprise, big data, cloud, and IoT. Data engineering staff then use rich data analysis, design, and development tools to create the desired IT-curated data services. Thus, when you run a report or refresh a dashboard, data virtualization’s query engine accesses the data sources in real time, makes the all needed transformations, and the quickly delivers the exact data requested.
2. Known benefits
Data virtualization provides a fast and economical way to integrate data reliably and evolve rapidly when requirements change. Virtualized data can be used across myriad analytics, self-service, business intelligence, and transactional applications to support multiple lines of business, hundreds of projects, and thousands of users. It provides up-to-the-minute data as needed via advanced performance optimization algorithms. And data virtualization’s management, monitoring, security, and governance functions ensure organizations can meet service-level agreements for security, reliability, and scalability.
3. Where it’s used
There are a number of project-scale scenarios where data virtualization is the optimal choice.
- Where speed is required, data virtualization provides IT-grade datasets with rapid time-to-solution (hours instead of weeks).
- For multiple services, data virtualization supports multiple-project data services and analytic projects drawn from myriad data sources through shared data services that save development time and expense. It improves data quality by sharing frequently used data, while avoiding unnecessary replication that can lead to inconsistencies.
- When business requirements are not firm, data virtualization provides IT with the ability to deliver and iterate a new dataset quickly. This agile approach lets IT service the business quickly, then fine tune the engineering later (even converting to ETL if appropriate).
- Where up-to-the-minute data is required, data virtualization lets you serve remarkably fresh data, unlike the batch approaches of ETL where you may have to settle for yesterday’s data.
- When data consolidation is inappropriate (data outside the warehouse, data outside the firewall, data too large to integrate physically), data virtualization saves the day by providing access to data where it lies.
4. Where it’s not
Data virtualization is not the answer to every analytics data requirement and its efficacy can be hindered by the complexity of required data cleansing or transformation in the virtual layer. It’s also not applicable where there’s a need to build history. Sometimes data consolidation in a warehouse or mart along with ETL or ELT is a better solution for a particular use case, and sometimes a hybrid is the best bet.
5. What’s the point?
Data virtualization presents a compelling business case. It enables up-to-the-minute business insights that help manage business risk and reduce compliance penalties. Data-dependent projects can be completed faster so business benefits are derived sooner (lower project costs are an additional time-to-solution benefit). Data virtualization improves utilization of existing server and storage investments, thus optimizing existing technology. And with less data replication required, hardware and governance savings are substantial.
As enterprises moves into an increasingly digital and data-dependent world, data virtualization can help organizations adapt with fast and flexible access to data from a variety of sources atop standard IT infrastructure. That’s a good thing to know.
Robert Eve is Senior Director at TIBCO Software. With over 30 years of data and analytics experience, his specialties include data virtualization, IT governance, and go-to-market strategy. Robert has an MS in Management from MIT and a BS in Business Administration with Honors from University of California, Berkeley.