By Lakshmi Randall
Data preparation constitutes an iterative process of transforming data into a meaningful form to support analytics and decision making. Enterprises can choose from a variety of technologies and tools, used independently or in combination. Examples include data integration platforms that incorporate ETL or data virtualization, Excel or dedicated tools for data preparation, data discovery, and/or self-service BI.
Data preparation tools differ in the complexity of what they can support. For example, some tools are well suited for light-weight data preparation like adding new calculations, measures, or KPIs. Other technologies are better suited for more complex needs, such as blending data from high-volume or highly disparate data sources, complex transformations, building multiple semantics based on the business department and the nature of the analyses.
In a survey recently conducted by BARC, 49 percent of companies pointed to enhanced “…performance, agility and ﬂexibility in business departments” as the key business driver for data preparation in their organizations. Excluding traditional ETL technology, the objective of implementing business-user-oriented data preparation is to augment IT with business resources who are empowered to perform more of upstream data preparation. The increased agility and flexibility realized by business units would ideally contribute to a “tangible business impact and increased competitiveness through analytics”, a separate but seemingly related business driver identified by 47 percent of companies as the key driver for data preparation.
Why Data Preparation Must be a Collaborative Effort between IT and Business
Based on the results of BARC’s survey, relatively few companies have implemented a collaborative approach to data preparation. When asked about their approach, 70 percent reported incorporating data preparation in their mainstream data processing with only 25 percent indicating that the technology is employed primarily as a collaborative effort between business users and IT. One out of four reported data preparation largely concentrated within IT and, predictably, only 9 percent of them have business users executing it using formal control and standardization.
Instead of working autonomously, effective collaboration between business and IT is the best strategy for overcoming potential challenges to realizing the benefits of employing data preparation methods and tools. The following are typical challenges faced by enterprises when considering a data preparation initiative:
- Lack of expertise: When enterprises assign responsibilities to business users with inadequate skills, the outcome can be inconsistent views of data resulting in inaccurate interpretation and flawed decision making.
- Enforcement of data governance: Execution of data governance (including data quality) and privacy policies is critical to an enterprise, and generally may not be accomplished using a data preparation tool exclusively. This is because IT and business must collaboratively drive the enterprise’s data quality initiatives.
- Provisioning the data: Data provisioning requirements may vary among different business users; depending on their roles, users may require raw data, curated data, or pre-aggregated data.
- Data mashups: In the absence of collaboration between business and IT, data mashups against transactional systems might pose a problem resulting in failure of the bread-and-better application.
- IT bottleneck or enabler: The characterization of IT as a bottleneck in every organization is misconstrued; IT can be an enabler, enabling business to focus on decision making rather than data preparation.
- Limitations of data prep tools: Data preparation tools may scale for departmental use, but may not satisfy enterprise-wide needs for robust scale, security, and metadata.
Data Virtualization and Data Preparation
The data virtualization platform enables enterprise-scale data preparation with much-needed governance and metadata. It also sources data to stand-alone tools, and to data discovery and BI tools.
To achieve and maintain a competitive edge, an enterprise must find a way to expedite its business users’ access to reliable data, and to provide them with an agile, flexible analytical environment that will support timely decision making. IT departments often lack the necessary resources, flexibility, and efficiency to adequately support these business needs. Data virtualization overcomes these handicaps by enabling the agility and flexibility demanded by business users. Data virtualization is characterized by:
- Data abstraction: Hides data complexity for ease of data access by business users.
- Real-time information: Enables timely decision-making.
- Self-service data services: Supports information discovery and self-service.
- Centralized metadata, security & governance: Simplifies data security, privacy, and audit.
Data virtualization is essential for a successful self-service initiative. It provides a platform for self-service with guardrails, supporting both “data cowboys” but with controls on their actions as well as regular business users.
Lakshmi Randall is the Director of Product Marketing at Denodo. She has nearly 20 years of experience in Data Management and analytics domains comprising product marketing, research, and analysis of emerging and disruptive industry trends. You can follow Lakshmi on Twitter @LakshmiLj.