When Will Data Marketplace Vendors Realize Their Potential?
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Tamr Head of Corporate Development Matt Holzapfel asks and answers the question: “When will data marketplace vendors realize their potential?”
Data marketplaces are becoming a standard feature of cloud data platforms. It’s not hard to see why. They offer access to thousands of data sources and can make acquiring them as easy as buying toothpaste on Amazon. It should be the dream of every CDO. Instead, these marketplaces can feel overwhelming to browse, underwhelming in the data they have available, and create yet another data silo once the data is acquired.
Despite these early challenges, there’s reason to be optimistic. Data providers are becoming increasingly receptive to publishing more than just sample datasets for lead generation purposes, and are now willing to deliver their core data products through marketplaces. Further, advances in areas like machine learning, and the maturation of tooling built for big data, are making it easier to turn disparate, disconnected data sources into business insights.
Many of our most successful customers at Tamr – from Fortune 100 brands to top-tier investment managers — use external data to help automate their data operations and power cutting-edge analytical applications. They’ve realized that the context that comes from external data is often required to make their internal data useful. This creates a compelling business case for external data since it has a multiplier effect on the value of internal data. It also explains why people are so excited about the potential value of data marketplaces.
So when will these marketplaces be useful enough to make external data ubiquitous? We’ve talked to 100+ data & analytics leaders who use external data in some form today, and identified four key dependencies for marketplaces.
Pricing and Contract Terms Need to Become Transparent
Two main reasons for cloud computing’s broad adoption are transparent pricing and an easy buying process, enabled by simple contracts. The marketplaces within them need to be equally transparent and easy.
This won’t happen overnight – many data providers aggregate data from other providers, leading to an opaque pricing structure and complex terms & conditions. But as we’ve seen with the ‘consumerization’ of enterprise software, customers place a premium on simplicity. The more customer dollars shift towards vendors that offer a simple, transparent buying process, the more it becomes the standard.
Master Data Must Move to the Cloud
External data on its own is not very useful. Sure, it’s nice to know how markets are trending and it can be useful to research individual companies or locations for insights. But as standalone information, it’s hard to know what’s signal versus noise, and almost impossible to get to something actionable.
People rarely analyze external data without any context. An investment analyst knows what kind of companies might be interesting to their team. A supply chain analyst knows what categories of suppliers their products depend on.
For external data to deliver the most value, this context must be broadly understood. This is done by making master data available in the same location as the data marketplace. In other words, information about customers, suppliers, portfolio companies, prospects, leads, and other key entities must live near the external data so the two can be integrated and represented in downstream analytics. This would also make it possible for marketplaces to start recommending data sources based on where overlaps exist between your data and that of external providers.
The scalability of cloud warehouses and data lakes has made it so that storing and running computations on all of this data in one place is no longer a challenge, opening a new world of possibilities.
Marketplaces and Data Providers Should Shift from Tables to Attributes
No supply chain professional wakes up and thinks, “I need a table of company financials to understand which suppliers are at risk as interest rates rise”. Instead, they think, “which of my suppliers have the most debt relative to cash?” The basis for most analytic questions follows a similar structure, and involves an entity (e.g., supplier) plus a small number of attributes about that entity (e.g., debt, cash).
Currently, data marketplaces are structured to help you find various sources of company financial information. This is a fine enough starting point, but it’s primarily a relic of legacy ways of aggregating and selling data. Data marketplaces need to shift away from this and towards an attribute-centric model that helps users find the best possible source of data for the attribute(s) they care most about.
Anyone who has purchased external data knows the feeling that comes from buying a data source and then later finding out that the provider has poor coverage for the attribute you care about. Data marketplaces are well-positioned to solve this problem because of their deep visibility into the contents of individual sources and should work to help users find the attributes they need.
Automated Data Integration is a “Must Have”
One of the biggest mistakes we see companies make with external data is they wait too long to think about how they are going to integrate all of it back into their master data. People often assume that the data providers will be able to match the data effectively for them, or their internal data is clean enough to join onto external data assets. It’s rare that either of these is true.
Data providers typically assume there are clean identifiers, such as a domain name or ticker symbol, to join on. When this is not the case, which happens quite often, a decision must be made between kicking off a large data quality and entity resolution project or living with the fact that only a small portion of their data can be enriched.
It’s worth investing early in the capabilities required to integrate many disparate external data sources. This includes things such as automated data cleaning, validation, and entity resolution (also known as ‘data mastering’). Machine learning, as well as advances in data cleaning techniques, has made this process highly accurate and efficient. This can deliver significant ROI from enabling you to integrate 20-50 percent more of your external data with internal data sources.
The importance of data marketplaces will certainly increase over time. If nothing else, the amount of investment being made here will ensure that innovation continues to happen. Changes in data provider behavior will accelerate the adoption of data marketplaces, but it’s also the responsibility of practitioners to ensure they are in a position to maximize the value of these marketplaces as they evolve.
Those who think ahead to a world where they’re onboarding a new external dataset every month, or more frequently, will be rewarded with new insights that help them race ahead of their competitors relying on a more limited set of information.