Ad Image

In Search Of AI-Ready Data: 6 Steps to Objectively Measuring Data’s AI-Readiness

DataOps.live’s Guy Adams offers commentary on objectively measuring data’s AI readiness. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.

Despite hefty enterprise AI investments, many AI projects are failing to move from pilot to production. There are many reasons for these failures, but one of the biggest is the absence of data that is suitable for AI. That is, companies do not have AI-ready data. But what exactly is AI-ready data, and how do you get there?

AI is, at its core, a reflection of data. AI models are trained on source data, and those models can then be used to analyze or make a prediction about fresh input data. Compared to other computational approaches–such as building a simulator based on physical laws or writing a collection of “if, then” rules–today’s AI reflects a complete data-oriented process.

Those who are eager to make their enterprise data-driven should be rejoicing. However, there are a few caveats to having a purely data-driven approach. For starters, it requires high-quality data. Getting the best output from an AI model requires giving it the best input. If you put garbage data in, you will get garbage data out. It really is that simple. The GIGO principle hasn’t changed since an IBM engineer first postulated it at the dawn of the mainframe era 70 years ago, and it isn’t likely to change anytime soon.

What is not simple is judging the state of your data, and its readiness to be used for AI in terms of both training and inference. What is even harder, is taking steps to improve the state of AI readiness in your data. But failing to account for this single factor can doom you to repeated AI failures.

Since the earliest days of data warehousing and business intelligence, enterprises have drawn links between the quality of data and the quality of analytics output. However, the size and scope of today’s ambitious AI projects–not to mention the potential for bad outcomes when AI models confidently present bad results when fed with bad data–demand a total rethink when it comes to developing AI-ready data.

Objectively Measuring Data’s AI-Readiness

The good news is that the pain of AI failure is driving the industry towards a collective agreement about the need for AI-ready data. Everyone may not agree on the various details, as companies in different industries and geographies will have different requirements. But most AI practitioners would recognize a pressing need for greater rigor and process-oriented thinking when it comes to achieving AI-readiness with data.

There is no single metric that defines AI-readiness. Rather, there is a range of measures that, when weighed collectively, generates a total score that strongly suggests where an enterprise sits on the spectrum of data AI-readiness. Extensive research has yielded a list of about 200 metrics that cover the most common variables when it comes to measuring AI-readiness of data.

These metrics span six categories, including data quality, data for AI training, data governance, data semantics, data management and operationalization, and data interoperability. With this in mind, a good way to test AI-readiness is to look at each category and ask questions such as:

  • Data Quality: Is your data fresh and timely? Is it accurate?  How valid and how consistent is the data? And do you have enough of it?
  • AI Training: Is your data biased? Do you have enough features identified in your data to train a modern model? Do you have enough labeled data? And how accurate are your labels?
  • Data Governance: Is data lineage being tracked? Have you enabled access control on the data? How about an audit trail? Data retention? Data privacy?
  • Data Semantics: Are you collecting enough metadata? Are you tracking your data schemas? Are you integrated and leveraging a business glossary? What about semantic annotation coverage?
  • Management and Operationalization: Have you automated deployment of data products?  How extensive is your monitoring and alerting system? Is your data architecture scalable? What about disaster recovery? Have you documented everything?
  • Data Interoperability: Are you adhering to API standards? What about data format compatibility and cross-platform accessibility? Are you locked-in to a specific vendor? What is your overall standards compliance level?

Why Achieving AI-Ready Data is Not a One Size Fits All

Each enterprise is different, and each enterprise’s data environment will differ, too. Some enterprises will naturally have a higher maturity level to start out when it comes to AI-ready data, whereas others will be starting off with a big learning curve ahead of them. Older and bigger companies that work in regulated industries, such as finance and healthcare, are likely already tracking some of these data metrics. Governmental organizations will have their own strengths and challenges to work from.

Not every enterprise has the same needs when it comes to developing AI-ready data. Companies in Europe, for example, must meet stricter governance and regulatory requirements when it comes to building AI applications than companies in the United States. The good news is that, with the right automated tool, it’s simple to adjust the AI-readiness weights that matter the most to you.

At the end of the day, it’s important to recognize the impact that can come from tracking a wide variety of data metrics over time. Kevin Drucker famously said, “If you can’t measure it, you can’t improve it.” If AI-readiness is important in your organization, you should probably be looking at adopting a more rigorous program to measure and improve the AI-readiness of your data.

In part two of this series, we’ll look at techniques for achieving AI-ready data.

Share This

Related Posts


Widget not in any sidebars

Follow Solutions Review