Ad Image

Generative AI Data Quality: Expert Insights from IT Leaders

Executive Editor Tim King brings this collection of Generative AI data quality insights from Solutions Review featured contributors. Articles appearing in this space originally appeared on Insight Jam, an enterprise IT community enabling the human conversation on AI.

As GenAI models become increasingly integrated into various sectors, the accuracy, reliability, and integrity of the data they consume directly impact their performance and the trust placed in their outputs. Poor data quality can lead to biased, inaccurate, or even harmful results, undermining the potential benefits of AI and posing significant risks to organizations and consumers.

To navigate these complexities, it is essential to draw on the knowledge and experience of those who are at the forefront of these fields. The GenAI and data management space is constantly advancing, with new methodologies and technologies emerging at a rapid pace. As such, the strategies for maintaining data quality are also evolving, requiring continuous learning and adaptation.

From addressing issues of data bias and representation to discussing the technical nuances of data preprocessing and validation, their expertise not only highlights the importance of rigorous data standards but also provides actionable strategies for ensuring that generative models are built on a foundation of high-quality data.

In this article, we have curated insights from a distinguished group of experts in the field, each offering a unique perspective on the challenges and best practices related to data quality in this AI moment.

The thought leaders featured in this article have been instrumental in shaping the discourse around data quality in generative AI. Their work not only addresses the current challenges but also anticipates future developments, providing a roadmap for how to approach data quality in a way that is both forward-thinking and practical.

Generative AI Data Quality: Expert Insights


Tola Capital Vice President Jake Nibley, Partner Akshay Bhushan, and Founder Sinan Ozdemir offer a commentary on how fine-tuning and data quality are defining the AI arms race:

“The data generation process cost the team less than $500 using the OpenAI API. In their initial run, fine-tuning the model took three hours on 8 80GB A100s and gave them similar performance results to text-davinci-003 using only $100 worth of cloud computing costs. For less than $1,000, the team created a language model that won 1 more comparison against text-davinici-003 in a blind pairwise comparison evaluation (89 vs. 90). This shows us that it’s more than possible for open source models to catch up quickly; it’s inevitable.

We’re heading into a world where everyone has access to these models, and enterprises and individuals alike will get tons of value from them. It’s up to enterprises to decide not only how they’ll create the next industry-disrupting technology using proprietary or open-source models but also how they’ll tweak them in a way that gives them better results for their specific use case. This kind of innovation isn’t a binary approach: open-source and proprietary models can work harmoniously.”

Read on Solutions Review


Dataddo‘s co-founder and CEO Petr Nemeth offers commentary on several solutions for improving data quality for AI initiatives:

“People-focused solutions for data quality, like instituting a comprehensive data governance policy, will continue to remain important, but they need to be supplemented by technological solutions for standardizing and flagging up questionable data as early as possible in the AI lifecycle. This is why organizations that don’t have the appropriate technologies and tools in place are struggling to move AI initiatives into production.

Since humans have been collecting data, organizational solutions like policies and methodologies have been essential for maintaining its quality. And, still, they are essential. However, by themselves, they are decidedly insufficient for AI workloads; they must be implemented alongside the right technologies and tooling. “

Read on Solutions Review


Syniti‘s Rex Ahlstrom offers a quick commentary on GenAI and data quality, and how to deploy a successful data strategy:

“Organizations must start collecting and documenting data, metadata, procedures, business processes and business rules as part of their data quality programs. These essential elements are necessary for AI models to produce accurate and insightful results. By investing in initiatives to enhance data quality, businesses can build a solid foundation for the application of AI.

The context of the data is important, too. How can you be sure you’re selecting the right data sets and inputs? Your results will be useless if you have high-quality data but the wrong data. To make sure you can make the most of generative AI, you must combine good data curation with high-quality data.”

Read on Solutions Review


Bigeye’s Kyle Kirwan offers insights on the critical importance of data quality in this deep dive resource:

“Data quality is a critical aspect of modern business operations, impacting everything from daily decisions to long-term strategic planning. By understanding the importance of data quality and implementing preventive measures, organizations can ensure that their data is reliable, accurate, and fit for purpose. High-quality data not only supports efficient and effective decision-making but also builds trust and confidence among stakeholders. As the saying goes, “garbage in, garbage out.” Ensuring data quality from the outset can prevent costly errors and pave the way for successful data-driven initiatives.”

Read on Solutions Review


Solutions Review Expert Nicola Askham offers insights on data quality being the secret sauce for AI and Generative AI success:

“We often marvel at the sheer scale of Large Language Models (LLMs). These behemoths owe their ‘largeness’ to the vast volumes of data they are trained on, collected from a myriad of sources. The lifeblood of these models is the quality of this big data. It’s through this data that the models learn the intricate dance of language patterns, enabling them to generate coherent and contextually accurate responses.

As a data executive, I’ve often found myself fascinated by the intricacies of artificial intelligence and its relation to the quality of data. However, it’s important to remember that AI, like any tool, is only as good as the data it’s trained on.”

Read on Solutions Review Thought Leaders

Share This

Related Posts

Insight Jam Ad


Widget not in any sidebars

Follow Solutions Review