Enterprises Must Ensure Data Reliability as AI Takes Over
Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise technology. In this feature, Acceldata‘s Rohit Choudhary offers commentary on why enterprises must ensure data reliability as AI takes over.
Every several years, a disruptive new technology emerges that has world-changing potential. The internet took shape back in the mid-1990s; innovations in mobile took hold around 2008; and cloud came to scale in the mid-2010s.
And now? Generative AI is having its moment.
It’s impossible to browse headlines without seeing multiple stories on generative AI: how organizations are leveraging it, what it means for the workforce, and its benefits, limitations, and risks. Just like other technology tipping point predecessors like the internet and mobile, generative AI is going to change how we work and live.
From a human standpoint, generative AI could let humanity achieve some level of an artificial general intelligence (AGI) that could have a significant impact for all aspects of society. From a business enterprise standpoint, generative AI can help organizations become more efficient, innovative, and competitive: more code can be written, more use cases can be solved, more data can be generated, and it can all be done faster and with fewer resources.
But are we prepared to function in a world that is effectively using our own collective intelligence to perform work we are accustomed to doing? Are we truly ready for the consequences of unleashing the collective power of humanity’s brain in ways that govern how we perceive and interpret the world?
There’s no telling how this will play out over the next decade or so, but in the midst of generative AI’s rapid proliferation, there is something critical that data leaders need to account for now, and that is data reliability.
Data Reliability & AI
Enterprise leaders have long understood the importance of using reliable data to train their AI and machine learning models, but generative AI is taking things a step further. Now, instead of just predicting things, AI is creating them—from writing blog posts, to producing images and writing code. This profound shift changes the nature of work for almost everyone in a way that was unimaginable as little as just 18 months ago.
Consider that at its foundation, everything resulting from generative AI starts with data. It is what powers generative AI models at scale. Currently, much of this data is being provided through public corpus. Case in point: 20 percent of all data on the web today has gone into creating OpenAI’s GPT-4.
We’ll also increasingly see private data become an integral part of training generative models in different domains. Bloomberg’s BloombergGPT, for example, uses the organization’s data to do things like “assess whether headlines are bearish or bullish for investors, and even write headlines based on short blurbs.” And legal AI firm Harvey has set out to do something similar for law firms.
In order to realize the value of generative AI, it’s critical that these models are given accurate, quality training data so they don’t hallucinate and provide users with inaccurate information. The data needs to be closely monitored and observed throughout the various data pipelines that are supplying and integrating data from disparate sources, but the process for doing this is only becoming more complex.
The Importance of Data Observability
Organizations need data observability tools to establish and maintain data reliability and avoid any potential negative business outcomes as AI rapidly proliferates.
Gartner defines data observability as, “the ability of an organization to have a broad visibility of its data landscape and multilayer data dependencies (like data pipelines, data infrastructure, data applications) at all times with an objective to identify, control, prevent, escalate and remediate data outages rapidly.”
Essentially, data observability tools give companies comprehensive visibility into their data so that they can identify, fix, and prevent issues, making the data stack much more reliable. It’s not hard to imagine why this will become crucial as organizations train their generative AI models, and it ties back to the need to avoid pitfalls like AI hallucinations.
For example, if a non-technical business user asks a chat bot a question and a generative response comes back that looks authoritative, they may take that information at face value and let it guide their decision-making. But in reality, if the organization didn’t get enough high quality data into that application, eventually some level of unintended corporate hallucination will occur, which could potentially lead to poor business outcomes down the line. For this
reason, it’s vital that data observability is possible at scale: Training models need massive amounts of high-quality data in order to function optimally.
The truth is, generative AI models are only as helpful and accurate as the quality of the data with which they’re supplied. The implications of training these models on low quality, incomplete, or inaccurate data could have serious negative consequences depending on the use case. As organizations explore the promise of generative AI, it’s essential they consider the use of data observability tools to increase data reliability.
- Enterprises Must Ensure Data Reliability as AI Takes Over - August 24, 2023