It’s Still Too Early to Trust Your Business to Conversational AI Interfaces

By Louis Landry , Engineering Fellow at Teradata
Best Practices,

Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise technology. In this feature, Teradata Engineering Fellow Louis Landry offers commentary on why it’s still too early to trust your business to conversational AI interfaces.

When ChatGPT burst into the vernacular late last year, discussions around Large Language Models (LLM) and Artificial Intelligence (AI) were elevated from IT backroom meetings to dinner party conversations overnight. And just as quickly, finding ways of capitalizing on this new “conversational interface” wave of AI – which now includes solutions like Google Bard and Claude.ai — became an imperative for nearly every enterprise.

But what everyone should be asking first is “Can you trust it?” For now, the answer is “not really … not yet.”

Conversational interfaces that enable natural language queries are built on LLMs, a type of generative AI. Both generative AI and predictive AI rely on algorithms. The difference is that in the predictive AI world, data scientists or AI practitioners compose different methods and tools, often stringing together complex mathematics with a goal of predicting an outcome. The goal of generative AI, in contrast, is generating new content or outputs based on a massive amount of curated data and content serving as context.

A multi-modal path that mixes LLMs with predictive AI is where the industry is heading. The combination of generative AI and predictive AI will achieve the best overall experience and outcomes for users.

Conversational AI & Trust

As we move in that direction, the generative model’s job will be translating the often imprecise and ambiguous language we use into specific tasks and requests of other systems including predictive AI, and then translating the results into a form that is consumable by anyone.

Eventually, it may take on other roles, but there is real danger in people expecting that generative AI will replace predictive AI outright. That’s because LLMs are designed to satisfy prompts rather than predict outcomes. A user may ask an LLM to perform a specific math problem, for example, and the LLM will respond with language it believes satisfies the question. It is not, however, performing math calculations. “What is 2×2?” for instance, would certainly return the result of “4” simply because examples of this math problem are readily found across what we know to be the largest public data sets in the world.

But being an outstanding parrot of trivial math equations is not the same thing as being able to do math. With more complex math equations, where correct responses are not available on the Internet or in a data set accessible by the LLM, the LLM cannot return the correct response. In fact, studies have shown solutions like ChatGPT may get as many as 60% of math-based queries wrong.

And that’s the rub.

Many of us might have been told we can trust computers because they do not lie, but we are now in an age in which they may not always tell the truth, either. It is not just math, of course. With so much disinformation and propaganda available through sources like the Internet, LLMs very likely will return inaccurate information alongside the correct information. And even though users are aware of these limitations, LLMs deliver results that feel authoritative and certain. That may not be a big deal when a personal user is asking for a biography of an author who never existed, but it’s a big deal when a health insurance customer asks a chatbot for information about why her healthcare claim was denied.

Inputs and Outputs

So, if accuracy is in question, why should anyone bother with LLMs today? It’s because they give a whole new level of input functionality. They make interacting with AI digestible and approachable for everyday users. Imagine being a traveler who missed a flight … Rather than searching for an airport gate agent, you could jump onto your phone, ask an LLM-enabled AI chatbot for options, and within seconds you could be reviewing flight options, complete with your seating preferences and other details.

Explainability

We are at an enticing inflection point. It is exciting to see how these two worlds – the more rigid machine learning and AI worlds that are essentially basic math that we’ve been doing forever, and LLMs — are coming together. While the former is good at arriving at the correct numerical answers, they do not connect and resonate with the broad population the way LLMs do. Bridging the gap between these worlds will be a game changer.

There’s still a mountain of work to be done around “explainability,” which will help instill greater trust. Explainability is essentially the concept that a machine learning model and its output can be described in a way that makes sense to a human. If I query a popular LLM, then follow up by asking how it arrived at an answer, it likely will respond that it used Internet resources. The more comforting response, would be if the LLM supplied citations and specific facts, and a response such as, “I arrived at this through advanced math.” We are at the beginning of that journey, and I don’t believe will be able to fully trust the LLMs until we get there.

ModelOps and Governance

There is hope, however. ModelOps, which enables rapid operationalization of models, help convert developed models into operational assets effectively and are designed to address some of these issues.

ModelOps acts as a governance and operations layer within the model-building process. ModelOps may include a simple user interface that shows when models were trained, using which data, and by whom. The interface might also show who evaluated the model, the results, and who approved and deployed the model. When information like this is captured, there is a detailed audit chain for every model which is documented for governance purposes.

ModelOps are crucial for advancing AI because they enable efficient management, deployment, and monitoring of AI models throughout their lifecycle, ensuring their optimal performance and scalability. It is also important to note that part of model performance involves examining it for drift, which describes the model’s degradation in performance over time. Drift occurs as the model’s uses and targets change.

Leveraging quality data, managing data, and managing model drift are all essential to building rigor in the data and data flow/pipeline, and ensuring greater trust in the insights and outcomes derived from the AI models.

Garbage In/Garbage Out

Blending predictive AI and LLM-based inputs and chats will certainly open new doors. In the context of a business, that might mean an AI-generated chat session could lead to opportunities for upselling, cross-selling and better relationship building because the systems working in tandem can understand both

what the customer is asking for and what the customer is likely to need next based on their past purchase history. But for all of this to work, trust is required.

The trust in any model output is only as good as the integrity of the data set that trains the model. Simply because I can point to “That’s the data set or that’s the information that I use to answer this question,” does not answer the question of whether that data set is valid and accurate.

For example, when a financial institution offers a loan to one candidate over another, the rejected application might ask a chatbot, “Why was my loan rejected?” A natural language response such as “People with a credit history like B are generally rejected,” can open many areas of concern because it begs for more information, such as where did the data come from? Is it biased? What characteristics were used? What features went into that training set?

With today’s rush to build AI that works for everyone, these types of problems will only be exacerbated. Taking the time to curate and weed out data set bias is a necessity, but currently, a step companies may overlook in their rush to be in the LLM game.

Being Prepared

So, what can companies do to ensure they are as prepared as possible when making their first foray into LLMs?

Start with Clean Data

Using the right data, infrastructure, and tools when building and maintaining LLMs helps build trustworthy models from the onset.

Use ModelOps to Advance AI

They enable efficient management, deployment and monitoring of AI models through their lifecycle, ensuring their optimal performance and scalability.

AI/ML models put into production degrade over time because dynamic business environments change constantly. During model deployment, data scientists can identify model performance and data quality risks by validating ModelOps analysis with previous model experiments and data environments.

Work Diligently at Keeping Bad Data Out

Governing, labeling, and cataloging data sets often comes down to human practice, so setting up strong governance that ensures the company is working with the best data possible means the business can then use that data to make the best possible decisions while minimizing bias or incorrect outcomes.

Focus on the value of accurate, real-time, on-time insights, with robust data flows and data pipelines that ensure there is no data or model drift.
This level of precision – and the ability to do this at scale – is especially critical for applications where “hard-and-fast” rules apply, such as medical applications, financial services, autonomous driving, and more.

Carefully Monitor for and Manage Model & Data Drift

Which typically result from disaggregation, decentralization, and autonomous teams doing autonomous work.

Upskill all Workers Around Data Management Practices

As the firewalls between analytical, data, software, and other engineering functions are melting away. Ensure the teams who will be working with the models are trained properly.

Conversational interfaces powered by GenAI or LLMs can be thought of as the new front-end paradigm for every piece of software you create because they likely will be the way your customers interact with your company moving forward. For those who interact with them, these interfaces essentially become a nerdy friend that is able to make complicated things understandable and approachable. And that’s a good thing.

Still, for now, we should all be cautiously optimistic. LLMs are a fantastic tool but they are not magic. Nor should they be trusted implicitly. They can enhance many processes, but the data management practices behind them need to be clean and clear and well documented. Data rigor equals data trust.

Those who believe ChatGPT and other conversational interfaces will change everything are wrong, but so are those who say it won’t change everything. While these solutions change nothing about the core business (your software is still your software, for example), they can and will change everything about how customers interact with your business. And being certain rigor, governance and trust are part of that equation is the only way forward.

This article was written by Louis Landry on August 24, 2023

Louis Landry

Engineering Fellow

Best Practices

It’s Still Too Early to Trust Your Business to Conversational AI Interfaces