Generative AI – How to Care For, and Properly Feed, Chatty Robots

By Doug Kimball , CMO at Ontotext
Best Practices,

Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise technology. In this feature, Ontotext‘s Doug Kimball offers commentary on how to properly interface with generative AI.

Developments in generative AI (GenAI) have reached a crescendo at what feels like hyper-speed. It has captivated our minds, imagination, and conversations over the last several months with its seemingly magical superpowers. Enterprises worldwide are analyzing Generative AI capabilities and seeking ways to leverage them for a variety of use cases to improve their competitive edge and incorporate automation and efficiency.

Terms related to GenAI such as hallucinations and Large Language Models (LLMs) have become lingua-franca for any and every business conversation. As a result, students, business professionals, developers, marketers, and others have begun exploring these “chatty robots” and discovered there is a lot to like, and some things to be concerned about. LLMs in particular have remarkable capabilities to comprehend and generate human-like text by learning intricate patterns from vast volumes of training data; however, under the hood, they are just statistical approximations.

Interfacing with Generative AI

So, What Exactly are Generative AI and LLMs?

Generative AI refers to computational models that are trained on massive amounts of text data and output in the form of text, images, video, audio, new data, and even code. An LLM, on the other hand, is a neural network model built by processing text data. Analyzing how this data relates to other text-based data allows the data to be associated with similar text.

In simple terms, these models predict which word best follows previous words by taking a broader context of the words before it. LLMs can even take tone and style into account where responses can be modified by incorporating personas such as asking ChatGPT (powered by an LLM) to explain the concept of data governance through a Taylor Swift style lyric.

Challenges & Limitations

So while innovation in AI technologies may introduce new capabilities and uncover new opportunities, it quickly runs into problems associated with data governance and quality, trust, bias, and ethics. For example, if input training data is of bad quality, the results from AI algorithms will be substandard too. LLMs routinely generate superficial and inaccurate information, are non-deterministic, unreliable, and suffer from being trained using stale data. They are also incapable of providing provenance or pointers to data sets that let users know how the results were obtained.

As a result, and even if they sound realistic, these LLMs routinely produce bad responses based on outdated training data, exhibit random hallucinations, create bias, and lack real-world context. This is because they don’t care about conflicting or ambiguous information, they often make up an answer randomly based on certain parameters. But as one could imagine, this can produce catastrophic results with buggy pipeline code, bad and/or suboptimal implementation logic, inappropriate answers, or just plain toxic information. They run the risk of using trademarked, copyrighted, or protected data as they scour public data and can be easily exploited and manipulated to ignore previous instructions. Worse, LLMS can be made to execute malicious code, commands, or unintended actions by creative prompting.

Additionally, data is the fulcrum of AI, and the data used to train LLMs must be properly governed and controlled. Otherwise, any LLM deployed in production would run the risk of basing their decisions on poor quality data, exposing privacy, intellectual property, bias, and ethical issues. Evaluating the trustworthiness of LLMs is notoriously difficult as well as they don’t have any ground truth label. This causes organizations to struggle to identify and benchmark as to when the model can be trusted.

Building Governance, Security, and Trust with LLMs

Before jumping on the generative AI bandwagon, organizations should do their homework and clearly understand the risks, challenges, and negative consequences of leveraging LLMs. Otherwise, they are like a black box, where very little is known as to how they arrive at answers and responses and organizations can lose control of private data, GenAI pipelines can get compromised, or applications can be attacked in subtle ways by hackers. To avoid this, enterprises should consider:

A comprehensive data strategy: to align data and AI initiatives with business objectives. In industries and domains where compliance and regulations are mandatory, generative AI techniques need to be cross-pollinated with complementary capabilities to ensure transparency, and its responses must be verified before using it in production systems. Organizations should establish effective ways to refer to trusted heterogeneous data from disparate sources to effectively support LLMs and their associated applications and minimize errors.
A centralized, cross-functional platform team and adoption framework: This should include governance and risk mitigation, guidelines, guardrails, and an organization-wide consensus on how LLMs can be used for business processes. The framework will also ensure inputs/outputs have context, and are reliable, trustworthy, and understandable.
Creating a governance team: This will add specific policy guidelines, train data engineers, data scientists, and data quality teams accordingly, and make sure data stewards enforce them. Leveraging the adoption framework, this team will help ensure proper data quality, security, and compliance.
Building a center of excellence for best practices: For LLM-assisted data management and analytics. This should include up-skilling programs and talent management strategies to foster a culture of continuous learning about changing data and architecture patterns.

LLMs with Knowledge Graphs

Knowledge graphs (KGs) are becoming increasingly important when it comes to making generative AI initiatives and LLMs more successful. Created by integrating heterogeneous datasets across diverse sources, KGs provide a structured representation of data that models entities, relationships, and attributes in the data in a graph-like structure. With interlinked descriptions of concepts and entities, KGs provide context which helps improve comprehension.

So how do organizations address these needs from a data perspective? They accomplish this by implementing a semantic KG that is based on factual data, has the ability to inject and enrich context to the prompts given to LLMs, and can direct the LLM engine towards higher accuracy and relevance. Additionally, KGs are very effective in validating the integrity and consistency of LLM responses. These responses can be represented as a graph of connected nodes, which can be further validated by an organization’s domain-specific KG. This addresses bias, consistency, and the integrity of the data and/or facts and ensures regulations and compliance are adhered to.

The synergy between a KG-enhanced or supplemented LLM can go a long way to mitigating incorrect information and to enhancing its accuracy. KGs help identify sensitive information, compliance errors, and ethical violations which minimizes associated risks. More importantly, a KG-based data model provides transparency to the responses generated by LLMs and allows answers that can be trusted.

In a nutshell, generative AI cannot be a standalone tool in a toolbox. There needs to be a strong synergy and collaborative partnership between LLMs and KGs with a feedback loop of continuous improvement. KGs offer comprehensive context that continually improves the performance of LLMs. They also provide guardrails to prohibit and prevent hallucinations and from having them give inconsistent answers to critical enterprise questions.

Final Thoughts

With its magic and pitfalls, GenAI has simultaneously opened both a box of jewels and Pandora’s Box. Almost every organization is asking the same question, “Do we go fast and adopt this technology, or do we leverage it in the right and responsible way and, if the latter, what is it?”

Leaders need to balance the adoption of generative AI with the risks involved, but it is a true joint effort. Teams need to outline ethical principles and guidelines, factoring in the specific risks of each use case and organizations must balance innovation with risk management by establishing robust governance frameworks to mitigate associated risks.

KGs are proving to be a highly effective tool to navigate the challenges and complex landscape of risk and governance management. Knowledge graphs help to establish a clear understanding and explain effective ways to access and use trusted data from disparate sources to effectively support LLMs and their associated applications, More importantly, they enable organizations to confidently leverage the true power and promise of generative AI and LLMs.

This article was written by Doug Kimball on September 1, 2023

Doug Kimball

CMO

Doug Kimball is CMO of Ontotext, the leading global provider of enterprise knowledge graph (EKG) technology, and semantic database engines.

Best Practices

Generative AI – How to Care For, and Properly Feed, Chatty Robots

Interfacing with Generative AI