Understanding How New Knowledge Graphs Work
DataStax’s Davor Bonaci offers insight on understanding how new knowledge graphs work. This article originally appeared on Solutions Review’s Insight Jam, an enterprise IT community enabling the human conversation on AI.
Knowledge graphs help us see how different pieces of information are related and help to organize and extract information from large amounts of content. Traditionally, knowledge graphs have been very selective, focusing on extracting specific entities like people, companies, or places and how they’re connected. However, selecting the details to build graphs is necessarily lossy and can be time-consuming, expensive, and prone to errors.
Imagine you wanted to create a curated map of everything you know about a specific area that makes it easy for others to navigate the space without getting lost. Think back to the days of folding paper maps and atlases before Google Maps. To do this with a traditional knowledge graph, you would have to painstakingly identify every important concept, like who works at what company, and every relationship between those concepts, like who manages whom.
You’d also need to have an expert on hand to help decide what’s important enough to include. In both cases, maps and graphs will be used to analyze and visualize for end users. So, including unnecessary information has negative consequences: cluttered data, slower performance, increased complexity, diluted insights, higher costs, the potential for misinformation, maintenance challenges, and confusion for users.
The process doesn’t stop there. If you decide to change how you’re organizing this information, you have to go back and reprocess everything. This makes traditional knowledge graphs very challenging to manage, especially when you’re dealing with a lot of information.
A New Approach: Content-Centric Knowledge Graphs
Just like when the user needs and technology evolved bringing about google maps allowing us to interact with maps with natural language, filters, modes and more. A new approach is now needed because of the benefits and needs of generative AI applications and the pros and cons of using large language models (LLMs). Here it’s important to provide the correct and complete context for the LLM to give a relevant answer without hallucinating.
Instead of focusing on these selective details from the start, there’s a new approach that’s much easier to handle and retains all of the content you need for use with LLMs. This method focuses on what’s called a “content-centric knowledge graph.” Here’s how it works:
- Chunking Content: First, you break down the content into chunks—these could be paragraphs, sections of a document, images, or tables. These chunks become the building blocks of the graph.
- Preserving Original Content: Unlike the traditional method, where you’d have to decide what’s important upfront, this approach keeps all the original content intact. This means you don’t lose any information, and it’s easier to adapt if your needs change later on.
- Automatic Linking: The graph is built by automatically identifying relationships between chunks based on things like hyperlinks, keywords, or other metadata. For example, if one chunk of text mentions another chunk, a link is automatically created between them.
Why This Matters
This new method is less work because you don’t need to be an expert to build the graph, and it leverages different technology: starting with vector search means you can extract semantically similar chunks without needing to reduce down to entities walking the graph that return related relevant information. It’s also scalable because the backing technology is built for real-time indexing retrieval, meaning it can handle large amounts of information without a problem. When you need to find answers, this graph helps by combining the benefits of traditional knowledge graphs and modern search techniques. It retrieves relevant content quickly, leading to better and more accurate results.
Generative AI is changing the way we think about organizing and retrieving information. Content-centric knowledge graphs make it easier to handle large datasets, reduce the need for expert input, and ensure that no information is lost in the process. As this technology continues to evolve, it promises to make searching for and using information for generative AI more intuitive and effective than ever before.