Different AI Approaches for Different Data Types

By Brett Hurt , CEO and Co-founder at data.world
Best Practices,

Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise tech. In this feature, data.world CEO and Co-founder Brett Hurt offers commentary on the different AI approaches for different data types.

AI has the potential to revolutionize how we manage and interpret data. However, different types of data—structured, unstructured, and semi-structured—pose unique challenges in scaling AI programs. Not all data was created equally – here’s how to approach AI across different data types.

Unstructured Data: Untameable Data

Unstructured data lacks a predefined format or structure, making it one of the most challenging types of data to manage. Unstructured data comes in formats like audio files, videos, and emails that are inherently complex and not easily queryable using traditional database queries like SQL.

To derive meaning from an audio file, one must listen and interpret the content. To make sense of an email, one must read it. Machines cannot straightforwardly query this data type to directly extract insights.

Unstructured Data & AI

AI technologies, particularly those utilizing Natural Language Processing (NLP) and machine learning, play a vital role in handling unstructured data. AI can transcribe speech from audio files, identify key phrases, and analyze sentiment in text. Additionally, AI-powered image recognition and video analysis can automatically categorize and tag visual content.

Despite its potential, using AI with unstructured data presents significant challenges. Hurdles come in the form of the diverse formats of unstructured data, as well as the need for extensive training data to accurately teach AI models how to interpret and process this information.

Structured Data: The Organized domain

Structured data is highly organized and easily searchable, thanks to its rigid schemas. It is arranged in rows and columns, nodes and edges, or objects and properties, making it efficiently queryable with languages like SQL. This type of data is exemplified by relational and graph databases, where entries are stored in consistent and standardized formats.

Structured Data & AI

AI significantly boosts the efficiency and speed of processing structured data. It can automate complex queries, forecast trends based on historical data, and detect anomalies in real-time to prevent fraud. Additionally, AI systems can improve database performance by learning query patterns and dynamically adjusting resource allocations.

To effectively leverage AI with structured data, it is crucial to provide it with an understanding of the metadata, which serves as the database schema. The optimal schema is an ontology, combining the data’s structure with its logical meaning. This integration makes ontology the most effective way for AI to interact with structured data.

Teaching AI via ontology can be done in a few ways. Fine-tuning models is one approach, but using Retrieval Augmented Generation (RAG) against a knowledge graph of the ontology is often more effective. This is a key focus area with our AI Context Engine.

Semi-Structured Data: The Middle Ground

Semi-structured data lacks a strict schema but includes formats like XML and JSON documents, which contain tags or keys that offer some level of hierarchy and organization. While this type of data is machine-readable, it often lacks a schema definition that clearly explains what the fields represent. As a result, semi-structured data poses greater challenges for machines to process compared to fully structured data.

Semi-structured Data & AI

Many of the strategies used for structured data are also applicable to semi-structured data. The key is to add or infer the missing structure that is only partially present in semi-structured data.

The Future of AI, For All Data Types

Looking ahead, AI is poised to become increasingly pivotal in managing various data types. The development of AI models capable of seamlessly handling unstructured, structured, and semi-structured data is particularly exciting. These advancements will allow businesses to derive insights more quickly and accurately, regardless of the data type.

However, there has been some disillusionment with generative AI, with even AI experts criticizing its lack of true understanding. Large Language Models (LLMs) can be unreliable. Rather than rendering knowledge and data obsolete, AI will broaden access to data for a wider audience. As we better grasp the strengths and limitations of AI and our data systems, we will come to expect more knowledgeable and intelligent access to data.

That said, there has been some disillusionment with generative AI; even AI experts argue that it falls short due to its lack of true understanding. You can’t always trust an LLM. Instead of making knowledge and data obsolete, AI will expand data access to a broader audience. As we better understand the limitations and capabilities of AI and our data systems, we will expect more intelligent access to data.

This article was written by Brett Hurt on May 20, 2024

Brett Hurt

CEO and Co-founder

Brett Hurt current job as Co-Founder & Chief Executive Officer at data.world. Brett joined Austin Ventures in 2012 and focuses on early-stage software investing. Brett holds an MBA in High-Tech Entrepreneurship from the Wharton School at the University of Pennsylvania and a BBA in Management Information Systems from the University of Texas at Austin.

Best Practices

Different AI Approaches for Different Data Types

Unstructured Data: Untameable Data

Unstructured Data & AI

Structured Data: The Organized domain

Structured Data & AI

Semi-Structured Data: The Middle Ground

Semi-structured Data & AI

The Future of AI, For All Data Types

Brett Hurt

CEO and Co-founder

Expert Insights

Latest Posts

Categories

Important Links

Useful Pages

Different AI Approaches for Different Data Types

Unstructured Data: Untameable Data

Unstructured Data & AI

Structured Data: The Organized domain

Structured Data & AI

Semi-Structured Data: The Middle Ground

Semi-structured Data & AI

The Future of AI, For All Data Types

Share This

Tags

Brett Hurt

CEO and Co-founder

Related Posts

Accelerate with Confidence: Building a Strong AI Governance Framework

The New Energy Imperative: Navigating Complexity Through Modern Systems Integ...

Model Context Protocol Explained: Insights from Dremio CTO Rahim Bhojani

Expert Insights

Latest Posts

Follow Solutions Review