Unleashing the Power of Generative AI for Data and Analytics – Part 2

Unleashing the Power of Generative AI for Data and Analytics - Part 2

- by John Santaferraro, Expert in Data Analytics & BI

To make the best decision on selecting an AI-enabled business intelligence vendor, look for one thing. Look for Metadata!

The Crucial Role of Metadata for Generative AI 

In the world of data management, metadata has already been established as the de facto way to govern the use of data and guide users to accurate insight for business decisions. Metadata is data about the data and often includes information like data definitions, ownership, lineage, quality, validation, and usage. More advanced uses include data certification, profiling, automation, and recommendations. The most advanced use cases include the automatic generation of metadata. Metadata can be applied to any type of data asset. Think of metadata as a dictionary or glossary utilized to ensure that content being developed is true to the intended use of the data.

Just as enterprise metadata governance and management are fundamental to enabling safe, scalable, and compliant analytics solutions, they are equally foundational for the use of AI and automation. Given access to metadata, generative AI can govern the use of data and ultimately produce more accurate insights for users that are often unaware of the importance of governance. For example, instead of just giving insight to a user, it is possible to provide an explanation of the insight being returned or warnings about potential inaccuracies. Or, instead of waiting for users to ask where the insight came from, the explanation can offer information on the source data used to produce the insight including information data lineage, such as, where the data came from, how the data was transformed, and to what degree the data was validated or certified.

The Crucial Role of Semantics for Generative AI

One of the biggest challenges in working with generative AI is the difficulty getting accurate results specific to the business. For this reason, data and analytics vendors that support a semantic layer, or a plain language description of the business, will greatly increase the output of reliable insight. The accuracy of the business model and the specificity of business processes descriptions both translate into competitive use of AI.

In order to produce accurate results, both general AI and generative AI require a logical representation of the business. A business model or semantic layer places guard rails on the use of AI and the resulting insight. While it is important to model the business at a high level, granularity widens the impact of AI across the entire organization. It is especially important to understand business process interactions and dependencies, as well as the potential value associated with each process. Defining roles and responsibilities within the organization and mapping them to the business processes extends AI impact even further. Role-based AI enables the fine tuning of automation in light of how people interact with data and technology.

The Crucial Role of Feedback Loops for Generative AI

Because generative AI is young and immature at this point, it makes sense to build checkpoints to ensure that decisions are being made with accurate insight. Leading vendors are already establishing both technical and human feedback loops in their products to continually improve accuracy.

From a technical standpoint, some vendors have limited generative AI access to metadata and the semantic layer. Generative AI can deliver data as part of a response, but not alter the data. Since these vendors have already created a highly governed environment for the use of data, this separation reduces the risk of hallucination. Other vendors have implemented a final validation step that feeds the results from the generative AI engine back into the analytics engine. The end user results come directly from the analytics engine.

Along with technical feedback, leading data management vendors are using a combination of human and technical feedback loops to increase generative AI accuracy and reliability. Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to optimize language models to eventually achieve the best possible results. In simple terms, the language model behind generative AI is already trained to a certain extent. Human experts then rank generative AI outputs and the rankings are used to create a reward model. The reward model is used as a guide for reinforcement learning, retraining the language model to produce better results.

The Crucial Role of General AI for Generative AI

The greatest potential for generative AI in data management does not emanate from tacking generative AI onto existing technology. The phenomenon is as obvious as a glance at where the internet is today, compared to 30 years ago when people were taking their existing content and just posting it on the internet. Deep value is achieved when generative AI is combined with general AI to provide a common infrastructure for all enablement.

Having sat through dozens of generative AI demos, it is instantly obvious when someone has simply bolted ChatGPT or some other generative AI platform to their existing software. In almost all cases where this is happening, there is much hype and little value; most of the implementations are showing what other AI-enabled vendors have been doing since 2019, or earlier.

To realize the importance of a unified approach for general AI and generative AI, data management vendors must bring the two worlds together. For example, metadata services vendors like Unifi were already using general AI for automated usage analysis, sensitive data discovery, relationship discovery, data tagging, data validation, automated inventory, automated cataloging, search recommendations, and relevant dataset recommendations. When generative AI has access to the results from the general AI capabilities, it can do far more than simply access data or other non-AI enabled platform functions.

For an analysis of what it takes navigate the complexity of generative AI in the data and analytics space, read the full Ferraro Consulting POV paper, Unleashing the Power of Generative AI for Data and Analytics.

See Part 1.