AI for Data Engineering: The Good, The Bad, and The Ugly

AI for Data Engineering: The Good, The Bad, and The Ugly

- by Robert Eve, Expert in Data Management

A tour de force among Western genre movies, with its unrivaled cast featuring Clint Eastwood as “the Good,” Lee Van Cleef as “the Bad,” Eli Wallach as “the Ugly,” and Ennio Morricone’s incredible film score, The Good, the Bad, and the Ugly presents a remarkable tale of trials and tribulations as three men journey in search of buried gold.

Ultimately, the Good overcomes these obstacles, ending up with all the gold.  The unfortunate Bad dies at the hands of the Good in the epic gun battle at the movie’s climax. Meanwhile, the Ugly manages to survive, albeit unrewarded.

From a movie set 160 years ago during the American Civil War, these three central characters represent the good, the bad, and the ugly organizations face in their journey to AI gold.  

Applying the Good, Bad, and Ugly Lens

This tripartite model is especially effective as a lens on AI and data engineering.

  • The Good: Ways to effectively apply AI to data engineering workloads, the capabilities, advantages, and benefits.

  • The Bad: What can go wrong and the negative consequences of AI on data engineering work and workers?

  • The Ugly: Data engineering work that AI cannot yet accomplish and thus must continue to be done “the old-fashioned way.”

I recently hosted an Insight Jam session exploring the Good, the Bad, and the Ugly of AI for data engineering with fellow Insight Jam expert Philip RussomKeebo CEO and Co-founder Barzan Mozafari, and Tredence VP of Engineering Arnab Sen.

Check it out on YouTube.  Here are my three key takeaways.

Opportunities for AI-driven Data Engineering

Data engineering is a critical business function. Without sound data engineering, organizations are challenged by:

  • Underperforming queries that frustrate users.

  • Poorly optimized warehouses and pipelines that increase costs.

  • Data and metadata anomalies that impact models and lead to incorrect decisions and actions.

  • Inaccurate or incomplete migrations that damage valuable data assets.

  • And more.

Like many areas of data and analytics, AI is revolutionizing data engineering tools and methods. This is an opportunity not to be missed.

Key Takeaway: Look for AI-driven data engineering tools and methods to address your most crucial data and analytics issues.  

Observe and Act vs. Observe and Report

One of the best ways to apply AI is to use large language models to scan massive quantities of data and suggest actions. Here are three examples:

  • Observe a poorly running query and alert the data engineering team to examine it.

  • Observe a spike in a warehouse workload and alert the data engineering team to expand the warehouse size.

  • Observe a “From to To” column mismatch during an on-premise to warehouse migration and report these anomalies to the data engineering team for resolution.

But why stop with a list of AI-driven suggestions? Rather than just reporting issues when observed, AI-driven data engineering tools can also automatically resolve problems at that moment. For example, automatically increase or reduce data warehouse sizes to match changing workloads.

Using observe and act data engineering tools allows the data engineering team to observe more problems sooner and respond faster to them, in fact, in real-time. Another benefit is less work for already overloaded data engineering teams, freeing them for other high-value activities such as building new data pipelines.

Key Takeaway: Don’t settle for “Observe and Report” AI-driven data engineering tools and methods. Instead, look for “Observe and Act” solutions. 

AI-driven Data Engineering is not (yet) a Panacea.

It is still early in transitioning from traditional data engineering tools and methods to AI-driven ones, but this evolution is moving fast. Organizations cannot afford to miss this opportunity, so it’s essential to get started.

Trust is also a concern. While current AI-driven tools and methods cannot yet provide 100% reliable results for 100% of the use cases, an 80/20 approach can provide significant business value today. Delivering tangible results is a great way to build trust.

Key Takeaway: While you will undoubtedly face the good, the bad, and the ugly on your AI-driven data engineering tools and methods journey, take the trip anyway. You won’t end up with the gold if you don’t.