From Prompt Design to Data Engineering: Why Architecture Matters for Agentic AI

By Alex Merced , Developer Advocate at Dremio
Best Practices,

Dremio’s Alex Merced offers commentary on prompt design and data engineering and why architectures matter for agentic AI. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.

Agentic AI is having a moment. Teams everywhere are wiring up agents that plan tasks, call tools, and act autonomously. Demos look smooth, and product pitches sound sharp. But most of these systems sit on a fragile base. At their core, agents depend on data. If that data is slow, scattered, or stale, the system breaks, no matter how good the model or prompt may be.

An agent isn’t just a chatbot with plugins. It’s a system that sets a goal, breaks it into steps, and makes decisions based on the current state of data. For that to work, three things must happen:

The data must feel unified, even if it lives in different places.
It must follow rules for access, quality, and updates.
It must be ready for low-latency queries at scale.

This is where modern data architectures come in, specifically lakehouses and open table formats like Apache Iceberg. These platforms already solve problems that agents inherit such as schema drift, access control, and real-time queries. That’s why a strong data foundation, not just prompt design, determines how well an agent performs.

Agents Need Engineering, Not Just Prompts (and How Lakehouses Can Help)

Many teams start with the wrong focus. They tune prompts, set up vector search, and wire agents to APIs. It works until real users arrive. Then, messy records, slow queries, and broken schemas start to surface.

If you’ve worked in data engineering, this will sound familiar. The same thing happened with early data lakes. Teams dumped files into cloud storage, but nobody trusted them. It took table formats, metadata catalogs, and reliable engines to bring structure and trust. Iceberg did that by turning object storage into something queryable and safe. Now, it’s time to bring those same patterns to Agentic AI.

After-all, agents need context to act. That context lives in tables, documents, emails, and logs. A support agent might combine customer history, product specs, and policy rules in real time. A planning agent might track inventory, supplier delays, and costs. In every case, the system only works if it can pull the right data at the right time.

Dashboards can tolerate delay but agents can’t. If your agent answers with last week’s prices or an outdated policy, it won’t just be wrong, it might cause harm. That means data must be fresh, structured, and queryable on demand. Agents push harder on things like streaming ingestion, small-batch writes, and fast metadata access.

Beyond Structured Data and Why Building Trust is Critical

Agents also pull from PDFs, chat transcripts, and web pages. Once those are embedded or extracted into fields, they live alongside structured tables. The lakehouse must treat both as first-class data. This means tracking metadata, audit trails, and schema evolution across both. Iceberg makes that possible by bringing ACID guarantees and time travel to the lakehouse, which is essential when an agent’s decision needs to be reproducible.

But, agents don’t just read data, they often write it too. A planning agent might adjust forecasts, and a support agent might update ticket statuses. That makes data governance essential. The platform must enforce rules every time the agent reads or writes. If it doesn’t, the system may drift, create risk, or leak sensitive information.

Apache Iceberg helps by enforcing atomic writes, schema checks, and versioned changes. Each write creates a snapshot, and every snapshot is tracked. If an agent makes a mistake, a data engineer can roll back. If something fails downstream, the engineer can pinpoint the exact data state that caused it. This isn’t just helpful, it’s required for safe AI.

Governance also means controlling what each agent can do. Some agents should read but never write and some should write but only to staging tables. Iceberg supports branching so changes can land in a safe space before they go live. A human can inspect and merge them later. This structure reduces fear and increases trust across the team.

Handling Mixed Workloads

Agents don’t run one at a time. In practice, dozens might run in parallel, some reading fresh data, others replaying old snapshots for audits or testing. That puts pressure on the data platform to support mixed workloads. Columnar formats, hidden partitioning, and metadata pruning help keep queries fast and cheap. But it takes planning.

Apache Iceberg supports features like partition evolution and automatic compaction. These help control file sprawl and keep performance consistent. As small updates land, sometimes thousands a day, Iceberg reorganizes them behind the scenes. It also manages metadata to avoid bloated catalogs or stale snapshots.

A well-tuned system feels calm, even under load. Without these controls, query performance can spiral as the number of small files grows, hurting both humans and agents.

Tracking the Past, Replaying Decisions, and What It All Means for Date Engineers

Many agent tasks depend on event history. What changed? Who did what? What data was visible at the time? Iceberg’s time travel lets you answer these questions without extra plumbing. You can query a snapshot from the last hour or last month and get a clean, isolated view of the system state.

This matters for trust. It helps with auditing, debugging, and learning. If an agent took the wrong step, engineers could reconstruct exactly what it saw. If a new model performs better, teams can compare past runs against the same data. This reproducibility is what makes AI manageable at scale.

Some teams worry that stronger data platforms slow them down. The opposite is true. Structure speeds you up. When the lakehouse handles freshness, schemas, and metadata, engineers can focus on what matters, building better prompts, choosing the right tools, and setting up useful agents.

No one wants to spend hours cleaning up nulls, patching broken joins, or rewriting pipelines. When the platform does its job, that work disappears. It also removes surprises. Tables stay consistent, fields behave, and pipelines keep running. Your agents get the context they need without human babysitting.

Why This Matters Now

Agentic AI might feel new, but the problems it exposes are not. Data engineers have already solved many of them. They’ve seen what happens when platforms grow without guardrails. They’ve seen the value of clear contracts, open formats, and shared governance.

That experience now applies to agents so there is no need to invent a new stack. The warehouse taught us the value of clean schemas, the lakehouse showed us how to combine flexibility with control, and Iceberg taught us how to treat object storage like a database. These lessons now carry into the world of autonomous systems.

The takeaway is simple: prompt design gets you a demo. Data architecture gets you a product.

This article was written by Alex Merced on December 12, 2025

Alex Merced

Developer Advocate

Alex Merced is a Developer Advocate for Dremio, focusing on educating the industry about using open data architecture with open source technologies such as Apache Iceberg, Apache Arrow, Project Nessie and more. Alex Merced is the host of several data podcasts such as Gnarly Data Waves, DataNation and co-host of “Select * from data.lake”. Alex Merced has worked as a developer for companies Crossfield Digital, Gened Systems, CampusGuard and has trained engineers as an instructor at General Assembly.

A Brief Apache Arrow Tutorial by Open Data Architecture Expert - March 30, 2023

Best Practices

From Prompt Design to Data Engineering: Why Architecture Matters for Agentic AI

Alex Merced

Developer Advocate

Expert Insights

Latest Posts

Categories

Important Links

Useful Pages

From Prompt Design to Data Engineering: Why Architecture Matters for Agentic AI

Share This

Tags

Alex Merced

Developer Advocate

Related Posts

Databases Go Cloud-Native: Kubernetes Paves the Way to Resilience and Scalabi...

The Best LightsOnData Data Governance Training and Online Courses for 2026

The Hidden Reason AI Fails & How Knowledge Graphs Can Fix Them

Expert Insights

Latest Posts

Follow Solutions Review