The Great Data Escape: AI, Local-First, and the Cloud Exodus

Brian Pontarelli, CEO of FusionAuth, provides in-depth commentary on AI agents, local-first computing, repatriation, and more. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
Data is at the heart of a business’s ability to drive value, meet regulatory requirements, control spending, operate efficiently, and do just about anything else it needs to grow revenue and serve customers. Yet, despite its importance, businesses are losing control of this critical asset.
As companies have increasingly moved to the cloud, they’ve handed over vast amounts of critical data to Software as a Service (SaaS) and cloud providers, often without an easy way to get it back. It is surprisingly hard to quantify the value of this data—and how this data is used—to any given organization. But to understand the importance of data to organizations, all we need to do is understand where the data is going, who owns it, and how this is changing.
Today, three trends threaten the cloud’s data moat, which has quickly built up over the last decade: AI agents, local-first computing, and repatriation. Companies are realizing that to innovate, cut costs, and ensure compliance, they need to regain control of their data.
The Data Moat of the Cloud
The emergence of SaaS solutions has been great for collaboration and convenience but terrible for data ownership. You can use a loyalty program with an airline to build points and get perks, but try to get the data of your travels out of their system to do something different with it on your end? Good luck.
The same applies to technical teams building using SaaS tools to get their jobs done. You can outsource to a SaaS-only provider for authentication to log users into your app, so you don’t have to build authentication yourself. That’s great, but getting your data back is not always so easy. The Thales group estimates that 60 percent of corporate data is stored in the cloud, up 100 percent from 2015. But once it’s there, can you get it back?
There’s some nuance here. It depends on which kind of ‘cloud’ we are talking about. In the Infrastructure as a Service (IaaS) model, your data is stored in the cloud, and you have a dedicated instance or otherwise ‘own’ that data; it’s yours. Getting your data back is less of a problem here (though it is still a problem; see the section on repatriation).
Once we start getting into the Platform as a Service (PaaS) model and pureplay SaaS, things get a little more tricky. You could get your data back, maybe, or you could get the metrics and logs from whatever application you’ve built on OpenShift. But you’ve lost control over the way ‘health’ is measured for your application, or you are dependent on third-party tools for that data. At the end of the day, you are dependent on the tool you have chosen and how they have decided to prioritize data ownership.
SaaS is the extreme situation where you put the data in, and the cloud keeps it, either because of internal rules (like Cognito’s lack of password hashing export) or data gravity (where you simply have too much data to move effectively). Yes, you can generally export your data from a SaaS solution … but can you? Maybe you can get your data from your Identity and Access Management (IAM) tool, but most certainly not from the hyperscalers. When was the last time you were able to export all the emails you’ve ever written from Gmail? It’s not easy.
AI Agents Encroach on SaaS and PaaS
With ChatGPT, it was always a huge drawback to give away search and conversation data to OpenAI’s servers in exchange for computing power and convenience. One of the biggest areas of growth and promise in AI is the AI agent, which is a far cry from the original ChatGPT SaaS model.
The value of the AI agent is predicated on direct access to data. For many AI agents, the learning and the data are local, focused on solving a local problem, like, for example, tuning solar panels and wind turbines to maximize efficiency, reducing equipment failures and leaks in chemical processing, or managing robots in an assembly line to optimize for quality. And when the agent works locally, there are no worries about where the data text you’re putting into the LLM is going.
ChatGPT is not an AI agent because it is reactive by definition, responding to user questions and requests based on patterns learned from massive datasets—always only in response to user input (versus learning on its own). And there is that small detail that it can only interact with you while you’re plugged into a WiFi connection and in the app.
With an AI agent, complex problems specific to the situation can be solved, such as allowing a robotic hand to learn different tricks and motions on its own, or in the case of the chatbot, an agentic AI chatbot could fill in any information gaps to respond to the task after assessing how its own tools and resources align to the need. Have you ever wished you could give ChatGPT access to your personal files and ask it a question about your taxes, your travel schedule, or even anything else that you asked previously, and have you come up with an informed response? A local AI agent, theoretically, could help with these challenges.
But where does the AI processing power come from for an AI agent? Doesn’t that require the cloud? The recent launch of DeepSeek from China and Llama, Meta’s LLM, demonstrate how easily AI can be downloaded and used exclusively on your local machine.
AI Agents Lead Into Local-First
Imagine an AI tool designed to organize your emails automatically with smart labels and filters. If your emails live on Gmail or another SaaS platform, that AI tool must rely on integrations provided by the cloud service, limiting its functionality. If the data was local, though, the AI agent would access your email without the layer and burden in between or any of the coaxing of the bigger cloud operators.
To be valuable, AI agents must have access to your specific data, whether it lives locally or in the cloud. This is why some agentic AI start-ups, like Vella, are also pushing for a local-first world.
The Local-First Movement Wants its Data Back
Local-first is retro. Twenty years ago, Excel was downloaded onto your laptop. You used it without a network connection, and that was that. The local-first movement aims to recapture that level of control while preserving real-time collaboration. With local-first applications, you use a sync system in place of a backend, and your application code reads and writes data to and from your local database. The app works and can be updated offline.
Local-first gives you the performance, privacy, and ownership benefits of a local app, along with the collaboration benefits of a SaaS app. For example, GitHub would be deemed truly local-first if it could operate on images and things apart from just code and if it had a real-time sync capability.
For local-first to work, there has to be a way to sync up data between local applications once they’ve connected again to a network. Academia has come up with, and continues to improve on, technology that can do just this, called Conflict-free Replicated Data Types, or (CRDTs).
It is telling that many of the apps used by engineers to develop software still revolve around a primary copy of the software on your local filesystem that’s not subordinate to any remote server. Git is a common example, but there are also Integrated Developer Environments (like VSCode), build tools (like Jenkins.io), and runtime environments (like Jupyter). When your data is local, you can control it and work quicker without the risk of a sync issue across to a remote server.
In local-first applications, you own your data. With a SaaS app, when you need to export your data, you rely on the cloud service. Imagine you’ve started a business, and you want to download all of your emails, sort them by various dates and subjects, and export them into a CRM system to understand who might be a potential customer from within your network. Doing that with today’s email systems is nearly impossible because you don’t own your data and are limited by the download functionality and rules of the email provider.
Local-first applications are everything that SaaS applications are not, so long as the syncing issue can be resolved. For many, the reward is worth the work of resolving the outstanding technical challenges to making local-first a broader reality.
How Far Are We On Making Local-First a Reality?
Today, local-first development ranges from a set of real-world implementations that act on certain objects to a set of pure-play prototypes.
‘Almost’ there: Linear’s sync engine
Linear is a tool for feature updates and planning. The founders made a name for themselves by building a real-time sync engine that allows you to handle persistent data, offline changes, and queuing, in-flight changes. It doesn’t use a lot of resources because the data is local. Only data changes require network requests.
Github: very close
GitHub allows developers to manage their code, requests to change the code, and updates to the code. GitHub is built on git, and the git functionality is truly local first. However, key functionality, such as pull requests, is not available without a network connection.
Authentication for devs: still needs the network
Developer-focused authentication tools you can download, like FusionAuth, can support local development and local testing, even though in production, the authentication process requires network calls to the authentication server to be useful for users. If you had a local webapp not hosted on the internet, you could use an authentication server like FusionAuth.
Prototypes
Ink&Switch, an independent research lab, has built multiple prototypes of local-first apps, from collaborative drawing to project management boards. Each prototype uses CRDTs and local-first development. They have found conflicting updates to be minimal, experienced great user experience and speed, and had success working with CRDTs.
Repatriation – Going After IaaS
So far, we’ve seen AI agents and local-first attacking PaaS and SaaS’s stranglehold on your data. But what about IaaS?
Even IaaS, which is the easiest to retrieve data from, is not safe from the move to gain more data control. In 2020, only 43 percent of CIOs planned to repatriate some of their cloud workloads to on-premises. That number jumped to 83 percent in 2024, according to the Barclays CIO Survey.
At first glance, it might appear that the concern around having data locally is a side-effect rather than the primary reason, which has more to do with the impact of cloud costs on overall margins. Andreessen Horowitz famously stated, “You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it,” when they published an estimate in 2022 about the cost savings that could result from full repatriation, using public data to infer the resulting change in share price in the 50 top public software companies. Their estimate totaled $100B of market value lost in reduced margins because of a reliance on cloud infrastructure.
But looking deeper at individual cases of repatriation, we see that one of the primary reasons for repatriation is the added value you can drive from your data when you have it locally. When commenting on GEICO’s massive repatriation efforts, Geico’s VP of Infrastructure, Rebecca Weekly, said, “If you spread your data and your methodology across so many different vendors, you are going to spend a lot of time re-collecting that data to actually serve customers.”
She also commented that compliance was more difficult in the cloud. At any given time, it was more difficult to produce information or analyze it, which limited speed and raised the costs of their compliance efforts.
Cultures with higher concerns over data privacy, like the EU, have been slower to migrate to the cloud. However, we might start seeing an increase in repatriation in these geographies. A recent study by Citrix found that 25 percent of organizations in the UK have already repatriated 50 percent or more of their cloud data to on-premises.
The Future of Data Ownership
The erosion of the cloud’s data moat is more than just a technical shift—it’s a rebalancing of power in the digital economy. For years, businesses have relied on SaaS and cloud providers to manage their critical data, often at the cost of control, compliance, and operational flexibility. But the rise of AI agents, local-first architectures, and cloud repatriation is changing that equation. Companies are realizing that true innovation, cost efficiency, and regulatory agility depend on owning and managing their own data rather than being locked into external platforms.
This shift isn’t just about saving money or improving compliance. It’s about redefining digital competitiveness. In the coming years, the winners won’t be those with the deepest cloud system integrations but those who can harness, secure, and optimize their data on their own terms.