The Unanswered Questions: What We Still Don’t Know About AI in Q4 2025
AppsFlyer’s Inna Weiner offers commentary on the unanswered questions and what we still don’t know about AI at the end of 2025. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
The AI landscape is full of confident predictions. Beneath the surface, several questions facing developers, product leaders, and enterprise builders remain unsettled. In Q4 2025, enterprise AI is defined by these known unknowns: issues we recognize but do not yet have final answers to.
How Much Autonomy Can We Safely Give Agents?
Most deployments still keep a human in the loop. Pressure is growing to let agents move from recommendation to execution. That raises concrete engineering problems: how do we add guardrails for probabilistic systems, which are prone to “please the user”? Examples include adding budget guardrails that prevent overspend and over‑allocation, context awareness that accounts for holidays, breaking news, or market shocks, and safeguards that stop cascading errors when an agent chains multiple actions. The open problem is how to encode limited trust: scoped permissions, rate and budget limits, and stepwise approvals.
How Do We Evaluate Systems that Change Continuously?
AI does not stand still. Models drift, input distributions shift, and business logic evolves. Evaluation must be continuous, not episodic. Teams need offline regression suites that grow with every release, golden sets that reflect real data slices, and canary rollouts that catch regressions before they propagate. Online, output filters, anomaly detection, and policy checks protect production. In some cases, an LLM judge may score responses, but that judge also needs its own calibration and drift monitoring. The open question is what a repeatable evaluation pipeline looks like when every layer is moving.
How Should Access Controls and Consent Frameworks Work in an Agentic World?
Enterprises have identity and privacy controls that work for humans. Agents must inherit those controls without exception. That means least‑privilege scopes tied to specific actions, time‑bound tokens, and consent policies that differentiate recommendation from execution. Requests that change state should require explicit approval paths and auditable logs. Teams also need defenses against prompt‑level social engineering that attempts to bypass policies. The question is how to formalize these controls so they apply consistently, whether the requester is a user, an API client, or an autonomous agent.
Are Today’s Privacy Regulations Enough, or Do We Need New Technical Frameworks?
Regimes such as GDPR and CCPA define data use, yet they do not anticipate negotiating, acting agents. Developers face gray areas: is inference on customer data equivalent to processing, how is training data ring‑fenced from production usage, and where does transparency live so compliance can be proven? Practical answers include strict separation between training and inference pathways, data minimization in prompts and tools, immutable audit logs, and user‑visible disclosures at the interface and API layers. The unresolved piece is how to standardize these practices so they withstand regulatory scrutiny across jurisdictions.
How Do We Establish Agent Identity and Authorization, Especially in Agent‑to‑Agent Systems?
As agents initiate tasks with other agents, identity becomes foundational. Systems need a way to prove which principal an agent represents, what scope it has, and whether a downstream agent should honor the request. Useful building blocks include signed requests with verifiable provenance, audience‑bound tokens, attestation of the execution environment, and transaction limits that prevent “runaway” behaviors. Teams also need traceability across the entire chain: who asked for what, when, with which inputs, and under which policy. The question is which protocol surfaces become the standard for provenance and authorization in A2A networks.
Do We Build Generalist Assistants or Specialized Agents?
The vision of an all‑purpose assistant is attractive. Today, the most reliable progress comes from narrow agents with clear interfaces and measurable outcomes. They can be evaluated in isolation, hardened with domain‑specific guardrails, and versioned independently. Composition then becomes an orchestration problem: idempotent actions, retries, circuit breakers, and failure containment so one agent’s error does not ripple through the workflow. The open question is where the boundary lies between useful specialization and maintainable composition.
Where Does this Leave Us?
None of these questions has a definitive answer today. Progress depends on the experimentation mindset: launch, learn, evaluate, and adapt. The teams that advance fastest will not claim certainty. They will build evaluation pipelines that keep up with drift, enforce policy as code for humans and agents alike, and deliver the observability needed to trust systems that are powerful precisely because they evolve.


