Saturday, March 14, 2026
spot_img

Is Agentic AI Shifting Away from Autonomy?

Last updated on December 26th, 2025 at 09:29 am

As vendors walk back bold claims and enterprise deployments hit architectural limits, the gap between “autonomous agents” and “supervised automation” is becoming impossible to ignore


When Salesforce CEO Marc Benioff launched Agentforce in September 2024, he called it “what AI was meant to be” – autonomous agents that would usher in “the third wave of AI” and enable companies to build “limitless digital workforces.” By October 2025, Benioff was declaring the “agentic revolution” and predicting his company wouldn’t need to hire software engineers in 2026 because AI agents would handle the work.

Then, two weeks ago, something changed.

On December 12, Salesforce said it is moving away from consumption-based pricing – the pay-per-conversation model that assumed autonomous digital labor – back toward traditional seat-based pricing. On December 24, trade publication AdExchanger reported (in a section of the roundup with the lede Let the great backpedaling on agentic AI begin!) that Salesforce executives were telling customers Agentforce works best using old-school “if, then” programming rather than generative AI, with one SVP admitting: “We all had more trust in the LLM a year ago.”

There is evidence of a fundamental reckoning happening across the industry, from tech to pricing to deployments, indicating probabilistic engines at the heart of large language models may be architecturally incapable of reliable agency. Vendors are pivoting from autonomous agents to supervised automation while trying to attract as little attention as possible.

The Vision That’s Crumbling

The promise was compelling: AI agents that reason, plan, and execute complex multi-step tasks independently. Not chatbots that answer questions, but digital workers that could qualify sales leads, resolve customer inquiries, coordinate supply chains, and handle administrative tasks without human oversight.

Gartner forecast that agentic AI could drive 30% of enterprise application software revenue by 2035 – over $450 billion, up from 2% in 2025. It then predicted in June that 40% of projects would be cancelled. There are still contradictory signals in its analysis, but the consolidation it identified is clearly underway; investors poured $17.7 billion into AI-related acquisitions in Q4 alone. Major vendors from Salesforce to Microsoft to Google staked their AI strategies on autonomous agents despite analyst misgivings.

The shift from software-as-a-service to “AI-as-a-service” would fundamentally change how enterprises operate and how software companies make money. Instead of paying for seats, companies would pay for outcomes – conversations handled, leads qualified, tasks completed.

But that vision required something no vendor has successfully delivered: AI systems that can reliably maintain state, handle sequential decision-making, and operate autonomously over extended interactions without degrading into incoherence or requiring constant human intervention.

The Salesforce Retreat

Throughout 2025, Salesforce served as the industry’s most visible test case. Benioff wasn’t just selling Agentforce to customers – he deployed it internally, cutting his support staff from 9,000 to 5,000 and claiming AI agents now handled half of customer conversations.

The messaging was aggressive: “No other company comes close to offering this complete AI solution for enterprises.” Agentforce was “the fastest growing product in Salesforce history.” The company would reach “one billion agents” by end of 2025.

Then came the walkback.

The pricing shift tells the story. Consumption-based pricing ($2 per conversation) assumes autonomous digital labor. Seat-based pricing assumes humans are still in the chair, using agents as productivity tools. When Salesforce walked back to seats on December 12, they were admitting what their internal deployments revealed: these systems require constant human supervision.

Booking.com’s Conservative Truth

While Salesforce was evangelizing autonomous agents, Booking.com was quietly demonstrating what actually works – and what it reveals is instructive.

In a November case study, Booking.com detailed their “agentic AI chatbot” for handling guest communications. The transparency is unusual for production AI deployments, and the architectural choices are telling. evaluation infrastructure is more complex than the agent itself: manual annotation rounds, LLM-as-a-judge for continuous evaluation, production monitoring via Arize, in-tool user feedback, and controlled experiments.

Booking.com’s system works because of what it doesn’t attempt. Their architecture succeeds by avoiding extended context, multimodal processing, and autonomous actions – the very features most AI vendors are marketing.

The Scaffolding Pattern

Agent-R1, a framework for training agentic AI released in November 2025, reveals the industry pattern: increasingly sophisticated external scaffolding to work around fundamental architectural constraints.

The framework introduces a Tool/ToolEnv architecture where:

  • Tool = atomic executor (API calls, code execution, database queries)
  • ToolEnv = state manager that handles how outcomes affect agent state and calculates rewards

Agent-R1 is external state management and reward optimization built on top of existing LLM architectures. What it hasn’t changed: how attention degrades over long contexts, how the model maintains internal coherence, the token-level processing mechanisms.

It’s sophisticated scaffolding – like building an excellent suspension system on a car with a fundamentally limited engine. The results validate this: Agent-R1 trained agents substantially outperformed baselines on multi-hop QA (probably 3-5 retrieval steps maximum). But there’s no demonstration of sustained autonomous operation, and all successful implementations have external orchestrators.

Google’s Deep Research follows the same pattern: elaborate external orchestration for task decomposition and state management. Microsoft’s agent taxonomy catalogs failure modes and proposes solutions: “identity management, memory hardening, control flow regulation” – more scaffolding.

The pattern across every major lab is identical: build increasingly complex external systems to manage what the probabilistic core cannot reliably maintain itself. The looming reality looks a lot more like layered systems, orchestration, guardrails, and scoped agents.

The Architecture That Isn’t There

Here’s what none of these systems have: mathematical frameworks predicting where they’ll fail, architectural explanations of why failure occurs at specific thresholds, or recognition that the probabilistic inference engine fundamentally cannot maintain the state required for agency.

Academic research catalogs the symptoms – “hallucination, brittleness, emergent behavior, coordination failure” – and proposes more scaffolding: ReAct loops, RAG architectures, orchestration layers, causal modeling. Industry reports document empirical failures: 0% success on administrative work, 8.3% on financial analysis, 30.4% on software development with clear validation criteria.

But nowhere in the published research is there a framework that explains the fundamental constraint: probabilistic next-token prediction architectures lack the primitives necessary for maintaining state across extended agentic interactions.

Earlier this year PatternPulse.AI (disclosure: author Jen Evans is a partner in both it and b2bnn.com) published Evans’ Law, a mathematical framework that predicts coherence collapse thresholds based on model parameters (L ≈ 1969.8 × M^0.74). The research introduces the “Evans ratio” – the percentage of agentic work actually performed by probabilistic inference versus deterministic orchestration code.

In current “agentic AI” systems, the Evans ratio is approximately 0-30%. The deterministic scaffolding does all the actual agentic work – state management, multi-step planning, tool coordination. The LLM generates text when prompted, and sometimes analyzes data.

This explains why Salesforce is reverting to deterministic programming, why Booking.com built such conservative architecture, and why Agent-R1 needs external state management: the probabilistic core cannot reliably handle agency at production scale.

What Enterprise Buyers Are Actually Getting

The gap between marketing and reality creates serious risks for enterprise deployments that are now manifesting:

Vendors are selling “autonomous agents” while delivering supervised automation tools. The difference matters enormously for ROI calculations, staffing decisions, and infrastructure requirements.

Pricing models reveal the truth. Consumption pricing assumes autonomous digital labor. Seat-based pricing assumes human supervision. When vendors walk back to seats, they’re admitting the agents can’t work independently enough to justify paying for output.

Scaffolding has maintenance costs. External orchestration systems require constant updating, monitoring, and troubleshooting – costs not included in the original “autonomous agent” pitch.

The Gartner forecast assumes agents work. That $450 billion projection for 2035 is built on assumptions of autonomous digital labor. If the industry is quietly pivoting to supervised automation, the entire revenue model changes.

The Questions Buyers Should Ask

When evaluating “agentic AI” solutions, enterprise buyers should demand specifics:

What does the deterministic code do versus the LLM? If vendors can’t clearly articulate this split, they don’t understand their own system’s limitations.

What’s the actual autonomy level? “Autonomous agents” that require human review of every action aren’t autonomous.

What’s the scaffolding maintenance burden? External orchestration, memory management, and monitoring systems require dedicated resources.

What happens at extended interactions? Most demos show 3-5 step interactions. What happens at 50 steps? 500?

What’s the actual success rate in production? “70% user satisfaction” compared to manual processes doesn’t mean the AI is succeeding 70% of the time – it might mean it’s marginally better than a terrible baseline.


For enterprise AI leaders navigating these decisions, the key insight is simple: evaluate solutions based on what the deterministic code does, not what the probabilistic engine promises. The scaffolding is doing the heavy lifting. Make sure you understand – and budget for – what that actually means.

Featured

Outsourcing For Outstanding Results: Where Is Outside Help Advised?

Credit : Pixabay CC0 By now, most companies can appreciate...

3 Essential Tips to Move to A New Country For Your Business

Image Credit: Jimmy Conover from Unsplash. Countless people end up...

The New Formula 1 Season Has Begun!

The 2025 Formula 1 season has kicked off with...

Savings Tips for Financial Success

Achieving financial success often starts with good saving habits....
Jennifer Evans
Jennifer Evanshttps://www.b2bnn.com
principal, @patternpulseai. author, THE CEO GUIDE TO INDUSTRY AI. former chair @technationCA, founder @b2bnewsnetwork #basicincome activist. Machine learning since 2009.