Tuesday, June 9, 2026
spot_img

The Day of AI Agents Arrived. Did it Result in Anything Meaningful Except Tokens?

Last updated on May 12th, 2026 at 11:28 am

Medallia handed the keys to its lenders in April. The SaaS index lost $285 billion in February. The next dozen names on the refinance wall tell the same story.

By Jen Evans, founder Pattern Pulse AI, publisher B2BNN, co-founder Tech Reset Canada

On April 22, 2026, Thoma Bravo handed Medallia over to its lenders. Blackstone, KKR, Apollo, and Antares now control a company Thoma Bravo bought for $6.4 billion in 2021. Roughly $5.1 billion in equity is gone.

Medallia is not isolated. It is the most visible mark on a wider trade going wrong: the bet that agentic AI would transform legacy seat-based SaaS fast enough to outrun a peak-vintage debt schedule. That bet is being repriced in two venues at once. The private-market side ran through Medallia in April. The public-market side ran through the SaaSpocalypse in February, when the SaaS index lost $285 billion in market cap over 48 hours. Both rest on the same misread of where agents actually work and where they do not.

How Medallia actually broke

The proximate mechanic is straightforward. Peak-vintage 2021 LBO, $6.4 billion price, financed with cheap debt and a Payment-in-Kind toggle that let the company defer cash interest by piling it onto principal. The PIK relief expired at the end of 2025. By early 2026, annual debt service had climbed to roughly $300 million against $200 million in earnings. When Blackstone, holding $1.5 billion of the debt, declined to extend the PIK window, no refinancing path existed. Median revenue multiples for mature SaaS platforms had dropped from 9x in 2021 to roughly 6x in 2026. Thoma Bravo could not refinance at any level that preserved equity. Lenders took the keys.

The deeper question is why multiples compressed that hard. Two reasons. Rates rose, which compresses everything. And buyers and lenders started pricing in agentic AI as both a threat to seat-based SaaS revenue and as a transformation that PE was counting on for margin gains that have not arrived. The threat got priced in. The rescue did not.

What PE actually bought

The largest software-focused PE firms hold more than 40 active investments each. Thoma Bravoโ€™s roster includes Coupa, Anaplan, Proofpoint, and the Hornetsecurity acquisition recently stacked on top of Proofpointโ€™s existing debt. Vista, Silver Lake, Clearlake, Hellman & Friedman, Francisco Partners, and Permira each carry comparable stacks of comparable vintage. Across the category, sponsors have been actively searching for the operating model that makes the agentic transformation work in time. The question is real and recently urgent inside these firms.

My research over the last eighteen months has documented why the answer they need does not exist yet. Drift signatures across Claude, GPT, and Gemini show the same failure pattern at scale. Evansโ€™ Law states the predictability problem precisely: the longer a model reasons, the greater the likelihood that a response will be incorrect, until the point at which the likelihood that an answer is incorrect exceeds the likelihood that it is correct. This is a structural property of how these systems hold state across reasoning chains. It is not closable through more compute or better prompting. The categories where it fails most dramatically are exactly the categories where seat-based SaaS lives.

The production data confirms it

Independent surveys from Anaconda, Forrester, a16z, and the MIT Sloan CIO panel converge on 88 percent of agent pilots never reaching production. Of the 12 percent that do, 22 percent show negative ROI within twelve months. The single most-cited production-readiness barrier, named by 70 percent of leaders, is non-deterministic outputs. That is drift restated in procurement language.

The categories with clean production deployments are narrow. Code migration, where compilers and tests provide deterministic feedback. Customer service with humans in the loop. SDR outbound where the volume absorbs the error rate. Document extraction inside bounded schemas. At the other end, legal and compliance work runs at a 61 percent human-in-the-loop rate with an 11-month median payback, which is why those categories were never going to deliver the operating margin lift PE needed.

The working categories share a structural pattern. Bounded task, short feedback loop, error rate absorbed either by verification (compilers, schemas, human review) or by throughput (volume swallowing the miss rate). Coding agents like Claude Code, Cursor, and Replit Agent run at industrial scale because every step gets checked by a compiler or test suite. Support resolution agents like Fin, Breeze, and Cresta work because tickets are small, success is measurable, and the customer signal closes the loop. SDR outbound, document extraction, transcription, translation, and dynamic pricing engines work for the same reasons. The deployed agent economy is real and large. It lives in the categories where the unit of work is small, scaffolded, and verifiable, which is the inverse of where seat-based legacy SaaS placed its bets.

That split produces two distinct failure modes for PE-owned legacy SaaS, both lethal. Where agents work, the seat-pricing model collapses because the work no longer needs seats. Modern AI agents resolve 55 to 70 percent of customer support tickets without a human, and top performers clear 86 percent. That is the Zendesk problem. Support agents do hold coherence on bounded tickets. The revenue model built around seats does not survive the resolution rate. Where agents do not hold coherence (procurement, customer experience, analytics, observability, corporate learning), the rescue thesis fails. State must persist across multiple steps, multi-system handoffs lose context, and the operating margin lift PE underwrote does not arrive. That is the Coupa, Medallia, Alteryx, New Relic, Cornerstone problem.

Jason Lemkinโ€™s distress list, published this week, names both buckets in one set: Zendesk, Coupa, Medallia, Alteryx, New Relic, Cornerstone, Proofpoint, and a dozen others. These are the workflows where PE concentrated more than $440 billion of capital between 2015 and 2025 across roughly 1,900 software companies. The convergence is the point. One end of the list dies because agents work and seats no longer make sense. The other end dies because agents do not work reliably enough to deliver the margin gains the debt schedule demanded.

The bet, marked to market

Lemkinโ€™s list reads as an exposure map. Pluralsight handed over by Vista in 2024. Medallia by Thoma Bravo in 2026. Qualtrics next, with JPMorgan-led banks pulling a $5.3 billion debt deal in March 2026 after failing to win over investors. Proofpoint with $4.67 billion in total debt against an interest coverage ratio that only worked at zero rates. Cloud Software Group (Citrix and TIBCO) at $16.5 billion of new leverage, the largest software LBO by debt quantum. HealthEdge bought by Bain at 30x EBITDA financed with $2.6 billion at 8.72 percent interest. Coupa with reported workforce cuts of up to 30 percent and an โ€œagent-first re-architectureโ€ running as a race against debt service.

That phrase, โ€œagent-first re-architecture,โ€ is the same bet wearing a different hat across the portfolio. Cut seats with agents, hold the revenue, rebuild margin fast enough to outrun the debt schedule. The bet only works if agents stay coherent in the categories where seats live. The production data above says they do not. The Medallia handover says lenders are reaching the same conclusion.

About 20 to 25 percent of all private credit is now exposed to software. Roughly one-fifth of that has to refinance by 2028. The wall is real, named, and dated.

And in public markets

The PE side is the leveraged half of the same story. The public-market half ran in February 2026, when the SaaS index lost $285 billion in market cap over 48 hours. The financial press named it the SaaSpocalypse. The four canonical seat-priced enterprise software stocks (Salesforce, Adobe, ServiceNow, and Atlassian) shed roughly a third to two-thirds of their market cap over the trailing twelve months. Atlassian alone is down 69 percent. The S&P 500 was roughly flat over the same window.

The deceleration was visible long before February. Median public-SaaS revenue growth has fallen every year since the 2021 peak: 33 percent, then 24, 17, 16, 12.2. Sixteen straight quarters of slowing growth. Atlassian reported its first-ever decline in enterprise seat counts and cut 10 percent of its global workforce to โ€œself-fund AI investments.โ€ Workday cut 8.5 percent. Oracle cut between 10,000 and 30,000 jobs. Q1 2026 alone saw roughly 80,000 tech layoffs, about half explicitly AI-attributed.

The incumbents are repricing themselves in public. Salesforce went headless, making MCP available without a Salesforce seat. ServiceNow says 50 percent of new business revenue is now non-seat consumption. Anthropic restructured Claude Enterprise from a flat $40 to $200 per seat to $20 per seat plus usage. HubSpot moved its Customer Agent to $0.50 per resolved conversation. Intercomโ€™s Fin charges per resolution because resolutions are now the unit of value. SAP announced a wholesale shift to AI consumption pricing. The companies whose entire model was the seat are quietly admitting the seat does not work anymore.

The math behind the migration is unforgiving. A team running 1,000 support tickets a month on Zendeskโ€™s public Suite pricing, with the AI Copilot add-on and per-resolution fees, runs $2,000 to $3,000 a month before a single ticket gets resolved. The same workload at $0.75 per AI resolution with no platform fee costs roughly $462. That is the spread the public market is pricing in.

What makes this lethal for PE-owned legacy SaaS specifically is that you cannot bolt a new pricing model onto a 2021-vintage capital structure. Public companies can absorb a transition over multiple quarters because their debt schedules are not pulling forward. PE-owned companies cannot. The same migration that costs Salesforce a third of its market cap costs Coupa, Medallia, and Proofpoint the entire equity stake.

The coherence constraint

The defense in PE pitch decks is that tokens are getting cheaper, so unit economics will catch up. That misreads the constraint. Token cost is the wrapper. Coherence is the load. Token economics determine how long an agent can be allowed to reason. Coherence requires reasoning long enough to maintain state across multi-system, multi-step workflows. Stretch the chain to hold coherence and you cross the Evansโ€™ Law predictability threshold. Constrain the chain to control token spend and you undercook the reasoning. The interval where both economics and coherence work is narrow and shrinking as task complexity grows. This is why the 70 percent non-determinism finding is structural and why no vendor has closed it.

The Coupa pitch deck and the Medallia operating model both rest on the assumption that the interval gets wider over time. The opposite is happening in the categories that matter for seat replacement. Models keep improving inside narrow, well-scaffolded environments. The long-horizon, multi-system, stateful workflows that procurement, support, customer experience, and observability platforms run on remain stubbornly resistant to reliable coherence. That is the gap PE sponsors have been hunting an answer to. The answer is not there to be found on the operational timeline the debt schedule demands.

The Agentic Governance Space

The deployed agent landscape can be plotted on three axes: error rate (how often the system gets it wrong), autonomy (how independently it acts), and consequentiality (how much each action matters). The safe operating envelope is bounded by the inequality E_safe โ‰ค k/Oc. Tolerable error falls as the product of autonomy and consequentiality rises. Above the surface, a system is acting too independently, on actions that matter too much, with an error rate the combination cannot absorb. Below the surface, the system is operating inside its safe envelope. This is the Agentic Governance Space framework I have been developing over the last six months, and it makes the deployment pattern legible.

Claude Code /loop is the worked example. Loop is Anthropicโ€™s recurring autonomous agent feature, released in March 2026. It runs scheduled coding tasks for up to 72 hours without human intervention, which is meaningful autonomy. The 72-hour cap is a deliberate design choice, not a technical limit. Anthropicโ€™s stated reasoning is that error accumulation in long-running automated processes compounds, that blast radius grows with run length, and that a hard limit forces a natural review point. The frontier lab with the most successful production agent deployments anywhere caps its own working agent at three days. In the constraint space, Loop sits comfortably below the surface: high autonomy, low error rate because compiler verification closes every step, low-to-medium consequentiality because permissions are scoped and runtime is bounded. The structural conditions keeping it safe are verification scaffolding and time-bounding.

Now apply the same frame to what PE underwrote. The Coupa, Medallia, and Proofpoint roadmaps assume agents operating across procurement systems, CX platforms, and email security at high autonomy with high consequentiality. State must persist across multiple systems. Actions touch finances, customer relationships, regulated data. The category sits structurally above the constraint surface, and there is no equivalent of the compiler closing the loop. The 72-hour cap that protects Loop has no analog in the workflows where the seat-pricing model placed its bets. Plotted against the constraint surface, the working agents cluster below it and the agents PE underwrote sit above it. The interactive visualization makes the gap visible at a glance.

The cluster pattern is the analytical payoff. Almost every named production agent sits in a tight band near the constraint surface. The boundary is where the economics work. Operate too far below the surface and the agent is too constrained to deliver business value. Operate above it and errors compound faster than throughput can absorb. Loop is the only point that combines real autonomy with safe operation, and it does so through engineered scaffolding: compiler verification on every step, scoped permissions, the hard 72-hour cap. OpenClaw shows what the same autonomy class looks like without those conditions. Same height on the autonomy axis, no verification loop, the point breaks the surface. The PE bet is that everything in the cluster can be moved to where Loop sits. That assumes the scaffolding can be replicated in procurement, CX, and observability workflows where compilers and hard time caps have no analog. The visualization shows why that assumption keeps failing.

A successful production deployment widely shared this month described a multi-agent product pipeline running design partner extraction, competitive analysis, codebase mapping, and nightly Claude Code PRs. The author called it โ€œagents running our end-to-end product process.โ€ Read the architecture, not the headline. Every agent operates on bounded tasks with structured outputs. Humans own strategy, prioritization, and every approval gate. Tests are first-class artifacts. The nightly Claude Code window is 2-3 hours, scoped and reviewed in the morning. The system works precisely because it respects the constraint surface, not because it overcomes it. The author says so directly: โ€œThis isnโ€™t unfettered autonomy. It works because weโ€™re strict about verification and checkpoints.โ€ Working agentic deployments at this moment all share this structure. None of it maps to what PE underwrote.

The deployed agent economy is consumer extraction

The other application of agentic AI currently deployed at industrial scale outside of coding is surveillance pricing. The discourse mostly looks past it.

Forty-plus state bills across 24 US states have been introduced in 2026, outpacing all of 2025. The House Oversight Committee launched a formal investigation on March 5. The FTC testified in April that staff work on surveillance pricing continues. Consumer Reports ran a field test across 400 volunteers and found 74 percent of Instacart items were offered at multiple price points simultaneously, with some shoppers seeing prices 23 percent higher than others for the same item at the same store. The estimated invisible tax on a family of four was $1,200 a year. The FTCโ€™s January 2025 staff findings confirmed the explicit pitch behind the industry: vendors market these tools as a way to boost revenue and margins by two to five percent through extraction.

Pricing engines need only enough inference to set a number. Coherence is not a binding requirement. That is why this is the agentic application that scales while the others stall. The infrastructure exists, the deployments are live, the revenue is real, and the unit economics work because the task is bounded and the feedback signal (did the customer buy at this price) is immediate.

The Canadian political fight

The political response to this in Canada has already started, and most of the AI policy world is looking the wrong direction. Avi Lewis won the federal NDP leadership on March 29 on the first ballot with 56 percent of the vote, the largest margin in NDP history. Two weeks later, on April 13, he stood with his caucus in Ottawa and announced a motion in Parliament to ban surveillance pricing, calling it โ€œa crystal clear example of why we desperately need government guardrails to protect us from the triple threat of Big Tech, AI and corporate monopolies.โ€ The framing he chose for his first major policy push as leader was the AI consumer extraction layer.

The lineage matters. Manitobaโ€™s NDP government became the first jurisdiction in Canada to introduce a surveillance pricing bill. The Ontario NDP under Marit Stiles followed. UFCW Canada is publicly backing the federal motion. The grocery and retail labour movement is in the same room as the AI policy push, which is unusual and significant.

The refusals matter more. Doug Ford, asked directly whether he would ban surveillance pricing in Ontario, said he would not because he believes in free-market capitalism. The Carney Liberal majority government has not moved on the file. The two governments with the actual authority to act are holding the line for the practice while the only deployed-at-scale extraction infrastructure outside of coding agents continues to run on Canadian consumers.

The shape of the next eighteen months

The original framing of this question was whether the day of AI agents had already come and gone. The Medallia handover answers it more precisely. The day arrived. The bet that PE placed at peak 2021 multiples on agents transforming legacy SaaS is being marked to market through forced restructurings. The day arrived for consumer extraction, where coherence is not a binding constraint. The day did not arrive for the workflows where the money was placed.

The bill on that mismatch is being paid in $50 billion-plus of distressed software debt and $285 billion of market cap that vanished in February. The next dozen names on the PE side are already in the queue: Pluralsight, Medallia, then Qualtrics, Proofpoint, HealthEdge, Alteryx, Quest, Cornerstone, Cloud Software Group. Each handover transfers operational control of mission-critical infrastructure from sponsors to lenders, with all the IP-stripping, workforce cuts, and service degradation the playbook implies. Vistaโ€™s drop-down transaction on Pluralsight, moving IP out of lendersโ€™ reach, set the template for how sponsors fight on the way down.

The professional skill of this field is willingness to look one layer deeper before reacting to the launch. Skip the demos. Read the pilot postmortems. Watch the PIK expirations, the lender markdowns, the syndication pulls, the state bills, the parliamentary motions. The shape of the deployment, and the shape of who is willing to defend it, is visible there before it shows up in the keynote.


Disclosure: Thoma Bravo reached out to the author in the fall of 2025 to discuss her published work on agentic AI deployment.

Featured

Jennifer Evans
Jennifer Evanshttps://www.b2bnn.com
principal, @patternpulseai and cofounder, techresetcanada. AI policy, research and analysis. #basicincome and anti-poverty activist. Machine learning since 2009.