post from @CharleyGaz on X
Google’s quiet rollout of task-specific AI agents across GitHub, BigQuery, Looker, and database migration tooling has been widely described as a productivity upgrade for developers. But it’s more significant than that. That description undersells what is actually happening. What Google is shipping is not merely better automation, nor a collection of clever copilots. It is an architectural shift in how agency itself is implemented in AI systems.
These agents are genuinely agentic — not because the models are suddenly more intelligent, but because the intelligence has been relocated into the workflow, externalizer from the model in digestible, manageable bits that can’t go rogue.
At first glance, Google’s agents appear straightforward. One agent reviews GitHub pull requests via the Gemini CLI. Another generates and validates SQL in BigQuery. Another handles schema diffs and database migrations. Each agent performs a bounded role, interacts with tools, and produces concrete outputs such as diffs, queries, or execution logs. The key detail is that none of these agents are expected to maintain long internal reasoning chains or persistent internal memory across extended interactions. The architecture accepts its demonstrable limitations and works within (or technically without) them.
Instead, the workflow does the remembering.
In traditional AI systems, agency is assumed to live inside the model. The system must internally preserve intent, context, and significance across many steps. This is precisely where large language models fail at scale. As conversations grow longer or tasks become more complex, significance degrades. Context fragments. The system either drifts or hallucinates. This failure mode is not incidental; it is structural.
Google’s agent architecture sidesteps this limitation completely, bypassing it by limiting it instead of attempting to fix it.
Each agent operates only long enough to complete a single bounded transformation. The agent reads the current state of the world (a repository, a schema, a query plan, a test result), performs an action using tools, and writes the result back into the world as an artifact. That artifact becomes the authoritative state for the next agent or the next step in the workflow.
Meaning is no longer preserved internally. It is serialized. This is not model autonomy, but system-level agency enforced through structure.
In this system, significance does not travel through the model. It travels through files, diffs, schemas, logs, and pull requests. Each agent re-derives significance fresh by re-reading the current state, rather than attempting to remember it. This turns the classic AI problem on its head. Instead of asking a model to remember everything, the system ensures the model never has to.
This is why the agency is real.
Agency in this system is not defined by autonomy. It is defined by the ability to perceive state, act upon it, and reliably move a system toward an intended outcome. Google’s agents do exactly that, but the continuity of purpose is enforced externally by the workflow. The workflow is a deterministic scaffold that constrains, checkpoints, and re-grounds each decision. And it works. It is much more reliable.
Pull requests are outputs and they are control surfaces. Database schemas are more than data structures; they are memory. Execution logs go beyond telemetry; they are epistemic anchors. Human developers become the editors, reviewers, and governors of these state transitions instead of the primary executors of work.
This is also why hallucinations appear dramatically reduced in these systems. The agents are not “guessing” answers. They are proposing actions, executing them via tools, and reporting the results. When a GitHub agent says a test fails, it is not asserting a belief, it is summarizing an observed outcome. Reality is continuously consulted, and reality wins.
What makes this shift economically viable now is compute progress, but not in the simplistic sense of faster models. Inference has become cheap enough to keep agents resident. Tool invocation latency is low enough to stay within human attention thresholds. Storage and retrieval of intermediate artifacts is trivial. Together, these factors make persistent, stateful workflows feasible at scale.
The result is a form of agentic agency that does not rely on long-horizon internal cognition. Instead, it relies on checkpointed execution across a shared environment. Intelligence emerges from the coordination between short-lived reasoning bursts and durable external state. It’s still <0 on the Evans Ratio; it’s more scaffolding than AI, but it’s also more enterprise functional than any other agenctic AI we’ve seen.
This architecture quietly concedes something important: large language models cannot be trusted with long-term meaning preservation. Rather than fighting that limitation, Google has redesigned the environment so that meaning never needs to be carried internally for long. The world itself becomes the memory, the context window, and the coherence mechanism.
For enterprises, this matters far beyond developer tooling. It points toward a future where AI systems are deployed not as autonomous thinkers, but as tightly scoped actors embedded in auditable, reviewable workflows. Governance shifts from model behavior to process design. Risk is managed not by hoping models behave, but by constraining what they can do and forcing every step to leave a trace.
Is it regression? Not exactly. It’s maybe promise diverted or delayed in favour of reliability. This is not a step on the path to artificial general intelligence … ot maybe it is. Maybe AGI will be made up of millions of tight functional agents rolling up into a larger contained brain that governs the role and output of each node. The human brain cannot do that, and it could be how agents are deployed in enterprise going forward, maybe at multiple layers. It won’t work for everything, but it will work for a lot.
This is something more pragmatic, more appealing to the enterprise, in many ways, more powerful, and almost certainly more valuable: civil engineering for unreliable “minds”. And it works precisely because the workflow — not the model — is where intelligence is stabilized.
It’s definitely a shift away from the centralized all powerful hive mind that has dominated AGI discourse since fall 2022. Will the intelligence that governs our systems end up a hybrid of the two?
Conceptual Diagram: Artifact-Anchored Agentic Wotfkflow






