A recent Nature article by four UC San Diego researchers declares that artificial general intelligence has arrived. “By reasonable standards, including Turing’s own,” they argue, “we have artificial systems that are generally intelligent.” The evidence? GPT-4.5 passes Turing tests 73% of the time, LLMs achieve gold medals at mathematical olympiads, and they can write code, prove theorems, and chat fluently with millions of users simultaneously.
It’s a bold claim. It’s also fundamentally wrong, not because current AI isn’t impressive, but because the authors are measuring against an outdated definition of intelligence that was never adequate in the first place, and have made some questionable arguments to validate their position.
Where Nature Gets It Right
To be fair, the Nature authors make some valid points about the limitations of traditional AGI skepticism. They’re correct that:
∙ We shouldn’t demand perfection or universality; no human excels at everything
∙ Intelligence doesn’t require biological embodiment or human-like architecture
∙ The “stochastic parrot” objection has become increasingly untenable as LLMs solve novel problems
∙ Many objections to AI intelligence apply arbitrary standards that humans themselves fail to meet
Their critique of moving goalposts also has merit: some skeptics do retreat to ever-higher bars whenever AI achieves a previously “impossible” task.
But here’s where they go wrong: they’re defending current AI against an obsolete definition of intelligence, then declaring victory. The problem isn’t that the goalposts moved unfairly, it’s that we were aiming at the wrong goalposts all along.
Why We Need a New Definition of Intelligence
The traditional definition of intelligence – the one Turing proposed and the Nature authors defend – focused on task performance: Can the system solve problems? Answer questions? Generate appropriate responses?
This made sense in 1950, when the question was whether machines could engage in sophisticated symbol manipulation at all. It even made sense through the 2010s, when the challenge was getting AI to perform human-level tasks.
But now that we have systems that can perform impressively on isolated tasks, we’re discovering what that old definition missed: intelligence isn’t about snapshot performance. AGI must be about sustained, coherent function over time.
Think about what intelligence actually does in the real world, whether in humans, animals, or any system we’d genuinely call intelligent:
Intelligence maintains stability under cognitive load. It doesn’t collapse when reasoning chains extend beyond a few steps.
Intelligence adjusts priorities dynamically. It recognizes what matters in shifting contexts and reallocates attention accordingly.
Intelligence sustains intent over time. It pursues goals coherently across thousands of steps, maintaining direction even as circumstances change.
Intelligence coordinates multiple functions simultaneously. It manages competing demands without systematic degradation.
These aren’t arbitrary add-ons to some pure concept of intelligence. These are intelligence, the functional capacities that let intelligent systems actually operate in complex, extended environments.
The New Model: Intelligence as Systemic Stability
This reframing reveals the critical question the Nature authors never asked: Can current AI maintain coherent intelligence over time?
The answer is no, and it’s mathematically predictable.
Recent empirical work has documented systematic coherence collapse in frontier models during extended reasoning. The pattern follows a power law: L ≈ 1969.8 × M^0.74, where L represents the conversation length (in turns) at which coherence begins to degrade, and M represents model parameter count in billions.
This isn’t a minor bug or training issue. It’s an architectural limitation that manifests as:
∙ Significance drift: Critical information gradually becomes indistinguishable from noise
∙ Context fragmentation: The system loses coherent connection to earlier reasoning
∙ Priority collapse: Goal-relevant information gets treated equivalently to irrelevant details
∙ Coordinated failure: Multiple interrelated functions can’t be sustained simultaneously
The Nature authors measured olympiad performance (single-turn excellence), code generation (isolated tasks), and Turing tests (brief conversations). They never measured what happens at turn 50, or turn 100, or during sustained goal pursuit across shifting contexts.
This veers into measuring knowledge, or memorization, versus measuring intelligence.
Why Nature’s Ten Objections Lack Coherence Themselves
The article addresses ten common objections to AI intelligence. But there are some baffling choices here to validate their argument – examples and comparisons that actually undermine rather than support their case for AGI. Let’s examine why none of them actually touch the core issue:
1. “They’re just parrots” – The authors argue LLMs do more than recombine training data. Fair enough. But parroting vs. genuine reasoning is somewhat distinct from the real question: can they maintain coherent reasoning over time?
2. “They lack world models” – The authors show LLMs can answer counterfactuals. True. But can they maintain a consistent world model across dozens of reasoning steps while adjusting for new information?
3. “They understand only words” – Frontier models now handle multimodal data. Granted. But the question isn’t modality – it’s whether understanding, in any modality, remains stable and coherent over extended operation, and how that data is applied.
4. “They don’t have bodies” – The authors correctly note embodiment isn’t required for intelligence. This has been understood since Descartes established mind-body dualism with “I think, therefore I am” – a cornerstone of Western philosophy that separates consciousness and reasoning from physical form. To not invoke Cartesian dualism here, open and shut, is an odd choice.
5. “They lack agency” – The authors argue agency isn’t required for intelligence, citing the Oracle of Delphi as an example of an intelligent system that only responded when queried. This is an astonishing choice. The Oracle of Delphi was a priestess who inhaled hallucinogenic ethylene vapors to enter a trance state, then produced deliberately vague and ambiguous pronouncements that required interpretation by priests – often yielding conflicting meanings. If the best historical example of “intelligence without agency” is a drugged priestess producing cryptic utterances requiring professional interpretation, operating nine days per year by “divine inspiration” with no sustained reasoning, no maintained context across consultations, no coherent goal-directed behavior over time, perhaps that reveals how weak the comparison to LLMs actually is. The question isn’t whether systems need to initiate goals – it’s whether they can maintain coherent, goal-directed reasoning without systematic collapse. The Oracle couldn’t. Current LLMs can’t either.
6. “They don’t have a sense of self” – The authors note humans with amnesia remain intelligent. True. But those humans maintain moment-to-moment coherence in reasoning. They don’t systematically lose track of what’s significant as conversations extend or stop and start.
7. “They are inefficient learners” – The authors argue efficiency doesn’t determine intelligence. Correct. But stability does. A slow learner who maintains coherent understanding is intelligent. A fast learner who forgets what matters after 50 turns is not.
8. “They hallucinate” – Humans experience incoherence and have false memories too, the authors note. Fair point. But human false memories don’t follow a predictable mathematical pattern of increasing incoherence with extended reasoning.
9. “They lack economic benefits” – The authors correctly separate economic utility from intelligence. We have seen countless examples of how intelligence and economic benefits are unrelated.
10. “Their intelligence is ‘alien’” – The authors embrace this, arguing alien intelligence is still intelligence. Absolutely true. But where intelligence originated or what form it takes is not particularly relevant if it passes the stability test. Alien intelligence that systematically collapses under cognitive load isn’t intelligence – it’s sophisticated pattern matching with predictable failure modes.
None of these objections addresses the fundamental question: Can these systems maintain stable, coherent, goal-directed operation over time?
The answer matters because even simple animals demonstrate this capacity. A dog maintains coherent pursuit of a goal across thousands of steps. A bird coordinates nest-building over days, adjusting priorities as circumstances change. These aren’t superintelligent systems, but they’re genuinely intelligent because they maintain functional coherence over time.
Current LLMs cannot. And that’s not a minor limitation. It’s the difference between intelligence and impressive performance.
What AGI Should Actually Mean
If intelligence is systemic stability rather than snapshot performance, AGI should be defined accordingly. In my 2023 framework, I proposed evaluating AI not by mastery of human-like tasks, but by sustained operational capacity:
Level 1: Coordinating AGI – Can manage multiple AI functions for a specific purpose continuously without degradation (e.g., coordinating delivery logistics for weeks) – running bookkeeping for a small company.
Level 2: Operational AGI – Can run operational aspects of organizations over months, maintaining coherent function and adjusting to changing circumstances – running the administration of a department; generating reports, analyzing data, highlighting issues.
Level 3: Social Caretaking AGI – Provides sustained support with emotional and intellectual reading/responding over extended periods, maintaining consistent understanding of individual contexts. Internalizing a person’s daily routine and helping them manage and maintain it.
Level 4: Society Recoding AGI – Optimizes outcomes across connected domains continuously, coordinating functions that span months or years. Managing the administration of a small city or country.
Level 5: Human Future Redesign AGI – Demonstrates genuine long-term planning and preservation across years or decades, maintaining coherent goals while adjusting strategies. Managing an integrated civilization: where needs are, what resources will help meet those needs, what needs to be allocated where.
This isn’t about whether AI can write poetry or solve math problems. It’s not even about morality. It’s about whether AI can function as an operator, coordinator, analyst and allocator, using decision making, maintaining coherent operation in the complex, extended contexts where intelligence actually matters.
By these standards, we’re not even reliably at Level 1.
How We’ll Know When We Reach AGI
True AGI won’t announce itself through benchmark scores or Turing test results. We’ll recognize it through operational reality:
The primary criterion is stability.
When AI systems can maintain Level 2 functions (managing organizational operations) for months without coherence collapse, that will be AGI.
When Evans’ Law no longer predicts systematic degradation because the architectural limitations have been solved, that will be AGI.
When systems can coordinate across domains for thousands of reasoning steps while maintaining context, adjusting priorities, and acting with sustained intent, that will be AGI.
When enterprises deploy AI for complex, extended workflows and report stable performance rather than gradual degradation, that will be AGI.
The test isn’t “can it fool a human in a five-minute conversation?” The test is “can it maintain coherent function over timescales that matter?”
We’re not there yet. Not because current AI can’t do impressive things: it can. It is a powerful pattern recognizer and useful enabler. But immediate impressive performance without stability isn’t intelligence. It’s sophisticated automation with predictable failure modes.
The Nature authors confused the two. They measured impressive performance and declared it intelligence. They ignored fundamental philosophy, misused history, and never tested for the one thing that actually matters: stability over time. The distinction matters more than they realize. This isn’t about moving goalposts – it’s about finally aiming at the right target. We’ll have AGI when systems can sustain coherent operation over timescales that matter. Until then, we have powerful tools with predictable limitations. Recognizing the difference isn’t skepticism – it’s scientific rigor.
Jennifer Evans is the founder of Pattern Pulse AI and B2B News Network. Her research on AI system reliability and coherence collapse has achieved over 5,200 downloads and validation from major AI labs. She established AI Conversational Phenomenology as a discipline and developed Evans’ Law, which predicts coherence collapse in extended AI reasoning.





