Tuesday, November 4, 2025
spot_img

How LLM Errors Compound until Likelihood of Inaccuracy is Higher than Accuracy (Evans’ Law)

Evans’ Law: Error Likelihood Rises Super-linearly with Prompt+Output Length

Author: Jennifer Evans

Date: November 2025

Abstract

Language models exhibit a consistent performance decay as input and output lengths increase. Evans’ Law defines this relationship:

“Whenever you request an output, the longer the prompt and the longer the response, the higher the likelihood there will be mistakes — until the likelihood of mistakes exceeds the likelihood of accuracy.”

This law captures the practical tipping point — or Evans’ Threshold — where model complexity, token window limits, and internal inconsistency converge to create error dominance.

Background

While prior studies (e.g., Lost in the Middle, Context Rot, Context Length Hurts Performance) documented accuracy degradation in long contexts, none formalized the relationship as a simple law balancing length versus reliability. Evans’ Law reframes this behavior as a probabilistic constraint on accuracy in both generation and reasoning.

The principle emphasizes that:

  • Longer prompts increase contextual entropy (model confusion).
  • Longer outputs accumulate local inaccuracies.
  • The combination drives super-linear error growth.

The Evans Curve

A two-dimensional representation shows Accuracy (%) declining as Token Length (Prompt + Output) increases.

The slope steepens beyond the Evans’ Threshold — typically around 8,000–12,000 tokens for GPT-class models — depending on temperature, model size, and domain complexity.

(Graph included)

Empirical Indicators – Operational Applications

  1. Prompt Engineering – Break large instructions into smaller, modular tasks.
  2. AI Content Workflows – Generate in 3–4 short passes rather than one long one.
  3. Enterprise AI Systems – Implement “context governors” to cap token depth dynamically.
  4. Agentic AI Frameworks – Use chaining memory between short agents instead of one long reasoning thread.

Evans’ Threshold Estimation Formula (Heuristic)

Proposed: Approximate the tipping point as:

T ≈ (M × α) ÷ C

Where:

  • T = token threshold before accuracy inversion
  • M = model size in billions of parameters
  • α = alignment constant (0.6–0.9 for instruction-tuned models)
  • C = complexity coefficient (1 = simple Q&A, 3 = reasoning, 5 = multi-part synthesis)

Example: GPT-4-class model (M=175, α=0.8, C=3) → T ≈ 46,000 tokens theoretical, ~10,000 observed.

UPDATE: After running initial experiments, revised formula is:

EVANS’ LAW (Revised)

T ≈ (M^1.5 × α × K) ÷ C^β

Where:

  • M = model parameters (billions)
  • α = 0.5-0.9 (alignment quality)
  • K = 50-200 (empirical constant)
  • C = task complexity (1-5)
  • β = 1-2 (complexity scaling)

Implications for Future LLMs

“Memory-rich” architectures will blur this limit.

  • Agentic AI distributes reasoning.
  • Retrieval-Augmented Generation (RAG) localizes context.
  • Persistent memory introduces continuity across prompts, reducing the need for long context spans.

Evans’ Law predicts that true intelligence at scale will emerge only when the retention-to-processing ratio increases — when LLMs can “hold” rather than “recreate” their reasoning with every query.

Citations

  1. Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Stanford CS
  2. Chroma Research (2024). Context Rot: How Increasing Input Tokens Impacts LLM Performance. TryChroma Research (The Chroma Research “Context Rot” citation is based on a 2025 internal blog post and benchmark by Chroma titled “Context Rot: Degradation in Long Context Performance” that circulated widely but wasn’t an academic preprint. It’s an industry source, but not a peer-reviewed paper.)
  3. Zhang et al. (2025). Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv 2510.05381
  4. Peysakhovich, A., & Lerer, A. (2023). Attention Sorting Combats Recency Bias in Long-Context Language Models. arXiv:2310.01427.
    Veseli et al. (2025). Positional Biases Shift as Inputs Approach Context Window Limits. ArXiv.  
    Evans, J. (2025). Prompt Complexity and Error Propagation in Non-Persistent Architectures (concept paper, in development).

Featured

Outsourcing For Outstanding Results: Where Is Outside Help Advised?

Credit : Pixabay CC0 By now, most companies can appreciate...

3 Essential Tips to Move to A New Country For Your Business

Image Credit: Jimmy Conover from Unsplash. Countless people end up...

The New Formula 1 Season Has Begun!

The 2025 Formula 1 season has kicked off with...

Savings Tips for Financial Success

Achieving financial success often starts with good saving habits....
Jennifer Evans
Jennifer Evanshttp://www.b2bnn.com
principal, @patternpulseai. author, THE CEO GUIDE TO INDUSTRY AI. former chair @technationCA, founder @b2bnewsnetwork #basicincome activist. Machine learning since 2009.