How LLM Errors Compound Until Likelihood Of Inaccuracy Is Higher Than Accuracy (Evans' Law)

Evans’ Law: Error Likelihood Rises Super-linearly with Prompt+Output Length

Author: Jennifer Evans

Date: November 2025

Abstract

Language models exhibit a consistent performance decay as input and output lengths increase. Evans’ Law defines this relationship:

“Whenever you request an output, the longer the prompt and the longer the response, the higher the likelihood there will be mistakes — until the likelihood of mistakes exceeds the likelihood of accuracy.”

This law captures the practical tipping point — or Evans’ Threshold — where model complexity, token window limits, and internal inconsistency converge to create error dominance.

Background

While prior studies (e.g., Lost in the Middle, Context Rot, Context Length Hurts Performance) documented accuracy degradation in long contexts, none formalized the relationship as a simple law balancing length versus reliability. Evans’ Law reframes this behavior as a probabilistic constraint on accuracy in both generation and reasoning.

The principle emphasizes that:

Longer prompts increase contextual entropy (model confusion).
Longer outputs accumulate local inaccuracies.
The combination drives super-linear error growth.

The Evans Curve

A two-dimensional representation shows Accuracy (%) declining as Token Length (Prompt + Output) increases.

The slope steepens beyond the Evans’ Threshold — typically around 8,000–12,000 tokens for GPT-class models — depending on temperature, model size, and domain complexity.

(Graph included)

Empirical Indicators – Operational Applications

Prompt Engineering – Break large instructions into smaller, modular tasks.
AI Content Workflows – Generate in 3–4 short passes rather than one long one.
Enterprise AI Systems – Implement “context governors” to cap token depth dynamically.
Agentic AI Frameworks – Use chaining memory between short agents instead of one long reasoning thread.

Evans’ Threshold Estimation Formula (Heuristic)

Proposed: Approximate the tipping point as:

T ≈ (M × α) ÷ C

Where:

T = token threshold before accuracy inversion
M = model size in billions of parameters
α = alignment constant (0.6–0.9 for instruction-tuned models)
C = complexity coefficient (1 = simple Q&A, 3 = reasoning, 5 = multi-part synthesis)

Example: GPT-4-class model (M=175, α=0.8, C=3) → T ≈ 46,000 tokens theoretical, ~10,000 observed.

UPDATE: After running initial experiments, revised formula is:

EVANS’ LAW (Revised)

T ≈ (M^1.5 × α × K) ÷ C^β

Where:

M = model parameters (billions)
α = 0.5-0.9 (alignment quality)
K = 50-200 (empirical constant)
C = task complexity (1-5)
β = 1-2 (complexity scaling)

Implications for Future LLMs

“Memory-rich” architectures will blur this limit.

Agentic AI distributes reasoning.
Retrieval-Augmented Generation (RAG) localizes context.
Persistent memory introduces continuity across prompts, reducing the need for long context spans.

Evans’ Law predicts that true intelligence at scale will emerge only when the retention-to-processing ratio increases — when LLMs can “hold” rather than “recreate” their reasoning with every query.

Citations

Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Stanford CS
Chroma Research (2024). Context Rot: How Increasing Input Tokens Impacts LLM Performance. TryChroma Research (The Chroma Research “Context Rot” citation is based on a 2025 internal blog post and benchmark by Chroma titled “Context Rot: Degradation in Long Context Performance” that circulated widely but wasn’t an academic preprint. It’s an industry source, but not a peer-reviewed paper.)
Zhang et al. (2025). Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv 2510.05381
Peysakhovich, A., & Lerer, A. (2023). Attention Sorting Combats Recency Bias in Long-Context Language Models. arXiv:2310.01427.
Veseli et al. (2025). Positional Biases Shift as Inputs Approach Context Window Limits. ArXiv.
Evans, J. (2025). Prompt Complexity and Error Propagation in Non-Persistent Architectures (concept paper, in development).

How LLM Errors Compound until Likelihood of Inaccuracy is Higher than Accuracy (Evans’ Law)

Featured

Outsourcing For Outstanding Results: Where Is Outside Help Advised?

Breaking the Mold: Why Real Estate Entrepreneurs Are Looking Past Traditional Loans

3 Essential Tips to Move to A New Country For Your Business

The New Formula 1 Season Has Begun!

Savings Tips for Financial Success