Evans’ Law: Error Likelihood Rises Super-linearly with Prompt+Output Length
Author: Jennifer Evans
Date: November 2025

Abstract
Language models exhibit a consistent performance decay as input and output lengths increase. Evans’ Law defines this relationship:
“Whenever you request an output, the longer the prompt and the longer the response, the higher the likelihood there will be mistakes — until the likelihood of mistakes exceeds the likelihood of accuracy.”
This law captures the practical tipping point — or Evans’ Threshold — where model complexity, token window limits, and internal inconsistency converge to create error dominance.

Background
While prior studies (e.g., Lost in the Middle, Context Rot, Context Length Hurts Performance) documented accuracy degradation in long contexts, none formalized the relationship as a simple law balancing length versus reliability. Evans’ Law reframes this behavior as a probabilistic constraint on accuracy in both generation and reasoning.
The principle emphasizes that:
- Longer prompts increase contextual entropy (model confusion).
 - Longer outputs accumulate local inaccuracies.
 - The combination drives super-linear error growth.
 
The Evans Curve
A two-dimensional representation shows Accuracy (%) declining as Token Length (Prompt + Output) increases.
The slope steepens beyond the Evans’ Threshold — typically around 8,000–12,000 tokens for GPT-class models — depending on temperature, model size, and domain complexity.
(Graph included)
Empirical Indicators – Operational Applications
- Prompt Engineering – Break large instructions into smaller, modular tasks.
 - AI Content Workflows – Generate in 3–4 short passes rather than one long one.
 - Enterprise AI Systems – Implement “context governors” to cap token depth dynamically.
 - Agentic AI Frameworks – Use chaining memory between short agents instead of one long reasoning thread.
 
Evans’ Threshold Estimation Formula (Heuristic)
Proposed: Approximate the tipping point as:
T ≈ (M × α) ÷ C
Where:
- T = token threshold before accuracy inversion
 - M = model size in billions of parameters
 - α = alignment constant (0.6–0.9 for instruction-tuned models)
 - C = complexity coefficient (1 = simple Q&A, 3 = reasoning, 5 = multi-part synthesis)
 
Example: GPT-4-class model (M=175, α=0.8, C=3) → T ≈ 46,000 tokens theoretical, ~10,000 observed.
UPDATE: After running initial experiments, revised formula is:
EVANS’ LAW (Revised)
T ≈ (M^1.5 × α × K) ÷ C^β
Where:
- M = model parameters (billions)
 - α = 0.5-0.9 (alignment quality)
 - K = 50-200 (empirical constant)
 - C = task complexity (1-5)
 - β = 1-2 (complexity scaling)
 
Implications for Future LLMs
“Memory-rich” architectures will blur this limit.
- Agentic AI distributes reasoning.
 - Retrieval-Augmented Generation (RAG) localizes context.
 - Persistent memory introduces continuity across prompts, reducing the need for long context spans.
 
Evans’ Law predicts that true intelligence at scale will emerge only when the retention-to-processing ratio increases — when LLMs can “hold” rather than “recreate” their reasoning with every query.
Citations
- Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Stanford CS
 - Chroma Research (2024). Context Rot: How Increasing Input Tokens Impacts LLM Performance. TryChroma Research (The Chroma Research “Context Rot” citation is based on a 2025 internal blog post and benchmark by Chroma titled “Context Rot: Degradation in Long Context Performance” that circulated widely but wasn’t an academic preprint. It’s an industry source, but not a peer-reviewed paper.)
 - Zhang et al. (2025). Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv 2510.05381
 - Peysakhovich, A., & Lerer, A. (2023). Attention Sorting Combats Recency Bias in Long-Context Language Models. arXiv:2310.01427. 
Veseli et al. (2025). Positional Biases Shift as Inputs Approach Context Window Limits. ArXiv.
Evans, J. (2025). Prompt Complexity and Error Propagation in Non-Persistent Architectures (concept paper, in development). 


                                    


