Jennifer Evans • November 7, 2025
This dataset and accompanying visualization support the revised regression findings published in Evans’ Law — Revised Regression and Cross-Family Validation (Nov 2025).
The data represents hallucination threshold measurements across GPT, Claude, Gemini, and Grok model families, tested under identical long-context stress conditions (temperature 0.2, deterministic sampling, first incoherence stop).
Section 1: Visualization

Empirical fit: L = 1773 × M0.79.
Section 2: Dataset
| Model | Architecture | Parameters (B) | Hallucination Threshold (tokens) |
|---|---|---|---|
| Gem-4 | Gemini | 4 | 5,200 |
| GL-8 | Claude | 8 | 9,100 |
| GPT-15 | GPT | 15 | 14,000 |
| Cl-27 | Claude | 27 | 23,000 |
| Gem-45 | Gemini | 45 | 41,000 |
| Cl-75 | Claude | 75 | 59,000 |
| Grk-100 | Grok | 100 | 69,000 |
| Grk-119 | Grok | 119 | 82,000 |
| Gem-250 | Gemini | 250 | 118,000 |
Section 3: Regression Notebook
Evans’ Law — Regression Notebook (v3.1)
This notebook reproduces the log–log regression of hallucination thresholds (L, tokens) on model size (M, billions of parameters) across GPT, Claude, Gemini, and Grok families.
Model: log L = log c + α · log M | Best-fit parameters: α ≈ 0.79, c ≈ 1773, R² ≈ 0.97.
Notes: Temperature = 0.2, deterministic sampling, stop at first incoherence. Theoretical reference (M⁽¹·⁵⁾) is shown for comparison in the chart.
4) Summary
Regression on log-scaled model parameters (M, billions) vs hallucination thresholds (L, tokens) yields: L = 1773 × M0.79.
This result diverges from the theoretical M1.5 prediction, confirming sub-linear coherence scaling. Larger models show diminishing returns in sustained factual coherence, with “coherence cliffs” appearing once roughly 80% of the context window is consumed.
Citation: Evans, J. (2025). Evans M¹·⁵ Scaling Law: Empirical Update and Cross-Family Validation. PatternPulse.AI / B2B News Network, Version 3.1 (Nov 2025). Archived at Zenodo: DOI 10.5281/zenodo.17523736.





