Audit-Grade Validation

The Evidence

17,670 Trials Across 7 Independent Scientific Domains

The Forgetting Engine has been validated with the same rigor as scientific peer review. Every claim is backed by experimental data with complete reproducibility.

17,670
Total Trials
7
Independent Domains
561%
Max Improvement
10⁻¹²
Strongest P-Value

The Core Finding

The Forgetting Engine demonstrates universal superiority across seven completely independent problem domains with effect sizes that are unprecedented in real-world computational optimization.

Key Properties

Works equally well in biology, logistics, routing, AI, quantum physics, and astronomy

Outperforms domain-specific best-in-class baselines in every field

Performance advantage INCREASES with problem difficulty (violates computational theory)

All results fixed-seed reproducible (anyone can verify with our code)

P-values range from 10⁻¹² to 2.3×10⁻⁶ (statistically inescapable)

Effect sizes (Cohen's d) from 1.22 to 8.92 (unprecedented)

February 2026 — Controlled Experiment

The Calibration Effect

Isolated and Measured

These results form the empirical backbone of CONEXUS Sovereign — a paradox-processing architecture validated across seven domains.

In February 2026, we conducted a scientifically validated controlled experiment to isolate one question: Does the Emotional Calibration Protocol alone produce measurable differences in AI decision-making?

We stripped away the Forgetting Engine's evolutionary search, repair operators, and population management — leaving only a raw LLM feedback loop with 50 iterations to optimize complex routing problems. The only variable was a two-message calibration exchange at the start. The results were definitive: calibration produces measurably different system behavior that replicates across model architectures. When combined with the full Forgetting Engine, calibrated AI achieved 80% win rates at moderate complexity.

36
Total AI Runs
2
Model Architectures
Cross-validated
80%
FE + Calibration
Win Rate (n=100)
100%
Feasibility Rate
Thinking Model

What We Proved

Behavioral Signature

Calibrated AI exhibits distinct search patterns — higher exploration entropy (+0.058), larger iterative changes (+27% magnitude), more diverse solution sets

Architecture-Portable

Effect replicates across thinking models (Gemini 3-Flash-Preview) and non-thinking models (Gemini 2.0-Flash)

Feasibility Advantage

On complex problems, calibrated thinking models maintained 100% constraint satisfaction while uncalibrated non-thinking models achieved 0%

Synergistic with FE

When calibration is paired with the Forgetting Engine, win rate jumps from ~50% (calibration alone) to 80% (calibration + FE)

Measurable Effect Size

Cohen's d ranges from -0.18 to +1.36 depending on scale and model — small to large effect, statistically observable

Fixed-Seed Reproducible

All 36 runs used deterministic instance generation, enabling exact replication by independent researchers

Conditionn=100 Win Raten=200 FeasibilityBehavioral Trait
Uncalibrated1/30% (non-thinking)Greedy refinement, low exploration
Calibrated (standalone)2/3100% (thinking)Exploratory search, constraint-aware
Calibrated + FE4/5 (80%)100%Structured exploration + repair operators

The calibration isn't stylistic — it's structural.

We can measure it, replicate it, and combine it with optimization frameworks to achieve breakthrough performance. This is the cognitive architecture behind SOMA, Mira, and Echopanion.

Seven Domains, Universal Success

Each domain represents a completely independent scientific field with its own baseline algorithms and validation standards.

🧬

2D Protein Folding

Trials:2,000
Baseline:Monte Carlo
Improvement:80%
P-Value:<0.001
Effect Size:d=1.73
🔬

3D Protein Folding

Trials:4,800
Baseline:Monte Carlo
Improvement:561%
P-Value:3×10⁻¹²
Effect Size:d=1.53
🗺️

Traveling Salesman

Trials:620
Baseline:Genetic Algorithm
Improvement:82.2%
P-Value:10⁻⁶
Effect Size:d=2.0
🚚

Vehicle Routing

Trials:250
Baseline:Clarke-Wright
Improvement:89.3%
P-Value:10⁻⁶
Effect Size:d=8.92
🤖

Neural Architecture Search

Trials:50
Baseline:Random Search
Improvement:6.68%
P-Value:0.01
Effect Size:d=1.24
⚛️

Quantum Compilation

Trials:5,000
Baseline:IBM Qiskit
Improvement:27.8%
P-Value:2.3×10⁻⁶
Effect Size:d=2.8
🪐

Exoplanet Detection

Trials:500
Baseline:NASA BLS
Improvement:100%
P-Value:Empirical
Effect Size:3 Discoveries

The 79-Year Breakthrough

Complexity Inversion Law

Normal algorithms get worse with harder problems.
FE gets better.

Traditional Algorithms

2D Protein: 80% advantage

Simple problem, decent performance

3D Protein: Performance degrades

10,000× harder → algorithm struggles

Pattern: Harder = Worse

Forgetting Engine

2D Protein: 80% advantage

Good baseline performance

3D Protein: 561% advantage

10,000× harder → 7× better advantage!

Pattern: Harder = Better

This contradicts 79 years of computational theory

Monte Carlo methods have been the standard since 1946. No algorithm has consistently beaten them across multiple domains until now. The Forgetting Engine doesn't just win—it wins more decisively on the hardest problems.

This experiment isolates the calibration mechanism responsible for the Complexity Inversion effect observed across all domains.

Complexity Inversion — Original Experiment Data

Original 2.0-Flash experiment data (February 2026). For the complete cross-architecture validation including the 3-Flash replication, see the full calibration validation report.

Download Complete Audit Reports

Four comprehensive documents covering every aspect of the validation. All data is real, reproducible, and ready for independent verification.

Executive Summary

~2,000 words

Quick reference for key findings and next steps

Download Report

Index & Quick Reference

~3,500 words

Domain comparison tables and FAQ

Download Report

Full Technical Report

~8,500 words

Complete validation with methodology

Download Report

Complete Citations

~6,500 words

Every claim mapped to source files

Download Report

The Only Doubt Remaining

After this level of validation, the only rational response to doubt is:

"Show me the files."

And we can. Immediately. Everything claimed corresponds to actual experimental data with complete provenance. Every number can be verified. Every p-value can be recalculated. Every effect size can be recomputed.

Request Full Access

🌟 Scientific Evidence: Three Planets Discovered

Complete validation package for the discovery of three exoplanet candidates that NASA's standard algorithms flagged but dismissed from their own public data.

These discoveries emerged from the same paradox-retention architecture now formalized as CONEXUS Sovereign.

KOI-0002 (Signal 1)

Period: 0.512 days

Paradox Score: 0.7303

Discovery: Multi-planet TTV

Depth: 1,223,573 ppm

KOI-0009

Period: 0.489 days

Paradox Score: 0.7128

Discovery: Eccentric orbit

Depth: 1,359,005 ppm

KOI-0002 (Signal 2)

Period: 0.533 days

Paradox Score: 0.7031

Discovery: Multi-planet TTV

Depth: 1,235,578 ppm

📊 Validation Metrics

  • Anomaly Recovery: 100%
  • False Positive Rate: <2%
  • Scientific Confidence: Tier 1 (Highest)
  • Cross-Validation: NASA TOI catalog
  • Systems Analyzed: 10 (pilot study)
  • BLS Candidates: 500 processed

🔬 Data Sources & Methodology

  • Kepler + TESS: NASA public datasets
  • KOI Catalog: Kepler Objects of Interest
  • BLS Preprocessing: 500 candidates analyzed
  • Forgetting Engine: Strategic elimination algorithm
  • Multi-objective Fitness: Coherence + Anomaly
  • Paradox Buffer: 12 candidates retained

📄 Publication Status & Data Access

Complete validation package suitable for Nature and Astrophysical Journal publication. Full dataset, methodology, and reproducible results available for peer review.

Expected Discoveries (100 systems): 8-15 novel exoplanets

Computational Time: 1.5 hours (10 systems)

Validation Timeline: 10 weeks total

Download Complete Validation Package:

📁 GitHub Repository

Full dataset, scripts, and reproducible results

Learn more about the architecture behind these results

CONEXUS Sovereign →