Jon Moshier / Notes / Recursive Self Improvement in AI budding
Note · From the Notebook

Recursive Self Improvement in AI

The classic intelligence-explosion loop versus what's actually self-improving in 2025: agents that rewrite their own scaffolding, gated by verifiers, and bottlenecked by compute.

Recursive self-improvement (RSI) is the idea that an AI system could improve its own ability to improve itself, each generation building a better successor. The term covers two very different things in 2025: a 60-year-old theoretical loop that has never been observed, and a set of working systems that improve the code and prompts around a frozen model. The gap between them is where the interesting questions live.

The lineage: from Good to the Gödel machine

The seed idea is I.J. Good’s 1965 “Speculations Concerning the First Ultraintelligent Machine”: an ultraintelligent machine could design even better machines, so “the intelligence of man would be left far behind,” an “intelligence explosion.” Good’s loop is about returns on cognitive investment. If improving cognition raises the rate at which you can improve cognition, the curve goes vertical.

The first formal version is Jürgen Schmidhuber’s [private link]: a self-referential program that rewrites any part of its own code, including the rewrite logic, but only when it can prove the change raises expected reward. It is provably optimal and practically inert, because proving such theorems is intractable. That tension defines the field. The clean theoretical object can’t run; the things that run aren’t clean.

What is actually self-improving in 2025

Three systems mark the current frontier, and they share a structure the theory didn’t predict: the foundation model stays frozen, and what improves is the scaffold around it (the agent’s own code, prompts, and tools).

The verifier is the load-bearing part

The common mechanism across every working system is a cheap, reliable verifier. AlphaEvolve has automated evaluators. The Darwin Gödel Machine has SWE-bench’s test suites. LADDER has a symbolic integration checker. The model proposes; the verifier disposes. Improvement compounds only in the slice of problem space where ground truth is cheap to check.

This is the same boundary that shows up in Model Collapse and the AI Data Crisis: synthetic data helps when a verifier filters the garbage out of the loop (AlphaZero’s game outcomes, a formal proof, a unit test) and degrades the model when it doesn’t. Self-improvement and self-poisoning are the same loop with and without an oracle. Coding and math are improving fast because they come with verifiers attached. Taste, strategy, and open-ended writing don’t, which is why nobody demonstrates RSI there. A January 2026 analysis argues this is structural, not temporary: without symbolic model synthesis, the singularity is not near.

Is it explosive? The compute-versus-labor debate

The live empirical question is whether automating AI research triggers Good’s runaway loop or hits a wall. Two pieces of evidence pull in opposite directions.

METR’s time-horizon work measures the length of task an agent can complete at 50% reliability and finds it doubling roughly every 7 months from 2019 to 2025, with their updated TH1.1 estimate putting the post-2023 doubling near 4.3 months. That is the curve an optimist extrapolates toward AI doing AI R&D.

Against that, Whitfill and Wu’s Will Compute Bottlenecks Prevent an Intelligence Explosion? estimates the elasticity of substitution between research compute and cognitive labor across OpenAI, DeepMind, Anthropic, and DeepSeek from 2014 to 2024. Their baseline model says compute and labor are substitutes (you can trade more thinking for less hardware). But their frontier-experiments specification, which accounts for the scale of state-of-the-art runs, says they are complements. If they are complements, infinite cognitive labor with fixed compute hits a hard ceiling. A software-only intelligence explosion would stall on a hardware ceiling that no amount of cleverness can conjure away. The result is paradigm-dependent, which is the honest state of the debate. The 2025 systems automate research tasks; none has demonstrated the self-sustaining loop, and gains often shrink once inference-time tricks are stripped out, mirroring the human-review bottleneck documented in Writing Code vs Shipping Code - AI Productivity Across Tool Generations.

Try it

Build a minimal Darwin-Gödel loop (a weekend, Python + any LLM API). Write a tiny coding agent (read file, edit file, run tests) as a single script. Give it one job: improve its own edit function so it passes more cases on a held-out set of SWE-bench Lite tasks or even a handful of LeetCode problems with hidden tests. Each iteration: ask the model to propose a code change to the agent, apply it in a sandbox, run the test suite, keep the variant in an archive scored by pass rate. Watch for two things the paper reports: emergent helper behaviors you didn’t prompt (it adds retry logic, logging, validation), and the moment progress flatlines once the easy verifier-gated wins are exhausted. The flatline is the whole argument made visible.

Strip the verifier (1-2 hours, same setup). Re-run the loop but replace the test suite with the model grading its own output. Self-improvement should stall or reverse almost immediately. That contrast is the verifier dependency from the section above, reproduced on your laptop.

See also

Sources

← All notes Read recent essays →