Jon Moshier / Notes / Marginal Risk Framework draft
Note · From the Notebook

Marginal Risk Framework

A method borrowed from cyber threat modeling for asking not whether an AI model is dangerous, but how much danger it adds over tools that already exist.

Ask whether an open AI model is dangerous and you get an alarming answer: yes, it can explain how to build a weapon. Ask how much danger it adds over a search engine, a textbook, or a closed model you can already rent, and the answer often shrinks toward zero. That gap — what the release adds, not what the model can do — is marginal risk.

Kapoor, Narayanan, Bommasani and colleagues set out the idea in On the Societal Impact of Open Foundation Models (2024), borrowing it from cybersecurity threat modeling.

The six steps

Work through them in order — each one earns the next. Stanford HAI lays out the sequence:

  1. Name the threat and the actor. “Anyone with a browser” and “a state bioweapons program” are not the same threat.
  2. Existing risk. How much of this already exists without the model?
  3. Existing defenses. What already blocks or blunts it?
  4. Marginal risk. Given steps 2 and 3, what does the release actually add?
  5. Cost of defending. Can cheap or existing defenses absorb that addition?
  6. What you don’t know. State your assumptions and what would change your mind.

Jump straight to “the model can do X” and you have done step 1 and skipped the rest.

The baseline decides the answer

Added risk means nothing without a starting point, and the point you pick drives the result. The paper’s biosecurity case shows it. Open models can describe pandemic pathogens accurately — but that information is already online. If the model only echoes what a search returns, its added risk is near zero, however alarming its raw capability sounds. Against nothing, it looks dangerous. Against Google and a library, it may not.

The paper’s main finding annoys both sides equally. Across cyberattacks, bioweapons, and disinformation, the evidence was too thin to size the added risk either way. The framework gives shape to an argument almost no one has yet run on real data. This is the same ground as Open Weight vs Proprietary (vs Open Source): you can’t recall an open release, so a wrong estimate is costly.

Where it gets contested

Step 6 carries the weight. Low risk today doesn’t promise low risk tomorrow, the authors warn, because model capability and public defenses both keep moving. A “web search” baseline holds only while models add nothing search can’t — and that won’t hold forever. The UK AI Safety Institute treats the danger line as sliding with capability, not fixed.

Critics fear the framework lets people relax too soon: every release looks small against the last, even as the floor rises under all of them. Defenders reply that the alternative — judging models on raw capability — would block nearly every open release and forfeit the transparency and competition open models bring.

Try it

Run the six steps on a real threat (1-2 hours, no code). Take one misuse claim from a model’s release notes or a news story — say, “this open model helps write phishing emails.” Write a paragraph per step. Step 3 is the interesting one: list what already defends against it (spam filters; the fact that writing the email was never the hard part). Then size step 4 honestly. Most scary claims collapse at step 2 or 3; the few that survive are the ones worth regulating. Set your write-up beside the original claim and count how much was raw-capability noise with the baseline left out.

Sources

← All notes Read recent essays →