Openwashing — Jon Moshier

Openwashing is using the word “open” for its reputational halo while withholding the substance: source, data, or the legal freedoms that “open source” classically grants. The term was coined in 2009 by internet-policy researcher Michelle Thorne, modeled on greenwashing (itself from Jay Westerveld’s 1986 essay on hotels that asked guests to reuse towels “for the planet” while doing nothing else). In AI it has gone from fringe complaint to a documented market dynamic.

The mechanism: openness is composite and gradient

The reason openwashing works is that “open” is not one property. Andreas Liesenfeld and Mark Dingemanse surveyed 45 models marketed as open in Rethinking open source generative AI: open-washing and the EU AI Act (FAccT 2024) and scored each across 14 dimensions: training data, code, weights, documentation, licensing, access method, and more. Openness, they argue, is composite (many parts) and gradient (degrees), not a binary.

That structure is exploitable. A vendor can max out the one cheap, visible dimension (downloadable weights) and score zero on the expensive ones (training data, reproducible pipeline) while still saying “open” in the headline. Their finding: only a handful of the 45, such as AllenAI’s OLMo and BigScience’s BloomZ, were open in a meaningful sense. Meta, Google, and Microsoft scored high on marketing language and low on actual disclosure. A 2024 Nature piece called labeling Llama 3 “open” a case of openwashing systems better understood as closed. This is the practical face of the distinction drawn in Open Weight vs Proprietary (vs Open Source): open weight is one dimension, not the whole picture.

The regulatory incentive that made it worse

Openwashing in AI is not just vanity. It has a payoff written into law. The EU AI Act grants lighter obligations to “open source” models but does not clearly define the term. So calling a model open can mean fewer transparency requirements and less regulatory and scientific scrutiny. The vaguer the legal definition, the larger the reward for claiming the label, which is exactly why a rigorous, enforceable definition (the kind the OSI’s OSAID attempts) is contested terrain rather than a dry technicality.

Try it

Score a model’s openness yourself (about an hour, no code). Pick a model that calls itself “open source” (Llama, Mistral, or for contrast OLMo). Check it against a short version of the Liesenfeld–Dingemanse axes: are the weights downloadable, is the training code public, is the training data public, is there a real paper, what does the license actually permit, can you use it commercially without a cap. Mark each yes/partial/no. What you’re looking for: most “open source” models pass two or three axes and fail the rest, and the ones that fail hardest (training data, unrestricted license) are the ones that matter most for reproducibility and scrutiny.

Sources

Rethinking open source generative AI: open-washing and the EU AI Act — Liesenfeld & Dingemanse, FAccT 2024. The 45-model survey and the 14-dimension framework.
Radboud University summary — accessible writeup of the study.
The open secret of open washing — The Register on the Llama 3 case and the term’s history.

The mechanism: openness is composite and gradient

The regulatory incentive that made it worse

Try it

See also

Sources