SPACE Metrics

SPACE is a 2021 framework for measuring developer productivity, published in ACM Queue by Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler (Microsoft Research, GitHub, and the University of Victoria). Its contribution was not a metric. It was a stance: productivity is multi-dimensional, and any attempt to capture it in one number is measuring the wrong thing. Where DORA Metrics measures the delivery system, SPACE measures the developers and the work.

The five dimensions

The acronym is the model. Each dimension is deliberately broad, and the framework gives example metrics rather than a fixed set.

S — Satisfaction and well-being. How fulfilled developers are with their work, tools, and team, and whether they are burning out. Example metrics: developer satisfaction surveys, employee net promoter score, retention, satisfaction with tooling and code review.
P — Performance. Outcomes, not output. The quality and impact of what ships. Example metrics: reliability and absence of defects, customer satisfaction, feature adoption. This is outcome-based on purpose; counting commits is not performance.
A — Activity. Counts of actions: commits, pull requests, code reviews completed, builds, deployments, incidents resolved. The framework is blunt that activity is easy to measure and easy to misread, because activity is not the same as value.
C — Communication and collaboration. How work and knowledge flow between people. Example metrics: code-review speed and quality, documentation discoverability, onboarding time for new members, network measures of who works with whom.
E — Efficiency and flow. The ability to do work with minimal interruptions and handoffs. Example metrics: uninterrupted focus time, number of handoffs in the value stream, wait time between steps, perceived ability to stay in flow.

The three rules that make it work

SPACE is less a list than a method, and the method has three load-bearing constraints.

Use metrics from more than one dimension. The framework’s core instruction is to pick two or three metrics spanning at least two or three different dimensions. A single dimension, like a single metric, invites gaming. Activity plus satisfaction plus performance triangulate in a way that any one alone cannot.

Include perceptual data, not just system data. This is the sharpest break from DORA and the metrics-dashboard tradition. A large share of what SPACE recommends measuring comes from asking developers, not from instrumenting Git. Some of the most important things (satisfaction, flow, whether collaboration is working) leave no trace in system logs and can only be surveyed. A framework that only counts what tools emit is structurally blind to them.

Do not measure individuals. SPACE states plainly that these are meant for teams and systems, not for ranking or evaluating people. Applied to an individual, activity metrics reward the wrong behavior and perception surveys have samples too small to trust.

The five myths it was written to kill

The paper is organized around debunking five common beliefs, and the list is the most useful thing in it:

Productivity is all about developer activity. (More activity is not more value.)
Productivity is only about individual performance. (It is a team and system property.)
One productivity metric can tell us everything. (No single number captures it.)
Productivity measures are useful only for managers. (Developers use them to improve their own work.)
Productivity is only about engineering systems and tools. (Satisfaction and collaboration matter as much as pipelines.)

Where it falls short

SPACE’s strength, its refusal to prescribe, is also its practical weakness. There is no standard implementation, so two organizations “using SPACE” may share nothing but the acronym, and benchmarking across companies is close to impossible. The perception half depends on surveys, which carry ongoing cost and decay fast: run the survey monthly and, by one analysis, response rates fall below 30% within two quarters, at which point the data is a self-selected sample. And the Activity dimension has aged badly in the AI era. It counts commits and PRs with no mechanism to attribute authorship, so when an AI generates a large share of the code, Activity rewards volume regardless of origin. This is part of why DX Core 4 later tried to package SPACE’s multi-dimensional insight into a fixed, comparable scorecard, a case its vendor (DX) has commercial reason to press.

There is a deeper move worth naming. SPACE tells you to measure across dimensions but not how to weigh them or what a good reading looks like, so in a strict sense it cannot be wrong: any bad outcome can be explained away as having picked the wrong metrics. It hands the hardest decision, the tradeoff between dimensions, back to the reader and calls that flexibility. The “Try it” experiment below quietly resolves one such tradeoff (a satisfaction drop overriding a cycle-time gain) that the framework itself never authorizes.

Try it

Run a minimal SPACE reading on one team for one sprint (a few hours to set up, one sprint to collect). Pick one metric from three different dimensions so the triangulation is real: pull request cycle time from a Git query (Efficiency), a three-question pulse survey on satisfaction and flow (Satisfaction), and a simple count of production incidents or defect escapes (Performance). Put the three side by side at sprint’s end. Watch for the contradiction SPACE exists to surface: a sprint where cycle time improved but the satisfaction score dropped is a team going faster by grinding people down, which a single-dimension dashboard would have reported as an unambiguous win.

Sources

Forsgren, Storey, Maddila, Zimmermann, Houck, Butler, “The SPACE of Developer Productivity,” ACM Queue (2021) — the original paper: the five dimensions, the example metrics, the three usage rules, and the five myths.
Communications of the ACM version of the same paper — the wider-circulation reprint.
DX, “The SPACE of Developer Productivity: there’s more to it than you think” — practitioner walkthrough of applying the dimensions.
Larridin, “SPACE Framework Explained (2026): What It Measures and Where It Falls Short” — the survey-fatigue and AI-attribution critiques.