Jon Moshier / Notes / SPACE Metrics budding
Note · From the Notebook

SPACE Metrics

The 2021 framework that reframed developer productivity as five dimensions instead of one number, insisting on perception data and refusing individual measurement.

SPACE Metrics

SPACE is a 2021 framework for measuring developer productivity, published in ACM Queue by Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler (Microsoft Research, GitHub, and the University of Victoria). Its contribution was not a metric. It was a stance: productivity is multi-dimensional, and any attempt to capture it in one number is measuring the wrong thing. Where DORA Metrics measures the delivery system, SPACE measures the developers and the work.

The five dimensions

The acronym is the model. Each dimension is deliberately broad, and the framework gives example metrics rather than a fixed set.

The three rules that make it work

SPACE is less a list than a method, and the method has three load-bearing constraints.

Use metrics from more than one dimension. The framework’s core instruction is to pick two or three metrics spanning at least two or three different dimensions. A single dimension, like a single metric, invites gaming. Activity plus satisfaction plus performance triangulate in a way that any one alone cannot.

Include perceptual data, not just system data. This is the sharpest break from DORA and the metrics-dashboard tradition. A large share of what SPACE recommends measuring comes from asking developers, not from instrumenting Git. Some of the most important things (satisfaction, flow, whether collaboration is working) leave no trace in system logs and can only be surveyed. A framework that only counts what tools emit is structurally blind to them.

Do not measure individuals. SPACE states plainly that these are meant for teams and systems, not for ranking or evaluating people. Applied to an individual, activity metrics reward the wrong behavior and perception surveys have samples too small to trust.

The five myths it was written to kill

The paper is organized around debunking five common beliefs, and the list is the most useful thing in it:

  1. Productivity is all about developer activity. (More activity is not more value.)
  2. Productivity is only about individual performance. (It is a team and system property.)
  3. One productivity metric can tell us everything. (No single number captures it.)
  4. Productivity measures are useful only for managers. (Developers use them to improve their own work.)
  5. Productivity is only about engineering systems and tools. (Satisfaction and collaboration matter as much as pipelines.)

Where it falls short

SPACE’s strength, its refusal to prescribe, is also its practical weakness. There is no standard implementation, so two organizations “using SPACE” may share nothing but the acronym, and benchmarking across companies is close to impossible. The perception half depends on surveys, which carry ongoing cost and decay fast: run the survey monthly and, by one analysis, response rates fall below 30% within two quarters, at which point the data is a self-selected sample. And the Activity dimension has aged badly in the AI era. It counts commits and PRs with no mechanism to attribute authorship, so when an AI generates a large share of the code, Activity rewards volume regardless of origin. This is part of why DX Core 4 later tried to package SPACE’s multi-dimensional insight into a fixed, comparable scorecard, a case its vendor (DX) has commercial reason to press.

There is a deeper move worth naming. SPACE tells you to measure across dimensions but not how to weigh them or what a good reading looks like, so in a strict sense it cannot be wrong: any bad outcome can be explained away as having picked the wrong metrics. It hands the hardest decision, the tradeoff between dimensions, back to the reader and calls that flexibility. The “Try it” experiment below quietly resolves one such tradeoff (a satisfaction drop overriding a cycle-time gain) that the framework itself never authorizes.

Try it

Run a minimal SPACE reading on one team for one sprint (a few hours to set up, one sprint to collect). Pick one metric from three different dimensions so the triangulation is real: pull request cycle time from a Git query (Efficiency), a three-question pulse survey on satisfaction and flow (Satisfaction), and a simple count of production incidents or defect escapes (Performance). Put the three side by side at sprint’s end. Watch for the contradiction SPACE exists to surface: a sprint where cycle time improved but the satisfaction score dropped is a team going faster by grinding people down, which a single-dimension dashboard would have reported as an unambiguous win.

See also

Sources

← All notes Read recent essays →