DX Core 4

DX Core 4 is a 2024 measurement framework from DX (the developer-productivity company, led by Abi Noda with CTO Laura Tacho) that tries to end the DORA-versus-SPACE-versus-DevEx confusion by folding all three into one system of four dimensions. Its bet: teams do not need to choose a framework; they need one dashboard where speed cannot hide the cost it imposes.

The four dimensions and their primary metrics

Each dimension carries one primary metric plus a few secondaries. The primaries are the design:

Speed — primary metric diffs per engineer (merged pull/merge requests per engineer). Secondaries borrow DORA’s lead time and deployment frequency and add perceived rate of delivery, the rate at which developers feel they are shipping valuable work.
Effectiveness — primary metric the Developer Experience Index (DXI), an aggregate score built from 14 standardized survey items covering things like code quality, focus time, and CI/CD. Secondaries include ease of delivery and time to a developer’s tenth PR.
Quality — primary metric change failure rate (straight from DORA). Secondaries include failed-deployment recovery time, perceived software quality, and operational health.
Impact — the “are we building the right things” dimension. Primary metric percentage of engineering time spent on new capabilities (versus maintenance, support, and toil). Secondaries include revenue or initiative outcomes per engineer.

The framework is deliberate about “one primary metric per dimension.” A dashboard with four numbers gets looked at; a dashboard with forty gets ignored.

Counterbalancing is the actual idea

The stated design principle: “the four dimensions are designed to hold each other in tension.” Speed and output metrics used alone incite fear and gaming, so DX Core 4 gives them equal standing on the dashboard with the DXI and quality data. A team cannot post a great diffs-per-engineer number while its DXI craters and its change failure rate climbs, because all three sit on the same dashboard. This is the same instinct behind DORA’s throughput-versus-stability pairing (DORA Metrics) and SPACE’s insistence on multiple dimensions (SPACE Metrics), generalized into a single scorecard.

The honest limit: co-presence exposes a trade, it does not prevent one. The dashboard only protects a reader who actually looks at all four numbers. A manager who fixates on diffs and ignores the other three gets no more safety from Core 4 than from a single-metric dashboard; the framework supplies visibility, not enforcement.

Diffs per engineer is the controversial choice, and DX knows it. It is a throughput count, and throughput counts are exactly what Goodhart’s law wrecks. DX’s own guidance is that it works only as a team-level signal read alongside the other three dimensions, never as an individual target. The moment “diffs per engineer” lands in a performance review, it stops measuring flow and starts measuring PR-splitting.

The DXI: turning perception into a number

The most distinctive piece is the Developer Experience Index. It is a single number aggregated from Likert-scale survey responses, built on data from over 40,000 developers across 800 organizations. DX reports that moving the DXI up one point corresponds to saving roughly 13 minutes per developer per week, about 10 hours a year per developer. Treat that as a vendor-derived correlation, not a controlled causal estimate: DX sells developer-experience measurement, and the relationship is drawn from their own survey population. The useful part is the move it makes, converting the friction developers feel into a tracked quantity that sits next to the throughput numbers instead of being dismissed as soft.

Why it caught on for AI measurement

DX Core 4 became a common backbone for measuring AI’s effect on engineering, and the reason is structural. AI pushes hardest on the speed dimension: it makes diffs cheap and plentiful. Read speed alone and every AI rollout looks like a triumph. The other three dimensions are exactly the counterweights that keep that honest. Effectiveness (DXI) catches whether developers are actually experiencing less friction or just producing more to review. Quality (change failure rate) catches whether the cheap code is breaking production. Impact catches whether any of the extra output reached a user. See SDLC Delivery Metrics for AI-Assisted Engineering for the underlying pattern: AI floods the top of the pipeline, so the honest metrics live downstream.

Try it

Assemble a one-page Core 4 scorecard for one team (a weekend, spreadsheet plus a short survey). Pull the three system metrics you already have from Git and CI: diffs per engineer, change failure rate, and lead time. For the Effectiveness dimension, send a short pulse survey with a handful of the DXI-style Likert items (rate agreement, 1 to 5, on statements about code quality, focus time, and build/deploy friction) and average them. Lay all four dimensions on one page and track them monthly. Watch for the tension the framework is built to expose: a month where diffs per engineer rose but the survey score fell is a team being pushed to output at the cost of experience, which is the exact failure mode a speed-only dashboard would have called a success.

Sources

DX, “Measuring developer productivity with the DX Core 4” — the framework definition, the four dimensions, primary and secondary metrics, and the DXI benchmarks.
Abi Noda, “Introducing the DX Core 4” — the original announcement and the counterbalancing design rationale.
LeadDev, “How DX Core 4 aims to unify developer productivity frameworks” — independent coverage of how it folds DORA, SPACE, and DevEx together, with the debate over diffs per engineer.