Jon Moshier / Notes / Theory of Constraints budding
Note · From the Notebook

Theory of Constraints

Goldratt's claim that every system's output is set by a single binding constraint, the five-step method for exploiting it, and the accounting and scheduling machinery that follow from taking that claim seriously.

Theory of Constraints

The Theory of Constraints starts from one structural claim: in any system aimed at a goal, throughput is governed by a single most-constrained stage. A well-run line is deliberately unbalanced so that one stage stays the constraint and every other stage carries enough spare capacity to keep it fed. The claim holds most cleanly in a repetitive, ordered system whose constraint sits still long enough to exploit. Eliyahu Goldratt introduced it in the 1984 business novel The Goal, set in a failing plant that its manager saves by finding the two operations choking output, a bottlenecked machine and a heat-treat furnace. The framework is what happens when you follow that claim to its conclusions in scheduling, accounting, and project management, and discover most of them contradict standard practice.

The five focusing steps

The core procedure, which Goldratt called the Process Of On-Going Improvement (POOGI):

  1. Identify the constraint. The stage where demand exceeds capacity, where work piles up in front and starves behind.
  2. Exploit it. Wring maximum output from the constraint as it exists now, before spending a cent. Stop letting it sit idle at lunch, stop feeding it defective parts it will only reject, stop routing trivial work through it.
  3. Subordinate everything else to the decision in step 2. Every non-constraint runs at the pace the constraint can absorb, not at its own maximum.
  4. Elevate the constraint. Now spend: add a shift, buy a second machine, hire. Raise its capacity.
  5. Repeat. Elevation moves the constraint somewhere else. Go back to step 1, and do not let inertia become the new constraint.

Step three is the one that breaks intuition. Subordinating means deliberately running non-constraint stations below their capacity. A manager measured on local utilization will resist this: an idle machine looks like waste. Goldratt’s point is that a non-constraint running flat out cannot raise system throughput, because the constraint downstream can only pass so much. All the extra output does is accumulate as inventory in front of the bottleneck. Local efficiency and global throughput are not the same objective, and optimizing the first usually degrades the second. This is a constraint-level version of the Systems Thinking result that the intuitive intervention, pushing every part to work harder, makes the whole behave worse.

Drum-buffer-rope

The scheduling method that operationalizes subordination has three parts. The drum is the constraint, and its rhythm sets the pace for the whole line, because no plant can ship faster than its slowest necessary step. The buffer is a small amount of time-based inventory placed just ahead of the constraint, so that an upstream hiccup never starves it; a starved constraint is throughput lost forever, since the bottleneck can never make up the gap. The rope ties raw-material release to the drum’s consumption: work enters the front of the line only as fast as the constraint pulls it through, which caps total work-in-progress without a separate limit at every station.

Drum-Buffer-Rope is a close cousin of the Work-in-Progress Limits used in Kanban, and of CONWIP (constant work-in-progress) release. All three refuse to let a fast front end flood a slow back end. The QA Handoff Workflows case is drum-buffer-rope in miniature: a single QA analyst is the drum, and capping the “Ready for QA” column is the rope that stops developers from opening new tickets faster than QA can clear them. Without the rope, the queue in front of the constraint does not just grow, it amplifies, the same feedback-and-delay dynamic as the Bullwhip Effect.

Throughput accounting

The accounting follows from the same premise and lands somewhere hostile to standard cost accounting. Goldratt reduces the whole business to three numbers:

The decision rule is: maximize throughput first, then reduce inventory and operating expense. This matters because traditional cost accounting rewards the wrong behavior. Under standard costing, fixed overhead is absorbed into the value of inventory. A manager who overproduces to keep every machine busy books that overhead as an asset sitting in the warehouse, and posts a healthy paper profit, while real cash drains into goods nobody ordered. Throughput Accounting refuses to count unsold inventory as gain, so it kills the incentive to overproduce for local efficiency. Product-mix and make-or-buy decisions are ranked by throughput per unit of constraint time, not by unit cost, and the two rankings routinely disagree.

Critical chain: the theory applied to projects

Goldratt turned the same logic on project management in Critical Chain (1997). The constraint in a project is the longest chain of dependent steps once you also account for resource contention, not just task sequence. The pathology he targets is safety time wasted rather than saved. Individual task estimates are padded to roughly 80–90% confidence, then the padding evaporates through two mechanisms: Parkinson’s Law, work expanding to fill the time allotted, and student syndrome, starting each task at the last responsible moment so the buffer is gone before any real problem hits.

Critical Chain Project Management strips the padding out of individual tasks, estimating each at roughly 50% confidence, and aggregates the removed safety into shared buffers: a project buffer at the end of the critical chain, feeding buffers where non-critical paths join it, and resource buffers that warn when a needed resource must be free. Because task-level fluctuations are partly independent, an aggregated buffer can be smaller than the sum of the individual pads it replaces and still protect the deadline, the same statistical pooling that lets an insurer hold less reserve than the sum of individual worst cases. Progress is tracked by buffer consumption rather than by task-level milestones.

Evidence and where it is thin

The reported results are large. A survey of TOC applications cites cases like airline maintenance turnaround cut by 50% with a 20% productivity gain, and software teams shipping releases roughly 25% faster with less overtime. The honest caveat: most of this evidence is self-reported case studies from practitioners and consultancies, not controlled trials, and success stories survive selection where failures go unpublished. The core insight is also not wholly original. Bottleneck management and Goldratt’s earlier OPT scheduling software predate the branded framework, and critics note TOC repackages ideas from queuing theory and lean.

Where does it fit less well? TOC’s sharpest edge is a physical or capacity constraint on a repetitive line. When the binding constraint is a policy, a market, or a measurement, the thinking processes Goldratt added later (current-reality trees, evaporating clouds) are meant to apply, but they are qualitative and far less validated than the operational tools. In a high-variability, low-repetition environment the “single constraint” can move faster than you can subordinate to it. Deciding whether a problem even has a stable, locatable constraint is itself a Cynefin Framework judgment: drum-buffer-rope assumes a system ordered enough that cause and effect hold still long enough to exploit.

Try it

Goldratt’s dice game (1 hour, physical or 20 lines of Python). Model a balanced line: five stations in a row, each with average capacity of 3.5 units per turn, simulated by rolling one die. Tokens flow left to right; a station can only process tokens that reached it and only up to its roll. Feed the first station unlimited raw material. Naively the line should output 3.5 units per turn since every station averages 3.5. Run 20 turns and it will not. Throughput lands well below 3.5 and work-in-progress piles up in front of whichever station is having a bad streak. What you are looking for is the interaction of statistical fluctuation and dependency: a station that rolls high cannot make up for the one upstream that rolled low, because it never received the tokens, but a low roll propagates downstream immediately. Now give one station a fixed low capacity to make it the clear constraint, place a buffer of a few tokens in front of it, and release material only when it consumes, drum-buffer-rope. Throughput rises toward the constraint’s rate and total WIP falls. That divergence between the two runs is the entire theory made visible on a tabletop.

See also

Sources

← All notes Read recent essays →