Burning the runbook.
We replaced two hundred pages with one paragraph — and the paragraph wrote itself.
Lede
The runbook was two hundred and fourteen pages long. The current version is a paragraph. We did not lose anything we needed.
The binder
The runbook had grown the way runbooks grow — one tragedy at a time. After every incident a paragraph was appended that explained, in apologetic detail, what should have happened. After enough quarters, the runbook was a kind of folk history of the team's mistakes. It was thorough. It was also, by any honest reading, unread.
The audit
We instrumented the runbook the way we had instrumented the dashboards. We logged which sections were opened during incidents, how long they were read, and whether the engineer reading them changed their behavior as a result. After ninety days, the verdict was unambiguous:
- Six pages had been opened during an incident
- Two of those had been read for longer than ten seconds
- One had changed an engineer's decision
Every other page was historical comfort — useful when the runbook was being written, useless when the system was on fire.
The paragraph
We deleted the binder. In its place we kept a paragraph that the system writes for each incident in real time. It contains four things, in this order:
The service that is on fire. The decision the system has already made. The thing a human can do that the system cannot. The page of the binder that we know we are violating, if any.
The paragraph is not a runbook. It is a note from the system to the human, written at the moment the human is needed. The runbook had tried to anticipate every situation; the paragraph admits that nothing useful can be said about a situation until the situation has begun.
The weeks after
For three weeks the paragraph was met with suspicion. Engineers would, mid-incident, ask in chat where the runbook had gone. We would point them at the paragraph. They would read it, look around for the rest, and slowly realize that the rest had been the problem.
By the fourth week the paragraph was being trusted. By the eighth, an engineer admitted, with some embarrassment, that the only thing she missed about the binder was the smell. We have not reprinted it.