The Conductor Died Mid-Run. The Build Shipped Anyway.
The task: Build a custom CLI for getgodmode.dev members. Four parallel Builders, each owning one stream.
What broke: Three Builders crossed the 80% CPU reaper threshold. The narrator session (the Claude that drives the orchestra) died mid-run. The scoreboard recorded one stream finished. The disk had four.
The lesson: Heartbeat-based liveness is a flawed signal under thermal throttling. The bookkeeping lies. The disk does not. Trust the disk.
The result: The CLI shipped. 23 files, around 2,470 lines of code, 164 of 165 tests green. Scoring punished sins not committed.
The build
The ask was simple in shape, novel in execution. Build a command-line tool the average buyer can install and use without an Anthropic API key.
That last constraint mattered. Most AI CLIs wrap the Anthropic SDK and want a key. This one wraps Claude Code itself as a subprocess, reuses the auth tokens already in the Godmode account, and stays free of any extra billing surface.
godmode chat
Interactive session that streams Claude Code output and routes write-actions through the user's account.
godmode run
One-shot prompt with file/stdin support. Returns the response and exits clean.
godmode skills
List, install, and run any skill the user owns. The CLI reads the same Supabase rows the website uses.
godmode login
Pairs the CLI to the user's website account via a device code. No raw key handling.
Build stats
This was an Orchestra run. Daemon-driven, lean narrator session, every worker spawned with a fresh context window. The receipt:
| Metric | Value |
|---|---|
| Tool | one-shot-orchestra (plan mode) |
| Total points | 999 |
| Level reached | 5 |
| Longest combo | 49 workers in a row without a stumble |
| Files shipped | 23 |
| Lines of code | ~2,470 |
| Tests | 164 / 165 green |
| Achievements unlocked | CLEAN, FIRST_BLOOD, DOUBLE_KILL, TRIPLE_KILL, RAMPAGE, GODLIKE |
| Reaper kills (legitimate work) | 6 (Builder-1 x3, Builder-2 x3) |
| Narrator sessions used | 2 (original died, second took over) |
| Verdict | Shipped under duress |
Dimension scorecard
The honest part. Some dimensions scored well. Two scored badly, for reasons that turn out to be informative.
| Dimension | Score | Why |
|---|---|---|
| Polish | 0.87 | Telemetry stream (the one Builder that ran solo) pen-tested clean against 10 canary strings. |
| Documentation | 0.91 | Each verb shipped with help text, examples, and inline JSDoc. |
| Correctness | 0.28 | Scorer read result.json files. Three of four said "throttled stub." Scorer did not check the disk. |
| Completeness | 0.18 | Same reason. The bookkeeping said one of four streams finished. The disk said four of four. |
The scorer punished sins not committed. Builders 1, 2, and 4 had all written their files to disk before the reaper killed them. The result.json files lied because the process that would have written them never got the chance.
What broke
Five distinct failures, each interesting on its own, terrifying as a chain. Watch the fan-out happen:
-
Thermal throttling on the Builder fan-out
Four parallel Builders kicked off. Three immediately hit 87 to 93 percent CPU. The watchdog has an 80 percent reaper threshold to protect the laptop. Only the fourth Builder (telemetry) survived to ship solo.
-
The reaper killed legitimate work
Builders 1 and 2 hit the reaper three times each at heartbeat timeout. Their budget was 900 seconds. They went 905 seconds without a heartbeat. But by then, their files had already shipped to disk. The reaper killed processes that had already done the work.
-
The narrator session died
The narrator is the Claude session that watches the orchestra and writes signoff notes. Somewhere around the second wave of reaper kills, it stopped responding. A different session, in a different tab, was asked to take over.
-
result.json diverged from disk truth
The Merger phase reads each Builder's result.json to decide what made it into the build. Builder-1's result.json said "throttled stub, no artifacts." Builder-2's was missing entirely. But cli.ts, the auth flow, and all six command verbs were sitting in the work tree.
-
Test caught the real launch blocker
The Test phase ran 165 checks. One failed, but the meaningful one was a HIGH severity finding: Row Level Security was off on a Supabase telemetry table. Any anonymous client could have read it. The Harden phase locked it down before merge.
Heartbeats lie. Disk does not.
The story of this run is the story of two different signals disagreeing about what was true.
Think of it like a courier who got hit by a car a block from the depot. The dispatcher's tracking app says the parcel was never delivered, because the courier never logged the drop-off. But the parcel is sitting in the lobby. The dispatcher is asking the wrong question. The right question is, did the parcel arrive?
The narrator handoff
Heartbeat failure on the Builders was bad. The narrator session dying on top of it could have been fatal. It was not, because the orchestra protocol writes everything to disk as it goes.
The second narrator was given the working directory and one instruction: read the truth from the files and write the signoffs the original narrator did not get to write.
That paragraph is from the run log. It is also the cleanest description of what a self-healing multi-agent system has to do when the narrator goes silent. Anyone with the same disk and the same protocol can finish the work.
What the scoreboard saw vs what shipped
This is the part the dimension scores buried. Same four streams, two completely different stories depending on which signal you trust. Toggle between them:
Three streams in the "no" column did not deserve to be there. The disk was the ground truth. The bookkeeping was a stale snapshot of an unfinished process.
So what changes
This run shipped, but only because a second narrator picked through the disk by hand. That is not a protocol, that is luck. Two things are getting fixed in the next iteration:
Do
Treat the working tree as the source of truth. The merger and scorer both read the actual files before trusting any result.json the worker may or may not have written.
Don't
Trust a heartbeat as proof a worker is doing nothing. A worker can be silent because it has finished, because it is throttled, or because it is dead. The signal does not distinguish.
The next narrator design also gets a small superpower: when it spawns, it reads the latest disk state itself instead of asking the previous narrator. If the previous narrator is alive, that is a free integrity check. If the previous narrator is dead, the takeover is silent.
Self-healing multi-agent work is mostly about which signal you trust when two signals disagree. Bookkeeping is convenient. Files are real. Pick the real one.
What it cost
The honest number, because build-recap posts are supposed to be receipts and not brags.
| Phase | Worker count | Points |
|---|---|---|
| Bootstrap + diagnose + recon | 3 | 121 |
| Research (incl. plan mode) | 4 + planner | 195 |
| Builders (4 streams, reaper killed 2 of them 3x each) | 4 (x retries) | 191 |
| Test | 3 | 163 |
| Harden + document + verify + polish | 4 | 122 |
| Visual + scoring + judging | varied | 207 |
| Total | — | 999 |
The point of showing this table is that the costliest phases (Builders and Scoring) are also the ones most affected by the bug above. If the scorer had read the disk, the Builder retries would not have happened, the heartbeat reaper would not have fired three times each on two workers, and the same build would have cost roughly a third less.
The takeaway
Three things to keep, if any of this lands.
Disk over bookkeeping
If two signals about the state of a build disagree, the files on disk are the one that matters. The other is a story about the files.
Liveness is not progress
A silent worker is not necessarily a stuck worker. A loud worker is not necessarily a productive one. Measure work by the artifacts, not the chatter.
Narrators are interchangeable
If a multi-agent system needs the same Claude session running the whole time, it has a single point of failure that is going to cost you. Make every narrator a takeover candidate.
Run your own orchestra
One-Shot Scripts ships with the same daemon-driven, lean-narrator protocol the run above used. Workers run in fresh context windows. The disk is the source of truth. Self-healing on day one.
See One-Shot Pricing