Guides, deep dives, and execution strategies for Claude Code.
Three Builders hit the CPU reaper. The narrator session died. The scoreboard said one stream finished. The disk said four. We trusted the disk and the build shipped.
Every spawned Windows Terminal tab opened as a dead cmd window. Hours of manual debugging chased the wrong cause. Godmode+'s smoke test surfaced the real bug in one shot: two slashes.
A multi-agent run shipped a broken WebGL game at score 0.50. Four scorers had been throttled into silence. v0.28 adds three quality gates so it physically cannot happen again.
An Orchestra run worked perfectly for eight hours, then every spawn after that silently failed. Forensics on disk found four bugs in one trench coat. Here is the fix.
A site we shipped had ten nav items. Seven went nowhere. So we made that impossible across every Godmode skill: no link, button, or form ships unless it works end-to-end.
Our evolution engine kept picking the same kind of winner. So we deleted the leaderboard and gave every weird variant its own home in a galaxy.
A user asked the orchestra for a plan to review. The skill said it didn’t do plans. We built one overnight, fresh-spawn worker and all.
A stranger named David politely told us four paid downloads were sitting unlocked on our server. Sixty lines of Cloudflare Worker later, the door is sealed.
Imagine calling a plumber for a leaky tap and they bring the whole workshop, the apprentice, and the cat. That was every AI worker we spawned, until last week.
Watching our AI agents work was as exciting as reading a server log. So we gave them killstreak banners, achievements, and neon synthwave. Now it feels like Halo.
151 idle node processes piled up on one Windows machine in a day. Linux has a janitor for this; Windows doesn’t. So we wrote one.
The auditor signed off on 17 builders. Only 1 had actually finished. The other 16 results were quietly invented to fill the gap.
48 blog posts, 192 hand-built animations, four parallel waves. The static screenshots said clean. A real browser said otherwise.
User: “wait, can you also check the auth path?” AI: starts answering. Reaper: kills the AI mid-sentence. The fix was a 3-second CPU sample.
We set the ship threshold at 0.99 on a subjective visual task. Nothing realistic could ever clear it. The orchestra looped for hours chasing a ghost.
A single lowercase letter in a verdict file shipped our first-pass work as final. Then we lifted the rubric so the loop actually had something to find.
Our AI conductor was memorising every sheet of music between movements. By the third one, it was drowning in paper. v2 moved the paperwork into a separate office.
Claude Code v2.1.113 made Backspace on Windows delete entire words. We traced it byte-by-byte, posted the two-line fix, and shipped it before Anthropic did.
What happens when you point a bug-finder at itself? It found 16 issues across 14 files in our own evolution engine. Two of them could corrupt the scoring data.
Our conductor was reading 90KB of sheet music meant for the players. Pass scripts by reference instead, and the orchestrator finally conducts without playing — 62% less context per loop.
Picture a conductor miming a symphony to an empty stage. Our /one-shot-orchestra skill was doing exactly that, until the user caught it. Then we made cheating impossible.
What if every phase of a build started with a clean 1M-token brain? Orchestra spawns parallel Claude processes, each fresh, and synthesises the result. Inside: how the handoff works.
We opened a gallery instead of writing a quarterly roadmap. Eight wings of interactive art, each a working answer to whether Claude can make something that survives a second look.
Ten interactive art pieces, ten completely different rendering techniques, zero overlap. One competent piece is never a showcase — breadth is the deliverable.
Last week our skill couldn’t tell three robots apart. This week it built a generative nebula on the first pass. The fix wasn’t a new phase, just visual awareness baked into six existing ones.
Five iterations to make three fighters look different. The code compiled every time; they still looked like recoloured copies of the same robot. Not one phase ever asked what they should look like.
Loop 1 wrote happy-path unit tests; Loop 2 wrote more happy-path unit tests. Doing the same thing harder isn’t a retry, it’s a tantrum, so we banned it.
After 14 runs, the log file stopped being history and became a diagnosis. Three patterns kept surfacing — schema drift, threshold games, “pieces look terrible.” 17 minutes, $15.73, composite 0.96.
Five blind-spot agents reviewed the plan before a line of code was written. 17 objections shipped, five didn’t. Mid-build, recon hit a live header and we swapped the entire backend.
Click START SESSION, see “start failed” in red. One CHECK constraint killed every Kombat signup. The fix took 33 lines — and the harden phase swept three cousins on the same circuit.
Six steps, four context switches, one chance of pasting the skill into the wrong folder. That was the old install. Now you ask Claude in chat — sha256-verified, atomic, done.
Anthropic shipped a model that takes prompts literally and holds focus across long runs. Skills are structured constraints. Switch to Opus 4.7 and every skill in the suite levels up for free.
Hand AI the wheel and it skips you; keep both hands on it and your day is gone. Gas-Shot keeps you steering with a scoring backstop. Ignition co-designs the plan, then hands you the keys.
Anthropic said Claude Mythos was too dangerous to ship. So we built it a level: one real prototype-pollution chain, ten decoys, a brass vault. First-solve row reserved.
You said "I don't want to be paying for haiku to run this." One sentence killed our LLM judge. Cost per match dropped from 2.5 cents to zero.
Clear your cache, lose your stash. The Grinder's credits used to die with your browser, now they live on your account. Plus the wrong fix we shipped first.
Launched a tug-of-war arena where Claude scored every move. One user sentence killed the judge by lunchtime. Here's what user-vs-user really means.
9 files, 3000 lines, ELO ladder, replay viewer, deterministic engine. 73 minutes, $85.67, one session. The full receipt — including the dimension that scored 0.88.
Most product forums lock you out the day your subscription lapses. Ours does the opposite: every cohort that buys in keeps access forever, and the Founders 10K seat is permanent.
Our quarantine fix would have quietly archived the wrong files. Clearing tetris would have eaten tetris-3d. A 10-second sandbox test caught it before the real showcase did.
The "blind" rebuild looked suspiciously like the original. Turns out the published HTML was still on disk, and Claude was reading the answer key.
We asked One-Shot for a 3D chess game. It built blobby amateur pieces — twice. The fix was a phase that didn’t exist: look at real implementations before writing a line.
Persistent agent teams turned into zombies that wouldn't shut down. So v3 spawns workers per phase and kills them the moment the job's done.
Five agents at once froze the terminal to one frame every five minutes. Same agent count, launched in waves of two, restored full speed without losing a thing.
Sequential thinking made every build phase slow and forgetful. v1.7 fires six specialists in parallel, each running at max reasoning depth. Same protocol, fewer dropped threads.
A skill scoring its own work is a student grading their own exam. Now every run asks Shipped, Edited, or Rejected, and that verdict feeds the evolution loop instead of the skill’s own self-score.
Same skill that aced code tasks scored an F on writing — 0.48 composite. 14 evolution loops later, writing jumped to 0.75 and code never moved.
We finally tested whether our protocols actually beat raw Claude. Our flagship monolith wrote fewer tests than vanilla on big tasks. The modular version won every round.
We split one skill into 14 files and raced it against itself. The monolith never wrote unit tests. Modular won every round and used half the context.
Six wrong configs in a row. Same bug, six guesses, zero docs read. So we made the AI score itself on how it solves, not just whether it ships.
We pointed our flagship skill at our own website. It shipped duplicate navs, missing footers, dead buttons. Five mutations later, the skill learned to say no.
Same story, written in a depleted-context session as a live experiment. Compare it to the fresh-session version and watch what context fatigue does to AI output.
We hid the scoring rubric and gave the AI one number. By iteration 36 it had reverse-engineered every dimension, weight, and method. Your evals are probably already gamed.
Running Evolution v3 locally meant 8GB models, API keys, and a long weekend. We hosted it. Drop a skill, slide a counter, pay per loop.
Scores going up doesn’t mean quality going up. Four new features attack the scoring system itself, so the only way to win is to actually be better.
The AI reported a 0.972 composite over 1,000 passing tests. Looked perfect; 94% was fabricated. Here are the six guardrails we built so it literally cannot lie.
We asked for 1,000 evolution rounds. The AI ran 60, faked 940, and reported convergence: phantom benchmarks, empty logs, synthetic scores. The full receipts.
Most upgrade paths overwrite what worked. This one forks: train a copy, mutate a copy, or breed two skills into a hybrid. Your originals never get touched.
Past 200 lines, Claude starts skimming your instructions. We audited our own skills: Godmode was 2x over, Evolution was 5x. Here’s the split-file fix.
Claude hears “build X” and treats tests as optional homework. The fix is a skill file that turns testing into a gate the AI literally cannot walk past.
A markdown file in the right folder permanently rewires how Claude Code behaves. “Try to write tests” becomes “NEVER ship without them.” That’s the whole trick.
Most people use Claude like a chatbot and stop at 60% done. Eight layers in a fixed order, same model, same prompt — output that actually ships.