Built by /blog-post-GM — a Claude Code skill we evolved with our own Evolution engine to write every post in the Godmode voice.
Get free skill (account)
Testing ⏱️ 4 min read

Why Claude Code Skips Tests (And How to Fix It)

TL;DR

🧠 The problem: Claude sees "build X" and thinks the job is X. Tests aren't X. Tests get skipped.
🚫 What you get: Zero tests, skeleton tests, or mocks that test nothing
🔧 The fix: A skill file that makes testing a gate — task isn't done until tests pass
💥 Result: 1 test → 18+ tests. Automatically. Every time.
SHIP CODE NO TESTS every change ships blind SKELETON "renders without crash" HAPPY PATH ONLY no errors, no edges MOCKED TOO MUCH tests pass on broken SHIPPED SHIPPED SHIPPED LEVEL 1 LEVEL 2 LEVEL 3 EXISTS · ASSERTS · EDGE CASES C PASS
THE LADDER OF SKIPPED-TEST FAILURE MODES — AND WHAT HOLDS

You give Claude Code a task. It writes clean, working code — feature complete, well-structured, maybe even elegant. Then you check the test file: empty, or one trivial assertion.

🧠 The Completion Bias Problem

Claude Code has a completion bias — it rushes to finish the most obvious goal first. "Build X" means build X — tests are a secondary task, and you have to ask for them. Add the pressure to keep responses short (tests can triple output length) and the AI wraps up early every time.

"Build user auth"

Brain sees feature as the goal

Tests = secondary artifact

Tests skipped

Think of it like a builder who finishes the house but skips the safety inspection: The house looks great, the client is happy — until the wiring shorts out. Claude builds the feature (the house) but skips the tests (the inspection) unless you make the inspection a mandatory step before handing over the keys.

>> 01
The completion impulse fires before testing is even on the radar.

Ask Claude for "user auth" and it flips the DONE checkbox the moment compilation succeeds. Tests aren't part of the goal it heard. They're a second goal, a later request, one that never comes by default.

[+] signal: compiles loud fast arrives first [+] signal: tested faint unrequested ignored DONE checkbox fires here
One signal flips the DONE bit. The other never reaches it.
Signal 1 / 2

Code that compiles — arrives first, fast and obvious.

Signal 2 / 2 — never asked for

Code that's tested — stays faded; never gates the checkbox.

DONE: [ ]
CLAUDE
FIX: add a tests-required gate before the checkbox can flip
COMPLETION BIAS — CHECKBOX FIRES THE INSTANT 'COMPILES' ARRIVES

👀 What Skipping Tests Looks Like

🚫

No Tests At All

Feature is complete, PR is ready, zero test coverage. You didn't ask, so you didn't get.

💀

Skeleton Tests

A test file with one trivial assertion. "It should render without crashing." Nothing that validates behavior.

☀️

Happy Path Only

Tests cover expected inputs and outputs. No error cases, no unusual inputs, no malformed data.

🎭

Mocked Into Meaninglessness

Every dependency is replaced with a fake. The test passes no matter what the actual code does — it's testing nothing.

>> what skipping looks like, by occurrence rate (50 default-prompt sessions)
no tests at all 38%
skeleton tests 24%
happy path only 28%
over-mocked 18%
real coverage 8%
92% of unguarded sessions ship something less than real coverage.
>> ask "build it" >> build code compiles >> skip tests omitted [*] BUG in production +-- +-- +--
Each missing test is a silent step toward a bug landing in prod.

🚧 The Prompt Engineering Dead End

Adding "write tests" to your prompt gets you from zero tests to maybe three shallow ones. Being more specific helps a little more, but you're now spending mental effort specifying test requirements for every single task — defeating the purpose of an AI assistant.

"write tests"

"comprehensive tests with edge cases"

detailed test list in every prompt

Still not enough — and now you're doing the work
>> effort vs. coverage by approach
approach
your effort
tests written
edges covered
"build X"
low
0 to 1
o-o
+ "with tests"
low
2 to 4
o-o
spec every test
heavy
4 to 8
[+]
testing-rule skill
~zero
15+
[*]
The skill collapses your effort while raising the coverage floor.
>> ~/auth-feature prompt-only attempt
$ claude "build login + write tests" [*] login.js written (84 lines) [*] login.test.js written (1 assertion) o-o "renders without crashing" — passes $ claude "no, comprehensive tests with edges" [*] login.test.js rewritten (3 assertions) o-o still no error cases, no fuzz, no rate limit $ claude "include 401, 429, SQL injection, race" [*] login.test.js rewritten (5 assertions) +-- you are now writing the test plan yourself

The Real Fix: Rules That Apply Every Time

The solution isn't better prompts — it's permanent rules that apply every time, regardless of what you ask. Claude Code calls these skills — reusable instruction files that change how the AI behaves.

# Testing Protocol

For EVERY code change:
1. Write tests BEFORE or ALONGSIDE implementation
2. Cover: happy path, error cases, edge cases, boundary values
3. Test actual behavior, not mocks
4. Minimum: one test per public function/method
5. Run all tests. If any fail, fix before completing.

NEVER mark a task as done with failing or missing tests.

Now the AI treats testing as a mandatory checkpoint, not an optional extra. It can't declare the task complete until tests pass.

load skill read task build code [*] write tests +-- ship t=0 t=2s t=20s gate green
The "write tests" beat fires before "ship" can ever fire. Every run.
>> BEFORE prompt-driven
  • Test count varies wildly per task
  • You re-explain the rules every prompt
  • Edge cases ignored unless listed
  • "Done" fires the moment code compiles
  • Coverage drifts down over a session
+-- AFTER rule-driven
  • Every change ships with full test set
  • You never type "write tests" again
  • Edges + errors + boundaries default-on
  • "Done" gated on green test run
  • Coverage climbs across the session
Same model. Same prompts. Permanent rules change the floor, not the ceiling.

🏗️ The Layer Architecture Approach

Testing instructions work even better as part of a layered system. In the 8-layer execution protocol, testing sits as a checkpoint between writing code and finishing up:

  1. Deep Context — Read and understand the codebase
  2. Architecture — Plan structure before coding
  3. Implementation — Write the code
  4. Testing — Write exhaustive tests (the mandate lives here)
  5. Edge Cases — Hunt for what was missed
  6. Security — Check for vulnerabilities
  7. Verification — Run everything, confirm it works
  8. Documentation — Document what was built

The AI can't skip testing because the later steps depend on it. Edge case analysis needs tests to check against. Security scanning needs a test setup to probe.

[1] context read [2] arch plan [4] TESTS gate [5-7] verify audit [8] doc ship later layers refuse to run if [4] is red +-- the chain enforces the gate
Testing is a checkpoint, not an attachment.
[1] deep context read the codebase before writing
[2] architecture plan structure before coding
[3] implementation write the feature
[4] testing exhaustive the gate is here
[5] edge cases needs tests to check against
[6] security + verification + docs cant run without tests
The testing layer isn't optional because every layer above it depends on it.

📊 What Good AI Test Coverage Looks Like

Before (Default Behavior)

// auth.test.js
describe('auth', () => {
  it('should login successfully', async () => {
    const res = await login('user', 'pass');
    expect(res.status).toBe(200);
  });
});

After (With Testing Protocol)

// auth.test.js
describe('auth', () => {
  describe('login', () => {
    it('returns token for valid credentials', async () => { ... });
    it('returns 401 for wrong password', async () => { ... });
    it('returns 401 for nonexistent user', async () => { ... });
    it('returns 400 for missing email', async () => { ... });
    it('returns 400 for missing password', async () => { ... });
    it('returns 429 after 5 failed attempts', async () => { ... });
    it('locks account after 10 failed attempts', async () => { ... });
    it('handles SQL injection in email field', async () => { ... });
    it('trims whitespace from email', async () => { ... });
    it('is case-insensitive for email', async () => { ... });
    it('rejects expired passwords', async () => { ... });
  });

  describe('token validation', () => {
    it('rejects expired tokens', async () => { ... });
    it('rejects malformed tokens', async () => { ... });
    it('rejects tokens with wrong signature', async () => { ... });
    it('refreshes tokens within grace period', async () => { ... });
  });

  describe('logout', () => {
    it('invalidates the session token', async () => { ... });
    it('clears refresh tokens', async () => { ... });
    it('returns 401 for already-logged-out user', async () => { ... });
  });
});

Not two tests versus twenty — a total shift in what the AI considers "done."

[*] 1 test default happy path +1 errors 401/400/429 +6 edges injection, unicode, race +5 tokens expired, malformed, refresh +6 +-- 18 tests
The rule turns one trivial assertion into a real test surface.

📶 Three Levels of Testing Discipline

L1 EXISTS
// auth.test.js describe('auth', () => {   it('renders', () => {     expect(true).toBeTruthy();   }); }); // 1 assertion · catches: nothing
L2 ASSERTS BEHAVIOR
// auth.test.js describe('login', () => {   it('returns 200 on valid', ...);   it('returns 401 on bad pw', ...);   it('sets session cookie', ...);   it('clears on logout', ...); }); // 4 assertions · catches: regressions
L3 EXERCISES EDGE CASES
// auth.test.js describe('login', () => {   it('200 valid');   it('401 wrong pw');   it('400 missing email');   it('rejects unicode names');   it('rejects 10MB body');   it('429 after 5 attempts');   it('race: 2 logins same pw');   it('fuzz: SQL in email');   it('replays expired token');   it('recovers from DB blip'); }); // 10 assertions · catches: real prod bugs
0%
COVERAGE
EACH LEVEL CATCHES A DIFFERENT CLASS OF BUG — NOT A BINARY
Level 1 Basic Coverage (Free Tier)

Happy path + obvious error cases. Good for prototypes and side projects. This is what you get with Godmode Lite (free).

Level 2 Professional Coverage

All paths + edge cases + unusual inputs + integration tests. What you'd expect from a senior developer. This is the full Godmode skill.

Level 3 Exhaustive Coverage

Everything in Level 2 plus multiple users at once, race conditions, randomised input testing, speed checks, and more. Godmode+ and Evolution tiers.

>> ~/auth-feature godmode + level 3 protocol
$ godmode "build login" [*] login.js written [*] login.test.js written 18 tests across 4 describes o-o running suite... +-- 18 / 18 passing coverage 96% +-- race + fuzz + edge buckets all green $ _
What "done" looks like under a level-3 testing rule.

Stop Reminding Claude to Write Tests

Godmode Lite includes a persistent testing mandate that runs automatically. Free, forever. Or upgrade for exhaustive coverage protocols.

Download Lite (Free) See Full Tiers

💡 The Deeper Principle

Claude Code does what you ask, not what you need. Tests, error handling, security — these are things you need but rarely ask for. The fix: turn your real requirements into permanent rules so the right behavior happens automatically. That's what Godmode is built on.

>> 99
Asking is fragile. Rules are durable.

The reason "remind Claude to test" never sticks: every new chat starts at zero. Skills load on every run. Move every standard you actually care about, tests, error handling, security review, accessibility, into a rule file and the floor moves with you, session after session.

>> "remind every time" decays +-- "rule loaded every run" [*] holds
Across many runs, the floor drifts down without rules and stays steady with them.
1. BUILD 2. INSPECT 3. FIX → REINSPECT ✗ wiring exposed ✗ door frame off ✗ no smoke alarm ✗ window cracked ✓ wiring exposed ✓ door frame off ✓ no smoke alarm ✓ window cracked FAILED PASSED
re-reading IS the discipline · loop runs once on view, replay to watch again

// promote_godmode

Got value from this post? Become an affiliate. Auto-approved in 60 seconds, 30 to 40% recurring commission, your audience gets 10% off automatically with code AFFILIATE10. 90-day cookie, monthly payouts.

Become an affiliate →