2026·05·06 // verification

Two-stage review, and why the order matters.

Check spec compliance first, code quality second. Reverse the order and you polish the wrong thing.

An agent hands me a diff at midnight. The variable names are good. The functions are the right size. The tests pass. I almost ship it. Then I read the brief again and notice the diff solves a slightly different problem than the one I asked about. Beautiful work. Wrong target.

That’s the trap, and it’s the most expensive class of mistake I make with agents. The code looks clean, so my brain rewards it, so I skim the rest, so I miss that the diff touched the wrong file or skipped the case I called out in the third paragraph. Code quality is loud. Spec drift is quiet. If you read in the order your brain prefers, you’ll catch the loud thing and ship the quiet one.

So the rule is fixed and unromantic. Spec compliance first, code quality second, always in that order. First pass, I’m not allowed to think about style, naming, or cleverness. I’m only asking four questions. Did it touch the files I asked it to touch, and only those? Did it cover the cases I called out by name? Did it leave alone the parts I told it not to touch? Does the change, run end to start, do the thing the brief says it should do? If any answer is no, the diff goes back before I read another line.

Second pass, only after the first one passes, I read for craft. Naming, shape, duplication, error handling, the small smells that turn into big ones in six months. This is the pass where I’m allowed to be picky, because by now I know I’m being picky about the right diff. If I find something ugly here, it’s a real finding worth fixing, with no louder failure hiding underneath it.

The artifact is a two-line review note in the ledger entry for the task. Line one. Spec pass, with the four questions answered yes or no. Line two. Quality pass, with whatever I want to say. If line one isn’t clean, line two doesn’t get written that round. The agent doesn’t get to defend the diff with elegance arguments while a spec failure sits underneath it. That rule has saved me more weekends than any test suite.

The whole thing costs about two extra minutes per review. Two minutes to re-read the brief before the diff, two minutes I used to skip because I was tired and the code looked nice and I wanted to go to bed. The trade is laughable in hindsight. Two minutes against a rollback, an apology, a Monday spent unwinding a Friday-night merge. I’ll take the two minutes.

The why under the why is about what agents are good at versus what they’re good for. They’re very good at producing clean code. They’re only sometimes good at producing the right code, and the gap between those two is invisible if you read in the wrong order. The two-stage review is a small, honest accounting of that gap. It assumes the agent built something beautiful and asks, before anything else, whether it’s the beautiful thing I asked for.

If you only have time for one pass, run the spec one. Ugly code can ship and get cleaned up next week. The wrong feature, no matter how elegant, gets rolled back, and the rollback always lands on a day you didn’t plan for.

Spec first. Code second. Always in that order.