What is spec-driven development?

Spec-driven development is a way of building software where the acceptance criteria, the product’s definition of done, are written before the code, and whether a task is done is decided by executing the code against those criteria, not by a passing build.

The name points at the order of operations: the spec comes first, the code second, and verification by execution last. The build passing is not the finish line. The criteria executing as written is.

Why it exists

A passing build tells you the code agrees with itself. It says nothing about whether it does what you asked. Measured across 120 runs and four models, a raw one-line request shipped a genuine correctness bug in 40% of cases with the build green. Handing the agent the acceptance criteria up front dropped that to 0%.

Genuine-bug rate per model, raw request versus the same request with acceptance criteria up front: every model drops to zero.

The model wasn’t the variable. The spec was.

How it works

Write the acceptance criteria for the task, in plain language, before any code. Each criterion is a checkable statement about behavior: what the code must do, on what input, with what result.
Implement the change (you, or a coding agent).
Gate on execution. Re-apply the change to a clean checkout, run the code, and assert every criterion. If one fails, the task isn’t done, no matter how green the build is.

The criteria travel with the change and outlive the session, which is what keeps a system coherent instead of locally correct but globally incoherent.

Where it comes from (and why it isn’t TDD)

Spec-driven development is not TDD, and conflating the two is the fastest way to misread it. TDD is a developer rhythm, red, green, refactor, at the unit level: the developer writes a failing unit test to drive the design of a function. The author is the developer, the artifact is a unit test, the purpose is design.

Spec-driven development sits one layer up, and comes from a different lineage: behavior-driven development (BDD), acceptance-test-driven development, and Gojko Adzic’s Specification by Example. The artifact is the acceptance criteria of a user story, the product’s definition of done, ideally written in an executable form like Gherkin’s Given/When/Then. It fuses product and code: the criteria the product defines become the gate the code has to pass. The user story carries the intent, the acceptance criteria make it checkable, and the gate runs them.

What’s new in the AI era is who writes what. When an agent writes the code, the human’s job moves up to the specification, and the gate has to enforce it on the real diff, on every run, because the agent is non-deterministic and you didn’t watch every line. It isn’t a developer testing their own design. It’s the product’s intent, made executable, deciding whether the agent’s output counts as done.

Spec-driven vs. spec-gated

There’s a distinction worth naming, because it’s where most teams fail. Spec-driven describes the order: you write the criteria before the code. Spec-gated describes the power: nothing is done until it passes the criteria, executed, on every run.

A repository full of PRDs and acceptance criteria in Markdown that nobody runs is spec-driven in name only. The documents exist. They decide nothing. What changed the numbers in the benchmark was not writing the criteria, it was gating on them: re-running the code against them on every change, on a non-deterministic agent you did not watch.

So the sharper name for what actually works in the AI era is spec-gated development: a change isn’t done until it passes the spec, executed, every run. The acceptance criteria aren’t documentation. They’re the gate.

Spec-gated: the acceptance criteria, executed on every run, are the gate that decides whether a change is done.

Spec-driven development vs. vibe coding

Vibe coding is the opposite: prompt, eyeball the result, ship if it looks right. It works until it doesn’t, and on a non-deterministic system you can’t tell which run you got without executing. Spec-driven development replaces “looks right” with “ran and passed.”

Where it pays off

The harder and less trivial the task, the more it matters: even the strongest frontier model, run raw, ships a real bug on a complex feature one in three times, non-deterministically. Writing the criteria once and gating on execution is what makes the result trustworthy, and it lets a cheaper model match an expensive one.

Doing it by hand on every task is the tedious part, and it’s what PaellaDoc automates.

Where to go deeper

Spec-driven development is one practice inside a broader shift in how software gets built with AI agents. The pieces below go one level down from this definition.

Start with the artifact itself: why the spec is the contract between intent and implementation, and why that contract should be portable across agents and editors instead of trapped in one tool. If the ceremony feels heavy, there is a lightweight version that keeps the contract and cuts the process, and a way to add specs to an existing codebase without stopping the world.

A spec is only useful while it stays true. That is the problem of spec drift, the slow divergence between the contract and the code, and the case for living specs that update as the system changes. Before an agent builds, the cheapest gate is reviewing the spec rather than the diff; once a tool like Spec Kit has written one, the harder question is what spec-driven needs next.

FAQ

Is spec-driven development the same as TDD? No. TDD is a unit-level rhythm for a developer to design code. Spec-driven development is the BDD / acceptance-test-driven lineage: the product’s acceptance criteria, expressed executably (Gherkin’s Given/When/Then), gate whether the work is done. Different layer, different author, different purpose.

Does it slow you down? Writing criteria costs minutes. Shipping a green-but-wrong feature costs hours or days downstream. On the measured tasks it removed the genuine-bug rate entirely.

Do I need a tool? No. You can do it by hand. A tool helps when you want the criteria and the execution gate on every task without remembering to.