Jun 13, 2026 · 7 min read

Spec-Driven Development: Rebuilding Legacy Systems for the Agentic Era

Most legacy modernization fails the same way: a team wraps an old system in a new API, drops a chatbot on top, and calls it AI-native. A few months later the demos still impress and nothing in production has actually changed. The system underneath is exactly as opaque as it was before — only now there's a language model gambling against logic it can't see.

The uncomfortable truth is that legacy systems were never built to be understood by machines. They were built to be operated by people who already knew how they worked. That assumption held for decades. It breaks the moment you ask software to reason about itself.

What the agentic era actually requires

For most of computing history, the audience for a system's behavior was human: developers reading code, operators running runbooks, analysts interpreting reports. Agents change the audience. When you want an AI system to triage a case, reconcile an account, or route an exception, it needs a faithful, machine-readable account of what the system does — not folklore, not tribal knowledge, not a 200-page PDF written three reorgs ago.

This is the real gap in legacy modernization. It isn't that the old system is slow or ugly. It's that its behavior is implicit — scattered across code, stored procedures, cron jobs, and the heads of three people who've been there long enough to remember why. Agents can't operate on implicit behavior. They need it made explicit.

Why rewrites usually make it worse

The instinct is to rewrite. The instinct is usually right and the execution is usually fatal. Big-bang rewrites fail because they treat the rewrite as a code problem when it's a knowledge problem. The hard part was never typing the new system; it was recovering what the old one actually did — including the edge cases nobody documented and the “bugs” that turned out to be load-bearing business rules.

Rebuild from an incomplete understanding and you ship a system that's clean, modern, and subtly wrong. The wrongness surfaces in production, months later, in exactly the cases that mattered most.

Spec-driven development

Spec-driven development inverts the order. Before you write the replacement, you extract the system's true behavior into a living specification — a precise, versioned account of what the system does, derived from the system itself: its code, its data, its observed behavior under real inputs. The spec, not the legacy code and not someone's memory, becomes the source of truth.

A spec in this sense is not a product requirements document. A PRD describes what someone wanted; a behavioral spec describes what the system actually does, edge cases included. It's the difference between an intention and a contract. You can test against a contract. You can rebuild against a contract. An agent can reason against a contract.

How it works in practice

The process has a shape. First, extract: mine the existing system — code paths, database states, real traffic — to reconstruct its behavior as explicit specifications. Second, validate: replay the spec against the live system and confirm it predicts the real outputs, including the awkward ones. A spec you haven't checked against reality is just a nicer-looking guess.

Then rebuild — incrementally. The spec lets you replace the system a slice at a time, strangler-fig style, with each new component verified against the same spec the old one satisfied. Nothing goes live until it provably matches the behavior you chose to keep. And you do choose: some legacy behavior is worth deleting, but now that's a decision, not an accident.

Finally, the spec stays. It doesn't get filed away when the project ends. It becomes the durable interface to the system — the thing humans read to understand it, the thing tests run against, and the thing agents consult to operate it safely.

Fig. 1 — Spec-driven modernization: extract the spec, then rebuild against it.

The spec is what makes a system agent-ready

This is the part that connects spec-driven development to the agentic era specifically. An agent operating against a system needs ground truth: which states are valid, which transitions are allowed, what must never happen. A living spec is exactly that ground truth, in a form a model can consume. Intelligence stops being a layer bolted on top and becomes part of the architecture — because the architecture now describes itself.

Fig. 2 — The living spec as one source of truth for humans, tests, and agents.

It also solves evaluation, which is where most production AI quietly dies. The same spec that defines correct behavior is the oracle you evaluate the AI against. You're no longer asking “does the output look reasonable?” You're asking “does it conform to the spec?” — a question you can answer automatically, at scale, every time the system changes.

Rewriting for a reason

None of this is rewriting for novelty. Rebuilding a working system because it's old is how you turn a stable liability into an unstable one. The reason to rebuild is specific: the next decade of software assumes systems that can be read, reasoned about, and operated by agents — and most legacy systems can't be, at any price, in their current form.

Spec-driven development is how you earn that capability without throwing away what the old system knew. You keep the institutional logic and shed the opacity. The system that comes out the other side isn't just modern — it's legible, to your team, to your tests, and to the agents that will increasingly do the operating.

That's what AI-native actually means. Not a model on top of a black box, but a system that can explain itself well enough that intelligence has something to stand on.