Dude, Where's My Code?

I deleted almost everything. I can always generate more.

Mar 23, 2026

[Correction, 2026-03-28: The original version of this article cited 1.5 million lines based on git churn figures provided by Claude. A subsequent verification — also performed by Claude, after I noticed the ballpark was wrong — found the actual total was 320,000 lines of Go across 46 generation runs. The AI confidently miscounted its own output. It took human intuition to catch it. The numbers below have been corrected.]

A 50,000-line Go project, generated from specification, differentially tested against GNU coreutils, deleted, and regenerated 46 times in six weeks. 320,000 lines generated, most of it deleted and rebuilt at higher quality. The code was correct every time. It did not need to survive. The specification did.

Dude, Where’s My Code?

In “Dude, Where’s My Car?” (2000), Ashton Kutcher and Seann William Scott wake up and their car is gone. They spend the whole movie looking for it.

I generated 320,000 lines of code in six weeks and deleted almost all of it. I did not go looking for them.

The Toy Project

go-unix-utils reimplements standard Unix utilities in Go — cat, grep, sort, wc, head, tail, about forty others. Roughly 50,000 lines at any given snapshot. Nobody needs another implementation of cat. GNU coreutils has been in production for decades.

The project exists to test cobbler-scaffold, my coding agent orchestrator. Cobbler reads a specification, decomposes it into tasks, assigns each task to a Claude Code instance in its own git worktree, and manages the generation loop: generate, compile, test, commit. go-unix-utils is the perfect test harness — the requirements are well-defined (match GNU behavior), the verification is differential testing against coreutils, and the scope is large enough to stress the pipeline across hundreds of tasks.

The git history for this project over the last six weeks:

320,000 lines of Go generated across 46 autonomous runs
4,123 commits
5,241 issues closed
491 pull requests merged
Net surviving: ~57,000 lines of code and documentation

Same codebase, generated and deleted 46 times. Each cycle: update the spec, throw out the code, regenerate from scratch, verify against coreutils. The code passes every time. And it gets deleted every time.

Why Not Keep It?

The code works. It passes differential testing. Compiles, handles flags, processes stdin, writes to stdout.

I delete it because adding a feature is cheaper from scratch than by integration.

Here is a real example. The sort utility was generated on March 13 — 1,671 lines across five tasks in 32 minutes. The requirements included numeric sort, human-numeric sort, month sort, version sort, key-based sorting, and a dozen flags. It passed differential testing against GNU sort.

One week later, on March 20, the entire codebase was regenerated from an updated specification. Sort was generated again — 1,736 lines, same requirements restructured (”comparison modes and stability” instead of listing individual sort types), plus whatever the spec had learned from the first run. The first implementation was deleted. Not archived. Deleted.

Why not keep the March 13 version and patch it? Because the specification had changed. The requirements were restructured. Integrating the restructured requirements into the old generated code means reading 1,671 lines I did not write, understanding structural decisions the agent made, finding the right insertion points, and verifying I did not break existing behavior. I do not have a mental model of the code’s internals because I never needed one.

Regenerating means: run cobbler-scaffold with the updated spec, run the differential tests against GNU sort. The new implementation has the restructured requirements natively — generated that way from the start, not patched in.

Transient, Not Disposable

I want to be precise about this. The code is not disposable. It runs. It is correct. If you deployed it, it would work.

The code is transient. It does not need to persist. Generated when needed, verified, used, replaced when the specification evolves.

Think about compilation. You write source code, the compiler produces a binary. You do not version the binary. You change the source and recompile. The specification is the source. The generated code is the binary.

If I lose the code, I regenerate it in hours. If I lose the specification, I have lost the actual work.

I delete the tests too, by the way. For now. I have not decided whether that is right long-term. But the verification does not come from the test suite — it comes from differential testing against GNU coreutils. The reference implementation is the oracle. The tests are generated along with the code, and they are as transient as the code. What persists is the verification method, not the test cases.

This Is Not New

Hunt and Thomas discuss active code generators in The Pragmatic Programmer [1] — tools that regenerate output every time the input changes. You never edit the generated output. You edit the input and regenerate. The generated code is transient by design.

C++ template metaprogramming is the same idea. I spent 20 years writing templates. You write the template, the compiler generates the specializations, you never look at the generated code. The template is the artifact. What is different now is not the concept. It is the scale. Templates generate code within one language, for specific patterns. Agent-based generation produces entire applications from natural-language specifications. The template is the spec. The compiler is the agent.

Martin Fowler called it the Phoenix Server in 2012 [2] — a server routinely destroyed and rebuilt from automation, never patched in place. Chad Fowler wrote “Trash Your Servers and Burn Your Code.” Same pattern. The running infrastructure is transient. The automation that produces it is the artifact.

Source code is becoming the server.

Code as Inventory

Code has been treated as an asset for sixty years. You invest in it, maintain it, build teams around understanding it. Developers spend 42% of their time on technical debt and bad code [3]. Maintenance accounts for 60-80% of total lifecycle costs [4]. Half of a maintainer’s time is spent understanding existing code before making any change [5].

Those are carrying costs. They exist because the code is supposed to last.

What if the code is inventory, not an asset? Inventory is produced, consumed, replaced. Toyota eliminated the warehouse — just-in-time production [6]. Produce what is needed, when it is needed.

When regeneration costs less than carrying costs, holding code is waste.

What Persists

If the code is transient, what do you actually keep?

Specifications. What each utility does. What flags it supports. What edge cases it handles. When a requirement changes, the spec changes, the code is regenerated.

Architecture. Module boundaries, naming conventions, patterns. The architecture evolves independently of any particular generated implementation.

Verification method. Differential testing against coreutils. The method persists. The individual test cases do not — they are regenerated along with the code.

Generation configuration. Orchestrator settings, prompt templates, constitutions. The compiler flags of the generative model.

Those four things are the source code. The Go is the compiled artifact.

What Changes

Technical debt moves. If the code does not persist, shortcuts do not accumulate. Each generation starts clean. But the debt does not disappear — it moves to the specification. Gaps in requirements, ambiguous behavior, missing edge cases. Specification debt is real. It just lives in a different artifact.

Code review moves. Reviewing generated code line by line is like reading compiler output. You review the specification, the architecture, and the verification results instead.

Git history becomes meaningless at the code level. 2,077 commits in six weeks. Nobody is reading that. The meaningful history is the evolution of the specification.

What Does Not Change

Brooks wrote “plan to throw one away; you will, anyhow” [7]. He was right. He just did not have tools that made it cheap. Throwing away 50,000 lines and regenerating costs hours now, not months.

But you still need to know what to build. The spec does not write itself. Deciding what cat should do with a binary file on stdin is a judgment call. The agent generates the code. The engineer generates the spec.

You still need verification. Without differential testing against coreutils, the generated code would be plausible but unverified. Plausible and unverified is worse than nothing, because it looks right.

You still need architecture. Module boundaries, API contracts, the decomposition into utilities — those persist. The architecture is the skeleton. The code is the muscle. Muscle is replaceable. The skeleton has to hold.

I am not sure this works for every project. go-unix-utils has a reference implementation to test against. Most projects do not. But for this one — 50,000 lines, 46 regeneration cycles, 320,000 lines generated and most of it discarded — it works. And I think the direction generalizes even if the specific verification method does not.

I generated 320,000 lines of code in six weeks and kept almost none of it. The specifications are 200 pages. I know which one I would rather lose.

REFERENCES

[1] Hunt, A. and Thomas, D. (1999). The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley.

[2] Fowler, M. (2012). “PhoenixServer.” martinfowler.com.

[3] Stripe and Harris Poll (2018). “The Developer Coefficient.” Stripe.

[4] Boehm, B. (1981). Software Engineering Economics. Prentice-Hall.

[5] Fjeldstad, R.K. and Hamlen, W.T. (1983). “Application Program Maintenance Study: Report to Our Respondents.”

[6] Ohno, T. (1988). Toyota Production System: Beyond Large-Scale Production. Productivity Press.

[7] Brooks, F.P. (1975). The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley. Chapter 11: “Plan to Throw One Away.”

Petar Djukic

Discussion about this post

Ready for more?