The Harness Was Already the Methodology

Google's new SDLC playbook names the discipline: an agent is a model plus a harness — roughly ten percent model, ninety percent harness, gated on evals rather than demos. But the ninety percent is not scaffolding around intelligence. It is methodology made executable — where intent, verification, and human judgment are encoded. The name arrived this spring. The harness has been running in production for twelve months.

Michael Shatny·June 25, 2026·7 min read

The Name Has a Document Now

In May 2026, Google released a playbook on AI-driven software development — The New SDLC With Vibe Coding, by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, as part of Kaggle's five-day AI Agents course. It is the cleanest packaging yet of where the field is converging. Three claims sit at its center.

AI coding is a spectrum, not a switch — casual prompting on one end, fully engineered systems with specs, evals, and CI gates on the other, and you choose the point that fits the job. An agent is a model plus a harness, and the rough split is ten percent model, ninety percent harness. And the rule that decides whether any of it is trustworthy: set the bar at the eval, not the demo.

The sentence underneath all three is the one that matters. The harness matters more than the model. A discipline that had been practiced without a name now has one: harness engineering.

What the Ninety Percent Actually Is

The temptation is to read the harness as scaffolding — plumbing wrapped around the part that does the thinking. That reading is backwards.

The harness is the rule files that say what the system is for. The tools and MCP servers that say what it is allowed to touch. The sandboxes that say where it may act. The orchestration that says how work moves between steps. The observability that says what gets recorded. The evals that say what standard it is held to. Every one of those is a place where intent is encoded.

A model is a capability. A harness is a commitment — to what the system is for, what it may do, how it is checked, and who decides. The model is the interchangeable part. Swap it out and the system still knows its purpose, its boundaries, and its gates. The harness is the part that carries the meaning.

Which is another way of saying: the ninety percent is not infrastructure around the methodology. It is the methodology — made executable.

Two Roads to the Same Claim

The playbook arrives at the harness from the model side. As frontier models converge in raw capability, the differentiator moves outward, to everything that surrounds them. Start with the model, and you discover the harness.

Methodology-as-Infrastructure and Intent-as-Infrastructure (DOI 10.5281/zenodo.20681523) arrived at the same place from the opposite direction. Start with intent. Bound it into a vocabulary that encodes the what. Let an intelligent compiler generate the how. Record the result in sovereign artifacts, and design the human into the moment judgment belongs. Start with the methodology, and you build the harness.

	Harness engineering	Intent-as-Infrastructure
Starts from	The model is commoditized	Intent should be declared
The model is	Ten percent of the system	The how, generated on demand
The value is in	The harness around it	The vocabulary and the artifacts
The human is	The reviewer at the gate	A first-class architectural primitive
Trust comes from	Evals, not demos	Sovereign, auditable artifacts

The two columns are the same claim, observed from two sides — and they were written down at almost the same moment. Methodology-as-Infrastructure was published in April 2026; Google's playbook followed in May. Neither cited the other. This is convergence, not priority: the field and the methodology arrived at the same place in the same season, from opposite directions. The value is not in the model. It is in the structure that surrounds it. Harness engineering is Intent-as-Infrastructure, seen from the model side of the room.

A Harness, in Production, for Twelve Months

None of the following was built to be “a harness.” Each piece solved an immediate problem — context lost between agents, a number that needed verifying, a page that had to prove it was not altered after compilation. Only when the playbook named the discipline did the shape become obvious. The pieces were already a harness. The name was not there yet.

Harness component	What was shipped
Instructions / rule files	phoenix-runtime mission brief — what each agent is for
Tools / MCP servers	CAL runtime; wake, chirp, and cal-workflow MCP servers
Orchestration	phoenix-runtime — seven agents, context inherited at every gate
Eval gates	phoenix gate --approve; the A-06 Validator; the brief audit
Observability / provenance	RECALL — every artifact embeds its source and its author
Memory	Wake — temporal memory that survives between sessions

This is the recurring shape of the work: the pipeline that was already a traffic system, the protocol that was discovered rather than designed. The structure gets built to solve real problems, one at a time. The pattern is there before the word for it is. Harness engineering is the latest word to arrive late.

Set the Bar at the Eval, Not the Demo

Of the three claims, this one carries the most weight, because it is where a harness earns trust. A demo proves a system can succeed once. A passing eval suite, with an explicit rubric, proves it succeeds reliably. You gate a shared-workflow agent on eval coverage the way you gate a service on test coverage.

A small example from this week. A generated brief carried a fabricated citation — a confident, plausible number attached to a source that did not contain it. The demo would have shipped it; it rendered perfectly. The eval gate — an audit that maps every field and checks every claim against its source — caught it before it published. The model produced the error. The harness refused to let it leave.

That is the eval gate doing the one thing only it can do. It is also the moment the human is designed in. The Synthesis Gate (DOI 10.5281/zenodo.20684283) named the architectural point where ambiguity enters and interpreted meaning comes out — where judgment belongs to a human and not a script. Seen from the verification side, the eval gate is the same node. The model generates. The harness decides what is allowed to leave. The human decides at the gate that the harness was built to create.

The Name Does Not Validate the Work. It Makes It Legible.

Twelve months of building. The discipline-name arrived this spring. The work did not get better when Google named it — phoenix-runtime gated the same way yesterday, RECALL embedded the same provenance, CAL ran the same deterministic checks. What changed is not the work. It is the legibility of the work.

Naming is not a reward for being early. It is infrastructure — the cheapest, highest-leverage infrastructure there is. A named thing can be searched for, cited, taught, compared, and recognized. An unnamed thing, however well built, has to be re-explained from scratch every time. Grace Hopper did not just build the first compiler. She named it, and the name is what let the rest of the industry pick it up.

The harness was always the methodology. It is now a discipline the whole field has a word for. The work is the same as it was last week. Today it is findable — and that is the only thing that had been missing.

Source playbook — The New SDLC With Vibe Coding (Addy Osmani, Shubham Saboo, Sokratis Kartakis; Google / Kaggle, May 2026), summarized at addyosmani.com/blog/new-sdlc-vibe-coding. Preceding paradigm: Intent-as-Infrastructure doi.org/10.5281/zenodo.20681523; the human-judgment node: the Synthesis Gate doi.org/10.5281/zenodo.20684283.

The Harness Was Already the Methodology

The Name Has a Document Now

What the Ninety Percent Actually Is

Two Roads to the Same Claim

A Harness, in Production, for Twelve Months

Set the Bar at the Eval, Not the Demo

The Name Does Not Validate the Work. It Makes It Legible.

Go deeper

Methodology as Infrastructure

The Synthesis Gate

The Pipeline That Was Already a Traffic System