Before the Agent Runs

AI-first delivery is fast. Dangerously fast. The instinct is to add governance after the artifacts exist — as review, as audit, as remediation. Governance-Driven Design proposes the opposite: formalize intent before the machine runs. A pre-execution discipline that completes what TDD, BDD, and DDD started — moving the definition of correctness upstream to the governance layer.

Michael Shatny·June 26, 2026·9 min read

A Migration Running for Twenty Years

There has been a slow migration happening in software development — not in any one release, not announced by any team — but visible in the pattern of what three disciplines have each moved: the definition of correctness upstream.

Test-Driven Design moved correctness upstream from debugging to implementation. Behaviour-Driven Design moved it upstream from implementation to behaviour specification. Domain-Driven Design moved it upstream from feature specifications to the domain model itself. Each discipline said the same thing in a different register: the later you discover that an assumption was wrong, the more it costs.

Discipline	Year	Where correctness is defined	What it governs
TDD	2003	At implementation	The function
BDD	2006	At behaviour specification	The feature
DDD	2003	At the domain model	The bounded context
GDD	2026	At the governance layer	The assumption set

GDD is not a replacement for TDD, BDD, or DDD. It is their upstream completion — the discipline that defines correctness at the layer that precedes all three. A governed constraint set is what exists before the domain model is drawn, before the behaviour specification is written, before the first test is named. GDD says: before any of that runs, there are assumptions. Formalize them. Stress them. Surface what you cannot resolve. Those unresolved items are not failure. They are the most important output the pre-build session produces.

A note on DDD: Domain-Driven Design addresses domain model integrity and bounded context separation — a different layer from GDD. DDD asks how to represent the domain accurately once it is understood. GDD asks whether the domain is understood accurately enough to represent at all. They are complementary, not competing. The tension DDD practitioners recognize in “getting the domain model wrong” is exactly the problem GDD operates on, one layer upstream.

The Governance-Last Problem

AI-first delivery is fast. Agents generate code, schemas, database migrations, API contracts, and compliance logic at a pace that outstrips the capacity of any review process to catch assumptions before they become behavior.

The instinct is governance-last: produce the artifacts, then review them. Catch problems in code review. Fix them in testing. Audit after deployment if necessary. This is the pattern software development has run on for decades, and it works reasonably well when artifacts are small, the domain is well-understood, and the cost of an incorrect assumption is manageable.

Neither of those conditions holds in AI-first delivery. Agents generate large artifacts quickly. The domain is often partially understood — that is frequently why an AI agent is being used rather than a developer who could hold the context manually. And the cost of an incorrect assumption in compliance logic, lending decisions, or multi-agent procurement workflows is not manageable: it is a regulatory notification, a production rollback, or a financial exposure that nobody saw coming because it was never in the spec.

Governance-last is not wrong. It is late. GDD proposes governance-first: formalize the assumptions before the agent runs, stress-test them until the ones that cannot be resolved surface cleanly, and make the human decisions that need to be made before the machine begins. The cycle that does this is the Iterative Constraint Refinement cycle.

The ICR Cycle

The Iterative Constraint Refinement (ICR) cycle runs six steps. Its exit condition is not “all constraints resolved.” It is “all residue explicitly acknowledged with an owner.” The cycle is designed to surface what it cannot resolve — not to eliminate uncertainty, but to name it and assign it before execution begins.

FORMALIZE

State every assumption the system depends on. Not requirements — assumptions. Things believed to be true, each written as a numbered statement. The discipline is writing down what everyone in the room is already assuming but has not said out loud.

STRESS

For each assumption: what would break it? What domain knowledge, edge case, regulatory constraint, or technical condition would make this assumption false? The stress test does not need to prove the assumption wrong. It needs to enumerate the conditions under which it could be.

CHECK

Run the assumption set against itself. Do any assumptions conflict? Is there a statement that is true only if another is false? Conflicts are the highest-signal output of the CHECK step — they mean the team was holding two incompatible beliefs simultaneously.

SURFACE

Collect what the cycle cannot resolve. Items that depend on regulatory expertise, stakeholder decisions, domain knowledge not in the room, or data sources not yet accessible. This is the unresolvable residue — the most valuable output of the cycle.

GATE

The human decisions that must be made before execution begins. Each residue item has an owner and a deadline. The cycle does not proceed — the agent does not run — until gate decisions have been recorded.

CONVERGE

Update the governed constraint set with gate decisions. Mark resolved items [RESOLVED]. Close the cycle record. The output is a constraint set the agent can execute against — not ambiguity-free, but one where every remaining ambiguity is named and owned.

What the Statuses Mean

Every constraint in the governed constraint set carries a status. Each status is a directive — a specific instruction to the agent executing against the set:

Status	Directive	Why it matters
[RESOLVED]	Implement exactly as stated. Do not reinterpret for elegance.	This constraint exists because a human decided it. Reinterpreting it silently is the failure mode GDD exists to prevent.
[ASSUMED]	Implement as stated, flag the assumption in code.	The assumption may be correct — it has not been verified. The flag is the trace that makes the assumption visible in review.
[UNKNOWN]	Do not implement logic that depends on this item. Surface it.	Implementing an unknown assumption is the same as resolving it — silently, without authority.
[CONFLICT]	Do not implement either side. Surface both sides for resolution.	Implementing one side of a conflict is worse than stopping. It buries the conflict in the codebase.

The statuses make the governed constraint set operational across two contexts: a human preparing to build, and an AI agent executing the build. A CLAUDE.md template can point an agent at the .gdd/ folder so it reads the constraint set before generating anything. The statuses become runtime directives: proceed, flag, stop, surface.

What One Example Revealed

Illustrative example. The domain details are accurate; the engagement is constructed to be representative.

A bank preparing to run a Phoenix legacy modernization pipeline against a 40-year-old COBOL loan origination system. 180,000 lines. The original authors have retired. The instinct: start Phoenix immediately. Extract the business logic. The code is the source of truth.

GDD says: not yet.

Three of the eight formalized assumptions produce immediate conflicts when the CHECK step runs.

FORMALIZE — four key statements

[1]  We believe a loan is approved when DTI ratio is below 43%.
[3]  We believe employment is verified by a flag in the APPLICANT table.
[4]  We believe the approval path handles self-employed applicants.
[6]  We believe the 43% DTI threshold is a fixed regulatory requirement.

CHECK — conflicts surface

[CONFLICT A]  Statement [6] assumes regulatory fixed floor.
              Statement [1] uses 43% as if it is policy-adjustable.
              These cannot both be true — different governance implications.

[CONFLICT B]  Statement [3] assumes the flag covers all applicants.
              Statement [4] assumes self-employed are handled.
              If self-employed have no flag path, [4] cannot be true.

SURFACE — unresolvable residue

ITEM 1: Is the 43% DTI threshold regulatory or policy?
  Cannot resolve from: COBOL code (encodes the number, not its origin)
  Decision required: Chief Compliance Officer
  Consequence if wrong: regulatory breach vs. unnecessary constraint

ITEM 2: What is the self-employed applicant path?
  Cannot resolve from: COBOL code (branch reaches a subroutine
                       referencing a 1987 product that no longer exists)
  Decision required: Head of Loan Origination + Compliance

ITEM 3: Are reason codes current?
  Cannot resolve from: COBOL code (hardcoded integers, not mapped)
  Decision required: Operations + Compliance

The GDD cycle ran four hours across two working sessions with five people. Three gate decisions were made before Phoenix ran a single agent pass. The CCO confirmed the 43% threshold was regulatory — immutable, not policy-adjustable. The Head of Loan Origination scoped self-employed applicants out of the initial migration. Operations took ownership of mapping the reason codes to current regulations.

None of those decisions were visible in the COBOL code. The code encoded the behavior. The GDD cycle encoded the intent behind the behavior — and surfaced the items where that intent was either missing or contradictory. The alternative — discovering the self-employed branch ambiguity in production — would have triggered a regulatory notification and a forensic investigation of how many applicants were incorrectly processed during the migration window.

The Most Valuable Output

A GDD cycle produces something that looks, at first glance, like a failure list. Conflicts found. Items that could not be resolved. Decisions that need to be made by people who are not in the room. It is tempting to interpret this as evidence that the team did not know enough to proceed.

The opposite is true. The constraints you could not resolve are precisely the constraints that were most likely to produce production incidents, regulatory notifications, or silent logic failures. The COBOL self-employed branch was not a bug in the usual sense. It was a 40-year-old decision encoded in a subroutine referencing a product that no longer existed. There was no test for it. There was no documentation of it. There was no Phoenix agent smart enough to know it should not be replicated.

The unresolvable residue is not a symptom of insufficient preparation. It is the evidence that the preparation was real. An assumption set that produces no residue either contains assumptions that are genuinely and completely resolved — which is rare in any nontrivial domain — or it contains assumptions that were never stressed hard enough to surface what they depended on.

The exit condition for the ICR cycle is not “all constraints resolved.” It is “all residue explicitly acknowledged with an owner.” That formulation is precise. An acknowledged unresolved item with a named owner and a deadline is a governed risk. An unacknowledged assumption is an ungoverned one. The difference, in production, can be a regulatory breach.

Making Agents Governance-Aware

GDD produces a governed constraint set. Making an AI agent use it is a one-step operation: drop a CLAUDE.md file in the project root and point it at the .gdd/ folder.

The template instructs the agent to read three files before generating anything:

CLAUDE.md — governance contract (excerpt)

Before writing any implementation code, any schema,
any configuration, or any test — read the GDD layer:

  1. Read .gdd/CONSTRAINTS.md   — the governed constraint set
  2. Read .gdd/RESIDUE.md       — open items requiring human decision
  3. Read .gdd/GATES.md         — human decisions already made

→ Does any [UNKNOWN] or [CONFLICT] item in RESIDUE.md
  affect what I am about to generate?

  If YES  — surface it to the human before proceeding.
            Do not generate code that depends on an unresolved item.

  If NO   — proceed, and note which constraints govern your output.

The .gdd/ folder holds four files: CONSTRAINTS.md (the primary artifact), RESIDUE.md (open items), GATES.md (recorded gate decisions), and a CYCLES/ directory for append-only ICR cycle history. The agent reads the constraint set on every pass. A [RESOLVED] item is implemented exactly as stated. An [UNKNOWN] item causes the agent to stop and surface the question rather than resolve it silently. An [CONFLICT] item stops both sides of the implementation until the human resolves it.

This step — making the agent governance-aware via CLAUDE.md — is separable from the ICR cycle itself. A team that has not run a full ICR cycle can still initialize a minimal constraint set and build with governance visibility from day one. The cycle can run alongside early implementation as assumptions surface, rather than only before a major migration or greenfield build.

Where GDD Sits

The Semantic Intent ecosystem runs on a chain that begins with governed intent and ends with auditable execution history. GDD is the upstream source:

GDD → Governed Constraint Set

→ SI contracts → EMBER (.sil artifacts)

→ CAL / REACH / OCTO → execution

→ Synthesis Gate → human judgment at execution time

→ TRACE → audit record

→ CROC → organizational corpus

The Synthesis Gate (DOI 10.5281/zenodo.20684283) is the downstream complement to GDD. GDD surfaces residue before the machine runs. The Synthesis Gate handles residue that emerges during execution — ambiguity the agent encounters that requires human judgment at the moment it arises. Together they describe a complete governance model for AI-first delivery: formalize what is known before build, surface what is unknown during build. Neither eliminates ambiguity. Both ensure it is named and owned.

Intent-as-Infrastructure (DOI 10.5281/zenodo.20681523) named the human as a first-class architectural primitive — not a safety valve, but a designed-in endpoint. GDD operationalizes that claim at the pre-execution layer. The human decisions that need to be made before any agent runs are not a bottleneck to route around. They are the governance acts that make the machine's execution meaningful.

The formal paper — Governance-Driven Design (GDD): Formalizing Intent Before the Machine Runs — is published at doi.org/10.5281/zenodo.20938778. Source and full examples: github.com/semanticintent/governance-driven-design. Documentation: gdd.semanticintent.dev.

Before the Agent Runs

A Migration Running for Twenty Years

The Governance-Last Problem

The ICR Cycle

What the Statuses Mean

What One Example Revealed

The Most Valuable Output

Making Agents Governance-Aware

Where GDD Sits

Go deeper

gdd.semanticintent.dev

The Synthesis Gate

When Intent Becomes Infrastructure