RECALL: The Institutional Memory Problem

An AI model re-enters your pipeline fresh on every run. No persistent mental model. No accumulated context. The Pipeline Manifest is the architectural answer — one machine-readable read that encodes everything the pipeline needs.

Michael Shatny·April 9, 2026·9 min read

Every Session Starts From Zero

A human author reads the documentation once. They build a mental model. They remember which fields are required, which elements accept groups, how the theme layer works, what the COMMENT clauses are for. The next time they open a .rcl file, that mental model is still there.

An AI author does not have this. Every run starts from zero. There is no accumulated context, no residual understanding, no memory of what the pipeline looked like last week. The model enters the session cold and must reconstruct the full picture from whatever it is given at the start of the prompt.

For a simple language this is manageable. Paste the schema, paste the brief, generate. But the RECALL pipeline is not a simple language plus some data. It has four distinct schema layers, each serving a different audience, each encoding a different kind of contract. A Language Schema that defines what elements exist. A Component Manifest that defines what plugin components look like. A Common Record Description that aligns what the AI fills in with what the compiler receives. And a Compositor Contract that governs how intent gets expanded into structure.

Four layers. Four documents. Four places an AI orchestrator must look before it can operate with any confidence.

That coordination cost is not acceptable at scale. Something has to carry the full pipeline contract in a single read.

Four Layers That Emerged — Not Were Designed

The four schema layers were not planned. They were discovered through use, each one solving a problem the previous one exposed.

The Language Schema came first. recall schema --json was built so that AI compositors could query the live element registry directly — the same data the compiler uses to validate every program it compiles. Not documentation that might drift. The compiler's own vocabulary, made queryable.

The Component Manifest came next. recall scaffold needed to read plugin component definitions — field shapes, PIC types, COMMENT annotations, group structures — from a machine-readable manifest that any plugin could ship. The author stops guessing. The manifest tells them exactly what fields a component expects and how they should be typed.

The Common Record Description came from a failure mode. The pipeline passes data through three expression points: the MCP tool's inputSchema tells the AI what to fill in. The brief JSON stores what was filled in. The DATA DIVISION declares what the compiler receives. When all three agree, the pipeline works. When they diverge — field name mismatch, PIC length exceeded, group row count wrong — the compiler truncates silently. The output is wrong. Nobody is told.

The Compositor Contract was the first AI-native layer. Every other layer had a 1959 precedent. The Compositor Contract — the JSON payload between recall expand and an AI compositor — is purely an AI-era invention. Schema version, intent string, DATA DIVISION symbols, COMMENT clauses, palette, component registry. The formal specification of what the AI receives and what it is expected to produce.

Four layers. Each named. Each with a distinct audience. And no single document that unified them.

The Common Record Description: Where Silent Failures Live

Of the four layers, the Common Record Description is the most consequential for pipeline reliability — and the least visible until something goes wrong.

The failure mode it addresses is specific: a value that fits in the brief JSON but exceeds the PIC X length declared in the DATA DIVISION. The AI fills in SECTION-TITLE with 82 characters. The DATA DIVISION declares PIC X(60). The compiler truncates at 60. The rendered page has a cut-off heading. No error. No warning. No signal anywhere in the pipeline that anything went wrong.

This is COBOL's implicit behaviour failure, reproduced in a modern AI-first publishing pipeline. The language trusts the data to fit. The data does not fit. The output is silently wrong.

recall crd is the answer. Run it before the compile step and it validates four agreements:

$ recall crd uc-230.rcl --against uc-230.json

  CRD — Common Record Description
  RCL:   src/uc-230.rcl
  Brief: briefs/uc-230.json

  ✗  [CRD-001]  Brief field "CASE-DATE-ISO" has no matching DATA DIVISION declaration
        → Add the field to the DATA DIVISION or remove it from the brief
  ⚠  [CRD-003]  "SECTION-2-BODY" value is 94 chars — exceeds PIC X(80), will truncate by 14
        → Increase PIC X to X(94) or shorten the value

  1 error, 1 warning

CRD-001: a field in the brief has no DATA DIVISION declaration — the compiler will ignore it silently. CRD-002: a DATA DIVISION field has no brief value — it will render empty. CRD-003: a value exceeds its PIC X limit — it will truncate. CRD-004: a brief array length differs from the DATA group row count — the group will render incorrectly.

Four diagnostic codes. Four ways the three layers can diverge. All caught before the compile step runs. The AI filled in the brief; the CRD validator is what makes that fill-in trustworthy.

This is the trust layer. Without it, the pipeline runs. With it, the pipeline is reliable.

Hopper's Catalogue, 1959

Three of the four layers have 1959 COBOL lineage. The Language Schema descends from COBOL's element vocabulary — the formal definition of what the language can express. The Component Manifest descends from COBOL's COPY books — shared definitions distributed across programs and resolved at compile time. The Common Record Description descends from COBOL's record layouts — the agreed field set that travels across every system that touches the same data.

Grace Hopper, who led the development of COBOL, had a name for a document that catalogued what existed and what contract each entry honoured. She called it the Program Library Directory — the unified index of programs, their interfaces, and their dependencies, readable by anyone who needed to understand what the system contained.

The Pipeline Manifest is that document for the RECALL pipeline. Hopper's name, applied to a 2026 problem she couldn't have anticipated: how do you give an AI orchestrator the institutional memory of a pipeline in a single read?

The answer looks like this:

$ recall manifest --json

{
  "schema": "recall-manifest/1.0",
  "generated": "2026-04-09T14:15:45.245Z",
  "philosophy": "Structured publishing language. Source is the artifact. AI authors, compiler renders, human reviews.",
  "layers": {
    "language": {
      "command": "recall schema --json",
      "purpose": "All valid RECALL elements, PIC types, divisions, and clauses",
      "data": { ... }
    },
    "components": {
      "package": "@stratiqx/recall-components",
      "manifest": "@stratiqx/recall-components/components/index.json",
      "purpose": "Field definitions and group shapes for available plugin components",
      "components": ["CAL-CASE-STUDY"]
    },
    "crd": {
      "document": "docs/COMMON-RECORD-DESCRIPTION.md",
      "command": "recall crd <file.rcl> --against <brief.json>",
      "purpose": "Field agreement across MCP inputSchema, brief JSON, and DATA DIVISION",
      "checks": ["CRD-001", "CRD-002", "CRD-003", "CRD-004"]
    },
    "compositor": {
      "document": "docs/COMPOSITOR-CONTRACT.md",
      "command": "recall expand <file.rcl>",
      "purpose": "WITH INTENT expansion protocol between recall expand and an AI compositor"
    }
  },
  "methodology": {
    "authoring":  "AI assembles brief against Common Record Description",
    "rendering":  "RECALL compiler + plugin renderers produce self-contained HTML",
    "validation": "inputSchema descriptions enforce field discipline at authoring time",
    "provenance": "brief JSON persisted alongside HTML — source always recoverable"
  }
}

The language schema is always inlined — the full element registry, PIC types, and division definitions embedded directly in the payload. The component list populates when the plugin package is resolvable. The CRD and compositor layers carry their commands and document pointers. The methodology block encodes the pipeline's philosophy in machine-readable form — not as prose the AI must parse, but as structured key-value pairs it can reason from directly.

One read. Full institutional memory.

What the Manifest Encodes That Documentation Cannot

Documentation sites answer the question a human thinks to ask. The Pipeline Manifest answers every question an AI orchestrator needs answered before it can operate — whether or not the orchestrator knows it needs to ask.

A human reading documentation absorbs context gradually, builds a model over time, and fills gaps from experience. An AI orchestrator that reads documentation is doing something structurally different: it is pattern-matching against prose and extracting structured understanding from unstructured text. That extraction is lossy. Nuance is lost. Edge cases are missed. The mental model built from documentation is an approximation.

The manifest is not documentation. It is the pipeline's schema made self-describing. The element vocabulary is not described in the manifest — it is present in it, the same data structure the compiler uses internally. The component names are not summarised — they are enumerated. The CRD diagnostic codes are not explained — they are listed as the exact strings the validator emits. Nothing to infer. Nothing to approximate.

There is also the philosophy block. Every publishing pipeline has an implicit philosophy — assumptions about who authors, how artifacts are produced, what happens when something goes wrong. Those assumptions usually live in README files, onboarding documents, and team conventions. They are invisible to an AI orchestrator entering the pipeline for the first time.

The manifest makes the philosophy explicit and machine-readable: source is the artifact, AI authors, compiler renders, human reviews, brief JSON persisted for provenance. Not as marketing language — as structured values an orchestrator can read and reason from. When the orchestrator faces a judgment call at the edge, the philosophy block is the reference.

The Autonomy Trajectory

The pipeline did not start here. It started with a human author writing HTML directly. Then a human author assembling a brief and reviewing compiled output. Then the MCP tool generating the brief, with the human reviewing the HTML and deploying. Each step reduced the number of human touches per published artifact.

The trajectory is visible when you lay it out:

Phase           Human role                    Pipeline state
────────────────────────────────────────────────────────────
Early           Writes HTML directly           No pipeline
Formalised      Assembles brief, reviews HTML  MCP tool + RECALL
Current         Reviews output, deploys        Full pipeline, brief persisted
Near-term       Approves cluster selection     Model proposes, pipeline renders
Target          Editorial gate only            Pipeline runs autonomously

The target state requires the pipeline to make judgment calls without a human in the loop — which fields to populate, which component to use, how long a section should be, whether a value fits its PIC X constraint before the compile step runs. These are not decisions that can be made from the brief JSON alone. They require the full pipeline contract.

The Pipeline Manifest is the precondition for the target state. Without it, autonomous operation is fast but not principled — the orchestrator is moving quickly through a pipeline it only partially understands. With it, autonomous operation is principled: the orchestrator has read the same schema the compiler enforces, knows the same diagnostic codes the validator emits, and carries the philosophy that governs edge cases. It can be trusted with more decisions because it has more of the contract.

The number of human touches per published artifact is not just an efficiency metric. It is the measure of how completely the pipeline's institutional memory has been made legible to the machines that run it.

The Architecture Is Complete

The four schema layers are now implemented and shipped:

recall schema --json       Language Schema       v0.x
recall scaffold --list     Component Manifest    v0.8
recall crd --against       Common Record Desc.   v1.0.6
recall manifest --json     Pipeline Manifest     v1.0.7

Each command queries a different layer. Each layer speaks to a different audience. All four are available in a single read via recall manifest --json.

Three of the four layers have Grace Hopper's DNA — element vocabulary, COPY books, record layouts. One is purely AI-era — the Compositor Contract has no 1959 precedent. The fact that they connect without friction is the legitimacy of the architecture. The discipline that made COBOL durable across 60 years is the same discipline that makes the RECALL pipeline reliable when AI authors are operating autonomously within it.

The problem was never that AI models lack capability. The problem was that pipelines lacked the architecture to make the full contract readable. The compiler knows the rules. The validator knows the constraints. The manifest knows the philosophy. Now the AI can know all three — from a single command, before the first line of source is written.

Every session still starts from zero. But zero is no longer empty.