Wake Intelligence: The Fourth Question

The first three layers ask why, how, and what. Layer 4 turns the question inward: how well am I actually predicting? Per-project weight tuning, cross-project causality, and semantic search — what shipped in v3.3.0 and why the hardest part was keeping three numbers well-behaved.

Michael Shatny·May 2, 2026·9 min read

The Promise Left Open

The previous post about Wake Intelligence ended with a deliberate gap. The prediction formula — 40% temporal, 30% causal, 30% frequency — was described as “calibrated starting points, not empirically tuned coefficients.” The weights were educated guesses. The scaffolding for learning was in place. And then: “The learning itself is the next step.”

That was v3.1. This is v3.3. The step has been taken.

But the gap was intentional in a second sense. The previous post also argued against opacity: “A learned layer would likely produce better recall. It would also produce an opaque result — a number without a traceable cause.” So the question wasn't just when to add learning. It was how to add learning without sacrificing the one property that makes Wake Intelligence useful for AI agents in the first place: every output has an observable reason.

The answer turned out to be algebraic rather than neural. No embeddings in the prediction path. No gradient descent. A small feedback loop operating on three numbers — and the most interesting algorithmic challenge was in keeping those three numbers well-behaved when any one of them hit a boundary.

The Fourth Question

The first three layers ask time-anchored questions:

Layer 1

Past — WHY

Why was this context created? What caused it?

Layer 2

Present — HOW

How relevant is this right now? What tier is it in?

Layer 3

Future — WHAT

What will be needed next? What is the prediction score?

Layer 4 asks a different kind of question — one directed inward rather than at the data: how well am I actually predicting?

The distinction matters. Layers 1–3 are about understanding contexts. Layer 4 is about understanding the understanding — a feedback loop on the prediction mechanism itself. Not “what is the score for this context?” but “are the weights I used to produce that score working for this project?”

Different projects have different access patterns. A codebase with long-running threads across months rewards causal weight. A fast-moving sprint rewards temporal weight. A reference library rewards frequency weight. Hardcoding 40/30/30 means the system is calibrated for an average project that may not exist. Layer 4 lets each project discover its own calibration.

How the Learning Works

The mechanism is deliberately narrow. It doesn't learn representations. It learns three numbers.

Every time a context is accessed, the system records a prediction outcome: the three component scores at the moment of access (temporal, causal, frequency), the composite score, and the fact that the context was actually accessed — confirming the prediction. These accumulate in a prediction_outcomes table, one row per access event.

// On every load_context or search_context call:
recordPredictionOutcome({
  contextId,
  project,
  predictedScore,          // composite at time of access
  temporalComponent,       // 0.0–1.0, how high was temporal?
  causalComponent,         // 0.0–1.0, how high was causal?
  frequencyComponent,      // 0.0–1.0, how high was frequency?
  actuallyAccessed: true,  // confirmed hit
})

Once a project accumulates 20 outcomes, the tuning runs. The algorithm is mean-normalisation: average each component across the recorded outcomes, then normalise so the three averages sum to 1.0. The dimension that most consistently scores high in the moments before access gets a higher weight going forward.

// Simplified tuning:
const avgT = mean(outcomes.map(o => o.temporalComponent))
const avgC = mean(outcomes.map(o => o.causalComponent))
const avgF = mean(outcomes.map(o => o.frequencyComponent))
const total = avgT + avgC + avgF

newWeights = {
  temporal:  avgT / total,
  causal:    avgC / total,
  frequency: avgF / total,
}

The result is a per-project project_weights row that the Propagation Engine reads before scoring. The prediction formula stays identical — the weights fed into it change.

The Clamp Problem

The most interesting implementation detail wasn't the learning — it was keeping the weights stable after learning.

Without constraints, tuning can collapse. If a project is purely temporal by nature, the temporal component will dominate the averages, and after several tuning cycles the temporal weight could approach 1.0 while the others approach 0. That makes the causal and frequency components effectively invisible — even on the rare occasions they carry signal.

The solution is a clamp: each weight is bounded between 0.1 and 0.6. No dimension can disappear below 10%, and no dimension can dominate beyond 60%.

const MIN_WEIGHT = 0.1
const MAX_WEIGHT = 0.6

// After normalisation, clamp each weight:
t = clamp(raw.temporal, MIN_WEIGHT, MAX_WEIGHT)
c = clamp(raw.causal,   MIN_WEIGHT, MAX_WEIGHT)
f = clamp(raw.frequency, MIN_WEIGHT, MAX_WEIGHT)

But clamping creates a new problem: the three values no longer sum to 1.0. A simple re-normalisation step — divide each by their sum — looks correct but breaks when multiple weights are pinned at their boundaries simultaneously. The re-normalised value for a weight that was clamped at MAX would be pushed above MAX.

The fix is redistribution: calculate the deficit (how far the clamped values are from 1.0), then distribute it only among the weights that have room to absorb it — those not already pinned at a boundary.

function redistribute(t: number, c: number, f: number) {
  const deficit = 1.0 - (t + c + f)
  if (Math.abs(deficit) < 1e-9) return [t, c, f]

  // Only distribute to weights that aren't already pinned
  const notPinned = (w: number) =>
    deficit > 0 ? w < MAX_WEIGHT - 1e-9 : w > MIN_WEIGHT + 1e-9
  const count = [t, c, f].filter(notPinned).length
  if (count === 0) return [t, c, f]

  const share = deficit / count
  return [
    notPinned(t) ? clamp(t + share, MIN_WEIGHT, MAX_WEIGHT) : t,
    notPinned(c) ? clamp(c + share, MIN_WEIGHT, MAX_WEIGHT) : c,
    notPinned(f) ? clamp(f + share, MIN_WEIGHT, MAX_WEIGHT) : f,
  ]
}

The invariant holds: the three weights always sum to exactly 1.0, no dimension exceeds its boundary, and the result is explainable. The system can report which weights are learned vs. defaulting, what the sample size was, and when tuning last ran.

Cross-Project Causality

The same release extended Layer 1 — the Causality Engine — across project boundaries.

The original design scoped dependency detection to a single project. When you saved a context for project-a, it would look back one hour for other contexts in project-a to auto-detect dependencies. This works when work is contained. It breaks when work spans multiple repositories, codebases, or workstreams — which it usually does.

Two additions resolve this. First, save_context now accepts a crossProject: true flag. When set, the temporal dependency scan queries all projects instead of just the save target — the system looks back across everything you were working on in the last hour, not just the current context's home project.

Second, a new tool: get_cross_project_dependents. Given a snapshot ID, it walks the full downstream dependency graph — not just direct children, but all transitive descendants across any number of hops and projects. The traversal is a standard BFS with a visited set to guard against cycles.

// What did this context eventually cause?
get_cross_project_dependents({ snapshotId: "abc-123" })

// Returns: all contexts in any project where
// causedBy traces back (directly or transitively) to abc-123

This answers a question that single-project causality couldn't: if a key architectural decision was made in one project, which contexts across all your other projects were downstream of it?

Semantic Search

The search path also changed in this release. The previous implementation was keyword-only: a SQL LIKE query on summary and tags. Adequate for exact matches, blind to meaning.

search_context now routes through Cloudflare Vectorize first. When a context is saved, its summary is embedded asynchronously — fire-and-forget, no latency impact on the save response — using @cf/baai/bge-base-en-v1.5 (768-dimensional vectors). The Vectorize index handles similarity queries at search time.

The fallback is preserved. If Vectorize returns no matches — either because the index is empty, the query has no close neighbours, or embeddings haven't propagated yet — the system falls through to the keyword search. Degraded capability, not a failure.

The determinism trade-off is worth noting. Semantic search produces results that can't be fully explained — two contexts may be similar because of meaning the embedding captured but the text doesn't surface. This is intentional and scoped: the search path uses embeddings, while the prediction scoring and causality logic remain purely algebraic. The explainability guarantee holds where it matters most.

Where It Stands

v3.3.0 completes the 4-layer architecture. The brain now has a feedback loop.

Layer 1

25 tests

Causality Engine — causal chains, cross-project BFS, dependency detection

Layer 2

10 tests

Memory Manager — 4-tier classification, LRU, prune

Layer 3

42 tests

Propagation Engine — prediction scoring, stale refresh

Layer 4

20 tests

Meta-Learning — weight tuning, clamp, redistribute invariant

Infrastructure

53 tests

D1, Vectorize, CloudflareAI — adapters and fallbacks

Application + Presentation

22 tests

MCP protocol, tool dispatch, HTTP routing

221 tests total. 15 MCP tools. The system deploys to Cloudflare Workers, runs the cron refresh every six hours, and starts learning project-specific weights from the first access event.

The architecture doc and full API reference are at wake.semanticintent.dev. The npm package is @semanticintent/semantic-wake-intelligence-mcp.