Wake Intelligence: The Three Questions Most Context Stores Never Ask

Most MCP context servers answer one question: what did we save? Wake Intelligence answers three: why was it created, how relevant is it right now, and what will be needed next. A 3-layer temporal intelligence brain for AI agents, built on Cloudflare Workers.

Michael Shatny·May 1, 2026·9 min read

The Question Behind the Question

A cormorant diving beneath the surface leaves a wake — the visible ripple of where it's been. Wake Intelligence is named for that trail. It tracks the ripple effects of decisions through time.

Every AI agent working with persistent context eventually faces the same moment: it surfaces a snapshot from three weeks ago and doesn't know what to do with it.

Not “is this data correct” — that's a different problem. The questions are subtler. How relevant is this right now? Was this an active working context or something resolved and closed out? Was it the root cause of everything that came after, or a leaf node that depended on something else? Should the agent pre-load this before the next operation, or wait to be asked?

Most context stores answer none of these. They store and retrieve. Relevance is implicit in the timestamp. Why a context was created is somewhere in the content, if it was written down at all. What comes next is the agent's problem.

Wake Intelligence was built around three explicit questions that operate at different timescales. That structure is what makes it a brain rather than a store.

Past — Why This Exists

The first layer, the Causality Engine, tracks why a context was created — not just what it contains.

When a context is saved, it can carry an action type (decision, file_edit, research, tool_use), a causal chain linking it to prior contexts, and the strength of those relationships. Dependency detection can happen automatically: contexts created within the same hour are likely causally related, and the engine links them without requiring explicit input.

The value isn't in retrieval. It's in reconstruction.

An agent returning to a project after weeks away can call build_causal_chain and trace a decision backwards through time — what led to the current architecture, why certain constraints were introduced, which early contexts are still load-bearing. The question “why is this built this way?” has an answer that was captured at the moment the work happened, not inferred after the fact from code comments and git history.

This is the layer that turns context from a snapshot into institutional memory.

Present — How Relevant Is This Right Now

The second layer, the Memory Manager, answers a question that changes continuously: how relevant is this context at this moment?

The answer is observable and deterministic. Contexts move through four tiers based on time since last access:

ACTIVE

< 1 hour

Highest priority

Top of results — in active use

RECENT

1–24 hours

High priority

Accessed today — still relevant

ARCHIVED

1–30 days

Low priority

De-prioritised — fading relevance

EXPIRED

> 30 days

Lowest priority

Pruning candidate

These aren't scores. They're facts about time, recalculated as contexts age. The same snapshot that was ACTIVE during an active sprint becomes ARCHIVED without anyone touching it — classification updates automatically as time passes.

This matters for two operations. Search results rank by tier alongside text similarity: an ACTIVE context is almost certainly more relevant than an ARCHIVED one with identical keywords. And cleanup has a principled basis: EXPIRED contexts are pruning candidates, not just the oldest entries in the database.

Access tracking fires and forgets — load operations update tiers without adding latency to responses. The housekeeping happens behind the response.

Future — What Will Be Needed Next

The third layer, the Propagation Engine, computes a prediction score for every context — an estimate of how likely it is to be accessed in the next operation.

The score is composite:

predictionScore =
  0.4 × temporalScore     // Recency — exponential decay, 24h half-life
  0.3 × causalStrength    // Graph position — roots score higher than leaves
  0.3 × frequencyScore    // Access count — logarithmic scale

Temporal score uses exponential decay: a context accessed an hour ago scores near 1.0; one last touched a week ago scores significantly lower. Causal score reflects position in the causality graph — roots that others depend on matter more than leaf nodes. Frequency score is logarithmic access count, with diminishing returns past a threshold.

Every prediction carries a reason alongside the score: recently_accessed, high_access_frequency, causal_chain_root, active_memory_tier. Not a number alone — an explanation. This matters for debugging, and for understanding why specific contexts are surfaced instead of others.

A scheduled cron job refreshes stale predictions every six hours across all projects. The engine doesn't wait to be asked — it maintains a continuously updated view of which contexts have the highest probability of being needed next.

The weights (40/30/30) are calibrated starting points, not empirically tuned coefficients. That limitation is acknowledged in the architecture. A future Layer 4 would track prediction accuracy per project and shift weights toward whatever dimension best predicts actual access in that domain. Some projects are temporal by nature. Others are pattern-driven. The scaffolding for learning is already in the architecture. The learning itself is the next step.

The Architecture Behind It

Wake Intelligence is built on hexagonal architecture. The domain logic has zero infrastructure dependencies. The composition root wires all three layers together in 74 lines — down from 483 in the previous monolithic version.

That 90% reduction isn't cosmetic: it's the difference between code that can be tested in isolation and code that can only be run as a whole. The 163-test suite covers every architectural layer independently:

Domain

111 tests

Pure logic, no mocks — the algorithms in isolation

Application

10 tests

Orchestration and MCP tool dispatch

Infrastructure

30 tests

D1 persistence and AI provider adapters

Presentation

12 tests

HTTP routing and CORS

Each layer is independently testable because each has exactly one responsibility and depends on interfaces, not implementations. Swapping D1 for a different database doesn't touch the domain logic. Replacing Workers AI with another provider doesn't require touching the prediction algorithms.

It deploys to Cloudflare Workers with D1 as the database. Every operation that can fail has a fallback that still produces a useful result.

What This Is and What It Isn't

Wake Intelligence is not a vector database. It doesn't use embeddings or semantic similarity. Every algorithm is deterministic: the same inputs always produce the same outputs, and every output has an observable reason attached.

This is a deliberate choice, not a gap. Deterministic algorithms are debuggable. You can trace exactly why a context received a high prediction score. You can audit why a context was pruned. You can explain to an AI agent why it's being given this set of contexts and not another.

A learned layer would likely produce better recall. It would also produce an opaque result — a number without a traceable cause. For a system designed to help AI agents reason about their own history, opacity in the infrastructure would be a contradiction in terms.

The learning path exists and is scoped. For now, the machine explains itself.

The Implementation

Wake Intelligence is open source, MIT licensed, and published to npm. The documentation site includes a full architecture walkthrough, API reference, and interactive playground.

Wake Intelligence: The Three Questions Most Context Stores Never Ask

The Question Behind the Question

Past — Why This Exists

Present — How Relevant Is This Right Now

Future — What Will Be Needed Next

The Architecture Behind It

What This Is and What It Isn't

The Implementation

Documentation

npm package

GitHub repository

Architecture doc