Under the hood: how Contexo distills a session into versioned memory — without ever calling an LLM

A deep dive into Contexo's engine: how the CLI and server stay LLM-free by handing distillation to the agent already in the loop (the push handshake), and how section-aware diffs make a page's history legible in a way git diff can't.

Most “memory for your AI” tools are a vector database with an API key. Contexo took a different bet: the tooling should be dumb on purpose. The CLI and the server don’t call an LLM at all. The smart part — turning a messy coding session into a clean, distilled page — is handed back to the agent that’s already in the loop and already has the context.

This post goes deep on the two mechanisms that make that work: the distiller handshake (how a session becomes a page) and section-aware diffs (how you review that page as it changes).

The capture buffer: a bounded summary of the session

It starts with a hook. ctx hooks install wires Contexo into Claude Code’s Stop hook, which fires after every assistant turn:

~/payments-svc main
$ ctx hooks install ✓ installed Contexo Stop hook in .claude/settings.json

On each turn, the hook calls ctx capture turn, which appends one line to a JSONL buffer at .contexo/raw/sessions/_pending/<session-id>.jsonl. Each line is a TurnRecord:

type TurnRecord struct {
    Timestamp string         `json:"ts,omitempty"`
    Turn      int            `json:"turn"`
    User      string         `json:"user,omitempty"`
    Assistant string         `json:"assistant,omitempty"`
    Tools     []string       `json:"tools,omitempty"`
    Truncated *TruncationTag `json:"truncated,omitempty"`
}

Crucially, this is a summary stream, not a transcript. The buffer is bounded by construction:

MaxUserBytes         = 2 * 1024   // user text truncated to 2KB
MaxAssistantBytes    = 4 * 1024   // assistant text truncated to 4KB
MaxTurns             = 500        // hard cap per session
DropOldestOnOverflow = 100        // and when we hit it, drop the oldest 100

When a session runs long, the oldest 100 turns are dropped and a marker line records that it happened (reason: "buffer_cap"), so the distiller later knows the trail isn’t complete. The buffer is throwaway scratch — it exists only to be distilled, then archived.

The push handshake: the agent is the distiller

Here’s the part that surprises people. When you push, Contexo doesn’t just commit your files. If the batch contains a knowledge page (a concept or analysis) and there’s a session buffer less than 6 hours old, the ctx_push MCP tool pauses and returns a directive instead of committing:

// Distill handshake (Phase 1).
if !distillDone && !noDistill && os.Getenv("CONTEXO_DISTILL_DISABLE") != "1" {
    if directive, ok := s.buildDistillDirective(filtered); ok {
        return textResult(directive)   // <PUSH_PAUSED …>
    }
}

The directive (<PUSH_PAUSED reason=distill_required>) is a set of instructions addressed to the agent. It carries the session buffer inline and asks the agent to write a source page first — following a fixed template:

TEMPLATE (drop sections that genuinely don't apply, keep them in this order):
  ## Decision
  ## Why this approach
  ## Rejected alternatives
  ## Path of inquiry
  ## Dead-ends
  ## Open questions
  ## Sources

IMPORTANT: redact any API keys, tokens, passwords, or PII you encounter.

The agent — which still has the whole session in its context window — writes that page with ctx_write_page(type: "source", …), then re-invokes the push with distill_done: true and the source_slug it just created. That second call is atomic: it links the new source into every concept/analysis page’s sources: frontmatter, archives the buffer, and commits the whole batch in one go.

It’s the inversion of the usual design. Instead of the tool calling a model to summarize a transcript it doesn’t understand, the model that generated the work writes the summary, in its own turn, with full context. The buffer truncation stops being a quality problem — the agent isn’t relying on the buffer to remember; it’s relying on its own context, and the buffer is just the prompt that says “now write it down.”

(You can opt out per-push with no_distill, or globally with CONTEXO_DISTILL_DISABLE=1.)

A page is a commit

What lands is a real git commit. Each Contexo repo is an on-disk git repository; every push is a commit with author attribution. The unit of versioning is the page — markdown with typed YAML frontmatter:

type PageFrontmatter struct {
    Schema           string    // "ctx.page.v1"
    Slug             string
    Type             PageType  // concept | entity | source | analysis
    Author           string
    Agent            string    // which agent wrote it
    Created          time.Time
    Updated          time.Time
    ParentSHA        string
    Sources          []string  // ← distiller links source pages in here
    Related          []string
    Tags             []string
    ReasoningSummary string    // one-line distillation
}

That Sources list is the thread that ties a concept back to the reasoning trail it came from. The distiller doesn’t just write prose — it wires the graph.

Diffing context isn’t diffing code

Now the read side. Once your context is versioned, you want to review how it changed — but git diff is the wrong tool for prose. Reflow a paragraph and it’s a wall of red and green. Reorder two sections and it looks like everything changed. Add a tag to a frontmatter list and the diff says the whole line changed.

So Contexo doesn’t diff lines. It parses each version into {frontmatter, preamble, ## sections} and diffs structurally:

type SectionDiff struct {
    FromSHA     string
    ToSHA       string
    Frontmatter FrontmatterDiff   // per-field
    Preamble    *SectionChange
    Sections    []SectionChange   // per ## heading
}

type SectionChange struct {
    Heading      string
    OldHeading   string   // set only on a rename
    Status       string   // unchanged | added | removed | modified | renamed
    From, To     string
    IntroducedBy *Commit  // per-section git-blame, when requested
}

Three things fall out of that structure:

  • Frontmatter is diffed field by field. Scalars show old→new. List fields (tags, sources, related) use set semantics — you see “added chompchat,” not “this line changed.” (That’s the top screenshot below: reasoning_summary rewritten, three related links added.)
  • Sections are matched, not lined up. A section is added, removed, modified, or renamed. A rename is detected when an unmatched (removed, added) pair shares ≥ 70% of its body — so changing a heading doesn’t read as “deleted one section, invented another.”
  • Blame is per section. Ask for it and each section carries IntroducedBy — the earliest commit where that heading first appeared.

One SectionDiff value is produced once and rendered three ways: by the CLI (ctx diff), over MCP (ctx_diff / ctx_evolution, so your agent can inspect history), and in the dashboard. Here’s that structure rendered for a real page from our own Contexo:

Contexo diff view of a page's frontmatter: created and updated timestamps changed, reasoning_summary rewritten from a terse note to a fuller description, and a new related list added Frontmatter, field by field — reasoning_summary sharpened, three related links added (set semantics).

Contexo diff view showing two newly added markdown sections — a TL;DR (+6 lines) and a single-price vs variants section (+18 lines) — with their content highlighted Whole sections added, with per-section line counts.

Contexo diff view showing a removed section (−14 lines) with the old content it replaced And a section removed (−14 lines) — you see exactly what context went away.

~/payments-svc main
$ ctx diff payments-webhooks --blame section “Retry policy” — modified · introduced by 3f9a1c2 (dana, 5d ago)

How the two halves fit together

They’re the same idea from two ends. The distiller writes pages — real commits, with parent_sha and a sources: graph. The differ reads any two commits of a page and produces a structure you can actually review. Drift detection and the conflict-merge directive ride the same git substrate: they’re all just comparisons of where your context was versus where it is.

And none of it needs the tool to be smart. The git layer is dumb and durable; the diff is deterministic; the one genuinely intelligent step — distillation — is delegated to the agent that’s already doing the thinking.

That’s the whole design, and it’s open source. If you want to read the real thing, the CLI, server, and MCP layer are on GitHub — start with internal/capture, internal/mcp/tools.go, and internal/diff/section.go, which is most of what this post is about.

ctx init

Point it at a project and watch your next session turn into a page you can diff. Free to start at contexo.live.

FAQ

Why doesn't the CLI just call an LLM to distill the session itself? +

Three reasons: no API keys or inference cost baked into the tool, a deterministic CLI you can self-host anywhere, and the distiller is always the best model you're already using — the agent in your editor, which already has the full session in context.

Does my session buffer or secrets get sent to the server? +

The buffer is a local file under .contexo/raw/sessions/_pending/. What gets committed is the distilled source page the agent writes — and the handshake explicitly instructs the agent to redact API keys, tokens, passwords, and PII while writing it.

Can I turn the distill handshake off? +

Yes — pass no_distill on the push, or set CONTEXO_DISTILL_DISABLE=1. The handshake only fires when a knowledge page is in the batch and a session buffer less than 6 hours old exists.

Is the diff just git diff under the hood? +

No. Pages are stored in git, but the diff is a custom structured pass over two versions — it understands YAML frontmatter fields and markdown sections, not lines.