Skip to content

feat(doctor): dangling topic_key reference detection in engram doctor #1

@Basparin

Description

@Basparin

Status: PROPOSAL INCUBATOR — internal Basparin fork

This issue lives in Basparin/engram (fork) as a proposal incubator per the fork-as-staging protocol (Basparin/symbiosis/protocols/fork-as-staging.md). It is iterated here until mature, then presented to upstream Gentleman-Programming/engram.

This is the first issue using this protocol. It serves as both content (proposal) and pattern test.


Context

engram lets users save memories with a topic_key (stable string identifier) for upsert semantics. Tools like CLAUDE.md, project docs, or other engram observations frequently reference a topic_key by name (e.g., "see engram topic workstation/pregunta-2-candidates"). When that referenced topic_key has no observations, the reader has no way to detect this is a dangling reference.

Empirical evidence

In Basparin/Thronglets project, CLAUDE.md references:

Open work: Workstation Pregunta 2 (ops no-delegables sin validación) — detalle en engram topic roadmap/current y workstation/pregunta-2-candidates.

A mem_search for both topic_keys returns 0 hits. A previous session summary (engram observation #967, 2026-04-26) noted: "Detalle en engram topics roadmap/current y workstation/pregunta-2-candidates (no logré abrir desde scope actual)" — confirming the references were dangling at least 5 days before this issue.

The reference was promoted into a permanent project document (CLAUDE.md) and persists, while the underlying topic_keys were never written. The dangling state is invisible to consumers of CLAUDE.md unless they manually verify each topic_key against engram.

Problem

Engram has no first-class mechanism to detect dangling topic_key references. Documents end up with confidently asserted citations to topic_keys that have never been observed. Search will return 0 hits, but the reader can't distinguish between (a) my query is wrong, (b) wrong scope/project, (c) the topic was never written.

Proposed approach (initial — open for iteration)

Add a diagnostic in engram doctor (introduced v1.15.0) that:

  1. Accepts an input file path or text blob.
  2. Extracts candidate topic_key references using a configurable pattern. Default pattern: tokens matching [a-z0-9-]+/[a-z0-9-]+(/[a-z0-9-]+)*, optionally surrounded by backticks.
  3. For each candidate, queries the local engram store: does any observation have this exact topic_key?
  4. Reports per candidate: MATCH (with observation count), DANGLING (zero observations), or AMBIGUOUS (matches multiple keys with case/separator variants).

Example invocation:

engram doctor topic-keys --file CLAUDE.md
engram doctor topic-keys --stdin < some-doc.md
engram doctor topic-keys --recursive ./docs

Optional report modes:

  • --strict: exit code != 0 if any DANGLING found (CI use).
  • --repair-suggestions: for each DANGLING, run mem_compare (v1.15.0) against existing topic_keys and suggest closest matches — typo detection.

Acceptance criteria (initial — refine before upstream)

  • engram doctor topic-keys subcommand exists and accepts --file, --stdin, --recursive.
  • For each candidate, classification is one of {MATCH, DANGLING, AMBIGUOUS}.
  • --strict flag returns non-zero exit code when DANGLING present.
  • At least one integration test covers each classification path.
  • Documentation update for engram doctor describes the subcommand.

Open questions (iterate via comments)

  1. Should the topic_key extraction pattern be configurable (regex via flag) or fixed?
  2. Should this scan ALL observations (any project, any scope), or default to current project + scope only?
  3. Should --repair-suggestions use mem_compare (v1.15.0) — does that exist as MCP only, or also CLI?
  4. Should there be a positive-mode that validates a list of topic_keys explicitly (vs blob extraction)?
  5. How do we handle topic_keys that legitimately have no observations yet (planned future topics)? An allowlist? A topic_key: frontmatter convention?

Upstream presentation criteria (when mature)

This issue is ready to present to Gentleman-Programming/engram when:

  • Open questions above are resolved.
  • A working prototype exists in a proposal/dangling-topic-key-detection branch in this fork.
  • At least one integration test passes.
  • Concrete CLI ergonomics validated (subcommand structure, flag names).
  • Acceptance criteria refined from initial draft to final.

Until then, iteration happens here.

Related

  • engram v1.15.0 release notes: introduces engram doctor and engram conflicts (this proposal extends doctor).
  • Gentleman-Programming/engram#233 (Basparin's open issue): semantic search layer — orthogonal but complementary (semantic search would help "did you mean" suggestions; doctor topic-keys is exact-match diagnostic).
  • Future Basparin/engram issues likely: mem_list_topic_keys browse tool, pre-write topic_key fuzzy similarity check.

Engram trace

  • basparin/fork-as-staging-protocol (2026-05-01) — protocol decision
  • This issue is the first artifact under that protocol.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions