feat(doctor): dangling topic_key reference detection in engram doctor

## Status: PROPOSAL INCUBATOR — internal Basparin fork

This issue lives in `Basparin/engram` (fork) as a **proposal incubator** per the fork-as-staging protocol ([Basparin/symbiosis/protocols/fork-as-staging.md](https://github.com/Basparin/symbiosis/blob/main/protocols/fork-as-staging.md)). It is iterated here until mature, then presented to upstream `Gentleman-Programming/engram`.

This is the first issue using this protocol. It serves as both content (proposal) and pattern test.

---

## Context

`engram` lets users save memories with a `topic_key` (stable string identifier) for upsert semantics. Tools like CLAUDE.md, project docs, or other engram observations frequently **reference** a topic_key by name (e.g., "see engram topic `workstation/pregunta-2-candidates`"). When that referenced topic_key has no observations, the reader has no way to detect this is a dangling reference.

## Empirical evidence

In `Basparin/Thronglets` project, `CLAUDE.md` references:

> **Open work**: Workstation Pregunta 2 (ops no-delegables sin validación) — detalle en engram topic `roadmap/current` y `workstation/pregunta-2-candidates`.

A `mem_search` for both topic_keys returns 0 hits. A previous session summary (engram observation #967, 2026-04-26) noted: *"Detalle en engram topics `roadmap/current` y `workstation/pregunta-2-candidates` (no logré abrir desde scope actual)"* — confirming the references were dangling at least 5 days before this issue.

The reference was promoted into a permanent project document (CLAUDE.md) and persists, while the underlying topic_keys were never written. The dangling state is invisible to consumers of CLAUDE.md unless they manually verify each topic_key against engram.

## Problem

Engram has no first-class mechanism to detect dangling topic_key references. Documents end up with confidently asserted citations to topic_keys that have never been observed. Search will return 0 hits, but the reader can't distinguish between (a) my query is wrong, (b) wrong scope/project, (c) the topic was never written.

## Proposed approach (initial — open for iteration)

Add a diagnostic in `engram doctor` (introduced v1.15.0) that:

1. Accepts an input file path or text blob.
2. Extracts candidate topic_key references using a configurable pattern. Default pattern: tokens matching `[a-z0-9-]+/[a-z0-9-]+(/[a-z0-9-]+)*`, optionally surrounded by backticks.
3. For each candidate, queries the local engram store: does any observation have this exact `topic_key`?
4. Reports per candidate: `MATCH` (with observation count), `DANGLING` (zero observations), or `AMBIGUOUS` (matches multiple keys with case/separator variants).

Example invocation:
```
engram doctor topic-keys --file CLAUDE.md
engram doctor topic-keys --stdin < some-doc.md
engram doctor topic-keys --recursive ./docs
```

Optional report modes:
- `--strict`: exit code != 0 if any DANGLING found (CI use).
- `--repair-suggestions`: for each DANGLING, run `mem_compare` (v1.15.0) against existing topic_keys and suggest closest matches — typo detection.

## Acceptance criteria (initial — refine before upstream)

- [ ] `engram doctor topic-keys` subcommand exists and accepts `--file`, `--stdin`, `--recursive`.
- [ ] For each candidate, classification is one of {MATCH, DANGLING, AMBIGUOUS}.
- [ ] `--strict` flag returns non-zero exit code when DANGLING present.
- [ ] At least one integration test covers each classification path.
- [ ] Documentation update for `engram doctor` describes the subcommand.

## Open questions (iterate via comments)

1. Should the topic_key extraction pattern be configurable (regex via flag) or fixed?
2. Should this scan ALL observations (any project, any scope), or default to current project + scope only?
3. Should `--repair-suggestions` use `mem_compare` (v1.15.0) — does that exist as MCP only, or also CLI?
4. Should there be a positive-mode that validates a list of topic_keys explicitly (vs blob extraction)?
5. How do we handle topic_keys that legitimately have no observations yet (planned future topics)? An allowlist? A `topic_key:` frontmatter convention?

## Upstream presentation criteria (when mature)

This issue is ready to present to `Gentleman-Programming/engram` when:

- Open questions above are resolved.
- A working prototype exists in a `proposal/dangling-topic-key-detection` branch in this fork.
- At least one integration test passes.
- Concrete CLI ergonomics validated (subcommand structure, flag names).
- Acceptance criteria refined from initial draft to final.

Until then, iteration happens here.

## Related

- `engram` v1.15.0 release notes: introduces `engram doctor` and `engram conflicts` (this proposal extends `doctor`).
- `Gentleman-Programming/engram#233` (Basparin's open issue): semantic search layer — orthogonal but complementary (semantic search would help "did you mean" suggestions; doctor topic-keys is exact-match diagnostic).
- Future Basparin/engram issues likely: `mem_list_topic_keys` browse tool, pre-write topic_key fuzzy similarity check.

## Engram trace

- `basparin/fork-as-staging-protocol` (2026-05-01) — protocol decision
- This issue is the first artifact under that protocol.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(doctor): dangling topic_key reference detection in engram doctor #1

Status: PROPOSAL INCUBATOR — internal Basparin fork

Context

Empirical evidence

Problem

Proposed approach (initial — open for iteration)

Acceptance criteria (initial — refine before upstream)

Open questions (iterate via comments)

Upstream presentation criteria (when mature)

Related

Engram trace

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(doctor): dangling topic_key reference detection in engram doctor #1

Description

Status: PROPOSAL INCUBATOR — internal Basparin fork

Context

Empirical evidence

Problem

Proposed approach (initial — open for iteration)

Acceptance criteria (initial — refine before upstream)

Open questions (iterate via comments)

Upstream presentation criteria (when mature)

Related

Engram trace

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions