V0.1.3/m2/chat history capture by Vedansi18 · Pull Request #5 · hi0001234d/nexpath

Vedansi18 · 2026-05-15T07:05:43Z

Adds the chat-history capture layer of the VS Code extension —
modules M2 (watcher), M3 (per-version extractors), and M4
(schema fingerprint) per dev plan §3 M2 §2.2. The watcher monitors
Cursor's state.vscdb (and the Windsurf chat-history dir), debounces
fs events, fingerprints the schema, dispatches rows to the appropriate
extractor, and emits normalised ChatHistoryEvents — without touching
any Layer C file. Also ships a dev-only dump-cursor-state.ts helper
(Option C) for capturing verified fixtures from real Cursor installs.

Stacked on M2 Branch 1 (v0.1.3/m2/extension-skeleton,
commit 879ed5e). Branch 2 needs Branch 1's sub-package skeleton, so
this PR's diff is meaningful only against B1, not against sub-7. See
"PR strategy" below.

Modules covered (per dev plan §3 M2 §2.2)

Module	File(s)	What it does
M2 — Watcher	`src/chat-history-watcher.ts` + `chat-history-types.ts`	`fs.watch` on each target with 250 ms debounce; reads `ItemTable` via injectable `readItemTableFn` (sql.js by default);
dedupes against `seenSignatures`; emits `{ prompt, rawSessionId, capturedAt, sourcePath, extractorId }`. All deps (`watchFn`, `readFileFn`, `readItemTableFn`, `nowFn`) injectable for testing.
M3 — Extractors	`src/extractors/cursor-v2024-q4.ts`, `cursor-v2025-q1.ts`, `cursor-v2025-q2.ts`, `windsurf.ts`	Per-version row decoders implementing the `ChatHistoryExtractor` contract. Each
Cursor extractor handles `role` / `type` and `content` / `text` field variants. Windsurf is a deliberate placeholder — real JSON-file decoding lands in B4 alongside `windsurfAdapter`.
M4 — Fingerprint	`src/extractors/index.ts`	`pickExtractor(observedKeys)` — prefix-matches each extractor's `fingerprintKeys` against the observed `ItemTable` keys, picks the
highest-match-count extractor, ties broken by registry order (newest first). Returns `FingerprintResult` (kind: `'known'` or `'unknown'`). The `'unknown'` payload is surfaced by the watcher via
`onSchemaUnknown` for the eventual "schema unknown" toast (wiring in B3/B4).
Bonus — Dump helper	`scripts/dump-cursor-state.ts` + `src/cursor-state-dump-helpers.ts`	Dev-only CLI for capturing verified state.vscdb fixtures. Discovers global + per-workspace DBs, dumps
`ItemTable` (filtered) + `cursorDiskKV`, optional `--redact` for sensitive content. Helpers live in `src/` so they're typechecked + unit-tested.

Real-machine inspection findings (Cursor 3.4.20 / Linux)

This branch's commit 3794bc3 includes real-data refinements after
running the dump script against a live Cursor 3.4.20 install:

Finding	Impact
WAL mode — main `.vscdb` is 4 KB; live writes go to `.vscdb-wal`. sql.js cannot read WAL siblings; better-sqlite3 can.	Dump script switched to better-sqlite3 (dev-dep). **Production watcher
still uses sql.js** — flagged for B4 to choose between (a) switching to better-sqlite3 in the .vsix, or (b) implementing a copy + checkpoint shim.
Workspace DB holds the data, not the global one. Plan-era assumption was global-only.	Dump script now scans `User/globalStorage/state.vscdb` AND every `User/workspaceStorage/*/state.vscdb`.
Watcher target list will need both paths (B4 wiring).
`cursor-v2025-q1` key was wrong — community docs said `composerData.composerData`; real key is `composer.composerData`.	Fixed in `cursor-v2025-q1.ts` + tests. Value shape (`allComposers /
conversation` array) is not what's at this key on 3.4.20 — JSDoc flagged.
`cursor-v2025-q2` prefix `cursorAIChatService.chatHistory.` NOT observed on Cursor 3.4.20.	Extractor still ships (in case older versions use it); JSDoc strengthened.
Three redacted fixtures committed for regression testing.	`test-fixtures/state-vscdb-samples/cursor-3-4-20-initial-{global,workspace-*}.json`.

Files

Path	Type	Tests
`src/ext-vscode/src/chat-history-types.ts`	shared types	exempt
`src/ext-vscode/src/chat-history-watcher.ts`	M2 watcher	11
`src/ext-vscode/src/chat-history-watcher.test.ts`	tests	—
`src/ext-vscode/src/extractors/cursor-v2024-q4.ts`	M3 (pre-Composer)	9
`src/ext-vscode/src/extractors/cursor-v2025-q1.ts`	M3 (Composer era)	10
`src/ext-vscode/src/extractors/cursor-v2025-q2.ts`	M3 (current schema, unverified)	11
`src/ext-vscode/src/extractors/windsurf.ts`	M3 placeholder	4
`src/ext-vscode/src/extractors/index.ts`	M4 pickExtractor + registry	12
`src/ext-vscode/src/cursor-state-dump-helpers.ts`	dump-script helpers	28
`src/ext-vscode/scripts/dump-cursor-state.ts`	dev-only CLI entry	(helpers tested)
`src/ext-vscode/test-fixtures/state-vscdb-samples/*.json`	3 redacted fixtures	—
`src/ext-vscode/package.json`	+`sql.js` runtime dep, +`better-sqlite3` + `tsx` devDeps	—
`src/ext-vscode/esbuild.config.mjs`	+`sql.js` marked external	—

85 new unit tests added by this branch (sub-package total: 110 across 10 files).