Skip to content

fix(sync-gbrain): fall through to filesystem walk when source has 0 code pages#1575

Open
kkroo wants to merge 1 commit into
garrytan:mainfrom
kkroo:omar/sync-gbrain-fall-through-when-no-code-pages
Open

fix(sync-gbrain): fall through to filesystem walk when source has 0 code pages#1575
kkroo wants to merge 1 commit into
garrytan:mainfrom
kkroo:omar/sync-gbrain-fall-through-when-no-code-pages

Conversation

@kkroo
Copy link
Copy Markdown

@kkroo kkroo commented May 18, 2026

Summary

/sync-gbrain --full is currently a silent no-op for sources that have never been initially synced. The orchestrator at bin/gstack-gbrain-sync.ts:404-407 routes --full to gbrain reindex-code, but reindex-code only re-chunks existing `type='code'` pages — it does NOT walk the filesystem.

For a fresh source (zero code pages), reindex-code exits 0 with `codePages: 0` and the orchestrator reports success despite indexing nothing.

Repro

gbrain sources add my-repo --path ~/my-repo --federated
cd ~/my-repo
/sync-gbrain --full
# ✅ Reports "OK" in ~10s
# 🚨 But page_count = 0

Side-by-side dry-run on the same source ID proves the discrepancy:

gbrain sync --strategy code --source <id> --dry-run  → 14,799 files
gbrain reindex-code --source <id> --dry-run          → 0 code page(s)

This bug bit hard on a real fresh install of a 14,799-file Go monorepo — appeared to complete in ~10 seconds with no errors, but nothing got indexed. Confusing because everything else looked successful.

Fix

Probe the source's existing code-page count via `reindex-code --dry-run --json` before deciding the subcommand. If zero pages (or probe fails for any reason), fall through to `sync --strategy code` which does the filesystem walk via `performFullSync`.

Preserves the v0.21.0 reindex-code backfill semantics for sources that already have code pages.

Diff shape

  • 26 insertions, 2 deletions, single file (`bin/gstack-gbrain-sync.ts`)
  • New behavior is gated behind `args.mode === "full"` (incremental path unchanged)
  • Probe failure / unparseable JSON falls through to sync (safe default)

Test plan

  • Repro confirmed against gbrain 0.33.1.1 with a fresh source
  • Fix verified on three Go repos: NOP (198 pages), magma (3,544 pages), trafficcontrol-mcst-billing (14,799 files, in flight)
  • reindex-code path still works on sources with existing code pages (probe returns codePages > 0)

🤖 Generated with Claude Code

…ode pages

`/sync-gbrain --full` was a silent no-op on sources that had never been
initially synced. The orchestrator's --full path routed to
`gbrain reindex-code`, which only re-chunks existing `type='code'`
pages — it does NOT walk the filesystem (see reindex-code.ts docstring:
"Explicit backfill for v0.19.0 → v0.21.0 brains... walk every page
where type='code'"). For a source with zero code pages, reindex-code
exits 0 with `codePages: 0` and the orchestrator reports success
despite indexing nothing.

Side-by-side proof on the same source ID:

    gbrain sync --strategy code --source <id> --dry-run  -> 14,799 files
    gbrain reindex-code --source <id> --dry-run          -> 0 code page(s)

Fix: probe the source's existing code page count via
`reindex-code --dry-run --json` before deciding the subcommand. If
zero pages (or probe fails), fall through to `sync --strategy code`
which does the filesystem walk via `performFullSync`. Preserves the
v0.21.0 backfill semantics for sources that already have code pages.

Repro:
1. Fresh source: `gbrain sources add my-repo --path ~/my-repo --federated`
2. `/sync-gbrain --full` (or invoke orchestrator directly)
3. Before this fix: completes in ~10s, page_count=0
4. After this fix: walks the filesystem as expected

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbetala7
Copy link
Copy Markdown
Contributor

Blocking correctness issue: this adds a direct spawnSync("gbrain", ...) call in bin/gstack-gbrain-sync.ts.

Current origin/main has lib/gbrain-exec.ts plus test/gbrain-exec-invariant.test.ts specifically to prevent direct gbrain spawns in this file, because direct spawns bypass buildGbrainEnv() and can read a project .env.local DATABASE_URL instead of gbrain's own config. This new probe should use spawnGbrain(["reindex-code", "--source", sourceId, "--dry-run", "--json"], { baseEnv: gbrainEnv, timeout: 30_000 }) or execGbrainJson(...) so it preserves the DATABASE_URL seeding invariant and passes the static invariant test.

@cfeddersen
Copy link
Copy Markdown

Hit this same bug today on a fresh source after a PGLite re-init — --full reported OK code synced (page_count=0) in seconds, but nothing was indexed and gbrain search came back empty. Took me a minute to realize the orchestrator was successful and the source was empty at the same time.

One alternative worth weighing: instead of probing reindex-code --dry-run --json and branching on page count, just always chain sync --strategy codereindex-code in --full mode, unconditionally.

Trade-off:

  • Your approach preserves the pure-reindex semantic for non-empty sources and only falls through when needed.
  • Always-chain is simpler logic (no probe call, no branch, no dry-run round-trip), but runs sync even on already-populated sources. In practice that's idempotent + mtime-fast: ~1s on a populated source vs the ~10s probe-then-branch in your repro.

PoC at cfeddersen/gstack@fix/full-mode-no-op-on-fresh-source if useful for comparison — it extracts buildCodeSyncCommands as a small exported helper for unit testing the ordering, then loops in runCodeImport with the same per-command failure handling.

Either approach fixes the bug. Genuine taste call on whether --full should mean "always rebuild the lot" or "smart about what to re-do." Posting in case the simpler-logic version is more in line with how /sync-gbrain is meant to be reasoned about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants