Skip to content

fix(gbrain-sync): --full produces an empty code index on first run of a new repo#1584

Open
jetsetterfl wants to merge 1 commit into
garrytan:mainfrom
jetsetterfl:fix/gbrain-sync-full-empty-index
Open

fix(gbrain-sync): --full produces an empty code index on first run of a new repo#1584
jetsetterfl wants to merge 1 commit into
garrytan:mainfrom
jetsetterfl:fix/gbrain-sync-full-empty-index

Conversation

@jetsetterfl
Copy link
Copy Markdown

@jetsetterfl jetsetterfl commented May 18, 2026

Problem

The first /sync-gbrain --full on a new repo produces a code index with
0 pages while the stage reports OK. Semantic code search
(gbrain code-def / code-refs / search) then silently returns nothing
for that repo.

gstack-gbrain-sync.ts:runCodeImport selects the code-stage command by
mode:

const syncArgs = args.mode === "full"
  ? ["reindex-code", "--source", sourceId, "--yes"]      // re-embeds EXISTING pages
  : ["sync", "--strategy", "code", "--source", sourceId]; // the walk that CREATES pages

gbrain reindex-code only re-embeds pages that already exist; it never
walks the filesystem. On a source registered moments earlier (0 pages),
the --full branch runs reindex-code, gbrain prints "No code pages to
reindex", finishes in about a second, and the index stays empty. The
page-creating walk (sync --strategy code) only runs on the incremental
path.

This also contradicts the skill's own documented contract: the --full
help text says "First-run; full walk + reindex" — but the code does
reindex only, no walk.

Fix

--full runs the file-walk sync first (creating/refreshing pages), then
reindex-code for the full re-embed. Incremental keeps the walk only.
This matches the documented "full walk + reindex" contract and is correct
for both freshly-registered and already-populated sources.

Verification

Reproduced and fixed end-to-end: with a freshly-registered source,
--full on the unpatched code finished in about a second with 0 pages;
with the fix it runs the real walk and the source fully populates
(hundreds of pages, multi-minute walk as expected).

Notes

  • Companion PR: fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL #1583 (a classifier false-broken-db bug found during the
    same investigation; that one gates this skill from even reaching the
    orchestrator inside web-app repos).
  • Secondary observation: the code stage reported OK with page_count=0.
    Worth considering a WARN/fail in the verdict block when a code sync
    completes with 0 pages on a non-empty repo, so this class of silent
    emptiness can't recur unnoticed. Not included here to keep the PR
    focused.

… on a fresh source

runCodeImport selected the code-stage command by mode: --full ran only
`gbrain reindex-code`, incremental ran `gbrain sync --strategy code`.
reindex-code re-embeds pages that already exist and never walks the
filesystem, so the first --full on a freshly-registered source found no
pages ("No code pages to reindex"), finished in ~1s, and left the code
index permanently empty while the stage still reported OK. Semantic code
search then silently returned nothing for that repo.

Always run the page-creating `sync --strategy code` walk first, then run
reindex-code when mode is --full. This honors the documented "full walk +
reindex" contract for both freshly-registered and already-populated
sources. Verified end-to-end: a fresh source that previously stayed at 0
pages now fully populates under --full.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbetala7
Copy link
Copy Markdown
Contributor

The behavior change looks right, but this needs a regression test before it is safe to keep. The previous bug was just the --full branch selecting reindex-code without first running sync --strategy code, so a fake gbrain in test/gstack-gbrain-sync.test.ts can assert the call order for --full is sync --strategy code ... then reindex-code ..., while incremental mode still calls only sync --strategy code. That would catch the empty-first-index regression without needing a real multi-minute gbrain walk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants