Skip to content

fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL#1583

Open
jetsetterfl wants to merge 1 commit into
garrytan:mainfrom
jetsetterfl:fix/gbrain-classifier-env-poisoning
Open

fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL#1583
jetsetterfl wants to merge 1 commit into
garrytan:mainfrom
jetsetterfl:fix/gbrain-classifier-env-poisoning

Conversation

@jetsetterfl
Copy link
Copy Markdown

@jetsetterfl jetsetterfl commented May 18, 2026

Problem

/sync-gbrain hard-stops at the Step 1.5 pre-flight with
gbrain_local_status: "broken-db" for any user whose gbrain engine is
healthy but who runs the skill from inside a repo that defines its own
DATABASE_URL in .env (common for web-app repos).

Root cause: gbrain is a Bun binary, and Bun autoloads .env from the
current directory. When lib/gbrain-local-status.ts:freshClassify probes
with gbrain sources list --json, it passes env: env ?? process.env.
Inside such a repo, process.env.DATABASE_URL is the project's app DB, not
gbrain's own DB from ~/.gbrain/config.json. gbrain connects to the wrong
database, sources list fails with "Cannot connect to database", and the
classifier reports broken-db.

It is a pure false negative: the configured gbrain engine is healthy
(gbrain doctor passes, direct DB connection works), but the user is told
to debug their database when nothing is wrong with it.

Second-order amplifier: the 60s status cache key
({home, path_hash, gbrain_bin, version, config_mtime}) does not include
cwd or the effective DATABASE_URL, so one poisoned probe propagates the
false broken-db to clean directories for up to a minute.

Fix

Route the probe env through buildGbrainEnv from lib/gbrain-exec.ts
the exact helper the sync orchestrator (gstack-gbrain-sync.ts) already
uses to seed DATABASE_URL from ~/.gbrain/config.json. The classifier
inconsistently skipped it. This also makes the probe cwd-independent, so
the cache can no longer propagate a poisoned result. The existing
GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch is unaffected.

Verification

  • Before: running the probe from inside a repo whose .env sets a
    different DATABASE_URLbroken-db while the gbrain engine is healthy.
  • After: same directory → ok, both cached and with the cache bypassed.
  • A directory with no .env is unchanged: ok.

Notes

  • Companion PR: fix(gbrain-sync): --full produces an empty code index on first run of a new repo #1584 (a separate --full first-run empty-index bug
    found during the same investigation).
  • Deeper root cause worth a follow-up in garrytan/gbrain: because gbrain
    is a Bun binary, any gbrain invocation from inside an app repo inherits
    that repo's .env DATABASE_URL. Every downstream tool has to
    defensively re-seed it. A first-class fix (prefer ~/.gbrain/config.json
    over an inherited DATABASE_URL, or a GBRAIN_DATABASE_URL that always
    wins) would remove the need for these guards. Filed as a suggestion, not
    blocking this PR.

Optional follow-up (not in this PR)

Add the effective DATABASE_URL to the status-cache key as
defense-in-depth, so any future env leak can't be cached across
directories. Kept out to keep this PR focused.

…fier isn't poisoned by a project .env

freshClassify probed `gbrain sources list` with raw process.env. Because
gbrain is a Bun binary, Bun autoloads a project's .env from cwd, so running
the preflight inside any repo that defines its own DATABASE_URL (common for
web-app repos) made the probe connect to the wrong database. `sources list`
then failed and the classifier reported `broken-db`, hard-blocking
/sync-gbrain even though the configured gbrain engine was healthy.

Route the probe env through buildGbrainEnv (the same helper the sync
orchestrator already uses) so DATABASE_URL is always seeded from
~/.gbrain/config.json. This also makes the probe result cwd-independent, so
the 60s status cache can no longer propagate a poisoned negative to clean
directories. The existing GSTACK_RESPECT_ENV_DATABASE_URL=1 escape hatch
still works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jbetala7
Copy link
Copy Markdown
Contributor

Could you add a focused regression for the poisoned .env case? The production fix now routes freshClassify() through buildGbrainEnv, but there is no test proving a project DATABASE_URL is overridden by ~/.gbrain/config.json for the gbrain sources list --json probe. A fake gbrain shim plus temp HOME/GBRAIN_HOME, mirroring the buildGbrainEnv tests, would pin the false broken-db regression and also prove GSTACK_RESPECT_ENV_DATABASE_URL=1 still opts out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants