Skip to content

v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up)#1592

Open
garrytan wants to merge 13 commits into
mainfrom
fix/audit-critical-high-bugs
Open

v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up)#1592
garrytan wants to merge 13 commits into
mainfrom
fix/audit-critical-high-bugs

Conversation

@garrytan
Copy link
Copy Markdown
Owner

@garrytan garrytan commented May 18, 2026

Summary

Ships the external audit wave originally proposed by @RagavRida in #1169, rebased on current main (v1.40.0.0 → v1.41.1.0), with regression tests added for every bug and one race in the contributor's cleanup path repaired before merge.

Credit: Audit + initial fixes by @RagavRida. We dropped one commit whose bug was already fixed independently since v1.6.4.0, added regression test coverage for every remaining bug, and fixed one race in the original cleanup path that the new tests caught.

Bugs landed (7)

File Severity Bug Status
scripts/build-app.sh (sed) HIGH Chromium rebrand interpolated $APP_NAME straight into sed; /, &, \ would break or inject Fixed + tested
scripts/build-app.sh (DMG_TMP) HIGH mktemp -d unchecked; failure → cp -a "$APP_DIR" "/" Fixed + tested
bin/gstack-telemetry-sync HIGH Predictable $$ fallback on mktemp failure (symlink/race) Fixed + tested
supabase/verify-rls.sh HIGH Same predictable-path fallback; RLS leak responses in the body Fixed + tested
browse/src/security-classifier.ts HIGH downloadFile leaked FD + half-written tmp on error paths Fixed + tested + cleanup race repaired
browse/src/meta-commands.ts HIGH parsePdfFromFile ran JSON.parse on user-supplied content with no guard Fixed + tested
bin/gstack-global-discover.ts HIGH 8KB read cap silently dropped Claude Code projects with long headers Fixed + tested

Dropped from the original PR: #1's CRITICAL disconnect ReferenceError on domains was independently fixed since v1.6.4.0. Commit dropped during rebase.

Test Coverage

51 new test assertions across 5 files, all gate-tier (deterministic, sub-second, no LLM, no network):

BUG → TEST FILE                                                       ASSERTIONS
[2,3] test/regression-pr1169-build-app-sed.test.ts                    7
[4,5] test/regression-pr1169-mktemp-fallbacks.test.ts                 7
[6]   browse/test/security-classifier-download-cleanup.test.ts        4 (multi-path)
[7]   browse/test/regression-pr1169-pdf-from-file-invalid-json.test.ts 8
[8]   test/global-discover.test.ts (new describe block)                6

Tests: 51 pass, 0 fail. Full `bun test` exits 0.

Bug #1 has no regression test — the static-grep approach would be tautological per codex review, and a behavioral CLI test is out of PR scope.

Pre-Landing Review

Eng review (CLEAR), codex consult (16 findings: 13 accepted into the test plan, 1 rejected with code citation — extractCwdFromJsonl and scanCodex are different functions, not the 128KB conflict codex claimed; 2 deferred as out-of-scope). Plan file: ~/.claude/plans/system-instruction-you-are-working-elegant-ocean.md.

Fix-the-fix

The contributor's downloadFile cleanup called writer.destroy() then fs.unlinkSync(tmp), but Node's createWriteStream lazy-opens the FD and flushes buffered writes during destroy. The naive sequence hit ENOENT on the unlink, then the writer's async destroy re-created the file on the next tick — leaving the half-written tmp behind exactly as the original fix tried to prevent. The new sequence awaits the writer's 'close' event before unlinking. Caught by browse/test/security-classifier-download-cleanup.test.ts.

Helper exports for tests

Three internal helpers were exported with // exported for tests intent so unit tests can import them: downloadFile (browse/src/security-classifier.ts:138), parsePdfFromFile (browse/src/meta-commands.ts:139), extractCwdFromJsonl (bin/gstack-global-discover.ts:276). Matches the existing normalizeRemoteUrl export pattern in the same global-discover file.

NOT in scope

  • Symlink/race-resistance tests for the mktemp paths. The fixes use atomic mktemp; a concurrent-attacker test would need separate setup.
  • Sed-escape for literal newlines in $APP_NAME. The runtime test covers &, /, \; newlines via env var are out of scope.
  • Re-running the original external audit. Track separately.

Test plan

  • bun test test/regression-pr1169-*.test.ts test/global-discover.test.ts browse/test/regression-pr1169-pdf-from-file-invalid-json.test.ts browse/test/security-classifier-download-cleanup.test.ts — 51 pass / 0 fail
  • Full bun test suite — exit 0
  • bun run test:evals gate tier on CI (no prompt-related files changed; expected untouched)

Credit

Original audit + fixes by @RagavRida in #1169. Re-opened from base repo so eval/E2E CI gets secrets (fork PRs don't inherit them); same commits + new tests + one cleanup race repair on top.

🤖 Generated with Claude Code

RagavRida and others added 12 commits May 17, 2026 19:52
build-app.sh injects \$APP_NAME directly into the replacement half of
sed's s/// when patching Chromium's localized InfoPlist.strings. If
\$APP_NAME ever carries '/', '&', or '\\' — the command either breaks
or starts interpreting input as sed syntax. The trailing '|| true'
would then silently hide the failure and ship a DMG that still says
'Google Chrome for Testing' in the menu bar.

Escape replacement metachars before substitution. No change for the
default name 'GStack Browser'.
The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check.
If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP
is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' —
expands to 'cp -a "<app>" "/"', which copies the bundle into the root of
the filesystem.

Refuse to continue unless mktemp produced a real directory. Defensive
second check catches the (rare) case where mktemp succeeds but returns
something that isn't a directory we can cp into.
gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on
failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable
and reusable, so on shared hosts another user can pre-create or symlink
the path and either steal the response body or clobber an unrelated
file when curl writes through it.

Drop the fallback. If mktemp cannot produce a unique file we just skip
this sync cycle — the events stay on disk and the next run picks them
up. Also install an EXIT trap so the response file is cleaned up on
unexpected exit, not just on the happy path.
Same shape as gstack-telemetry-sync: on mktemp failure the script fell
back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the
PID and a per-check counter. On a shared box another user can pre-create
or symlink the path and either capture the HTTP response body (which may
leak what the RLS tests revealed) or corrupt an unrelated file that curl
writes through.

Make mktemp strict. On failure return from the check function; the caller
tallies a FAIL and the run moves on.
downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives
it from a fetch body reader, but if reader.read() or writer.write()
throws mid-download the writer is never closed. That leaks an FD per
failed attempt and leaves the half-written tmp on disk. A later retry
can land in renameSync(tmp, dest) with a truncated TestSavantAI /
DeBERTa ONNX file — which then loads but produces garbage classifier
verdicts until the user manually nukes the models cache.

Wrap the download loop in try/catch. On failure, destroy() the writer
and unlink the tmp before rethrowing, so the next attempt starts from a
clean slate.
parsePdfFromFile() runs JSON.parse on user-supplied file contents with
no try/catch. A malformed payload surfaces as an uncaught SyntaxError
from the 'pdf' command handler and the user sees an opaque stack trace
instead of "this file isn't valid JSON". Worse, the same call path is
used by make-pdf when header/footer HTML would overflow Windows'
CreateProcess argv cap, so a corrupt payload file there can take down
the make-pdf run.

Wrap JSON.parse. Re-throw with a message that names the offending file
and echoes the parser's own explanation. Also reject top-level non-
objects (null, array, primitive) since the rest of the function treats
json as an object — catching that here produces a clear error instead
of a TypeError further down.
extractCwdFromJsonl() reads the first 8KB of each JSONL session file and
runs JSON.parse on every newline-split line. When a session record
happens to straddle the 8KB cap, the last line ends in a truncated JSON
fragment, JSON.parse throws, the catch block 'continue's silently, and
if that was the only line carrying 'cwd' the whole project gets dropped
from the discovery output without a warning.

Two independent hardening steps:
  1. Raise the read cap to 64KB. Session headers observed in Claude
     Code / Codex / Gemini transcripts fit comfortably; this just moves
     the cliff out of the normal range.
  2. Drop the final segment after splitting on '\\n'. If the read hit
     the cap mid-line, that segment is guaranteed incomplete; if the
     file ended inside the buffer, the split produces an empty final
     segment and dropping it is a no-op.

Together these make the parser robust regardless of how verbose the
leading records are.
These three internal helpers are now imported by regression tests
landing in the next commits (PR #1169 follow-up). Pattern matches the
existing normalizeRemoteUrl export in gstack-global-discover.ts which
test/global-discover.test.ts already imports side-effect-free.

No change to runtime behavior; gstack has no public package entrypoint
that would re-export these, so the in-repo surface is unchanged for
callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…error

The earlier downloadFile() error-path cleanup hit a race: Node's
createWriteStream lazily opens the FD and flushes buffered writes during
destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()`
hits ENOENT (file not yet on disk), then the writer's destroy finishes on the
next tick and creates the file fresh — leaving the half-written tmp behind
exactly as the original fix tried to prevent.

The new sequence awaits the writer's 'close' event before unlinking, so the FD
is fully torn down and no subsequent flush can re-create the path.

Caught by browse/test/security-classifier-download-cleanup.test.ts in the
next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mFile guard

Covers PR #1169 bugs #6 and #7:

- security-classifier-download-cleanup.test.ts pins downloadFile error-path
  cleanup against three failure shapes: reader rejects mid-stream, non-2xx
  response, missing body. Asserts the dest file is not created and no
  <dest>.tmp.* siblings remain (glob-matched, not exact path — codex push:
  if the fix later switches to mkdtempSync, the assertion still holds).
  Includes a happy-path case so the cleanup isn't fighting a correct download.

- regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile
  to throw a helpful error for: invalid JSON, empty file, top-level array,
  top-level number, top-level string, top-level null, top-level boolean.
  Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof
  guard must be tested separately from the JSON.parse try/catch.

Both files use mkdtempSync(process.cwd()/...) for fixture isolation since
SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1169 bug #8: the 8KB read cap landed mid-line on Claude Code session
headers, JSON.parse threw on the truncated tail, the catch silently
continued, and the project disappeared from /gstack discovery output.

Six new cases under describe("extractCwdFromJsonl 64KB cap"):

- happy path: small JSONL with obj.cwd returns it
- 12KB first line with obj.cwd: returns cwd (the bug case)
- 80KB single line overflowing 64KB: returns null without crashing
- complete line followed by partial second line: trailing-partial-drop
  must not poison the result; returns first line's cwd
- missing file: returns null (file read error swallowed)
- malformed first line + valid second line within cap: skips bad,
  returns second's cwd

Tests use the exported extractCwdFromJsonl (added in earlier export
commit) and live in a separate describe block from the existing
"4KB / 128KB buffer" tests, which exercise the unrelated scanCodex
meta.payload.cwd path at L338 — different function, different bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new test files pinning the four shell-script invariants from the
external audit:

regression-pr1169-build-app-sed.test.ts — bugs #2 + #3
- Runtime isolation: extracts the sed-escape sequence from build-app.sh
  and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App",
  "A/B\C&D"). Asserts the literal hostile name round-trips through a real
  `sed s///` invocation, locking the metachar safety end-to-end.
- Static check: the rebrand block must contain both the escape line AND
  the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME
  interpolation directly into the s/// replacement is rejected.
- Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }`
  failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation
  AND the cp -a appears AFTER both guards.
- Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that
  exits 1, asserts the script exits non-zero before any cp block can reach.

regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5
- Per codex pushback, the invariant is "no `mktemp ... || echo <path>`
  fallback shape" — not just "no $$ token." That's a stronger invariant
  that catches future swaps to $RANDOM or hardcoded paths.
- For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh:
  - no echo-based fallback after mktemp
  - no $$ inside any /tmp path literal
  - mktemp failure path explicitly exits / returns non-zero
  - telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup
    so success paths don't leak the tmp on normal exit.

All seven new test files are gate-tier (deterministic, sub-second, no LLM,
no network). Runtime shell tests use fake-bin PATH stubs in temp dirs;
no $HOME mutation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 18, 2026

E2E Evals: ✅ PASS

6/6 tests passed | $.97 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 2/2 $0.14
e2e-deploy 2/2 $0.3
e2e-qa-workflow 1/1 $0.51
llm-judge 1/1 $0.02

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@garrytan garrytan changed the title fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) May 18, 2026
garrytan added a commit that referenced this pull request May 19, 2026
…ith #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants