v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) by garrytan · Pull Request #1592 · garrytan/gstack

garrytan · 2026-05-18T23:27:30Z

Summary

Ships the external audit wave originally proposed by @RagavRida in #1169, rebased on current main (v1.40.0.0 → v1.41.1.0), with regression tests added for every bug and one race in the contributor's cleanup path repaired before merge.

Credit: Audit + initial fixes by @RagavRida. We dropped one commit whose bug was already fixed independently since v1.6.4.0, added regression test coverage for every remaining bug, and fixed one race in the original cleanup path that the new tests caught.

Bugs landed (7)

File	Severity	Bug	Status
`scripts/build-app.sh` (sed)	HIGH	Chromium rebrand interpolated `$APP_NAME` straight into sed; `/`, `&`, `\` would break or inject	Fixed + tested
`scripts/build-app.sh` (DMG_TMP)	HIGH	`mktemp -d` unchecked; failure → `cp -a "$APP_DIR" "/"`	Fixed + tested
`bin/gstack-telemetry-sync`	HIGH	Predictable `$$` fallback on mktemp failure (symlink/race)	Fixed + tested
`supabase/verify-rls.sh`	HIGH	Same predictable-path fallback; RLS leak responses in the body	Fixed + tested
`browse/src/security-classifier.ts`	HIGH	`downloadFile` leaked FD + half-written tmp on error paths	Fixed + tested + cleanup race repaired
`browse/src/meta-commands.ts`	HIGH	`parsePdfFromFile` ran `JSON.parse` on user-supplied content with no guard	Fixed + tested
`bin/gstack-global-discover.ts`	HIGH	8KB read cap silently dropped Claude Code projects with long headers	Fixed + tested

Dropped from the original PR: #1's CRITICAL disconnect ReferenceError on domains was independently fixed since v1.6.4.0. Commit dropped during rebase.

Test Coverage

51 new test assertions across 5 files, all gate-tier (deterministic, sub-second, no LLM, no network):

BUG → TEST FILE                                                       ASSERTIONS
[2,3] test/regression-pr1169-build-app-sed.test.ts                    7
[4,5] test/regression-pr1169-mktemp-fallbacks.test.ts                 7
[6]   browse/test/security-classifier-download-cleanup.test.ts        4 (multi-path)
[7]   browse/test/regression-pr1169-pdf-from-file-invalid-json.test.ts 8
[8]   test/global-discover.test.ts (new describe block)                6

Tests: 51 pass, 0 fail. Full `bun test` exits 0.

Bug #1 has no regression test — the static-grep approach would be tautological per codex review, and a behavioral CLI test is out of PR scope.

Pre-Landing Review

Eng review (CLEAR), codex consult (16 findings: 13 accepted into the test plan, 1 rejected with code citation — extractCwdFromJsonl and scanCodex are different functions, not the 128KB conflict codex claimed; 2 deferred as out-of-scope). Plan file: ~/.claude/plans/system-instruction-you-are-working-elegant-ocean.md.

Fix-the-fix

The contributor's downloadFile cleanup called writer.destroy() then fs.unlinkSync(tmp), but Node's createWriteStream lazy-opens the FD and flushes buffered writes during destroy. The naive sequence hit ENOENT on the unlink, then the writer's async destroy re-created the file on the next tick — leaving the half-written tmp behind exactly as the original fix tried to prevent. The new sequence awaits the writer's 'close' event before unlinking. Caught by browse/test/security-classifier-download-cleanup.test.ts.

Helper exports for tests

Three internal helpers were exported with // exported for tests intent so unit tests can import them: downloadFile (browse/src/security-classifier.ts:138), parsePdfFromFile (browse/src/meta-commands.ts:139), extractCwdFromJsonl (bin/gstack-global-discover.ts:276). Matches the existing normalizeRemoteUrl export pattern in the same global-discover file.

NOT in scope

Symlink/race-resistance tests for the mktemp paths. The fixes use atomic mktemp; a concurrent-attacker test would need separate setup.
Sed-escape for literal newlines in $APP_NAME. The runtime test covers &, /, \; newlines via env var are out of scope.
Re-running the original external audit. Track separately.

Test plan

bun test test/regression-pr1169-*.test.ts test/global-discover.test.ts browse/test/regression-pr1169-pdf-from-file-invalid-json.test.ts browse/test/security-classifier-download-cleanup.test.ts — 51 pass / 0 fail
Full bun test suite — exit 0
bun run test:evals gate tier on CI (no prompt-related files changed; expected untouched)

Credit

Original audit + fixes by @RagavRida in #1169. Re-opened from base repo so eval/E2E CI gets secrets (fork PRs don't inherit them); same commits + new tests + one cleanup race repair on top.

🤖 Generated with Claude Code

build-app.sh injects \$APP_NAME directly into the replacement half of sed's s/// when patching Chromium's localized InfoPlist.strings. If \$APP_NAME ever carries '/', '&', or '\\' — the command either breaks or starts interpreting input as sed syntax. The trailing '|| true' would then silently hide the failure and ship a DMG that still says 'Google Chrome for Testing' in the menu bar. Escape replacement metachars before substitution. No change for the default name 'GStack Browser'.

The DMG creation step sets DMG_TMP from 'mktemp -d' with no error check. If mktemp fails (tmpfs full, permissions, TMPDIR misconfigured), DMG_TMP is empty and the very next line — 'cp -a "\$APP_DIR" "\$DMG_TMP/"' — expands to 'cp -a "<app>" "/"', which copies the bundle into the root of the filesystem. Refuse to continue unless mktemp produced a real directory. Defensive second check catches the (rare) case where mktemp succeeds but returns something that isn't a directory we can cp into.

gstack-telemetry-sync tried 'mktemp /tmp/gstack-sync-XXXXXX' and on failure fell back to '/tmp/gstack-sync-$$'. $$ is the PID — predictable and reusable, so on shared hosts another user can pre-create or symlink the path and either steal the response body or clobber an unrelated file when curl writes through it. Drop the fallback. If mktemp cannot produce a unique file we just skip this sync cycle — the events stay on disk and the next run picks them up. Also install an EXIT trap so the response file is cleaned up on unexpected exit, not just on the happy path.

Same shape as gstack-telemetry-sync: on mktemp failure the script fell back to '/tmp/verify-rls-$$-$TOTAL', which is fully predictable from the PID and a per-check counter. On a shared box another user can pre-create or symlink the path and either capture the HTTP response body (which may leak what the RLS tests revealed) or corrupt an unrelated file that curl writes through. Make mktemp strict. On failure return from the check function; the caller tallies a FAIL and the run moves on.

downloadFile() opens an fs.WriteStream to '<dest>.tmp.<pid>' and drives it from a fetch body reader, but if reader.read() or writer.write() throws mid-download the writer is never closed. That leaks an FD per failed attempt and leaves the half-written tmp on disk. A later retry can land in renameSync(tmp, dest) with a truncated TestSavantAI / DeBERTa ONNX file — which then loads but produces garbage classifier verdicts until the user manually nukes the models cache. Wrap the download loop in try/catch. On failure, destroy() the writer and unlink the tmp before rethrowing, so the next attempt starts from a clean slate.

parsePdfFromFile() runs JSON.parse on user-supplied file contents with no try/catch. A malformed payload surfaces as an uncaught SyntaxError from the 'pdf' command handler and the user sees an opaque stack trace instead of "this file isn't valid JSON". Worse, the same call path is used by make-pdf when header/footer HTML would overflow Windows' CreateProcess argv cap, so a corrupt payload file there can take down the make-pdf run. Wrap JSON.parse. Re-throw with a message that names the offending file and echoes the parser's own explanation. Also reject top-level non- objects (null, array, primitive) since the rest of the function treats json as an object — catching that here produces a clear error instead of a TypeError further down.

extractCwdFromJsonl() reads the first 8KB of each JSONL session file and runs JSON.parse on every newline-split line. When a session record happens to straddle the 8KB cap, the last line ends in a truncated JSON fragment, JSON.parse throws, the catch block 'continue's silently, and if that was the only line carrying 'cwd' the whole project gets dropped from the discovery output without a warning. Two independent hardening steps: 1. Raise the read cap to 64KB. Session headers observed in Claude Code / Codex / Gemini transcripts fit comfortably; this just moves the cliff out of the normal range. 2. Drop the final segment after splitting on '\\n'. If the read hit the cap mid-line, that segment is guaranteed incomplete; if the file ended inside the buffer, the split produces an empty final segment and dropping it is a no-op. Together these make the parser robust regardless of how verbose the leading records are.

These three internal helpers are now imported by regression tests landing in the next commits (PR #1169 follow-up). Pattern matches the existing normalizeRemoteUrl export in gstack-global-discover.ts which test/global-discover.test.ts already imports side-effect-free. No change to runtime behavior; gstack has no public package entrypoint that would re-export these, so the in-repo surface is unchanged for callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…error The earlier downloadFile() error-path cleanup hit a race: Node's createWriteStream lazily opens the FD and flushes buffered writes during destroy(), so a naive `fs.unlinkSync(tmp)` immediately after `writer.destroy()` hits ENOENT (file not yet on disk), then the writer's destroy finishes on the next tick and creates the file fresh — leaving the half-written tmp behind exactly as the original fix tried to prevent. The new sequence awaits the writer's 'close' event before unlinking, so the FD is fully torn down and no subsequent flush can re-create the path. Caught by browse/test/security-classifier-download-cleanup.test.ts in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mFile guard Covers PR #1169 bugs #6 and #7: - security-classifier-download-cleanup.test.ts pins downloadFile error-path cleanup against three failure shapes: reader rejects mid-stream, non-2xx response, missing body. Asserts the dest file is not created and no <dest>.tmp.* siblings remain (glob-matched, not exact path — codex push: if the fix later switches to mkdtempSync, the assertion still holds). Includes a happy-path case so the cleanup isn't fighting a correct download. - regression-pr1169-pdf-from-file-invalid-json.test.ts pins parsePdfFromFile to throw a helpful error for: invalid JSON, empty file, top-level array, top-level number, top-level string, top-level null, top-level boolean. Codex push: JSON.parse accepts primitives too, so Array.isArray + typeof guard must be tested separately from the JSON.parse try/catch. Both files use mkdtempSync(process.cwd()/...) for fixture isolation since SAFE_DIRECTORIES allows TEMP_DIR or cwd; cwd is universal across CI hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #1169 bug #8: the 8KB read cap landed mid-line on Claude Code session headers, JSON.parse threw on the truncated tail, the catch silently continued, and the project disappeared from /gstack discovery output. Six new cases under describe("extractCwdFromJsonl 64KB cap"): - happy path: small JSONL with obj.cwd returns it - 12KB first line with obj.cwd: returns cwd (the bug case) - 80KB single line overflowing 64KB: returns null without crashing - complete line followed by partial second line: trailing-partial-drop must not poison the result; returns first line's cwd - missing file: returns null (file read error swallowed) - malformed first line + valid second line within cap: skips bad, returns second's cwd Tests use the exported extractCwdFromJsonl (added in earlier export commit) and live in a separate describe block from the existing "4KB / 128KB buffer" tests, which exercise the unrelated scanCodex meta.payload.cwd path at L338 — different function, different bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two new test files pinning the four shell-script invariants from the external audit: regression-pr1169-build-app-sed.test.ts — bugs #2 + #3 - Runtime isolation: extracts the sed-escape sequence from build-app.sh and runs it against hostile $APP_NAME values ("Foo/Bar&Baz", "Cool\App", "A/B\C&D"). Asserts the literal hostile name round-trips through a real `sed s///` invocation, locking the metachar safety end-to-end. - Static check: the rebrand block must contain both the escape line AND the sed line referencing $APP_NAME_SED_ESCAPED; bare $APP_NAME interpolation directly into the s/// replacement is rejected. - Static check: DMG_TMP=$(mktemp -d) is followed by an explicit `|| { ... exit }` failure handler AND a `[ -z "$DMG_TMP" ] || [ ! -d "$DMG_TMP" ]` validation AND the cp -a appears AFTER both guards. - Runtime fake-bin: extracts the guard shape, runs with a fake mktemp that exits 1, asserts the script exits non-zero before any cp block can reach. regression-pr1169-mktemp-fallbacks.test.ts — bugs #4 + #5 - Per codex pushback, the invariant is "no `mktemp ... || echo <path>` fallback shape" — not just "no $$ token." That's a stronger invariant that catches future swaps to $RANDOM or hardcoded paths. - For each of bin/gstack-telemetry-sync and supabase/verify-rls.sh: - no echo-based fallback after mktemp - no $$ inside any /tmp path literal - mktemp failure path explicitly exits / returns non-zero - telemetry-sync also pins the `trap rm -f $RESP_FILE EXIT` cleanup so success paths don't leak the tmp on normal exit. All seven new test files are gate-tier (deterministic, sub-second, no LLM, no network). Runtime shell tests use fake-bin PATH stubs in temp dirs; no $HOME mutation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-18T23:31:44Z

E2E Evals: ✅ PASS

6/6 tests passed | $.97 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	2/2	✅	$0.14
e2e-deploy	2/2	✅	$0.3
e2e-qa-workflow	1/1	✅	$0.51
llm-judge	1/1	✅	$0.02

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ith #1574 The version-gate workflow flagged a collision: PR #1574 (garrytan/colombo-v3) already claims v1.41.0.0, and #1592 (fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's workspace-aware ship rule, queue-advancing past a claimed version within the same bump level is permitted — MINOR work landing on top of a queued MINOR still reads as MINOR relative to main. Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry header bumped + dated 2026-05-19; entry body unchanged (same wave content, same credit list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RagavRida and others added 12 commits May 17, 2026 19:52

chore: bump version and changelog (v1.41.1.0)

4761d67

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

garrytan changed the title ~~fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up)~~ v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up) May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up)#1592

v1.41.1.0 fix wave: 7 HIGH bugs from external audit + regression tests (PR #1169 follow-up)#1592
garrytan wants to merge 13 commits into
mainfrom
fix/audit-critical-high-bugs

garrytan commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bugs landed (7)

Test Coverage

Pre-Landing Review

Fix-the-fix

Helper exports for tests

NOT in scope

Test plan

Credit

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Evals: ✅ PASS

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading