Enhance/package metadata hardening#21
Merged
Merged
Conversation
…data Four defense-in-depth boundaries around inputs derived from attacker-controlled package metadata. None is currently exploitable — they're forward-looking guards that make the trust boundaries explicit so future refactors can't regress silently. 1. npmLifecycleScript now single-quotes the cd target via a new shQuote() helper. The previous double-quoted form still expanded $(...), backticks, and \, so a package.json key like "foo$(id)" could break out of the cd argument and run on the sandbox shell. Single quotes are literal — only ' is special and is escaped via the standard '\'' idiom. 2. depfile.Parse rejects names outside the registry-allowed character sets at parse time (npm: lowercase + optional @scope/, PyPI: PEP 508 ASCII). Hard error so a malformed name surfaces clearly instead of producing a degenerate shell command or KOJUTO_SCAN_PKGS env var. PyPI's ';' is not exercised in tests — the parser already strips PEP 508 environment markers before validation runs. 3. sitecustomize.py's _w() strips \n/\r/\0 from every emission. The Go parser splits on \n, so an attacker-controlled co_filename or compile() filename arg containing \n could smuggle a fake KOJUTO: line onto the wire. _t() already escapes the snippet field; this covers the filename field (concatenated raw from _origin()) plus any future tag. 4. The _FRAMES=8 cap on _origin()'s stack walk is removed. A malicious package burying its own frame under >8 stdlib wrappers would have escaped the "+" user-origin marker, letting the parser drop the resulting dynamic_exec event as benign. The walk is now unbounded; the common-case fast path (early return on the first user frame) is unchanged, so practical cost is the same. Tests: - TestParse_RejectsInvalidNpmName, TestParse_AcceptsScopedNpmName - TestParse_RejectsInvalidPyPIName - TestShQuote, TestNpmLifecycleScript_QuotesShellMetachars - TestNpmLifecycleScript_ScopedPackage updated for single-quote form Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The GitHub Action wrapper was missing wiring for three CLI flags that already work on the binary itself: `--config`, `--quiet`, and `--no-color`. Workflows that wanted these had to drop down to a custom shell step. - `config` — passes through to `--config` so an alternative `kojuto.yml` path can be used (CI matrices with per-language configs, or a vendored config kept under a non-default path). - `quiet` — suppresses phase progress lines so CI logs only contain the final verdict block. Pairs well with PR-comment workflows that post the report as a comment. - `no-color` — disables ANSI escapes; mirrors the `NO_COLOR` environment variable spec, useful for log aggregators that don't strip escapes. `runtime` description is also tightened to drop the now-obsolete "empty string also resolves to auto" parenthetical (the auto path is the documented default since #14) and the YAML default is set to `auto` explicitly so the rendered docs match the CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…th count The two unreleased fixes that shipped in #18 (a5cab0e) were missing from CHANGELOG.md, and the README's credential-access table still quoted the older "~50 sensitive paths" figure. This commit catches the docs up to the merged code; no behavior change. - CHANGELOG: add Severity-tiered verdict and Caller-aware audit hook entries under [Unreleased] / Added. - README: bump the sensitive-path count from ~50 to ~60 to reflect the current entry list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l scans
The dockerExecRoot heredoc pattern used to write probe scripts and
honeypot credentials had a stray-terminator weakness:
sh -c "cat > <path> << 'KOJUTO_EOF'\n<body>KOJUTO_EOF"
If <body> contained a line consisting only of `KOJUTO_EOF`, the shell
terminated the heredoc early and parsed the rest of the body as
commands — running as root inside the sandbox via `docker exec
--user=root`. <body> includes the package name interpolated into a
JS/Python `'%s'` literal in WriteProbeScripts*; a name carrying
"\n" + "KOJUTO_EOF" + "\n" + "<cmd>" produced exactly the lines needed
to terminate and execute <cmd>.
Reachability:
- `kojuto scan <pkg>` gated by downloaderValidate (regex blocks \n)
- `kojuto scan -f <file>` gated by depfile validators (71d6d0a)
- `kojuto scan --local <path>` was NOT gated — pkg derived from
filepath.Base(localPath) carried any byte the filesystem allowed,
so a malware sample distributed with a crafted filename would have
achieved root in the sandbox during WriteProbeScripts. The most
damaging payload would `rm /usr/local/lib/python3.12/site-packages/
sitecustomize.py`, neutralising kojuto's own audit hook before the
import phase actually ran the malicious code — defeating detection
with no surface visibility.
Two-layer fix:
1. dockerWriteFile() helper replaces all 9 heredoc sites (5 honeypot
credentials + 4 probe scripts). Content is delivered via stdin to
`cat`, so the body never appears on the shell command line. Path
is single-quoted as defense-in-depth though all current callers
pass constants.
2. runLocalScan validates the derived pkg with downloaderValidate
before any sandbox interaction. Newlines and shell metacharacters
surface as a clear error rather than silently producing a
degenerate container command.
Tests:
- TestDockerWriteFile_StructureNoHeredoc pins that the new helper
emits `docker exec -i ... sh -c "cat > '<path>'"` with no `<<` or
`KOJUTO_EOF` token anywhere in the args.
- TestDockerWriteFile_QuotesPath confirms shQuote is applied to the
path slot for future callers.
- TestRunLocalScan_RejectsUnsafeFilename creates a `.tgz` whose
basename contains "\nKOJUTO_EOF\n" and asserts the scan errors
out before reaching Docker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
getHostUsername read USER/USERNAME/LOGNAME and returned the raw value; getHostHostname routed os.Hostname through sanitizeDockerArg before returning. The two are used in symmetric slots — s.mountPoint = "/home/" + hostUser + "/projects" hostname = sanitizeDockerArg(hostHostname) — and mountPoint then flows into a `-v <packageDir>:<mountPoint>:ro` docker volume spec and a `pip install --no-index --no-deps --no-build-isolation <mountPoint>/*` shell glob. A USER value containing `:` would break the volume separator, and one containing shell metacharacters would derail the glob expansion. Env vars are normally trusted, so this is not a concrete vulnerability in any threat model kojuto currently supports — but the inconsistency between the two helpers was load-bearing only because nobody set USER to anything weird. Mirroring the hostname sanitizer removes the implicit trust dependency at zero cost. Side change: sanitizeDockerArg now takes an explicit fallback string instead of hardcoding "localhost". The hostname caller passes "localhost"; the new username caller passes "user". Removes the hostname-specific bias from a function that is otherwise a generic charset filter, and lets the test parameterise both fallback paths. Tests: TestGetHostUsername_Sanitizes pins the new sanitization with the same metacharacter cases the hostname test already covered, plus non-ASCII fallback. TestSanitizeDockerArg expanded to exercise parameterised fallback and the volume-spec / glob metacharacters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`--strict` defaults to false on the CLI but true in the GitHub Action. Local interactive runs therefore silently honor `sensitive_paths.exclude` in the cwd's kojuto.yml. An untrusted repo (or extracted tarball) can plant a kojuto.yml that shrinks the detection surface without the user noticing — a "clean" verdict from such a run is not actually clean for the excluded categories. `--strict` already exists as the right answer for that threat, but the existing UX only surfaces exclusion activity when --strict is ON (printing the paths it's overriding). The lenient path was silent. Now preRunLoadConfig emits a stderr warning whenever exclusions are honored, listing the count and the actual paths and pointing at --strict. The honoring behavior is unchanged — this is purely a visibility fix so an attacker-planted exclude appears in CI logs and interactive output before the verdict is trusted. Test: TestPreRunLoadConfig_LenientWarnsOnExclude captures stderr across a temp config with two excluded paths and asserts the warning contains the count, both paths, and the --strict pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A malicious package could embed ANSI escape sequences (or any C0 control byte) into source compiled via `compile()` or `exec()`. The audit hook captured the snippet via repr() / str() and emitted it on the wire; the Go side put it in `code_snippet` and json.Marshal encoded it as `�...`. Structurally the JSON was fine, but a common downstream consumer pattern is `jq -r .events[].code_snippet`, which decodes the JSON string literally — handing a real ESC byte to the user's terminal. The attacker could then clear the screen and paint a fake "verdict: clean" line in green, hiding the real suspicious verdict. Same vector exists for `comm` (a malicious process can `prctl(PR_SET_ NAME, "evil\x1b[31m")` with control bytes) and any future attacker- reachable string field. Two-layer fix: 1. sitecustomize.py's _t() now escapes every C0 control byte and DEL (0x00-0x1f, 0x7f) to a printable `\xNN` form, with the conventional \n / \r / \t shorthand for the common cases. The wire-side _w() already strips line terminators for framing safety; this extends the same idea to the snippet field's content for raw-consumer safety. Closes the gap at the source. 2. report.WriteJSON now walks every event and runs each string field through sanitizeControl before encoding. Reflection picks up every string field of types.SyscallEvent so future fields are covered automatically. kojuto-generated strings (Syscall, Category, Phase, etc.) never contain control bytes, so the pass is a no-op for them. Tests: - TestSanitizeControl exercises the helper across plain ASCII, UTF-8 multibyte, individual control bytes, and a mixed-byte vector. - TestWriteJSON_StripsAnsiFromAttackerFields plants control bytes in CodeSnippet, Comm, Cmdline, and FilePath; round-trips through WriteJSON + json.Unmarshal; asserts no decoded field contains a raw byte below 0x20 or 0x7f, and that the on-the-wire JSON contains a printable backslash-escaped form for the ESC byte. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gosec G306 flagged three output writers using 0o644 (or os.Create's default 0o666 then umask-clipped). All three carry data that shouldn't leak to other accounts on a multi-user host: - openOutput / -o report.json: events embed attacker-supplied code snippets, dst addrs, file paths, and the full dependency tree of the scanned package — anything a malicious package touches at install/import time. Switched os.Create to os.OpenFile with an explicit 0o600 so the report is owner-only by default; users who want to share can chmod after the fact. - writePinnedPyPI / writePinnedNpm: --pin output captures the resolved manifest from the last clean scan. Internal/private registry deps and version metadata can be load-bearing for inferring the user's stack, so default 0o600 too. The other gosec G306 hits (sandbox/staging package.json files, seccomp.json, the in-flight tarball copy) all stay at world-readable intentionally — the docker daemon and the in-container `dev` user need to read them, and the tests would silently miss every install- phase event without the wider mode. Those sites are out of scope here. Tests: TestOpenOutput_File and a new TestOutputFiles_OwnerOnly syscall.Umask(0) the test process and stat the resulting files, asserting Mode().Perm() == 0o600. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broad re-audit (post-PR-#18 sweep) surfaced three concrete bypasses where a malicious package could escape kojuto's monitoring without tripping any current detector. All three are now closed; each ships with a regression test pinning the security invariant. 1. strace stderr scanner used bufio.Scanner with the default 64 KiB line cap and never checked scanner.Err(). The docker-exec stderr is a multiplex of strace trace lines AND the tracee's own writes to fd 2, so a single `process.stderr.write("A".repeat(70000))` in a preinstall script tripped ErrTooLong, the for-loop exited, and every subsequent rename / openat / connect / execve recorded by strace was silently dropped — yielding a CLEAN verdict on a package that did anything observable after the long write. Both call sites (container_strace.parseStraceOutput and fallback.StartWithPID) now `Buffer()` to a 16 MiB cap and treat any post-loop scanner.Err() as `dropped++`. The dropped counter already converts the verdict to inconclusive in cmd/root, so the bypass is closed without changing the verdict logic. 2. straceRenameRe / straceRenameatRe used `"([^"]+)"` to capture filenames, which fails on strace's C-style backslash-escaped quotes. A package that renames `/tmp/x"y` (a legal Unix filename containing `"`) over `/usr/local/bin/python3` produced an strace line `rename("/tmp/x\"y", "/usr/local/bin/python3") = 0` that the regex did not match — no event reached the analyzer, the binary hijack category never fired, and a subsequent execve of the hijacked path was filtered as benign by `isBenignExec`. The capture is now `((?:\\.|[^"\\])+)` for both `rename` and `renameat[2]` paths. 3. The eBPF vfs_rename probe writes `d_name` (a qstr basename) into evt.path / evt.path2, so eBPF mode delivers a rename event with `DstPath = "python3"` and no parent directory. `isBenignRename` computed `destDir = path.Dir("python3") + "/" = "./"` which never matched the allowed dirs (e.g. `/usr/local/bin/`) — the function fell through to `return true` and treated every rename as benign. Result: in `--probe-method=ebpf`, NO rename was ever flagged, regardless of its target. `isBenignRename` now fail-safes when DstPath has no leading "/": a basename matching a tracked binary stays suspicious. Strace mode emits absolute paths and is unaffected. Tests: - TestParseStraceOutput_LongLineFlipsToInconclusive feeds a straceMaxLine+1024-byte single line via a chunkReader and asserts cs.dropped > 0 after parseStraceOutput returns. - TestParseStraceLine_RenameWithEscapedQuotes exercises rename and renameat lines whose src/dst contain `\"`, asserting the event is produced with the expected SrcPath/DstPath. - TestAnalyze_RenameTrustedBinary gains three basename-only sub-cases covering python3/sh as suspicious and an unknown name as clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The hardening commits (71d6d0a..64788b2) collected nine linter violations the local go test pass didn't surface. Local CI parity run with golangci-lint v2.1.6 — the version pinned in .github/ workflows/ci.yml — turned them up; this commit clears them all. - gocritic regexpSimplify in depfile.go: drop the unnecessary `\/` escape inside a non-character-class regex. - gofmt in strace_parse_extra_test.go and sandbox.go: trailing blank line / shQuote comment alignment. - govet shadow in cmd/root.go and cmd/root_test.go: rename the inner err binding so it doesn't shadow an outer one. - intrange in report.go (×2) and report_test.go (×1): switch bounded for-loops to `for i := range N` (Go 1.22+). - misspell in report_test.go: serialised → serialized. Re-running `golangci-lint run` reports `0 issues.`. Cross-compile across all 5 GOOS/GOARCH matrix entries, `go test -race ./...`, `go mod verify`, and `govulncheck ./...` all pass locally too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
CI on windows-latest failed to compile cmd/root_test.go because syscall.Umask is undefined on Windows. The umask call exists in TestOpenOutput_File and TestOutputFiles_OwnerOnly so the asserted mode is the requested 0o600 rather than the user's umask-clipped result — Windows has no umask and Mode().Perm() doesn't reflect Unix-style bits there, so the assertion is meaningless on Windows even if it built. Move the perm-check tests to a new cmd/root_unix_test.go behind `//go:build !windows`. TestOpenOutput_File stays in root_test.go in its original cross-platform shape (verifies the file is created and is not os.Stdout); the mode assertion now lives in the new TestOpenOutput_FileMode in the unix file. TestOutputFiles_OwnerOnly moves wholesale. Verified locally: - GOOS=windows GOARCH=amd64 go test -c ./cmd/ now compiles - GOOS=linux GOARCH=amd64 go test -c ./cmd/ still compiles - go test -race ./cmd/ green at 72.6% coverage - golangci-lint v2.1.6 reports 0 issues Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Security hardening sweep against attacker-controlled inputs that flow through the sandbox lifecycle. Two-phase audit: (1) defense-in-depth around package metadata identified by an initial review of the merged PR #18 surface, then (2) a broader re-audit across the full codebase that turned up three concrete detection bypasses missed by the narrower scope. All findings closed with regression tests pinning each invariant; ten commits, no breaking changes for legitimate use.
Changes
Injection / RCE paths closed
cat << 'KOJUTO_EOF' … KOJUTO_EOFcould be terminated early by a\nKOJUTO_EOF\n<cmd>payload smuggled through a malicious--localfilename, executing<cmd>as root inside the sandbox and letting the package neutralisesitecustomize.pybefore its own code ran. Replaced all 9 heredoc sites with a newdockerWriteFilehelper that delivers content viadocker exec -i … cat > <path>stdin. (131bb3f)--localpackage-name validation.runLocalScannow passes the derived name throughdownloaderValidate, blocking shell metacharacters and newlines at the boundary instead of relying on downstream sanitisation. (131bb3f)npmLifecycleScript. NewshQuotehelper wraps thecdtarget so apackage.jsonkey likefoo$(id)cannot break out. (71d6d0a)parseRequirementsTxtandparsePackageJSONreject names outside the npm/PyPI registry-allowed character sets (npmNameRe,pypiNameRe) at parse time. (71d6d0a)Detection-bypass paths closed
bufio.NewScanner(stderr)used the default 64 KiB cap and never checkedscanner.Err(). A package writing >64 KiB to the shared docker-exec stderr without a newline trippedErrTooLong, exited the parse loop, and silently dropped every subsequent strace event — yielding a CLEAN verdict. Now buffers at 16 MiB and converts anyscanner.Err()todropped++, flipping the verdict to inconclusive. BothparseStraceOutputandStraceFallback. (64788b2)\"escape."([^"]+)"failed on filenames containing literal"(escaped by strace as\"), suppressing the entire rename event. A package could rename/tmp/x"yover/usr/local/bin/python3invisibly. Capture is now((?:\\.|[^"\\])+). (64788b2)d_namegives basename only, so eBPF mode deliveredDstPath = "python3"andpath.Dir("python3")+"/" = "./"never matched any allowed dir — every rename in eBPF mode was treated as benign.isBenignRenamenow fail-safes whenDstPathlacks a leading/. (64788b2)sitecustomize.py's_w()strips\n/\r/\0so an attacker-controlledco_filenamecannot smuggle a fake follow-upKOJUTO:line onto the wire. (71d6d0a)_FRAMES = 8let a malicious package bury its own frame under stdlib wrappers and escape the+user-origin marker. The walk is now unbounded; the common-case fast path (early return on first user frame) is unchanged. (71d6d0a)Defense-in-depth / visibility
kojuto.ymlwarning.--strictdefaults to false on the CLI; an attacker-plantedkojuto.ymlin cwd could silently shrink the detection surface.preRunLoadConfignow emits a stderr warning whenever exclusions are honored, listing the count and paths and pointing at--strict. (9fc198f)code_snippet,comm, andcmdlinecould carry ANSI escape sequences from a malicious package;jq -r .events[].code_snippetwould feed real ESC bytes to a terminal. Now sanitised in two places:sitecustomize.py:_t()escapes0x00-0x1f/0x7fat source;report.WriteJSONsanitises every string field of every event via reflection at output. (f623391)getHostUsernamesanitisation. Mirrors the existinggetHostHostname → sanitizeDockerArgpattern.sanitizeDockerArgnow takes an explicit fallback string so each caller picks its own. (ba9449b)-o) and--pinmanifests now write at0o600instead of0o644/0o666. (09f223f)Action / docs / tooling
config,quiet,no-colorexposed to match CLI flags.runtimedescription tightened to drop the now-obsolete empty-string fallback note. (7269ecf)[Unreleased] / Addedgains the severity-tiered verdict and caller-aware audit hook entries; README sensitive-paths count corrected (50 → 60). (0264378)golangci-lint v2.1.6findings the hardening series introduced (gofmt, govet shadow, intrange, gocritic, misspell). (e71d423)Test Plan
make test)go test -race -coverprofile coverage.txt -covermode atomic ./...)golangci-lint v2.1.6reports0 issues.linux/amd64,linux/arm64,darwin/amd64,darwin/arm64,windows/amd64go mod verifyandgovulncheck ./...clean (0 vulnerabilities reachable from kojuto code)TestParse_RejectsInvalidNpmName,TestParse_RejectsInvalidPyPIName,TestParse_AcceptsScopedNpmNameTestShQuote,TestNpmLifecycleScript_QuotesShellMetachars,TestNpmLifecycleScript_ScopedPackageTestDockerWriteFile_StructureNoHeredoc,TestDockerWriteFile_QuotesPathTestRunLocalScan_RejectsUnsafeFilenameTestSanitizeDockerArg,TestGetHostUsername_SanitizesTestPreRunLoadConfig_LenientWarnsOnExcludeTestSanitizeControl,TestWriteJSON_StripsAnsiFromAttackerFieldsTestOpenOutput_File,TestOutputFiles_OwnerOnlyTestParseStraceOutput_LongLineFlipsToInconclusiveTestParseStraceLine_RenameWithEscapedQuotesTestAnalyze_RenameTrustedBinary(basename-only sub-cases)Related Issues