RalianENG · RalianENG · May 13, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **`CategoryUnknownBinary` (severity LOW)** — execve events that do not match a positively-defined attack signature are recorded under this new category for forensic chain visibility without flipping the verdict. Covers the residual default branch of `classifyExecve` AND `sh -c` invocations whose contents fail `isShellCmdBenign`. The rationale block above `classifyExecve` documents why a positive allowlist of legitimate-install binaries is intractable (build toolchains span every ecosystem) and why the harm-firing syscall layer (network/credential/persistence/RWX/audit hook) is the actual detection point. `TestAnalyze_ShellCMultiLayer` documents the mapping from each previously-caught `sh -c` attack pattern to its downstream harm rule
+- **V8 JIT mprotect/mmap filter (path-aware, clone-aware)** — analyzer pre-pass builds a streaming PID→comm map from execve and clone events, then skips simultaneous-RWX `mprotect`/`mmap` events whose PID resolves to a known JIT interpreter (`node`/`nodejs`/`deno`/`bun` plus the `npm`/`npx`/`yarn`/`pnpm` shebang wrappers that run as node under binfmt_script) launched from a trusted directory (`/usr/bin/`, `/usr/local/bin/`, `/bin/`). The filter requires (a) a prior execve for the PID, (b) the basename is in `jitInterpreters`, and (c) the launch path is in `jitInterpreterTrustedDirs` — an attacker who plants a binary named `node` under `/install/<pkg>/bin/` does NOT get the pass. Eliminates the per-package `memory_execution` false positive that every Node-driven scan previously produced. The long-standing TODO comment in `strace_parse.go` documenting this issue is now retired
+- **Clone-attributed thread comm propagation** — `EventClone` (a new event type) is parsed from `clone`/`clone3`/`vfork` strace lines and used to copy the parent's execve comm to the child PID when the child never executes its own execve (V8 worker threads, `posix_spawn` helpers, fork-without-exec). Without this pass, every V8 worker thread emitting mprotect RWX leaked past the JIT filter. The propagation runs as a streaming pre-pass, so a child that later does its own execve still gets correct attribution (clone propagation never overwrites an existing entry)
+- **Main-target PID aliasing** — strace prints the main traced process's syscalls without a `[pid X]` prefix (extracting as `PID=0`) until ambiguity forces a switch, after which the same process appears as `[pid X]` with its real kernel PID. The `collectPIDComm` streaming pass now propagates `m[0]` to any non-zero PID that emits a non-clone event without having appeared as a clone child — that PID is the disambiguated main target. Without this aliasing, import-phase node (which runs as the main strace target) had two PIDs in our event stream for the same process, and worker threads cloned from the disambiguated PID had no parent attribution
 - **Audit hooks for dynamic code execution detection** — Python PEP 578 hook (`sitecustomize.py`) intercepts `compile`/`exec`/`import`/`ctypes.dlopen`; Node.js `--require` hook (`kojuto-require.js`) intercepts `eval`/`Function`/`vm.runInNewContext`/`vm.runInThisContext`/`vm.Script`. New `dynamic_code_execution` category and `EventDynamicExec` event type
 - **Severity-tiered verdict** — `types.CategorySeverity` classifies each detection category as HIGH (one event raises the verdict to SUSPICIOUS), MEDIUM (two-or-more raise it), or LOW (never raises the verdict alone). `dynamic_code_execution` is LOW, `dns_tunneling` and `evasion` are MEDIUM, all other categories stay HIGH. Unmapped categories fail closed (treated as HIGH). LOW events still appear in `report.events` for forensics — verdict reflects severity, not raw event count. Stops legitimate Python compat libraries (`six`, `attrs`, `future`) from flipping to SUSPICIOUS on benign internal `compile`/`exec` calls
 - **Caller-aware audit hook** — `sitecustomize.py` now walks the Python call stack and reports the actual `.py` file invoking `compile`/`exec`, not the user-controllable `filename` argument (which `six` deliberately sets to `<string>`). When the deepest frame lives inside the scanned package's `site-packages` directory the hook prefixes the wire payload with `+` so the analyzer bypasses path-based benign filtering. Sandbox passes the scan list via `KOJUTO_SCAN_PKGS` so the hook knows which paths count as "user code"
@@ -17,6 +21,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **GitHub Action inputs** — `config`, `quiet`, and `no-color` exposed as Action inputs to match CLI flags (`--config`, `--quiet`, `--no-color`)
 
 ### Changed
+- **Probe install launch uses a staged script file, not `sh -c <inline>`** — `Sandbox.InstallCommand` / `InstallAllCommand` now write the install script to `/var/cache/kojuto/install.sh` (a dedicated tmpfs) via `dockerWriteFile` and return `["sh", "/var/cache/kojuto/install.sh"]`. The outer probe shell is filtered as benign by `isBenignExec` (sh from `/bin/` matches `benignPaths`) instead of tripping the `sh -c` branch of `classifyExecve`. Attackers cannot mimic this shape because npm/yarn/pnpm always spawn lifecycle hooks as `sh -c <package script>`; the file-path form is reserved for kojuto's own launch path. Signature change: both methods now take `context.Context` and return `(cmd, error)`
+- **Unrecognized execve AND `sh -c` content demoted to LOW severity** — both `classifyExecve`'s default branch and its `sh -c` branch now assign `CategoryUnknownBinary` instead of `CategoryCodeExecution`. The event still appears in the report for chain visibility, but the verdict is decided by the syscall-level rules that observe the binary's actual behavior. Surfaced by clean-corpus measurement: native-module packages (argon2, bcrypt, sharp, etc.) all fire `sh -c "cross-env FOO=bar node-gyp-build"` in their preinstall hook, and the negative-space first-token filter in `isShellCmdBenign` cannot keep up with the legitimate set of node-ecosystem build tools. Each attack pattern previously caught by cmdline content has a dedicated harm-firing rule downstream: curl/wget → `c2_communication` on connect; cp/mv to /usr/local/bin/* → `binary_hijacking` on openat (parser emits openat specifically for system-binary write targets); cat ~/.ssh/* → `credential_access` on openat; bind/listen → `backdoor`. A detailed design rationale lives above `classifyExecve` documenting the dynamic/static defense split and the mapping from each historical sh -c case to its downstream rule
+- **strace tracing extended with `clone`/`clone3` and `--quiet=attach`** — clone variants are now in the strace trace list so the analyzer's PID→comm propagation pass can attribute child syscalls correctly. `--quiet=attach` suppresses the `strace: Process N attached` informational line, which strace otherwise prints INLINE inside the originating clone() trace, splitting the trace across two output lines and breaking single-line regex parsers
 - Go version requirement lowered from 1.25.0 to 1.24.0 (stable release); `golang.org/x/sys` downgraded from v0.43.0 to v0.41.0
 - `--runtime` flag default changed from `""` to `"auto"`
 - `evasion-test` package updated: `b2_eval_exec` and `b3_function_constructor` promoted from `[BYPASS]` to `[DETECT]` (now `a10`/`a11`); new `b9_audit_hook_disable`, `b10_eval_via_import`, `c6_detect_audit_hook` evasion tests
@@ -26,6 +33,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Documentation aligned with implementation: README/SPECIFICATION/SECURITY now describe the actual `--network=none` sandbox (previously claimed an isolated bridge); Python audit hook list corrected (`compile`/`exec`/`import` — `eval` is a Node.js-only event); SPECIFICATION test-data section rewritten around `probe-alpha`/`probe-npm`/`evasion-test` (the obsolete `axios-demo` entry was stale)
 
 ### Fixed
+- **`extractPID` returns 0 for space-padded child PIDs** — strace right-pads small PIDs with spaces to align columns (`[pid    12]`, `[pid     1]`), and `strconv.ParseUint` rejects leading whitespace. Every container PID parsed as 0, silently disabling all downstream PID-aware analysis (V8 JIT correlation, process-tree reconstruction). `strings.TrimSpace` before parse fixes it
+- **Package-manager caches no longer trip the persistence backstop** — `NPM_CONFIG_CACHE=/var/cache/kojuto/npm` and `PIP_CACHE_DIR=/var/cache/kojuto/pip` pin both caches to a dedicated tmpfs (`--tmpfs=/var/cache/kojuto:nosuid,mode=1777,size=200m`) outside the sandbox's `HOME=/home/dev`. Without these, npm's `_logs/`/`_cacache/` and pip's wheel cache wrote under `/home/dev/.npm/` and `/home/dev/.cache/pip/`, both correctly flagged by the "/home is illegitimate" structural backstop. Redirecting at the sandbox layer is preferable to relaxing the analyzer rule — the detection guarantee stays strict, and legitimate cache I/O goes to a path the analyzer never inspects (avoids the "set up a benign-looking path under HOME and smuggle payload" bypass that a carve-out would have enabled)
+- **`/tmp/` added to `suspiciousExecDirs`** — execve from `/tmp/` is now positively classified as `code_execution` HIGH, matching the documented behavior in README. Previously this only worked by accident via the catch-all default branch of `classifyExecve`; with the demotion to `CategoryUnknownBinary` LOW, the basename-spoofing detection (e.g. `/tmp/python3`) needed an explicit positive rule
 - **`/.dockerenv` masking actually works now** — the post-start `rm -f /.dockerenv` had been silently failing on every scan since `--read-only` rootfs was introduced (the rootfs is, by design, not writable). `/.dockerenv` is now masked at container creation time by bind-mounting an empty regular file from the host over the path. Sandbox-aware payloads that read `/.dockerenv` see empty content; gVisor (`--runtime=runsc`) is still recommended to also defeat path-existence checks
 - **Sandbox preparation no longer fails silently** — `plantHoneypotFiles`, `restoreLocalBin`, `WriteProbeScripts`, and `WriteProbeScriptsMulti` now return errors instead of swallowing the result of every `docker exec`. A swallowed honeypot-write or probe-script-write failure used to leave the container partially prepared, and any sandbox-aware payload that detected the gap and stayed dormant would surface as `clean`. The errors propagate through `Start` / `StartPaused` and abort the scan
 - 21 linter errors: gofmt (15 files), importShadow (2), ifElseChain (1), godot (1), intrange (1), staticcheck De Morgan (1)

diff --git a/README.md b/README.md
@@ -237,12 +237,13 @@ This approach detects environment-aware and delayed-execution supply chain attac
 
 ## Detection Benchmarks
 
-Validated against 300 randomly sampled malicious packages from [Datadog's malicious-software-packages-dataset](https://github.com/DataDog/malicious-software-packages-dataset) (seed=42, reproducible) and 70 known-clean packages.
+True-positive rate validated against 300 randomly sampled malicious packages from [Datadog's malicious-software-packages-dataset](https://github.com/DataDog/malicious-software-packages-dataset) (seed=42, reproducible). False-positive rate re-measured (2026-05) against 100 popular non-corporate PyPI packages + 100 popular non-corporate npm packages.
 
 | Metric | Result |
 |--------|--------|
 | True Positive Rate | **100%** (61/61 installable malicious packages detected) |
-| False Positive Rate | **0%** (0/70 clean packages flagged) |
+| False Positive Rate — PyPI | **0%** (0/84 clean PyPI packages flagged, 16 install-resolution failures excluded as scan-infrastructure issue) |
+| False Positive Rate — npm | **0%** within in-scope detection categories (0/89 across `code_execution`/`memory_execution`/`persistence`/`binary_hijacking`/`credential_access`/`backdoor`/`anti_forensics`); 2/91 across the entire taxonomy from a documented `c2_communication` issue (install-time DNS lookups by `bull`/`bullmq` register as outbound intent — see Known Limitations) |
 | Batch screening speed | **50 PyPI packages in 98s** (single sandbox) |
 
 Of the 300 malicious samples, 238 failed to install (dependencies already removed from PyPI) and 1 timed out — expected for archived malware. All 61 that installed successfully were detected.
@@ -254,8 +255,8 @@ Of the 300 malicious samples, 238 failed to install (dependencies already remove
 | C2 communication (`c2_communication`) | `aiogram-types-v3` → `147.45.124.42:80` | `connect`/`sendto` to non-loopback IPs |
 | Data exfiltration (`data_exfiltration`) | DNS resolution of Discord/Telegram/Pastebin | `sendto` port 53 resolving known exfil services |
 | Credential access (`credential_access`) | `axios-attack-demo` → `.ssh/id_rsa`, `.aws/credentials`, `.solana/id.json` | `openat` on ~60 sensitive paths (SSH, cloud, crypto wallets, browser data) |
-| Code execution (`code_execution`) | `advpruebitaa` → `type nul > prueba11.txt`, `/tmp/ld.py` | `execve` with inline `-c`/`-e` flags or from `/tmp`, `/dev/shm` |
-| Memory execution (`memory_execution`) | `ctypes.mmap(RWX)` shellcode injection | `mmap`/`mprotect` with simultaneous PROT_WRITE+PROT_EXEC |
+| Code execution (`code_execution`) | `advpruebitaa` → `/tmp/ld.py` | `execve` with inline `-c`/`-e` flags, or from `/tmp`/`/dev/shm`/`/proc/self/fd`. `sh -c` content and unrecognized binaries are recorded as `unknown_binary` (LOW) — their harm fires via the dedicated rules below |
+| Memory execution (`memory_execution`) | `ctypes.mmap(RWX)` shellcode injection | `mmap`/`mprotect` with simultaneous PROT_WRITE+PROT_EXEC, attributed by PID to filter V8/JIT noise from `node`/`npm`/`npx`/`yarn`/`pnpm`/`deno`/`bun` running from trusted system directories |
 | Binary hijacking (`binary_hijacking`) | `rename /tmp/evil /usr/local/bin/python3` | `rename` targeting trusted system binaries |
 | Backdoor (`backdoor`) | `bind` + `listen` + `accept` on attacker-controlled port | Server socket operations during install |
 | Persistence (`persistence`) | Write to `.bashrc`, `.config/systemd/user/`, any `/home/` path | `openat` with write flags to shell startup files or user home directory |
@@ -278,7 +279,7 @@ sensitive_paths:
 
 ### False positive verification
 
-50 popular PyPI packages (flask, django, requests, cryptography, pydantic, etc.) and 20 npm packages (lodash, express, axios, etc.) scanned with zero false positives.
+100 popular non-corporate PyPI packages and 100 popular non-corporate npm packages scanned. PyPI: 84 returned a verdict, all clean (0 in-scope FP); 16 hit a pre-existing pip dep-resolution failure unrelated to detection logic. npm: 91 returned a verdict, 89 clean within in-scope categories and 2 flagged by the documented `bull`/`bullmq` install-time DNS lookup issue (Known Limitations); 9 errored on native build failures unrelated to detection logic.
 
 ## Known Limitations
 

diff --git a/cmd/root.go b/cmd/root.go
@@ -316,6 +316,19 @@ func scanSinglePackage(pkg, version, ecosystem string) (*pinnedDep, error) {
 	return &pinnedDep{Name: pkg, Version: resolvedVersion}, nil
 }
 
+// benchLog emits a single stderr line with event count and heap stats when
+// KOJUTO_BENCH=1. Used by bench/ harness to chart analyzer load and memory
+// ceiling across install/import/analyze phases. No-op outside bench mode.
+func benchLog(phase string, eventCount int) {
+	if os.Getenv("KOJUTO_BENCH") != "1" {
+		return
+	}
+	var ms runtime.MemStats
+	runtime.ReadMemStats(&ms)
+	fmt.Fprintf(os.Stderr, "BENCH phase=%s events=%d heap_mb=%d sys_mb=%d\n",
+		phase, eventCount, ms.HeapAlloc/(1024*1024), ms.Sys/(1024*1024))
+}
+
 func runBatchScan(_ []string) error {
 	deps, ecosystem, err := depfileParse(flagFile)
 	if err != nil {
@@ -421,7 +434,11 @@ func runBatchScreening(deps []depfile.Dep, ecosystem string) (string, error) {
 	// Install all packages at once with strace.
 	installPhase := startPhase("install", fmt.Sprintf("%d packages", len(pkgNames)))
 	cp := probe.NewContainerStrace()
-	installOut, installErr := cp.StartAndInstall(ctx, sb.ContainerID(), sb.InstallAllCommand(pkgNames))
+	installCmd, installCmdErr := sb.InstallAllCommand(ctx, pkgNames)
+	if installCmdErr != nil {
+		return "", fmt.Errorf("staging install command: %w", installCmdErr)
+	}
+	installOut, installErr := cp.StartAndInstall(ctx, sb.ContainerID(), installCmd)
 	if installErr != nil {
 		fmt.Fprintf(os.Stderr, "[!] Install output:\n%s\n", string(installOut))
 		return "", fmt.Errorf("batch install failed: %w", installErr)
@@ -432,6 +449,7 @@ func runBatchScreening(deps []depfile.Dep, ecosystem string) (string, error) {
 	for evt := range cp.Events() {
 		events = append(events, evt)
 	}
+	benchLog("install_drain", len(events))
 
 	// Import all packages under simulated OS identities (3 scripts total).
 	if err := sb.WriteProbeScriptsMulti(ctx, pkgNames); err != nil {
@@ -453,9 +471,11 @@ func runBatchScreening(deps []depfile.Dep, ecosystem string) (string, error) {
 			events = append(events, evt)
 		}
 		importPhase.end()
+		benchLog("import_drain_"+osNames[i], len(events))
 	}
 
 	verdict, filtered := analyzer.Analyze(events)
+	benchLog("analyze_done", len(filtered))
 	phaseInfo("screening", fmt.Sprintf("verdict=%s (%d events)", verdict, len(filtered)))
 
 	return verdict, nil
@@ -1034,7 +1054,11 @@ func runContainerStraceProbe(ctx context.Context, sb *sandbox.Sandbox, _ string)
 	cp := probe.NewContainerStrace()
 	installPhase := startPhase("install", "")
 
-	installOut, err := cp.StartAndInstall(ctx, sb.ContainerID(), sb.InstallCommand())
+	installCmd, err := sb.InstallCommand(ctx)
+	if err != nil {
+		return nil, fmt.Errorf("staging install command: %w", err)
+	}
+	installOut, err := cp.StartAndInstall(ctx, sb.ContainerID(), installCmd)
 	if err != nil {
 		fmt.Fprintf(os.Stderr, "[!] Install output:\n%s\n", string(installOut))