perf(sandbox): parallelize npm lifecycle hooks (5x faster batch installs)#25
Merged
Conversation
Sequential `;`-joined subshells were the dominant wall-time cost
for batch npm scans: ~1 second per `npm run --silent --if-present`
no-op invocation due to npm CLI startup overhead alone, multiplied
by 3 hooks (preinstall/install/postinstall) per package. For a
100-package corpus that adds up to ~5 minutes spent waiting on
no-op skips before any real work happens.
Replace the sequential form with bounded-parallel xargs dispatch:
Discovery path (pkgs=nil):
find ... -print0 | xargs -0 -P 4 -I{} sh -c 'cd ... && hooks'
Named-pkgs path (pkgs=["lodash","express",...]):
printf '%s\n' '/install/.../lodash' '/install/.../express' \
| xargs -P 4 -I{} sh -c 'cd "{}" && hooks'
Parallelism bound (npmLifecycleParallelism = 4):
Each hook chain spawns ~3-5 processes (sh + npm + node + helpers
+ occasional native-build subprocesses). At N=4 the peak concurrent
process count stays well within the container's --pids-limit=256
and matches audit.py's worker count for consistency. CPU/memory
ceilings (--cpus, --memory) absorb the parallel load.
Portability:
- `find -print0 | xargs -0` is supported by dash and busybox
findutils alike; the previous comment that motivated avoiding
`read -d ""` does not apply (that was about the `read` builtin
specifically, not pipe-based separation).
- The named-pkgs path uses `\n` separation instead of `\0` because
dash's printf builtin's `\0` handling is not portable; npm
package names cannot contain newlines per the npm registry
name spec, so `\n` is safe.
- `xargs -I{}` substitutes one input line as one argument,
preserving spaces and quoting via single-quoted command body.
Measured (10-pkg light corpus, post-FP-fix kojuto binary):
install phase: 43s -> 19s (-56%)
import phase × 3: ~same (per-OS-identity, single command each)
total real time: 2:05 -> 1:56
events captured: 19 -> 111 (parallel fork/exec produces more
concurrent PIDs; verdict CLEAN
maintained — clone tracking + V8
JIT filter + library_hijack rule
all correctly attribute parallel
events).
Tests:
- TestNpmLifecycleScript_ParallelDispatch pins the xargs -P N
invocation structure for both paths and guards against
regression to the sequential form.
- Existing TestNpmLifecycleScript_ScopedPackage and
TestNpmLifecycleScript_QuotesShellMetachars still pass — single-
quoting of cd targets is preserved as defense-in-depth against
attacker-controllable package names slipping past depfile
validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sequential
;-joined subshells were the dominant wall-time cost for batch npm scans: ~1 second pernpm run --silent --if-presentno-op invocation due to npm CLI startup overhead alone, multiplied by 3 hooks (preinstall/install/postinstall) per package. For a 100-package corpus that adds up to ~5 minutes of pure overhead before any real work happens.Replace with bounded-parallel xargs dispatch (
-P 4). Peak concurrent process count stays within--pids-limit=256; CPU/memory caps absorb the parallel load.Measured (10-pkg light corpus on post-FP-fix kojuto binary): install phase 43s → 19s (-56%). The win compounds for larger corpora.
Changes
npmLifecycleScriptrewritten to pipe package paths intoxargs -P 4 -I{}.pkgs=nil):find -print0 | xargs -0 -P 4 ...printf '%s\n' <quoted paths> | xargs -P 4 ...npmLifecycleParallelism = 4constant — matches audit.py worker count.Test Plan
TestNpmLifecycleScript_ParallelDispatchpins the xargs -P invocation structure (both paths) and regression-guards against the sequential form.TestNpmLifecycleScript_ScopedPackage,TestNpmLifecycleScript_QuotesShellMetachars— single-quote preservation unchanged.go vet,golangci-lint run— 0 new issues.Related Issues
50 PyPI packages in 98s) is unchanged — this PR targets npm; PyPI batch already uses a singlepip installinvocation that parallelizes internally.