Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- **Audit hooks for dynamic code execution detection** — Python PEP 578 hook (`sitecustomize.py`) intercepts `compile`/`exec`/`import`/`ctypes.dlopen`; Node.js `--require` hook (`kojuto-require.js`) intercepts `eval`/`Function`/`vm.runInNewContext`/`vm.runInThisContext`/`vm.Script`. New `dynamic_code_execution` category and `EventDynamicExec` event type
- **Severity-tiered verdict** — `types.CategorySeverity` classifies each detection category as HIGH (one event raises the verdict to SUSPICIOUS), MEDIUM (two-or-more raise it), or LOW (never raises the verdict alone). `dynamic_code_execution` is LOW, `dns_tunneling` and `evasion` are MEDIUM, all other categories stay HIGH. Unmapped categories fail closed (treated as HIGH). LOW events still appear in `report.events` for forensics — verdict reflects severity, not raw event count. Stops legitimate Python compat libraries (`six`, `attrs`, `future`) from flipping to SUSPICIOUS on benign internal `compile`/`exec` calls
- **Caller-aware audit hook** — `sitecustomize.py` now walks the Python call stack and reports the actual `.py` file invoking `compile`/`exec`, not the user-controllable `filename` argument (which `six` deliberately sets to `<string>`). When the deepest frame lives inside the scanned package's `site-packages` directory the hook prefixes the wire payload with `+` so the analyzer bypasses path-based benign filtering. Sandbox passes the scan list via `KOJUTO_SCAN_PKGS` so the hook knows which paths count as "user code"
- **System binary write detection** — `openat` with write flags to trusted system binaries (`python3`, `node`, `pip`, `sh`, etc.) in `/usr/local/bin/` or `/usr/bin/` classified as `binary_hijacking`, preventing benignPaths bypass via tmpfs overwrite
- **gVisor auto-detection** — `--runtime` default changed from empty (runc) to `auto`: probes `docker info` for runsc availability and uses gVisor if registered, falls back to runc otherwise
- **npm test package** — `testdata/probe-npm-0.1.0/` with lifecycle hook payloads (preinstall/postinstall) and require-time import phase covering 31 TTPs across 9 attack categories including audit hook validation
- **GitHub Action inputs** — `config`, `quiet`, and `no-color` exposed as Action inputs to match CLI flags (`--config`, `--quiet`, `--no-color`)

### Changed
- Go version requirement lowered from 1.25.0 to 1.24.0 (stable release); `golang.org/x/sys` downgraded from v0.43.0 to v0.41.0
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,7 @@ Of the 300 malicious samples, 238 failed to install (dependencies already remove
|----------|----------|-----------------|
| C2 communication (`c2_communication`) | `aiogram-types-v3` → `147.45.124.42:80` | `connect`/`sendto` to non-loopback IPs |
| Data exfiltration (`data_exfiltration`) | DNS resolution of Discord/Telegram/Pastebin | `sendto` port 53 resolving known exfil services |
| Credential access (`credential_access`) | `axios-attack-demo` → `.ssh/id_rsa`, `.aws/credentials`, `.solana/id.json` | `openat` on ~50 sensitive paths (SSH, cloud, crypto wallets, browser data) |
| Credential access (`credential_access`) | `axios-attack-demo` → `.ssh/id_rsa`, `.aws/credentials`, `.solana/id.json` | `openat` on ~60 sensitive paths (SSH, cloud, crypto wallets, browser data) |
| Code execution (`code_execution`) | `advpruebitaa` → `type nul > prueba11.txt`, `/tmp/ld.py` | `execve` with inline `-c`/`-e` flags or from `/tmp`, `/dev/shm` |
| Memory execution (`memory_execution`) | `ctypes.mmap(RWX)` shellcode injection | `mmap`/`mprotect` with simultaneous PROT_WRITE+PROT_EXEC |
| Binary hijacking (`binary_hijacking`) | `rename /tmp/evil /usr/local/bin/python3` | `rename` targeting trusted system binaries |
Expand Down
28 changes: 26 additions & 2 deletions action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,14 @@ inputs:
description: 'Scan a local package file (.whl, .tgz) or directory'
required: false
default: ''
runtime:
description: 'Container runtime: auto (default, use gVisor if available), runsc (force gVisor), runc (force Docker default). Empty string also resolves to auto on the CLI.'
config:
description: 'Path to kojuto.yml config file (default: kojuto.yml in current directory)'
required: false
default: ''
runtime:
description: 'Container runtime: auto (use gVisor if available), runsc (force gVisor), runc (force Docker default)'
required: false
default: 'auto'
probe-method:
description: 'Probe method: auto, ebpf, strace, strace-container'
required: false
Expand All @@ -45,6 +49,14 @@ inputs:
description: 'Ignore sensitive_paths.exclude from config (recommended for CI)'
required: false
default: 'true'
quiet:
description: 'Suppress phase progress output; emit only the final verdict block'
required: false
default: 'false'
no-color:
description: 'Disable colored output (NO_COLOR env is also respected)'
required: false
default: 'false'

outputs:
verdict:
Expand Down Expand Up @@ -77,10 +89,13 @@ runs:
KOJUTO_FILE: ${{ inputs.file }}
KOJUTO_PIN: ${{ inputs.pin }}
KOJUTO_LOCAL: ${{ inputs.local }}
KOJUTO_CONFIG: ${{ inputs.config }}
KOJUTO_RUNTIME: ${{ inputs.runtime }}
KOJUTO_PROBE_METHOD: ${{ inputs.probe-method }}
KOJUTO_TIMEOUT: ${{ inputs.timeout }}
KOJUTO_STRICT: ${{ inputs.strict }}
KOJUTO_QUIET: ${{ inputs.quiet }}
KOJUTO_NO_COLOR: ${{ inputs['no-color'] }}
run: |
ARGS=()

Expand Down Expand Up @@ -111,9 +126,18 @@ runs:
if [ -n "$KOJUTO_RUNTIME" ]; then
ARGS+=("--runtime" "$KOJUTO_RUNTIME")
fi
if [ -n "$KOJUTO_CONFIG" ]; then
ARGS+=("--config" "$KOJUTO_CONFIG")
fi
if [ "$KOJUTO_STRICT" = "true" ]; then
ARGS+=("--strict")
fi
if [ "$KOJUTO_QUIET" = "true" ]; then
ARGS+=("--quiet")
fi
if [ "$KOJUTO_NO_COLOR" = "true" ]; then
ARGS+=("--no-color")
fi

sudo "${{ github.action_path }}/kojuto" scan "${ARGS[@]}" \
-o /tmp/kojuto-report.json
Expand Down
49 changes: 42 additions & 7 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -165,10 +165,21 @@ func preRunLoadConfig(_ *cobra.Command, _ []string) error {
return fmt.Errorf("loading config %s: %w", cfgPath, err)
}

if flagStrict && len(cfg.SensitivePaths.Exclude) > 0 {
fmt.Fprintf(os.Stderr, "warning: --strict ignoring %d excluded path(s) from config: %v\n",
len(cfg.SensitivePaths.Exclude), cfg.SensitivePaths.Exclude)
cfg.SensitivePaths.Exclude = nil
if len(cfg.SensitivePaths.Exclude) > 0 {
if flagStrict {
fmt.Fprintf(os.Stderr, "warning: --strict ignoring %d excluded path(s) from config: %v\n",
len(cfg.SensitivePaths.Exclude), cfg.SensitivePaths.Exclude)
cfg.SensitivePaths.Exclude = nil
} else {
// Without --strict, an attacker-planted or otherwise
// untrusted kojuto.yml in the cwd silently shrinks the
// detection surface. Surface the exclusions on stderr so
// the user can see them in CI logs and interactive output
// before trusting a "clean" verdict.
fmt.Fprintf(os.Stderr, "warning: %s excludes %d sensitive path(s) from monitoring: %v\n",
cfgPath, len(cfg.SensitivePaths.Exclude), cfg.SensitivePaths.Exclude)
fmt.Fprintln(os.Stderr, " pass --strict to ignore these exclusions")
}
}

paths := config.MergeSensitivePaths(cfg)
Expand Down Expand Up @@ -570,7 +581,12 @@ func writePinnedPyPI(path string, deps []pinnedDep) error {
fmt.Fprintf(&b, "%s\n", dep.Name)
}
}
return os.WriteFile(path, []byte(b.String()), 0o644)
// 0o600 — pinned files can carry private package names (internal
// registry deps) and embedded version metadata; default to user-
// only readable so a multi-user host doesn't leak the manifest to
// other accounts. Caller can chmod after the fact if they want to
// share.
return os.WriteFile(path, []byte(b.String()), 0o600)
}

func writePinnedNpm(path string, deps []pinnedDep) error {
Expand All @@ -595,7 +611,9 @@ func writePinnedNpm(path string, deps []pinnedDep) error {
}
jsonBytes = append(jsonBytes, '\n')

return os.WriteFile(path, jsonBytes, 0o644)
// 0o600 for the same reason as writePinnedPyPI — keep the pinned
// manifest from leaking to other users on a shared host.
return os.WriteFile(path, jsonBytes, 0o600)
}

// runLocalScan scans a local package file (.whl, .tgz) or directory.
Expand Down Expand Up @@ -646,6 +664,17 @@ func runLocalScan(_ []string) error {
pkg = detectPackageName(filepath.Base(localPath))
}

// Validate the derived package name with the same regex as registry
// scans. Local mode is the only path where pkg is built from a user-
// supplied filesystem path, so the name can carry any byte the OS
// allows — including newlines and shell metacharacters that downstream
// sandbox helpers used to interpolate into shell heredocs and Python/
// JS string literals. The dockerWriteFile refactor closed the heredoc
// path, but defending at the boundary keeps every future use site safe.
if validateErr := downloaderValidate(pkg, ""); validateErr != nil {
return fmt.Errorf("local package name derived from %q is unsafe: %w", localPath, validateErr)
}

// Auto-detect ecosystem from file extension, but only if -e was not explicitly set.
// .tgz is always npm; .tar.gz is ambiguous (could be PyPI sdist), so only
// override to npm for .tgz, not .tar.gz.
Expand Down Expand Up @@ -1115,7 +1144,13 @@ func openOutput() (*os.File, error) {
return os.Stdout, nil
}

f, err := os.Create(flagOutput)
// 0o600 — the report can carry attacker-supplied code snippets,
// inferred file paths, and any internal package names from the
// scanned dependency tree. os.Create defaults to 0o666 (then
// umask-clipped, typically 0o644), which would leak the report
// to other users on a multi-user host. Owner-only by default;
// users who explicitly want to share can chmod after the fact.
f, err := os.OpenFile(flagOutput, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0o600)
if err != nil {
return nil, fmt.Errorf("creating output file: %w", err)
}
Expand Down
34 changes: 34 additions & 0 deletions cmd/root_mock_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -443,6 +443,40 @@ func TestRunLocalScan_NpmAutoDetect(t *testing.T) {
}
}

// TestRunLocalScan_RejectsUnsafeFilename pins the security invariant that
// a filename whose derived package name contains shell metacharacters or
// newlines is rejected at the boundary, before any sandbox interaction.
// Without this gate, a malicious .tgz/.whl filename containing a newline
// followed by "KOJUTO_EOF" used to terminate dockerWriteFile's predecessor
// heredoc and execute arbitrary commands as root inside the container.
// The dockerWriteFile refactor closed the heredoc; this test guards the
// belt-and-braces input check.
func TestRunLocalScan_RejectsUnsafeFilename(t *testing.T) {
saveAndRestoreFlags(t)

dir := t.TempDir()
// "\n" + "KOJUTO_EOF" payload baked into a .tgz filename. macOS APFS
// and Linux ext4/xfs all permit '\n' in filenames; if the underlying
// FS rejects it, skip rather than fail.
badName := "evil\nKOJUTO_EOF\nrm-1.0.0.tgz"
badFile := filepath.Join(dir, badName)
if err := os.WriteFile(badFile, []byte("x"), 0o644); err != nil {
t.Skipf("filesystem refuses newlines in filenames: %v", err)
}

flagLocal = badFile
flagEcosystem = types.EcosystemNpm
flagTimeout = 5 * time.Second

err := runLocalScan(nil)
if err == nil {
t.Fatal("expected validation error for unsafe local filename")
}
if !strings.Contains(err.Error(), "unsafe") {
t.Errorf("expected 'unsafe' in error, got: %v", err)
}
}

// --- prepareLocalNpm tests ---

func TestPrepareLocalNpm_NoTgz(t *testing.T) {
Expand Down
54 changes: 53 additions & 1 deletion cmd/root_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -349,7 +349,10 @@ func TestOpenOutput_File(t *testing.T) {
t.Error("expected a file, not os.Stdout")
}

// Verify the file was actually created.
// Verify the file was actually created. The owner-only (0o600)
// permission contract is exercised by TestOpenOutput_FileMode and
// TestOutputFiles_OwnerOnly in root_unix_test.go — those tests
// rely on syscall.Umask, which is not defined on Windows.
if _, err := os.Stat(flagOutput); err != nil {
t.Errorf("output file was not created: %v", err)
}
Expand Down Expand Up @@ -634,6 +637,55 @@ func TestPreRunLoadConfig_StrictIgnoresExclude(t *testing.T) {
}
}

// TestPreRunLoadConfig_LenientWarnsOnExclude pins the visibility
// guarantee added for surface C: when --strict is OFF and the loaded
// kojuto.yml carries `exclude` entries, a stderr warning surfaces the
// reduction in coverage so a CI log or interactive run cannot silently
// trust a "clean" verdict produced under a tampered config. Without
// this, an attacker who plants kojuto.yml in the cwd of a target repo
// can shrink the detection surface invisibly.
func TestPreRunLoadConfig_LenientWarnsOnExclude(t *testing.T) {
origConfig := flagConfig
origStrict := flagStrict
defer func() { flagConfig = origConfig; flagStrict = origStrict }()

dir := t.TempDir()
cfgPath := filepath.Join(dir, "kojuto.yml")
if err := os.WriteFile(cfgPath, []byte("sensitive_paths:\n exclude:\n - \"/.ssh/\"\n - \"/.aws/\"\n"), 0o644); err != nil {
t.Fatal(err)
}

flagConfig = cfgPath
flagStrict = false

oldStderr := os.Stderr
r, w, err := os.Pipe()
if err != nil {
t.Fatal(err)
}
os.Stderr = w
t.Cleanup(func() { os.Stderr = oldStderr })

runErr := preRunLoadConfig(nil, nil)
w.Close()

data, _ := io.ReadAll(r)
got := string(data)

if runErr != nil {
t.Fatalf("preRunLoadConfig returned error: %v", runErr)
}
if !strings.Contains(got, "excludes 2 sensitive path(s)") {
t.Errorf("stderr missing exclusion-count warning, got:\n%s", got)
}
if !strings.Contains(got, "/.ssh/") || !strings.Contains(got, "/.aws/") {
t.Errorf("stderr missing the actual excluded paths, got:\n%s", got)
}
if !strings.Contains(got, "--strict") {
t.Errorf("stderr missing pointer to --strict flag, got:\n%s", got)
}
}

func TestPreRunLoadConfig_InvalidConfig(t *testing.T) {
origConfig := flagConfig
origStrict := flagStrict
Expand Down
80 changes: 80 additions & 0 deletions cmd/root_unix_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
//go:build !windows

package cmd

import (
"os"
"path/filepath"
"syscall"
"testing"
)

// TestOpenOutput_FileMode pins the 0o600 contract for the main scan
// report file. Reports embed attacker-supplied code snippets, file
// paths, and the dependency tree of the scanned package; on a multi-
// user host that data must not be readable by other accounts. The
// test runs under syscall.Umask(0) so the assertion measures the
// requested mode rather than the user's umask-clipped result.
//
// Windows has no umask and no Unix-style permission bits, and Go's
// os.File.Mode().Perm() reports a synthetic 0o666/0o444 there, so
// this test lives in a !windows build-tagged file.
func TestOpenOutput_FileMode(t *testing.T) {
oldUmask := syscall.Umask(0)
defer syscall.Umask(oldUmask)

original := flagOutput
defer func() { flagOutput = original }()

dir := t.TempDir()
flagOutput = filepath.Join(dir, "test-output.json")

f, err := openOutput()
if err != nil {
t.Fatalf("openOutput() error: %v", err)
}
defer f.Close()

info, err := os.Stat(flagOutput)
if err != nil {
t.Fatalf("output file was not created: %v", err)
}
if mode := info.Mode().Perm(); mode != 0o600 {
t.Errorf("output file mode = %o, want 0o600", mode)
}
}

// TestOutputFiles_OwnerOnly pins the 0o600 contract for both pinned-
// manifest writers — they share the same threat model as the main
// report (multi-user host, package metadata leakage).
func TestOutputFiles_OwnerOnly(t *testing.T) {
oldUmask := syscall.Umask(0)
defer syscall.Umask(oldUmask)

dir := t.TempDir()
deps := []pinnedDep{{Name: "lodash", Version: "4.17.21"}}

pyPath := filepath.Join(dir, "requirements.txt")
if err := writePinnedPyPI(pyPath, deps); err != nil {
t.Fatalf("writePinnedPyPI: %v", err)
}
pyInfo, err := os.Stat(pyPath)
if err != nil {
t.Fatal(err)
}
if mode := pyInfo.Mode().Perm(); mode != 0o600 {
t.Errorf("pinned requirements.txt mode = %o, want 0o600", mode)
}

npmPath := filepath.Join(dir, "package.json")
if writeErr := writePinnedNpm(npmPath, deps); writeErr != nil {
t.Fatalf("writePinnedNpm: %v", writeErr)
}
npmInfo, err := os.Stat(npmPath)
if err != nil {
t.Fatal(err)
}
if mode := npmInfo.Mode().Perm(); mode != 0o600 {
t.Errorf("pinned package.json mode = %o, want 0o600", mode)
}
}
26 changes: 20 additions & 6 deletions internal/analyzer/analyzer.go
Original file line number Diff line number Diff line change
Expand Up @@ -942,13 +942,27 @@ var fileOpCommands = map[string]bool{
// Renames to other destinations (e.g. pip installing a new CLI script) are benign.
func isBenignRename(evt *types.SyscallEvent) bool {
destBase := path.Base(evt.DstPath)
destDir := path.Dir(evt.DstPath) + "/"
allowedDirs, ok := benignPaths[destBase]
if !ok {
return true
}

if allowedDirs, ok := benignPaths[destBase]; ok {
for _, d := range allowedDirs {
if destDir == d {
return false
}
// Basename-only DstPath (no leading "/") means the probe captured
// only the dentry name and not the parent directory — the eBPF
// probe takes this shape because vfs_rename's renamedata gives us
// d_name (a qstr basename) rather than a full path. Without the
// parent we cannot prove the rename targets a trusted directory,
// so fail-safe: a rename whose basename matches a tracked binary
// (python3, node, sh, ...) stays suspicious. Strace mode emits
// absolute paths and falls through to the directory check below.
if !strings.HasPrefix(evt.DstPath, "/") {
return false
}

destDir := path.Dir(evt.DstPath) + "/"
for _, d := range allowedDirs {
if destDir == d {
return false
}
}

Expand Down
Loading
Loading