codemap

Static codebase + binary analyzer. One binary, ~500 actions, 18 source languages, sub-second cold-cache on 3K-file repos. No network, no servers, no databases, no API keys.

This README is your system prompt. Designed for AI agents: drop the entire file into your context (or fetch https://raw.githubusercontent.com/charleschenai/codemap/main/README.md) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see docs/HUMAN.md. Everyone else, keep reading.

When to reach for codemap

Problem	Codemap action	Why codemap (vs alternatives)
"What does this codebase do?"	`summary --dir <path>`	Cross-file structural overview in one call. Beats reading files.
"Find unused functions / dead code"	`dead-functions --dir <path>`	Call-graph reachability across modules. grep can't do this.
"Who calls function X?"	`callers --dir <path> X`	True call graph (AST-aware), not a string match.
"What does function X depend on (transitively)?"	`trace --dir <path> X`	Walks the dep graph. grep would only find direct refs.
"What changed between two commits?"	`diff --dir <path> <ref1> <ref2>`	Semantic diff, not line diff.
"Find security issues"	`audit --dir <path>`	Composite of taint + secret-scan + dep-tree + dead-deps.
"Where would a tainted input flow?"	`taint --dir <path> --source <fn> --sink <fn>`	Path-sensitive, sanitizer-aware, alias-aware, cross-procedural.
"Reverse-engineer a binary"	`bin-info <path/to/binary>`	PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in.
"Find cross-language coupling"	`cross-lang --dir <path>`	Imports/calls that cross language boundaries.
"Natural-language: I don't know which action"	`natural-query "<question>" --dir <path>`	Routes to the right action by Levenshtein + LLM (when --llm).

When NOT to reach for codemap

Editing files: codemap is read-only. Use Edit/Write directly.
Running code: codemap doesn't compile or exec. Use bash.
Live process state: codemap is static. Use ps, lsof, ss.
Single-file grep: if you know the file, grep is faster.
String search across few files: if N<5 files, just grep.

Install (one command)

curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/install.sh | sh

Detects arch (x86_64-linux, aarch64-linux, x86_64-macos, aarch64-macos), downloads matching tarball, installs to ~/.local/bin/codemap. No sudo. To install to /usr/local/bin: append -- --system.

From source: git clone https://github.com/charleschenai/codemap && cd codemap && ./install.sh.

Verify

codemap --version-detail

Prints:

codemap 8.0.1
git: 5c091ec
built: 2026-05-19T12:40:00Z
host: mater (aarch64-unknown-linux-gnu)

If the binary is older than expected, re-run install with --update.

How to call any action

Universal shape:

codemap <ACTION> [TARGET...] --dir <PATH> [--json] [--quiet] [other-flags]

Flag	Purpose
`--dir <PATH>`	Required. Repo/dir to scan. Repeatable for multi-repo.
`--json`	Output JSON (parseable). Default is text (human-readable).
`--quiet`	Suppress scan/cache status messages on stderr.
`--no-cache`	Force re-scan, ignore `.codemap/cache.bincode`.
`--include-path <PATH>`	C/C++ include search path.
`--watch [SECS]`	Re-run every N seconds.

For agents: always use --json and --quiet unless you specifically want text output.

Discover actions

codemap --help                                       # full action list
codemap <action> --help                              # action-specific flags
codemap natural-query "find dead code" --dir <path>  # NL routing

natural-query accepts plain English and returns the top routed action(s). For agents that aren't sure which action to call, this is the primary entry point.

Action categories

~500 actions grouped by purpose. Full catalog at docs/ACTION_CATALOG.md. High-level groups:

Category	Action count	Examples
Analysis	~20	`summary`, `stats`, `trace`, `callers`, `hotspots`, `layers`, `health`, `decorators`
Code intelligence	~30	`complexity`, `import-cost`, `churn`, `api-diff`, `clones`, `entry-points`, `dead-functions`
Dataflow / security	~15	`data-flow`, `taint`, `slice`, `trace-value`, `sinks`, `secret-scan`, `audit`, `dep-tree`
Graph theory	~40	`pagerank`, `hubs`, `bridges`, `centrality` (17 measures), `community` (Leiden), `bellman-ford`
Binary / RE	~150	`elf-info`, `pe-imports`, `macho-info`, `bin-search`, `bin-disasm`, `bin-strings`, `bin-relocs`
Schemas	~10	`proto-schema`, `openapi-schema`, `graphql-schema`, `sql-extract`, `dbf-schema`
Supply chain	~10	`osv-scan`, `sbom-diff`, `license-check`, `cve-scan`
Config-as-code	~10	`k8s-scan`, `iac-scan`, `dockerfile-scan`, `ci-scan`, `oci-scan`
ML / AI	~10	`gguf-info`, `safetensors-info`, `onnx-info`, `cuda-info`, `pyc-info`
LSP bridge	~5	`lsp-symbols`, `lsp-references`, `lsp-calls`, `lsp-diagnostics`, `lsp-types`
Web	~5	`web-sitemap`, `js-api-extract` (HAR/HTML input required)
Cross-language	~5	`lang-bridges`, `gpu-functions`, `monkey-patches`
Composite	~10	`audit`, `compare`, `validate`, `changeset`, `handoff`, `pipeline`
arXiv-derived	15	`symex-concolic`, `pointer-analysis`, `abstract-interp`, `bin-search`, `loop-polyhedral`, `gpu-analyze`, `side-channel-detect`, `semantic-slice`, `symex-speculative`, `cegio`, `natural-query`, `synthesize`, `detect-memory-corruption`, `neural-decompile`, `patch-binary`

Output schema

All --json outputs follow:

{
  "ok": <boolean>,
  "action": "<action-name>",
  "dir": "<scanned-path>",
  "result": <action-specific>,
  "stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}

result shape varies per action. Action-specific schemas in docs/SCHEMAS.md.

Exit codes

Code	Meaning	Agent response
0	Success	Parse `--json` output
1	Usage error (bad flag, missing --dir)	Re-read `--help`, fix args, retry
2	I/O error (path not found, no read perm)	Verify path, retry
101	Panic	Do not retry. File a bug at https://github.com/charleschenai/codemap/issues

Other non-zero codes: action-specific. See <action> --help.

MCP integration

codemap ships an MCP server for Claude Code agents:

{
  "mcpServers": {
    "codemap": {
      "command": "codemap",
      "args": ["mcp-stdio"]
    }
  }
}

Exposes the full action surface as MCP tools. Tool names match action names; args match CLI flags.

Recipes — when the agent has a specific job to do

Each recipe: what the action does → command → sample output → when to use it.

For the complete flat list of action names see docs/ACTION_CATALOG.md.

Codebase understanding (first-look on an unknown repo)

`summary` — one-page structural overview

Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.

$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
  "entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}

Use when: new repo, "tell me what this does" before diving deeper.

`stats` — quantitative metrics

Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.

$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}

Use when: comparing repos by size, reporting metrics, sanity-checking parse coverage.

`layers` — architectural layer detection

Infers boundaries (web / service / data / infra) from import patterns + naming conventions.

$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
  {"name":"data","modules":["models","repo"]}],"violations":[...]}}

Use when: validating that "web shouldn't import from data" type architectural rules hold.

`hotspots` — files with most churn × complexity

Surfaces "danger zone" code (high git churn + high cyclomatic complexity).

$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}

Use when: prioritizing refactor work, finding "where bugs live."

`entry-points` — public API surface

Lists exported functions/classes that other code can call from outside.

$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}

Use when: API documentation, understanding what's a stable contract.

`health` — overall quality summary

Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.

$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}

Use when: quick "should we touch this codebase or not" gut-check.

Code quality & cleanup

`dead-functions` — unreachable code

Functions never called by any other function in the workspace.

$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}

Use when: cleanup PR, removing tech debt. Don't use for: identifying entry points (they're "dead" by call-graph but intentionally public).

`dead-files` — files imported nowhere

Files no other file imports / uses.

$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}

Use when: dead-import cleanup.

`dead-deps` — declared deps never imported

Packages in Cargo.toml/package.json/pyproject.toml that no source file imports.

$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}

Use when: dep cleanup, reducing build time + attack surface.

`complexity` — cyclomatic complexity per function

McCabe complexity (branches+1). Catches "this function should be split."

$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}

Use when: finding refactor candidates, code review automation.

`churn` — git change frequency per file

Commits-touching-file count over a window.

$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}

Use when: combined with complexity for hotspots, ownership analysis.

`clones` — duplicated code blocks

Detects near-identical token sequences across files (copy-paste detection).

$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}

Use when: finding extraction candidates for shared functions.

`circular` — circular import detection

Reports module cycles (a → b → c → a).

$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}

Use when: untangling architecture before a refactor.

Impact tracing & change analysis

`trace` — transitive callees (what does X depend on?)

Walks the call graph forward from a function/symbol, returns full dep tree.

$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
  {"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
  {"name":"format_money","file":"util/money.go:8","depth":2}]}}

Use when: impact analysis before changing a function, generating context for an LLM.

`callers` — transitive callers (who calls X?)

Reverse of trace. Returns the function's call sites + their callers.

$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}

Use when: "if I change this signature, what breaks?"

`blast-radius` — affected entities from a change

Combines callers + dataflow + tests touched. Most pessimistic estimate.

$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}

Use when: "what's the size of changing this thing?"

`diff` — semantic diff between two refs

Function-level diff: added, removed, signature-changed, body-changed.

$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
  "signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}

Use when: generating PR descriptions, understanding code review scope.

`api-diff` — breaking-change classifier

Like diff but specifically flags BREAKING vs additive changes to public API.

$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
  {"kind":"removed","fn":"OldAPI::v1_login"},
  {"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}

Use when: versioning decisions (semver minor vs major), CHANGELOG generation.

`diff-impact` — functions affected by a commit range

Maps the diff to every transitively-affected caller.

$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}

Use when: deciding test scope for a PR.

`churn-vs-complexity` (via `hotspots`) — see Codebase understanding above

Data flow & security

`audit` — composite security report

Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.

$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
  {"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
  {"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
  {"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

Use when: first-pass security review of an unfamiliar repo.

`taint` — path-sensitive taint flow

Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. safe = sanitize(x)), cross-procedural (parses wrapper bodies to detect hidden sanitizers).

$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
  "hops":["params.id","userId","query"],"sanitized":false}]}}

Use when: SQLi/XSS/SSRF detection, "is user input reaching this sink?"

`slice` — backward program slice

Given a target variable/sink, return only the code that influences it.

$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}

Use when: narrowing what to read when chasing a bug.

`sinks` — list all dangerous sinks

Enumerates every db.execute, eval, exec, Runtime.exec, subprocess.shell=True, innerHTML=, etc.

$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}

Use when: building taint queries, audit checklist generation.

`secret-scan` — credentials in source

20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.

$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}

Use when: pre-commit hook, pre-publish audit.

`data-flow` — value origin tracing

Where does this variable's value come from? (def-use chain)

$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}

Use when: "where does this magic value come from?"

`api-surface` — every exported HTTP endpoint

Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.

$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}

Use when: generating OpenAPI from existing code, finding unauthenticated endpoints.

Graph algorithms (heterogeneous-graph queries)

These run on codemap's internal call graph + import graph + AST graph.

`pagerank` — most-important nodes

NetworkX-style PageRank. High score = central + many incoming refs.

$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}

Use when: finding "load-bearing" functions, prioritizing code review.

`hubs` — high-out-degree nodes

Functions/modules that depend on many others. Different from PageRank (which is about incoming).

$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}

Use when: finding god-objects, refactor targets.

`bridges` — single-edge cut points

Edges whose removal disconnects the graph. These are critical paths.

$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}

Use when: identifying single points of failure in module coupling.

`centrality` (17 measures) — broker / connector detection

Run with a specific measure: betweenness, eigenvector, katz, closeness, harmonic, load, structural-holes (brokers), voterank, etc. All NetworkX standards.

$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}

Use when: finding modules that connect otherwise-separate subsystems.

`clusters` — community detection (Leiden default)

Partitions the graph into densely-connected sub-communities.

$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}

Use when: discovering implicit module boundaries.

`paths` — shortest path between two nodes

Returns the chain of imports/calls connecting source → target.

$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}

Use when: "how does X reach Y?"

`subgraph` — extract a focused subgraph

Returns nodes within N hops of a target. Useful before deep analysis.

$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}

Use when: narrowing scope before more expensive analysis.

`bellman-ford <src>` / `astar <src> <tgt>` / `floyd-warshall` / etc.

Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.

Binary analysis & reverse engineering

`bin-info` / `elf-info` / `macho-info` / `pe-info` — binary fingerprint

Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.

$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
  "sections":34,"anti_debug":[],"packed":false}}

Use when: triage step 1 — "what is this binary?"

`bin-search` — cross-binary function matching

(arXiv 2507.15226 TinyLFU) Embedding-based + bucketed dedup. Finds functions shared across two binaries even when symbols are stripped.

$ codemap bin-search --json --quiet --left ./malware-a --right ./malware-b
{"ok":true,"result":{"shared":[{"fn":"hash_block","conf":0.97}],"only_left":[...],"only_right":[...]}}

Use when: malware family detection, identifying shared code across stripped binaries, version comparison.

`pe-imports` / `pe-exports` — Windows PE import/export tables

Lists every DLL imported + every function exported.

$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}

Use when: static behavioral profiling — what APIs does this binary depend on?

`pe-strings` / `bin-strings` — string extraction

Ascii + utf16le + entropy-filtered.

$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}

Use when: triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.

`binary-diff` — semantic binary diff

Functions added / removed / modified between two builds.

$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}

Use when: patch analysis, regression hunting in firmware.

`dotnet-meta` — .NET assembly metadata

PE that contains CLI/.NET — reads the metadata streams, lists types + methods.

$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}

Use when: analyzing .NET malware or .NET 3rd-party libs.

`java-class` — JVM class file

Constant pool, method signatures, bytecode summaries.

`wasm-info` — WebAssembly module

Imports, exports, function table, memory layout.

Schemas & config-as-code

`openapi-schema` / `graphql-schema` / `proto-schema` — extract API schemas

Parses spec files and reports endpoints/types/operations.

$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}

Use when: generating client code, checking spec consistency.

`k8s-scan` — Kubernetes CIS audit (16 rules)

Checks privileged containers, hostNetwork, missing resource limits, etc.

$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}

Use when: auditing manifests before apply.

`iac-scan` — Terraform/CloudFormation/Pulumi audit (12 rules)

$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}

`dockerfile-scan` — Dockerfile audit (10 rules)

$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}

`ci-scan` — CI/CD pipeline audit (37 rules across 6 ecosystems)

GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, pull_request_target misuse.

$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}

`oci-scan` — OCI image / docker save tarball audit

Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.

$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}

`sql-extract` — SQL DDL/DML extraction

Pulls SQL out of source code or .sql files. Schema + queries.

$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}

Supply chain

`osv-scan` — match deps against OSV.dev advisories (offline)

Semver-range-aware.

$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

`sbom-diff` — CycloneDX/SPDX diff

Added, removed, upgraded, downgraded packages between two SBOMs.

$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}

`license-check` — SPDX compatibility

Per-package license + compatibility verdict.

$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}

`cve-scan` — same as osv-scan but specifically against MITRE CVE corpus

ML / AI model files

`gguf-info` — llama.cpp GGUF inspection

Architecture, layer count, head count, quant level, vocab size.

$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}

Use when: "what model is this file?" Pre-load sanity check.

`safetensors-info` — HuggingFace safetensors inspection

Tensor shapes, dtypes, total params.

$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}

`onnx-info` — ONNX model graph

Operators, inputs, outputs, opset.

$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}

`cuda-info` — CUDA fatbin/cubin inspection

SM versions present, kernel symbols.

`pyc-info` — Python bytecode inspection

Magic number, marshalled code object, imports.

Cross-language & web

`lang-bridges` — FFI/binding detection

Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.

$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}

`gpu-functions` — GPU kernels in source

CUDA __global__, OpenCL kernels, Metal compute kernels, ROCm/HIP.

$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}

`monkey-patches` — runtime mutation detection

obj.method = new_fn, setattr, prototype patching.

`dispatch-map` — generic dispatch tables

Routers, registries, plugin maps. Finds the "switch statement that controls behavior."

`web-sitemap` — sitemap.xml + crawled link graph

`js-api-extract` — extract API calls from HAR / JS source

LSP bridge (requires a running language server)

`lsp-symbols` — workspace symbol table from LSP

Real symbol info, not AST-inferred. More accurate for typed languages.

`lsp-references` — every reference to a symbol (LSP-grade)

`lsp-calls` — call hierarchy from LSP

`lsp-diagnostics` — current LSP diagnostics across the workspace

$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}

Use when: programmatic access to compiler/type-checker errors.

`lsp-types` — type info on hover for a position

arXiv-derived research actions (advanced)

These implement specific research papers. Each works in MVP-scaffold form — verifying the integration points and graph data; full paper-grade results may need additional flags or tuning.

`natural-query` — plain-English action router

arXiv 2301.04862. Maps NL questions to codemap actions via Levenshtein + (optionally) LLM router.

$ codemap natural-query "find functions that handle authentication" --dir ./my-repo --json --quiet
{"ok":true,"result":{"routed_to":"callers","args":{"target":"login|auth|signin"}}}

Use when: the agent doesn't know which action to call. Always-safe entry point.

`symex-concolic` — concolic execution driver

arXiv 1205.4951. Combines concrete + symbolic execution. Drives test inputs by negating path conditions to explore new branches.

$ codemap symex-concolic --dir ./my-repo --json --quiet --target validate_input
{"ok":true,"result":{"paths":[{"condition":"x > 0","example_input":"x=1"}]}}

Use when: generating test inputs that achieve branch coverage on a target function.

`pointer-analysis` — Andersen field-sensitive PA

Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.

$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
  "aliases":[{"ptr":"p","may_alias":["a","b"]}]}}

Use when: understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.

`abstract-interp` — composable abstract-domain analyzer

arXiv 1309.5133. Computes invariants like "x is positive" over abstract states. Sign + parity domains shipped; user-pluggable.

$ codemap abstract-interp --dir ./my-repo --json --quiet --target check_bounds
{"ok":true,"result":{"invariants":[{"var":"i","sign":"pos","parity":"any"}]}}

Use when: proving safety properties (overflow-free arithmetic, non-null pointers).

`loop-polyhedral` — polyhedral iteration-domain classifier

Feautrier 1996 / Bondhugula 2008. Classifies loops as affine / non-affine / parallelizable / vectorizable.

$ codemap loop-polyhedral --dir ./my-repo --json --quiet
{"ok":true,"result":{"loops":[{"file":"matmul.c","line":12,"class":"affine","parallel":true}]}}

Use when: identifying loop-optimization opportunities before manual vectorization.

`gpu-analyze` — CUDA kernel triage

arXiv 2604.14825 Nautilus. Memory-bound vs compute-bound vs warp-divergence triage on CUDA kernels.

$ codemap gpu-analyze --dir ./kernels --json --quiet
{"ok":true,"result":{"kernels":[{"name":"gemm","class":"compute-bound","warp_divergence":"low"}]}}

Use when: GPU kernel optimization priority (don't tune memory if compute-bound, etc.).

`side-channel-detect` — speculative-execution / cache-oracle finder

arXiv 2301.03724. Detects code patterns vulnerable to Spectre-class timing attacks (branch-on-secret + dependent memory access).

$ codemap side-channel-detect --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":"crypto.c","line":48,"kind":"branch_on_secret"}]}}

Use when: auditing crypto / privileged code for timing leaks.

`semantic-slice` — LLM-augmented backward slice

arXiv 2507.18957 SLICEMATE. Static slice + LLM refinement.

$ codemap semantic-slice --dir ./my-repo --json --quiet --var 'auth_token'
{"ok":true,"result":{"slice":[...],"llm_refinement":"sanitization missing on line 88"}}

Use when: chasing a bug — narrow the code that influences a sink with LLM help.

`symex-speculative` — speculative-decoding symex

arXiv 2203.16487. Faster symbolic execution via draft-model speculation.

$ codemap symex-speculative --dir ./my-repo --json --quiet --target parse
{"ok":true,"result":{"paths_explored":42,"speculation_accept_rate":0.71}}

Use when: faster symex when willing to trade some completeness for speed.

`cegio` — counterexample-guided inductive optimization

arXiv 1704.03738. Given taint paths, synthesizes the minimum input that triggers a vulnerability.

$ codemap cegio --dir ./my-repo --json --quiet --taint-result <prior-taint-output>
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}

Use when: turning a taint finding into a proof-of-concept exploit input.

`synthesize` — example-guided program synthesis

arXiv 1702.06334. Given input/output examples, generates code that produces the mapping. Static-pruned for performance.

$ codemap synthesize --json --quiet --examples '[(1,1),(2,4),(3,9)]'
{"ok":true,"result":{"program":"fn f(x) { x * x }"}}

Use when: spec-by-example, generating boilerplate from samples.

`detect-memory-corruption` — Veritas-light corruption finder

arXiv 2605.15097. Static detection of double-free, use-after-free, buffer overflow.

$ codemap detect-memory-corruption --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"kind":"use_after_free","file":"alloc.c","line":42}]}}

Use when: C/C++ codebase audit for memory-safety bugs.

`neural-decompile` — Decaf decompile-compile-verify

arXiv 2605.11501. Decompiles a binary function via neural model, recompiles, checks semantic equivalence.

$ codemap neural-decompile ./sample.exe --json --quiet --fn 0x401000
{"ok":true,"result":{"decompiled":"int main() { ... }","recompile_match":true}}

Use when: stripped-binary RE, want approximate source.

`patch-binary` — SCRIBE vulnerability-fix recipe synthesizer

arXiv 2605.02121. Given a CVE/vuln location in a binary, generates patch instructions.

$ codemap patch-binary ./vuln.exe --json --quiet --cve CVE-2024-12345
{"ok":true,"result":{"patch_recipe":[{"offset":"0x401050","bytes":"90 90 90"}]}}

Use when: offensive/defensive binary patching when source unavailable.

Composite workflows

`audit` — kitchen-sink security report

See "Data flow & security" section above.

`validate` — sanity check (build + lint + tests + audit summary)

Single composite for "is this repo broken?"

`changeset` — file-grouped diff summary

$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}

`handoff` — generate handoff document for a project

Distills repo state into a single MD doc (status + open issues + recent work + next-steps).

`pipeline` — multi-action pipeline runner

Run several actions in sequence, accumulate results.

$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}

Use when: scripted multi-step analysis.

Architecture (1-paragraph)

codemap walks --dir, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes ~500 actions through a uniform CLI registry (inventory::submit!). Cache: .codemap/cache.bincode next to the scanned dir. Pure static. No daemons, no network access at analysis time.

Repo layout

codemap-core/ — parsing, graph, algorithms, actions
codemap-cli/ — the codemap binary
codemap-napi/ — Node.js bindings (optional)
docs/ — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md
install.sh — single install entry

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,000 Commits
.claude-plugin		.claude-plugin
.github		.github
codemap-cli		codemap-cli
codemap-core		codemap-core
codemap-napi		codemap-napi
docs		docs
examples		examples
plugin		plugin
tests		tests
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
TODO.md		TODO.md
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

codemap

When to reach for codemap

When NOT to reach for codemap

Install (one command)

Verify

How to call any action

Discover actions

Action categories

Output schema

Exit codes

MCP integration

Recipes — when the agent has a specific job to do

Codebase understanding (first-look on an unknown repo)

summary — one-page structural overview

stats — quantitative metrics

layers — architectural layer detection

hotspots — files with most churn × complexity

entry-points — public API surface

health — overall quality summary

Code quality & cleanup

dead-functions — unreachable code

dead-files — files imported nowhere

dead-deps — declared deps never imported

complexity — cyclomatic complexity per function

churn — git change frequency per file

clones — duplicated code blocks

circular — circular import detection

Impact tracing & change analysis

trace — transitive callees (what does X depend on?)

callers — transitive callers (who calls X?)

blast-radius — affected entities from a change

diff — semantic diff between two refs

api-diff — breaking-change classifier

diff-impact — functions affected by a commit range

churn-vs-complexity (via hotspots) — see Codebase understanding above