Skip to content

charleschenai/codemap

codemap

Static codebase + binary analyzer. One binary, ~500 actions, 18 source languages, sub-second cold-cache on 3K-file repos. No network, no servers, no databases, no API keys.

This README is your system prompt. Designed for AI agents: drop the entire file into your context (or fetch https://raw.githubusercontent.com/charleschenai/codemap/main/README.md) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see docs/HUMAN.md. Everyone else, keep reading.


When to reach for codemap

Problem Codemap action Why codemap (vs alternatives)
"What does this codebase do?" summary --dir <path> Cross-file structural overview in one call. Beats reading files.
"Find unused functions / dead code" dead-functions --dir <path> Call-graph reachability across modules. grep can't do this.
"Who calls function X?" callers --dir <path> X True call graph (AST-aware), not a string match.
"What does function X depend on (transitively)?" trace --dir <path> X Walks the dep graph. grep would only find direct refs.
"What changed between two commits?" diff --dir <path> <ref1> <ref2> Semantic diff, not line diff.
"Find security issues" audit --dir <path> Composite of taint + secret-scan + dep-tree + dead-deps.
"Where would a tainted input flow?" taint --dir <path> --source <fn> --sink <fn> Path-sensitive, sanitizer-aware, alias-aware, cross-procedural.
"Reverse-engineer a binary" bin-info <path/to/binary> PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in.
"Find cross-language coupling" cross-lang --dir <path> Imports/calls that cross language boundaries.
"Natural-language: I don't know which action" natural-query "<question>" --dir <path> Routes to the right action by Levenshtein + LLM (when --llm).

When NOT to reach for codemap

  • Editing files: codemap is read-only. Use Edit/Write directly.
  • Running code: codemap doesn't compile or exec. Use bash.
  • Live process state: codemap is static. Use ps, lsof, ss.
  • Single-file grep: if you know the file, grep is faster.
  • String search across few files: if N<5 files, just grep.

Install (one command)

curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/install.sh | sh

Detects arch (x86_64-linux, aarch64-linux, x86_64-macos, aarch64-macos), downloads matching tarball, installs to ~/.local/bin/codemap. No sudo. To install to /usr/local/bin: append -- --system.

From source: git clone https://github.com/charleschenai/codemap && cd codemap && ./install.sh.

Verify

codemap --version-detail

Prints:

codemap 8.0.1
git: 5c091ec
built: 2026-05-19T12:40:00Z
host: mater (aarch64-unknown-linux-gnu)

If the binary is older than expected, re-run install with --update.


How to call any action

Universal shape:

codemap <ACTION> [TARGET...] --dir <PATH> [--json] [--quiet] [other-flags]
Flag Purpose
--dir <PATH> Required. Repo/dir to scan. Repeatable for multi-repo.
--json Output JSON (parseable). Default is text (human-readable).
--quiet Suppress scan/cache status messages on stderr.
--no-cache Force re-scan, ignore .codemap/cache.bincode.
--include-path <PATH> C/C++ include search path.
--watch [SECS] Re-run every N seconds.

For agents: always use --json and --quiet unless you specifically want text output.

Discover actions

codemap --help                                       # full action list
codemap <action> --help                              # action-specific flags
codemap natural-query "find dead code" --dir <path>  # NL routing

natural-query accepts plain English and returns the top routed action(s). For agents that aren't sure which action to call, this is the primary entry point.


Action categories

~500 actions grouped by purpose. Full catalog at docs/ACTION_CATALOG.md. High-level groups:

Category Action count Examples
Analysis ~20 summary, stats, trace, callers, hotspots, layers, health, decorators
Code intelligence ~30 complexity, import-cost, churn, api-diff, clones, entry-points, dead-functions
Dataflow / security ~15 data-flow, taint, slice, trace-value, sinks, secret-scan, audit, dep-tree
Graph theory ~40 pagerank, hubs, bridges, centrality (17 measures), community (Leiden), bellman-ford
Binary / RE ~150 elf-info, pe-imports, macho-info, bin-search, bin-disasm, bin-strings, bin-relocs
Schemas ~10 proto-schema, openapi-schema, graphql-schema, sql-extract, dbf-schema
Supply chain ~10 osv-scan, sbom-diff, license-check, cve-scan
Config-as-code ~10 k8s-scan, iac-scan, dockerfile-scan, ci-scan, oci-scan
ML / AI ~10 gguf-info, safetensors-info, onnx-info, cuda-info, pyc-info
LSP bridge ~5 lsp-symbols, lsp-references, lsp-calls, lsp-diagnostics, lsp-types
Web ~5 web-sitemap, js-api-extract (HAR/HTML input required)
Cross-language ~5 lang-bridges, gpu-functions, monkey-patches
Composite ~10 audit, compare, validate, changeset, handoff, pipeline
arXiv-derived 15 symex-concolic, pointer-analysis, abstract-interp, bin-search, loop-polyhedral, gpu-analyze, side-channel-detect, semantic-slice, symex-speculative, cegio, natural-query, synthesize, detect-memory-corruption, neural-decompile, patch-binary

Output schema

All --json outputs follow:

{
  "ok": <boolean>,
  "action": "<action-name>",
  "dir": "<scanned-path>",
  "result": <action-specific>,
  "stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}

result shape varies per action. Action-specific schemas in docs/SCHEMAS.md.

Exit codes

Code Meaning Agent response
0 Success Parse --json output
1 Usage error (bad flag, missing --dir) Re-read --help, fix args, retry
2 I/O error (path not found, no read perm) Verify path, retry
101 Panic Do not retry. File a bug at https://github.com/charleschenai/codemap/issues

Other non-zero codes: action-specific. See <action> --help.


MCP integration

codemap ships an MCP server for Claude Code agents:

{
  "mcpServers": {
    "codemap": {
      "command": "codemap",
      "args": ["mcp-stdio"]
    }
  }
}

Exposes the full action surface as MCP tools. Tool names match action names; args match CLI flags.


Recipes — when the agent has a specific job to do

Each recipe: what the action doescommandsample outputwhen to use it.

For the complete flat list of action names see docs/ACTION_CATALOG.md.


Codebase understanding (first-look on an unknown repo)

summary — one-page structural overview

Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.

$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
  "entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}

Use when: new repo, "tell me what this does" before diving deeper.

stats — quantitative metrics

Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.

$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}

Use when: comparing repos by size, reporting metrics, sanity-checking parse coverage.

layers — architectural layer detection

Infers boundaries (web / service / data / infra) from import patterns + naming conventions.

$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
  {"name":"data","modules":["models","repo"]}],"violations":[...]}}

Use when: validating that "web shouldn't import from data" type architectural rules hold.

hotspots — files with most churn × complexity

Surfaces "danger zone" code (high git churn + high cyclomatic complexity).

$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}

Use when: prioritizing refactor work, finding "where bugs live."

entry-points — public API surface

Lists exported functions/classes that other code can call from outside.

$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}

Use when: API documentation, understanding what's a stable contract.

health — overall quality summary

Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.

$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}

Use when: quick "should we touch this codebase or not" gut-check.


Code quality & cleanup

dead-functions — unreachable code

Functions never called by any other function in the workspace.

$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}

Use when: cleanup PR, removing tech debt. Don't use for: identifying entry points (they're "dead" by call-graph but intentionally public).

dead-files — files imported nowhere

Files no other file imports / uses.

$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}

Use when: dead-import cleanup.

dead-deps — declared deps never imported

Packages in Cargo.toml/package.json/pyproject.toml that no source file imports.

$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}

Use when: dep cleanup, reducing build time + attack surface.

complexity — cyclomatic complexity per function

McCabe complexity (branches+1). Catches "this function should be split."

$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}

Use when: finding refactor candidates, code review automation.

churn — git change frequency per file

Commits-touching-file count over a window.

$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}

Use when: combined with complexity for hotspots, ownership analysis.

clones — duplicated code blocks

Detects near-identical token sequences across files (copy-paste detection).

$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}

Use when: finding extraction candidates for shared functions.

circular — circular import detection

Reports module cycles (a → b → c → a).

$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}

Use when: untangling architecture before a refactor.


Impact tracing & change analysis

trace — transitive callees (what does X depend on?)

Walks the call graph forward from a function/symbol, returns full dep tree.

$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
  {"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
  {"name":"format_money","file":"util/money.go:8","depth":2}]}}

Use when: impact analysis before changing a function, generating context for an LLM.

callers — transitive callers (who calls X?)

Reverse of trace. Returns the function's call sites + their callers.

$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}

Use when: "if I change this signature, what breaks?"

blast-radius — affected entities from a change

Combines callers + dataflow + tests touched. Most pessimistic estimate.

$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}

Use when: "what's the size of changing this thing?"

diff — semantic diff between two refs

Function-level diff: added, removed, signature-changed, body-changed.

$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
  "signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}

Use when: generating PR descriptions, understanding code review scope.

api-diff — breaking-change classifier

Like diff but specifically flags BREAKING vs additive changes to public API.

$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
  {"kind":"removed","fn":"OldAPI::v1_login"},
  {"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}

Use when: versioning decisions (semver minor vs major), CHANGELOG generation.

diff-impact — functions affected by a commit range

Maps the diff to every transitively-affected caller.

$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}

Use when: deciding test scope for a PR.

churn-vs-complexity (via hotspots) — see Codebase understanding above


Data flow & security

audit — composite security report

Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.

$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
  {"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
  {"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
  {"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

Use when: first-pass security review of an unfamiliar repo.

taint — path-sensitive taint flow

Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. safe = sanitize(x)), cross-procedural (parses wrapper bodies to detect hidden sanitizers).

$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
  "hops":["params.id","userId","query"],"sanitized":false}]}}

Use when: SQLi/XSS/SSRF detection, "is user input reaching this sink?"

slice — backward program slice

Given a target variable/sink, return only the code that influences it.

$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}

Use when: narrowing what to read when chasing a bug.

sinks — list all dangerous sinks

Enumerates every db.execute, eval, exec, Runtime.exec, subprocess.shell=True, innerHTML=, etc.

$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}

Use when: building taint queries, audit checklist generation.

secret-scan — credentials in source

20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.

$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}

Use when: pre-commit hook, pre-publish audit.

data-flow — value origin tracing

Where does this variable's value come from? (def-use chain)

$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}

Use when: "where does this magic value come from?"

api-surface — every exported HTTP endpoint

Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.

$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}

Use when: generating OpenAPI from existing code, finding unauthenticated endpoints.


Graph algorithms (heterogeneous-graph queries)

These run on codemap's internal call graph + import graph + AST graph.

pagerank — most-important nodes

NetworkX-style PageRank. High score = central + many incoming refs.

$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}

Use when: finding "load-bearing" functions, prioritizing code review.

hubs — high-out-degree nodes

Functions/modules that depend on many others. Different from PageRank (which is about incoming).

$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}

Use when: finding god-objects, refactor targets.

bridges — single-edge cut points

Edges whose removal disconnects the graph. These are critical paths.

$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}

Use when: identifying single points of failure in module coupling.

centrality (17 measures) — broker / connector detection

Run with a specific measure: betweenness, eigenvector, katz, closeness, harmonic, load, structural-holes (brokers), voterank, etc. All NetworkX standards.

$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}

Use when: finding modules that connect otherwise-separate subsystems.

clusters — community detection (Leiden default)

Partitions the graph into densely-connected sub-communities.

$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}

Use when: discovering implicit module boundaries.

paths — shortest path between two nodes

Returns the chain of imports/calls connecting source → target.

$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}

Use when: "how does X reach Y?"

subgraph — extract a focused subgraph

Returns nodes within N hops of a target. Useful before deep analysis.

$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}

Use when: narrowing scope before more expensive analysis.

bellman-ford <src> / astar <src> <tgt> / floyd-warshall / etc.

Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.


Binary analysis & reverse engineering

bin-info / elf-info / macho-info / pe-info — binary fingerprint

Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.

$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
  "sections":34,"anti_debug":[],"packed":false}}

Use when: triage step 1 — "what is this binary?"

bin-search — cross-binary function matching

(arXiv 2507.15226 TinyLFU) Embedding-based + bucketed dedup. Finds functions shared across two binaries even when symbols are stripped.

$ codemap bin-search --json --quiet --left ./malware-a --right ./malware-b
{"ok":true,"result":{"shared":[{"fn":"hash_block","conf":0.97}],"only_left":[...],"only_right":[...]}}

Use when: malware family detection, identifying shared code across stripped binaries, version comparison.

pe-imports / pe-exports — Windows PE import/export tables

Lists every DLL imported + every function exported.

$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}

Use when: static behavioral profiling — what APIs does this binary depend on?

pe-strings / bin-strings — string extraction

Ascii + utf16le + entropy-filtered.

$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}

Use when: triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.

binary-diff — semantic binary diff

Functions added / removed / modified between two builds.

$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}

Use when: patch analysis, regression hunting in firmware.

dotnet-meta — .NET assembly metadata

PE that contains CLI/.NET — reads the metadata streams, lists types + methods.

$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}

Use when: analyzing .NET malware or .NET 3rd-party libs.

java-class — JVM class file

Constant pool, method signatures, bytecode summaries.

wasm-info — WebAssembly module

Imports, exports, function table, memory layout.


Schemas & config-as-code

openapi-schema / graphql-schema / proto-schema — extract API schemas

Parses spec files and reports endpoints/types/operations.

$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}

Use when: generating client code, checking spec consistency.

k8s-scan — Kubernetes CIS audit (16 rules)

Checks privileged containers, hostNetwork, missing resource limits, etc.

$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}

Use when: auditing manifests before apply.

iac-scan — Terraform/CloudFormation/Pulumi audit (12 rules)

$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}

dockerfile-scan — Dockerfile audit (10 rules)

$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}

ci-scan — CI/CD pipeline audit (37 rules across 6 ecosystems)

GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, pull_request_target misuse.

$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}

oci-scan — OCI image / docker save tarball audit

Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.

$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}

sql-extract — SQL DDL/DML extraction

Pulls SQL out of source code or .sql files. Schema + queries.

$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}

Supply chain

osv-scan — match deps against OSV.dev advisories (offline)

Semver-range-aware.

$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

sbom-diff — CycloneDX/SPDX diff

Added, removed, upgraded, downgraded packages between two SBOMs.

$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}

license-check — SPDX compatibility

Per-package license + compatibility verdict.

$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}

cve-scan — same as osv-scan but specifically against MITRE CVE corpus


ML / AI model files

gguf-info — llama.cpp GGUF inspection

Architecture, layer count, head count, quant level, vocab size.

$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}

Use when: "what model is this file?" Pre-load sanity check.

safetensors-info — HuggingFace safetensors inspection

Tensor shapes, dtypes, total params.

$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}

onnx-info — ONNX model graph

Operators, inputs, outputs, opset.

$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}

cuda-info — CUDA fatbin/cubin inspection

SM versions present, kernel symbols.

pyc-info — Python bytecode inspection

Magic number, marshalled code object, imports.


Cross-language & web

lang-bridges — FFI/binding detection

Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.

$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}

gpu-functions — GPU kernels in source

CUDA __global__, OpenCL kernels, Metal compute kernels, ROCm/HIP.

$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}

monkey-patches — runtime mutation detection

obj.method = new_fn, setattr, prototype patching.

dispatch-map — generic dispatch tables

Routers, registries, plugin maps. Finds the "switch statement that controls behavior."

web-sitemap — sitemap.xml + crawled link graph

js-api-extract — extract API calls from HAR / JS source


LSP bridge (requires a running language server)

lsp-symbols — workspace symbol table from LSP

Real symbol info, not AST-inferred. More accurate for typed languages.

lsp-references — every reference to a symbol (LSP-grade)

lsp-calls — call hierarchy from LSP

lsp-diagnostics — current LSP diagnostics across the workspace

$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}

Use when: programmatic access to compiler/type-checker errors.

lsp-types — type info on hover for a position


arXiv-derived research actions (advanced)

These implement specific research papers. Each works in MVP-scaffold form — verifying the integration points and graph data; full paper-grade results may need additional flags or tuning.

natural-query — plain-English action router

arXiv 2301.04862. Maps NL questions to codemap actions via Levenshtein + (optionally) LLM router.

$ codemap natural-query "find functions that handle authentication" --dir ./my-repo --json --quiet
{"ok":true,"result":{"routed_to":"callers","args":{"target":"login|auth|signin"}}}

Use when: the agent doesn't know which action to call. Always-safe entry point.

symex-concolic — concolic execution driver

arXiv 1205.4951. Combines concrete + symbolic execution. Drives test inputs by negating path conditions to explore new branches.

$ codemap symex-concolic --dir ./my-repo --json --quiet --target validate_input
{"ok":true,"result":{"paths":[{"condition":"x > 0","example_input":"x=1"}]}}

Use when: generating test inputs that achieve branch coverage on a target function.

pointer-analysis — Andersen field-sensitive PA

Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.

$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
  "aliases":[{"ptr":"p","may_alias":["a","b"]}]}}

Use when: understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.

abstract-interp — composable abstract-domain analyzer

arXiv 1309.5133. Computes invariants like "x is positive" over abstract states. Sign + parity domains shipped; user-pluggable.

$ codemap abstract-interp --dir ./my-repo --json --quiet --target check_bounds
{"ok":true,"result":{"invariants":[{"var":"i","sign":"pos","parity":"any"}]}}

Use when: proving safety properties (overflow-free arithmetic, non-null pointers).

loop-polyhedral — polyhedral iteration-domain classifier

Feautrier 1996 / Bondhugula 2008. Classifies loops as affine / non-affine / parallelizable / vectorizable.

$ codemap loop-polyhedral --dir ./my-repo --json --quiet
{"ok":true,"result":{"loops":[{"file":"matmul.c","line":12,"class":"affine","parallel":true}]}}

Use when: identifying loop-optimization opportunities before manual vectorization.

gpu-analyze — CUDA kernel triage

arXiv 2604.14825 Nautilus. Memory-bound vs compute-bound vs warp-divergence triage on CUDA kernels.

$ codemap gpu-analyze --dir ./kernels --json --quiet
{"ok":true,"result":{"kernels":[{"name":"gemm","class":"compute-bound","warp_divergence":"low"}]}}

Use when: GPU kernel optimization priority (don't tune memory if compute-bound, etc.).

side-channel-detect — speculative-execution / cache-oracle finder

arXiv 2301.03724. Detects code patterns vulnerable to Spectre-class timing attacks (branch-on-secret + dependent memory access).

$ codemap side-channel-detect --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":"crypto.c","line":48,"kind":"branch_on_secret"}]}}

Use when: auditing crypto / privileged code for timing leaks.

semantic-slice — LLM-augmented backward slice

arXiv 2507.18957 SLICEMATE. Static slice + LLM refinement.

$ codemap semantic-slice --dir ./my-repo --json --quiet --var 'auth_token'
{"ok":true,"result":{"slice":[...],"llm_refinement":"sanitization missing on line 88"}}

Use when: chasing a bug — narrow the code that influences a sink with LLM help.

symex-speculative — speculative-decoding symex

arXiv 2203.16487. Faster symbolic execution via draft-model speculation.

$ codemap symex-speculative --dir ./my-repo --json --quiet --target parse
{"ok":true,"result":{"paths_explored":42,"speculation_accept_rate":0.71}}

Use when: faster symex when willing to trade some completeness for speed.

cegio — counterexample-guided inductive optimization

arXiv 1704.03738. Given taint paths, synthesizes the minimum input that triggers a vulnerability.

$ codemap cegio --dir ./my-repo --json --quiet --taint-result <prior-taint-output>
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}

Use when: turning a taint finding into a proof-of-concept exploit input.

synthesize — example-guided program synthesis

arXiv 1702.06334. Given input/output examples, generates code that produces the mapping. Static-pruned for performance.

$ codemap synthesize --json --quiet --examples '[(1,1),(2,4),(3,9)]'
{"ok":true,"result":{"program":"fn f(x) { x * x }"}}

Use when: spec-by-example, generating boilerplate from samples.

detect-memory-corruption — Veritas-light corruption finder

arXiv 2605.15097. Static detection of double-free, use-after-free, buffer overflow.

$ codemap detect-memory-corruption --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"kind":"use_after_free","file":"alloc.c","line":42}]}}

Use when: C/C++ codebase audit for memory-safety bugs.

neural-decompile — Decaf decompile-compile-verify

arXiv 2605.11501. Decompiles a binary function via neural model, recompiles, checks semantic equivalence.

$ codemap neural-decompile ./sample.exe --json --quiet --fn 0x401000
{"ok":true,"result":{"decompiled":"int main() { ... }","recompile_match":true}}

Use when: stripped-binary RE, want approximate source.

patch-binary — SCRIBE vulnerability-fix recipe synthesizer

arXiv 2605.02121. Given a CVE/vuln location in a binary, generates patch instructions.

$ codemap patch-binary ./vuln.exe --json --quiet --cve CVE-2024-12345
{"ok":true,"result":{"patch_recipe":[{"offset":"0x401050","bytes":"90 90 90"}]}}

Use when: offensive/defensive binary patching when source unavailable.


Composite workflows

audit — kitchen-sink security report

See "Data flow & security" section above.

validate — sanity check (build + lint + tests + audit summary)

Single composite for "is this repo broken?"

changeset — file-grouped diff summary

$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}

handoff — generate handoff document for a project

Distills repo state into a single MD doc (status + open issues + recent work + next-steps).

pipeline — multi-action pipeline runner

Run several actions in sequence, accumulate results.

$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}

Use when: scripted multi-step analysis.


Architecture (1-paragraph)

codemap walks --dir, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes ~500 actions through a uniform CLI registry (inventory::submit!). Cache: .codemap/cache.bincode next to the scanned dir. Pure static. No daemons, no network access at analysis time.

Repo layout

  • codemap-core/ — parsing, graph, algorithms, actions
  • codemap-cli/ — the codemap binary
  • codemap-napi/ — Node.js bindings (optional)
  • docs/ — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md
  • install.sh — single install entry

License

MIT. See LICENSE.

About

Codebase dependency analysis — 25 actions including PageRank, HITS, bridges, clusters, blast radius, and more. Zero deps. Single file. <500ms.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages