Skip to content

sckwokyboom/Call-Graph-For-Java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jcg

jcg builds static Java call graphs, aggregates them across hierarchy levels, runs clustering, and serves an interactive graph UI.

ui_example.png

MVP status

This implementation provides:

  • CLI: analyze, serve, cluster, export
  • Stage-based pipeline with manifests and cache reuse
  • Build-system detection (gradle, maven, fallback)
  • Fast static call graph extraction from source (approximate)
  • Aggregated graphs: method/class/package/module
  • Clustering algorithm names: leiden, louvain, infomap, lpa (with deterministic fallback)
  • UI + REST API for exploration and reclustering
  • LLM retrieval artifacts (context_index.jsonl, cluster summaries, entrypoints)

Install

pip install -e .

or run directly:

PYTHONPATH=src python -m jcg.cli --help

Quickstart

Analyze a local project and serve UI:

jcg analyze --path /path/to/java-repo --mode class --serve --port 8765 --open

Analyze a remote repo URL:

jcg analyze --repo https://github.com/org/project --mode package --cluster-algos leiden,louvain

Serve an existing analysis:

jcg serve --input out/<project_fingerprint>

Recluster an existing graph:

jcg cluster --input out/<project_fingerprint> --level class --algo leiden,louvain --resolution 1.2 --seed 7

Export graphs:

jcg export --input out/<project_fingerprint> --format graphml --level all

Security & Data Egress

jcg is mostly local-first, but not fully offline in all modes.

What stays local:

  • Static extraction, aggregation, clustering, and artifact generation run locally.
  • Analysis outputs are written to local disk under out/<project_fingerprint>/.
  • The built-in API/UI server binds to 127.0.0.1 (localhost) only.
  • This package has no telemetry/analytics dependencies in pyproject.toml.

Where network access can happen:

  • jcg analyze --repo ... runs git clone/fetch/pull against a remote repo.
  • Gradle/Maven build steps may access remote artifact repositories and execute project build logic/plugins.
  • The web UI currently loads Cytoscape from CDN: https://unpkg.com/cytoscape@3.30.2/dist/cytoscape.min.js.

Recommended hardening for internal/company projects:

  • Prefer local source input: jcg analyze --path /path/to/repo.
  • Avoid executing project build scripts when not required: --build-system none.
  • Run in a restricted network environment (egress firewall, sandbox/container) when handling sensitive code.
  • Use Maven/Gradle offline modes and pre-populated local caches if you need build-assisted analysis.
  • Vendor UI assets locally (replace CDN script with a local file) if a fully offline UI is required.

Threat model note:

  • jcg itself does not implement explicit upload of analysis artifacts.
  • If you analyze untrusted repos with build execution enabled, treat Gradle/Maven build scripts/plugins as arbitrary code from a security perspective.

Output layout

Each run writes:

out/<project_fingerprint>/
  run_manifest.json
  analysis_index.json
  stage-01-acquire/
  stage-02-build/
  stage-03-extract/
  stage-04-aggregate/
  stage-05-cluster/
  exports/

Important files:

  • stage-03-extract/method_graph.json
  • stage-04-aggregate/class_graph.json
  • stage-04-aggregate/package_graph.json
  • stage-04-aggregate/module_graph.json
  • stage-05-cluster/clusters/<level>_<algo>_res...json
  • stage-05-cluster/node2cluster/<level>_<algo>_res...json
  • stage-05-cluster/cluster_graph/<level>_<algo>_res...json
  • exports/context_index.jsonl
  • exports/cluster_summaries.json
  • exports/entrypoints.json

Build detection and degraded modes

  • --build-system auto detects Gradle/Maven by project files.
  • If build fails, jcg attempts fallback javac compile.
  • If compilation is not possible, extraction still runs from source roots in degraded mode.
  • run_manifest.json records quality as full, partial, or degraded.

API endpoints

  • GET /api/meta
  • GET /api/graphs?level=class&min_weight=2
  • GET /api/clusters?level=class&algo=louvain&resolution=1.0&seed=42
  • POST /api/cluster/recompute
  • GET /api/node/<id>?level=class
  • GET /api/cluster/<id>?level=class&algo=louvain&resolution=1.0&seed=42
  • GET /api/export/context?cluster_id=c0001

FAQ: call-graph limitations

  • Reflection, dynamic proxies, AOP weaving, and runtime codegen are not fully captured.
  • The default extractor is a fast static approximation from source; it is not a whole-program sound analysis.
  • Framework lifecycle callbacks may be under-approximated.

Recommended settings for large repos

  • Start with --mode package or --mode module.
  • Use filtering (--exclude-packages) to remove utility-heavy namespaces.
  • Raise min edge weight in UI before expanding details.
  • Recluster at aggregated levels first (package then class).

Tests

Run:

PYTHONPATH=src python -m unittest discover -s tests -p 'test_*.py'

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors