CodeGraphene

Large Language Models are revolutionizing code completion and vulnerability detection, but they share a critical bottleneck: context limits and noise. You cannot feed an entire repository into an LLM, and naive text-retrieval (RAG) often pulls in irrelevant code that confuses the model.

CodeGraphene solves this through Semantic Compression. It parses codebases into mathematical Code Property Graphs (CPGs), slices out only the structurally and semantically relevant code, and serializes it into LLM-ready prompts.

✨ Key Features

Semantic Compression: Reduce LLM prompt sizes while retaining critical structural context.
Variable Granularity: Analyze code at the LINE, METHOD, or FILE level natively.
Highly Modular Architecture: Hot-swap Parsers, Trimmers, and Serializers to run ablation studies in minutes.
Parser Agnostic: Currently powered by Joern, with an architecture built to easily support Tree-sitter or other AST extractors in the future.

🚀 Quickstart

CodeGraphene is built around a simple, 3-stage pipeline: Parse -> Trim -> Serialize.

from codegraphene import GraphPipeline, NodeGranularity
from codegraphene.parsers.joern import JoernParser
from codegraphene.trimmers.khop import KHopTrimmer
from codegraphene.serializers.text import CodeReconstructionSerializer

# 1. Configure the modular pipeline
pipeline = GraphPipeline(
    parser=JoernParser(granularity=NodeGranularity.LINE),
    trimmer=KHopTrimmer(hops=1),  # Extract a 1-hop structural neighborhood
    serializer=CodeReconstructionSerializer(granularity=NodeGranularity.LINE)
)

# 2. Extract highly-compressed context for a specific target
# (e.g., finding the context around line 30 in a target file)
llm_prompt = pipeline.run(file_path="examples/sample_code.py", target=30)

print(llm_prompt)

📚 Interactive Tutorials

The best way to understand CodeGraphene is to see it in action. We have provided a suite of interactive Jupyter Notebooks in the /examples directory to walk you through the framework.

00_quickstart_pipeline.ipynb
The "Hello World" of CodeGraphene. Learn how the Parser, Trimmer, and Serializer fit together to compress a single Python file.
01_granularities.ipynb
Discover how swapping the NodeGranularity config (LINE, METHOD, FILE) changes the shape of the graph and the resulting LLM prompt.
03_exploring_raw_cpgs.ipynb
Dive under the hood to analyze the raw Code Property Graph output generated by Joern before CodeGraphene filters it.

🛠️ Installation

CodeGraphene relies on Joern for CPG extraction. Because Joern is highly optimized for Unix environments, we strongly recommend running CodeGraphene on Linux or Windows Subsystem for Linux (WSL 2).

1. Install Joern

Ensure you have Java (JDK 11 or 17) and unzip installed, then run the official Joern installer:

curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" | sh

Make sure the directory containing the joern executable is accessible in your system PATH (usually ~/bin).

2. Install CodeGraphene

Clone the repository and install it in editable mode:

git clone https://github.com/stg-tud/CodeGraphene.git
cd CodeGraphene
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

🏗️ Architecture Overview

CodeGraphene enforces strict decoupling to allow researchers to easily experiment with graph-augmented LLM strategies:

Parsers: Ingest raw code and convert it into a standardized CodeGraph (a NetworkX MultiDiGraph). They handle granularity collapsing (e.g., merging AST tokens into LINE or METHOD nodes).
Trimmers: Accept a massive graph and a target node, returning a minimal, context-rich subgraph.
Serializers: Convert the optimized subgraph into a format the LLM can understand, such as reconstructed sequential code text.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
examples		examples
src/codegraphene		src/codegraphene
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeGraphene

✨ Key Features

🚀 Quickstart

📚 Interactive Tutorials

🛠️ Installation

1. Install Joern

2. Install CodeGraphene

🏗️ Architecture Overview

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeGraphene

✨ Key Features

🚀 Quickstart

📚 Interactive Tutorials

🛠️ Installation

1. Install Joern

2. Install CodeGraphene

🏗️ Architecture Overview

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages