PtrTrans is a C-to-Rust translation framework that leverages Knowledge Graph (KG) construction and SVF-based pointer analysis to generate safe, idiomatic Rust code from C projects. Unlike naive LLM-based translation, PtrTrans integrates static analysis resultsβincluding pointer ownership, mutability, nullability, and aliasing informationβinto the translation prompts, enabling the LLM to produce Rust code that respects Rust's ownership and borrowing rules.
PtrTrans-C2Rust/
βββ README.md
βββ dataset/
β βββ crown_dataset/ # Source C projects from Crown benchmark
β β βββ avl/ # AVL tree implementation
β β βββ binn/ # Binary serialization library
β β βββ bst/ # Binary search tree
β β βββ buffer/ # Buffer management
β β βββ bzip2/ # bzip2 compression
β β βββ genann/ # Neural network library
β β βββ heman/ # Heightmap utilities
β β βββ ht/ # Hash table
β β βββ json_h/ # JSON parser
β β βββ libtree/ # Tree data structure
β β βββ libzahl/ # Big integer library
β β βββ lil/ # Scripting language
β β βββ lodepng/ # PNG encoder/decoder
β β βββ quadtree/ # Quadtree spatial index
β β βββ rgba/ # RGBA image processing
β β βββ urlparser/ # URL parser
β βββ parsed_projects/ # Pre-parsed project metadata (entities, relationships, call graphs)
β βββ Trans_C-Rust-KG/ # Translation results with full KG + pointer analysis (PtrTrans)
β βββ Trans_not_PA/ # Ablation: translation without pointer analysis
β βββ Trans_not_PU/ # Ablation: translation without pointer-usage context
β βββ Trans_not_RA/ # Ablation: translation without Rust-oriented annotation
βββ script/
βββ main.py # Main entry point for the translation pipeline
βββ generator.py # LLM API wrapper (OpenAI GPT-4 / local models via HuggingFace)
βββ translator.py # Prompt construction and response extraction for translation
βββ handcraftPrompt.py # All prompt templates for translation, error fixing, etc.
βββ KG_construction.py # Knowledge Graph construction pipeline
βββ slicer.py # Code slicing and call graph extraction via Tree-sitter + LSP
βββ SA/ # Static Analysis module (SVF-based)
β βββ backup/
β βββ PA_func.cpp # SVF pointer analysis for function parameters
β βββ PA_struct.cpp # SVF pointer analysis for struct fields
β βββ run.sh # Build & run script for SVF analysis
βββ utils/
βββ c_parser.py # C code parsing utilities (Tree-sitter)
βββ rust_parser.py # Rust code parsing utilities (Tree-sitter)
βββ doxygen_extractor.py# Doxygen XML parser for call graph extraction
βββ macro_expand.py # Macro expansion via Clang preprocessor
βββ header_extractor.py # Header file content extraction
βββ extract_cf.py # Control flow extraction
βββ git_manage.py # Git state management for rollback on failure
βββ misc_utils.py # Miscellaneous utilities
βββ Doxyfile # Doxygen configuration template
| Dependency | Version | Purpose |
|---|---|---|
| LLVM/Clang | 14.0.6 | C-to-LLVM-IR compilation, macro expansion |
| SVF | 2.9 | Static Value-Flow Analysis for pointer analysis |
| Doxygen | β₯ 1.9 | Call graph extraction from C source code |
| jsoncpp | (system package) | JSON output from SVF analysis programs |
| z3 | (system package) | SMT solver required by SVF |
| GCC | β₯ 9.0 | Compilation verification |
| pkg-config | (system package) | Build configuration for jsoncpp |
| Package | Version | Purpose |
|---|---|---|
| Python | β₯ 3.9 | Runtime |
| tree-sitter | 0.20.1 | C and Rust source code parsing |
| openai | (legacy API, v0.x) | OpenAI GPT API access |
| tiktoken | β₯ 0.5 | Token counting for GPT models |
| transformers | β₯ 4.30 | Local LLM support (LLaMA, etc.) |
| torch | β₯ 2.0 | PyTorch backend for local models |
| langchain | β₯ 0.1 | Prompt template formatting |
| tqdm | β₯ 4.60 | Progress bars |
| monitors4codegen | (multilspy) | Language Server Protocol client for code navigation |
Ubuntu/Debian:
# LLVM 14
sudo apt-get install clang-14 llvm-14 llvm-14-dev llvm-14-tools
# Doxygen
sudo apt-get install doxygen
# jsoncpp and z3
sudo apt-get install libjsoncpp-dev libz3-dev
# pkg-config
sudo apt-get install pkg-configSVF must be built from source and placed under dependencyLib/SVF-SVF-2.9:
# Download SVF 2.9
wget https://github.com/SVF-tools/SVF/archive/refs/tags/SVF-2.9.tar.gz
tar xzf SVF-2.9.tar.gz
mv SVF-SVF-2.9 dependencyLib/
# Build SVF
cd dependencyLib/SVF-SVF-2.9
./build.shAfter building, ensure the following files exist:
dependencyLib/SVF-SVF-2.9/Release-build/svf/libSvfCore.adependencyLib/SVF-SVF-2.9/Release-build/svf-llvm/libSvfLLVM.a
pip install tree-sitter==0.20.1 openai tiktoken transformers torch langchain tqdm
pip install monitors4codegenTree-sitter language parsers need to be pre-built and placed under dependencyLib/:
# Build C parser
git clone https://github.com/tree-sitter/tree-sitter-c.git
cd tree-sitter-c
# Build the shared library (c_parser.so) and place it in dependencyLib/Edit script/generator.py and set your OpenAI API key:
OPENAI_API_KEY = "your-api-key-here"
openai.api_base = "https://api.openai.com/v1" # or your proxy endpointcd script
# Run PtrTrans (full pipeline with KG + pointer analysis)
python main.py --translate_mode Trans_PA --model_name gpt-4o-2024-11-20
# Run LLM-only baseline (no program analysis)
python main.py --translate_mode LLM_only --model_name gpt-4o-2024-11-20
# Ablation: without pointer-usage context
python main.py --translate_mode Trans_not_PU --model_name gpt-4o-2024-11-20
# Ablation: without Rust-oriented annotation
python main.py --translate_mode Trans_not_RA --model_name gpt-4o-2024-11-20| Mode | Description |
|---|---|
Trans_PA |
Full PtrTrans: KG construction + SVF pointer analysis + Rust annotation |
LLM_only |
Baseline: LLM translation without any program analysis |
Trans_not_PU |
Ablation: no pointer-usage context in prompts |
Trans_not_RA |
Ablation: no Rust-oriented annotation in prompts |
| Argument | Default | Description |
|---|---|---|
--model_name |
gpt-4o-2024-11-20 |
LLM model name (GPT-4, GPT-3.5, or local model) |
--model_path |
"" |
Path to local model weights (for non-GPT models) |
--root_dir |
../Code_Package |
Root directory of the project |
--translate_mode |
Trans_not_RA |
Translation mode (see table above) |
The translation pipeline consists of the following stages:
-
Macro Expansion (
KG_construction.pyβmacro_expand.py)- Expands C macros using Clang preprocessor
- Tags system vs. local code origins
-
Knowledge Graph Construction (
KG_construction.pyβdoxygen_extractor.py)- Extracts entities (functions, structs, enums, variables) via Doxygen
- Builds call graph relationships
- Performs topological sort for translation ordering
-
SVF Pointer Analysis (
SA/backup/run.shβPA_func.cpp,PA_struct.cpp)- Compiles C source to LLVM IR via
clang-14 - Links all IR files via
llvm-link-14 - Runs SVF-based analysis for:
- Ownership inference (Owning vs. Borrowed)
- Mutability analysis (Mutable vs. Immutable)
- Nullability detection
- Alias analysis between function parameters
- Struct field usage patterns
- Compiles C source to LLVM IR via
-
LLM-Based Translation (
translator.pyβgenerator.py)- Translates code units in topological order (callees before callers)
- Injects pointer analysis results into translation prompts
- Handles
freeoperations (memory deallocation β Rust ownership)
-
Compilation Verification & Repair (
main.py)- Verifies translated Rust code compiles (
cargo build) - Iterative error-fixing loop (up to 5 attempts)
- Git-based rollback on persistent failures
- Stub generation as fallback
- Verifies translated Rust code compiles (
The benchmark uses 16 C projects from the Crown dataset, covering diverse domains:
| Project | Domain |
|---|---|
| avl | AVL tree data structure |
| binn | Binary serialization |
| bst | Binary search tree |
| buffer | Buffer management |
| bzip2 | Data compression |
| genann | Neural network |
| heman | Heightmap processing |
| ht | Hash table |
| json_h | JSON parsing |
| libtree | Tree data structure |
| libzahl | Big integer arithmetic |
| lil | Scripting language interpreter |
| lodepng | PNG image codec |
| quadtree | Spatial index |
| rgba | Image color processing |
| urlparser | URL parsing |