LLVM-based compiler project for inserting and optimizing array bounds checks in C programs.
This project implements an LLVM IR transformation pipeline that detects array accesses, inserts runtime lower-bound and upper-bound checks, and then removes redundant checks using compiler optimization techniques inspired by Gupta and Wolfe.
Full project report: report.pdf
Array bounds checking improves memory safety by ensuring that array accesses stay within valid bounds. However, inserting checks before every array access can introduce significant runtime overhead, especially in loop-heavy programs.
This project studies that tradeoff by implementing:
- automatic array bounds-check insertion in LLVM IR
- Gupta-style redundant bounds-check elimination
- Wolfe-style implication-aware optimization using PRE-style reasoning
- benchmark evaluation using runtime, code size, and number of checks removed
The project was built as part of Georgia Tech CS 6241: Compiler Design.
The compiler pipeline consists of four LLVM passes:
ArrayAccessDetectionBoundCheckInsertionBoundCheckOptimizationValueMetadataRemoval
The high-level optimized pass pipeline is:
mem2reg → access-det → check-ins → check-opt → valuemd-rem
For the unoptimized bounds-checking baseline, the optimization pass is skipped:
mem2reg → access-det → check-ins → valuemd-rem
LLVM lowers C array accesses into pointer arithmetic and memory operations. For example:
int a[10];
a[5] = 1;is represented in LLVM IR using alloca, getelementptr, and store instructions:
%3 = alloca [10 x i32], align 16
%12 = getelementptr inbounds [10 x i32], ptr %3, i64 0, i64 5
store i32 1, ptr %12, align 4The ArrayAccessDetection pass identifies potential array accesses and attaches metadata to the corresponding getelementptr instructions. The BoundCheckInsertion pass then uses this metadata to insert two runtime checks before each detected memory operation:
- a lower-bound check
- an upper-bound check
Each inserted check records metadata such as:
- check kind
- array kind
- access ID
- source line
- load/store information
The main optimization logic is implemented in BoundCheckOptimization.cpp.
The optimization mode is selected using the BC_METHOD environment variable.
Gupta mode implements redundancy elimination using dataflow analysis and symbolic reasoning over checks.
It performs:
- local redundancy elimination inside each basic block
- canonical affine check representation, such as
Base + Offset - MCFG-based dataflow analysis
- very-busy check computation
- available check computation
- global check modification
- redundant check elimination
- loop propagation followed by a second elimination pass
Run with:
BC_METHOD=gupta ./install/run_pass.sh <benchmark>Wolfe mode implements an implication-aware optimization strategy inspired by Wolfe’s method for array bounds-check optimization.
It performs:
- ScalarEvolution-based check canonicalization
- grouping checks into families
- construction of a Wolfe-style Check Implication Graph
- anticipatable check analysis
- available check analysis
- check strengthening
- loop-entry check hoisting into loop preheaders
- final redundancy elimination
Run with:
BC_METHOD=wolfe ./install/run_pass.sh <benchmark>From the root of the repository:
mkdir build
cmake \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_INSTALL_PREFIX=./install \
-DCMAKE_BUILD_TYPE=Debug \
-B build \
-S . \
-G Ninja
cd build
ninja installAfter the build completes, the installed scripts, passes, and benchmarks will be available under:
install/
To run the bounds-checking pipeline on a benchmark:
./install/run_pass.sh <benchmark>To select a specific optimization mode:
BC_METHOD=gupta ./install/run_pass.sh <benchmark>or:
BC_METHOD=wolfe ./install/run_pass.sh <benchmark>The script generates transformed LLVM bitcode files and corresponding human-readable LLVM IR files.
The project includes both microbenchmarks and larger array-intensive benchmarks.
check_elimination
check_modification
global_1d_array
malloc_1d_array
static_1d_array
bfs
dither
is
jacobi-1d
The microbenchmarks are useful for testing correctness of individual optimization cases. The larger benchmarks are used to evaluate performance impact, code-size reduction, and bounds-check removal on more realistic programs.
After building the repository, benchmarks are installed under:
install/benchmark
The transformed bitcode can be linked with the runtime bounds-checking functions to produce executables. For large benchmarks, input details are available in the corresponding RUN.md files under:
benchmark/large_benchmark/<BENCHMARK_NAME>/RUN.md
The project compares four variants:
- original program
- unoptimized bounds-checked program
- Gupta-optimized bounds-checked program
- Wolfe-optimized bounds-checked program
The evaluation measures:
- median runtime
- LLVM bitcode size
- number of bounds checks before optimization
- number of bounds checks after optimization
- percentage of bounds checks removed
Runtime was measured using hyperfine.
For microbenchmarks:
hyperfine --shell=none --warmup 100 --runs 1000 ./benchmarkFor larger benchmarks:
hyperfine --shell=none --warmup 10 --runs 100 ./benchmarkGupta optimization was the strongest overall optimization mode in the final evaluation.
Across the benchmark suite:
- Gupta removed 199 of 260 checks, or 76.5%
- Wolfe removed 104 of 260 checks, or 40.0%
- Gupta reduced code size by 5.89%
- Wolfe reduced code size by 3.32%
- Gupta was faster on 6 of 9 benchmarks
The largest improvements appeared in loop-heavy benchmarks such as jacobi-1d, is, and dither.
In jacobi-1d, Gupta removed all inserted bounds checks and improved runtime by roughly 88% relative to the unoptimized bounds-checking baseline.
The cost of bounds checking depends heavily on where checks are inserted. In small microbenchmarks, runtime differences were often close to measurement noise. In larger loop-heavy programs, redundant checks appeared on hot paths, so eliminating them produced much larger performance improvements.
The results also show that static check removal does not always translate directly into runtime improvement. Some benchmarks had fewer checks and smaller code size but little speedup, likely because the removed checks were not on the dominant execution path.
The current Wolfe implementation is more conservative than the Gupta implementation and does not yet capture all profitable loop-hoisting and elimination opportunities.
Spill-code statistics were not available in the evaluation setup, so performance regressions are discussed only in terms of observed runtime, check counts, and code-size changes.
Useful LLVM references:
- LLVM Language Reference Manual
- GetElementPtr Instruction
- The Often Misunderstood GEP Instruction
- LLVM Programmer’s Manual
Useful LLVM instructions for this project include:
getelementptrallocaloadstoreinvoke
- Rajiv Gupta, “Optimizing array bound checks using flow analysis”
- Michael Wolfe, bounds-check optimization using PRE-style reasoning
hyperfine
- Sai Hemanth Bheemreddy
- Rishi Khare