Implementation of classical CS algorithms in Lean 4.
There are some very simple unit tests (for sanity check only, as in the future formal proofs will be added). To run them, just run
lake exe testWe compared this Lean 4 implementation against a standard C++ implementation.
Test Environment:
- Input Text: "War and Peace" (approx. 3.2 MB, ASCII filtered)
- Patterns: 1,000 substrings of length 4 generated from the text.
- Metric: Total time and peak memory usage to build the automaton and scan the text.
| Implementation | Matches | Time (s) | Peak Memory | Relative Speed | Relative Memory |
|---|---|---|---|---|---|
| C++ (std::unordered_map) | 496,029 | 0.079s | 9.4 MB | 1.0x (Baseline) | 1.0x |
| Lean 4 (HashMap) | 496,029 | 0.234s | 172.2 MB | 2.96x | 18.2x |
Analysis:
- Speed: Lean performs exceptionally well, trailing optimized C++ by only ~3x. This demonstrates the efficiency of Lean's compiler and reference-counting optimizations for in-place updates.
- Memory: The higher memory usage in Lean is expected due to the overhead of immutable persistent data structures (HAMTs), boxed integers, and the garbage collection runtime, compared to C++'s flat memory layout.
- For the C++ code
g++ -O3 bench.cpp -o bench.out && /usr/bin/time -l ./bench.out rm -rf bench.out - For the lean code
lake build bench && /usr/bin/time -l ./.lake/build/bin/bench