High-precision performance measurement tools for Rust applications requiring nanosecond-level timing accuracy.
Add to your Cargo.toml:
[dependencies]
hft-benchmarks = { path = "../path/to/hft-benchmarks" }Simple benchmark:
use hft_benchmarks::*;
fn main() {
quick_calibrate_tsc_frequency();
SimpleBench::new("my_function")
.bench(1000, || my_expensive_function())
.report();
}Output:
my_function: 1000 samples, mean=245ns, p50=230ns, p95=310ns, p99=450ns, p99.9=890ns, std_dev=45.2ns
use hft_benchmarks::*;
// One-time setup (do this once at program start)
calibrate_tsc_frequency();
// Time a single operation
let (result, elapsed_ns) = time_function(|| {
expensive_computation()
});
println!("Operation took {}ns", elapsed_ns);// Collect multiple measurements for statistical analysis
let mut results = BenchmarkResults::new("algorithm_comparison".to_string());
for _ in 0..1000 {
let timer = PrecisionTimer::start();
your_algorithm();
results.record(timer.stop());
}
let analysis = results.analyze();
println!("{}", analysis.summary());
// Check if performance meets requirements
if analysis.meets_target(100) { // P99 < 100ns
println!("✓ Performance target met");
} else {
println!("✗ Too slow: P99 = {}ns", analysis.p99);
}use hft_benchmarks::*;
fn main() {
quick_calibrate_tsc_frequency();
// Benchmark old implementation
let old_perf = SimpleBench::new("old_algorithm")
.bench(5000, || old_implementation())
.analyze();
// Benchmark new implementation
let new_perf = SimpleBench::new("new_algorithm")
.bench(5000, || new_implementation())
.analyze();
// Calculate improvement
let speedup = old_perf.mean as f64 / new_perf.mean as f64;
println!("New implementation is {:.1}x faster", speedup);
println!("Old: {}ns P99, New: {}ns P99", old_perf.p99, new_perf.p99);
}use hft_benchmarks::*;
fn main() {
quick_calibrate_tsc_frequency();
// Run built-in allocation benchmarks
benchmark_allocations(); // Test different allocation sizes
benchmark_object_pools(); // Compare pool vs direct allocation
benchmark_aligned_allocations(); // Test memory alignment impact
}Example output:
Benchmarking memory allocations (10000 iterations per size)...
allocation_64B: 10000 samples, mean=89ns, p50=70ns, p95=120ns, p99=180ns
allocation_1024B: 10000 samples, mean=145ns, p50=130ns, p95=200ns, p99=280ns
Pool allocation: pool_allocation: 10000 samples, mean=65ns, p50=60ns, p95=85ns, p99=110ns
Direct allocation: direct_allocation: 10000 samples, mean=140ns, p50=130ns, p95=180ns, p99=220ns
// Required once at program startup for accurate timing
calibrate_tsc_frequency(); // 1000ms calibration (most accurate)
quick_calibrate_tsc_frequency(); // 100ms calibration (faster, less accurate)Fluent API for quick benchmarking:
use hft_benchmarks::SimpleBench;
SimpleBench::new("operation_name")
.bench(iterations, || your_function())
.report(); // Print results
// Or get analysis object
let analysis = SimpleBench::new("operation_name")
.bench(iterations, || your_function())
.analyze();For custom measurement logic:
use hft_benchmarks::{PrecisionTimer, time_function};
// Time a single operation
let timer = PrecisionTimer::start();
expensive_operation();
let elapsed_ns = timer.stop();
// Time function with return value
let (result, elapsed_ns) = time_function(|| {
compute_something()
});use hft_benchmarks::BenchmarkResults;
let mut results = BenchmarkResults::new("test_name".to_string());
// Collect measurements
for _ in 0..1000 {
let elapsed = time_operation();
results.record(elapsed);
}
// Analyze results
let analysis = results.analyze();
println!("Mean: {}ns, P99: {}ns", analysis.mean, analysis.p99);
// Check performance target
if analysis.meets_target(500) { // P99 < 500ns
println!("Performance target met!");
}The benchmark results show statistical distribution of timing measurements:
function_name: 1000 samples, mean=245ns, p50=230ns, p95=310ns, p99=450ns, p99.9=890ns, std_dev=45.2ns
- mean: Average execution time
- p50 (median): 50% of operations complete faster than this
- p95: 95% of operations complete faster than this
- p99: 99% of operations complete faster than this (critical for tail latency)
- p99.9: 99.9% of operations complete faster than this
- std_dev: Standard deviation (consistency indicator)
In performance-critical systems:
- Mean can hide outliers that hurt user experience
- P99 shows worst-case performance for 99% of operations
- P99.9 reveals extreme outliers that can cause system issues
Example: A function averaging 100ns but with P99 of 10ms will cause problems despite good average performance.
Run the benchmark test suite:
# From project root
cd /path/to/hft-framework/Code
cargo test --package hft-benchmarks -- --nocapture
# Or from benchmark crate directory
cd crates/hft-benchmarks
cargo test --lib -- --nocaptureRun example benchmarks:
cargo run --example simple_benchmark_exampleAlways calibrate before benchmarking:
// At program start
quick_calibrate_tsc_frequency(); // For development/testing
// OR
calibrate_tsc_frequency(); // For production measurementsUse appropriate sample sizes:
// Quick development check
SimpleBench::new("dev_test").bench(100, || function()).report();
// Production validation
SimpleBench::new("prod_test").bench(10000, || function()).report();Account for JIT compilation and cache warming:
// Warm up
for _ in 0..1000 { function(); }
// Then benchmark
SimpleBench::new("warmed_up").bench(5000, || function()).report();- Run on isolated CPU cores for consistent results
- Disable CPU scaling for accurate measurements
- Minimize background processes during benchmarking
- Use release mode builds (
cargo run --release)
use hft_benchmarks::*;
fn main() {
quick_calibrate_tsc_frequency();
SimpleBench::new("new_feature")
.bench(1000, || my_new_function())
.report();
}use hft_benchmarks::*;
fn compare_algorithms() {
quick_calibrate_tsc_frequency();
println!("=== Algorithm Comparison ===");
let results_a = SimpleBench::new("algorithm_a")
.bench(5000, || algorithm_a())
.analyze();
let results_b = SimpleBench::new("algorithm_b")
.bench(5000, || algorithm_b())
.analyze();
println!("Algorithm A: {}ns P99", results_a.p99);
println!("Algorithm B: {}ns P99", results_b.p99);
if results_b.p99 < results_a.p99 {
let improvement = (results_a.p99 as f64 / results_b.p99 as f64 - 1.0) * 100.0;
println!("Algorithm B is {:.1}% faster (P99)", improvement);
}
}use hft_benchmarks::*;
fn validate_performance() {
calibrate_tsc_frequency(); // Full calibration for accuracy
let analysis = SimpleBench::new("critical_path")
.bench(10000, || critical_trading_function())
.analyze();
// Ensure P99 latency meets requirements
const MAX_P99_NS: u64 = 500;
assert!(
analysis.meets_target(MAX_P99_NS),
"Performance regression: P99 = {}ns (max allowed: {}ns)",
analysis.p99,
MAX_P99_NS
);
println!("✓ Performance validation passed");
println!(" Mean: {}ns, P99: {}ns, P99.9: {}ns",
analysis.mean, analysis.p99, analysis.p999);
}use hft_benchmarks::*;
fn optimize_memory_usage() {
quick_calibrate_tsc_frequency();
println!("=== Memory Allocation Comparison ===");
// Test stack allocation
SimpleBench::new("stack_alloc")
.bench(10000, || {
let data = [0u64; 64]; // Stack allocated
std::hint::black_box(data);
})
.report();
// Test heap allocation
SimpleBench::new("heap_alloc")
.bench(10000, || {
let data = vec![0u64; 64]; // Heap allocated
std::hint::black_box(data);
})
.report();
// Use built-in memory benchmarks
benchmark_object_pools();
}
## Running Complete Benchmark Suite
### Memory Allocation Analysis
```bash
cargo run --example simple_benchmark_exampleOutput:
=== Vector Allocation Benchmark ===
vec_allocation: 1000 samples, mean=185ns, p50=170ns, p95=220ns, p99=992ns
=== Implementation Comparison ===
Old: 90ns P99, New: 50ns P99
Improvement: 166.7% faster
use hft_benchmarks::*;
fn main() {
calibrate_tsc_frequency();
// Benchmark your trading algorithm
SimpleBench::new("order_processing")
.bench(10000, || process_market_order())
.report();
// Memory-intensive operations
benchmark_allocations();
benchmark_object_pools();
}This library uses CPU timestamp counters (TSC) for nanosecond-precision timing:
- TSC-based timing: Direct CPU cycle counting via
_rdtsc()instruction - Memory barriers: Prevents instruction reordering that could affect measurements
- Calibrated conversion: Converts CPU cycles to nanoseconds based on measured frequency
- Minimal overhead: ~35ns measurement overhead
The benchmark tools themselves have minimal impact:
PrecisionTimer overhead: ~35ns
Function call overhead: ~37ns
Statistical calculation: <1μs for 10k samples
Memory allocation test: ~100-500ns per iteration
- x86_64/ARM CPU with stable TSC (most modern processors), on aarch64 tsc will not be available
- Linux, macOS, or Windows
- Rust 1.70+
- CPU frequency scaling can affect accuracy (disable for best results)
- System load impacts measurement consistency
- Compiler optimizations may eliminate benchmarked code (use
std::hint::black_box) - First run variance due to cache warming and JIT compilation
Use alongside other profiling tools for comprehensive analysis:
- perf for hardware counter analysis
- valgrind for memory profiling
- flamegraph for call stack visualization
- criterion for statistical benchmarking
This library excels at microbenchmarks and latency-critical code paths where nanosecond precision matters.