Skip to content

MnO2/zw-fast-quantile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zw-fast-quantile

CI Crates.io docs.rs

A Rust implementation of the Zhang-Wang fast approximate quantile algorithm, as described in An Efficient Quantile Computation Technique for Approximate Query Processing (SSDBM 2007). This is the original algorithm that inspired the quantile techniques used in TensorFlow Boosted Trees. For more context, see this blog post.

Features

  • Approximate quantile computation with tunable error bound (epsilon)
  • Two variants: fixed-size (known stream length) and unbounded (unknown stream length)
  • Generic over any Clone + Ord element type
  • Sub-linear memory usage — space grows logarithmically with stream size
  • Optional serde support for serialization/deserialization
  • Query caching for efficient repeated lookups

Installation

Add this to your Cargo.toml:

[dependencies]
zw-fast-quantile = "1.0"

Usage

FixedSizeEpsilonSummary

Use when the total number of elements is known ahead of time. This allows tighter memory allocation.

use zw_fast_quantile::FixedSizeEpsilonSummary;

let epsilon = 0.01;  // 1% error tolerance
let n = 10000;       // expected stream size
let mut summary = FixedSizeEpsilonSummary::new(n, epsilon).unwrap();

for value in 1..=n {
    summary.update(value);
}

// Query the median (rank 0.5)
let median = summary.query(0.5).unwrap();

// Query any quantile from 0.0 (min) to 1.0 (max)
let p99 = summary.query(0.99).unwrap();

UnboundEpsilonSummary

Use when the stream size is not known in advance. It dynamically manages internal summaries as the stream grows.

use zw_fast_quantile::UnboundEpsilonSummary;

let epsilon = 0.01;
let mut summary = UnboundEpsilonSummary::new(epsilon).unwrap();

// Feed values from any iterator or stream
for value in 1..=10000 {
    summary.update(value);
}

let median = summary.query(0.5).unwrap();
let count = summary.size();  // number of elements inserted

Error Handling

Both constructors and query() return Result<_, QuantileError>:

use zw_fast_quantile::{FixedSizeEpsilonSummary, QuantileError};

// Invalid parameters
assert!(FixedSizeEpsilonSummary::<usize>::new(0, 0.1).is_err());     // n must be > 0
assert!(FixedSizeEpsilonSummary::<usize>::new(10, -0.1).is_err());   // epsilon must be positive
assert!(FixedSizeEpsilonSummary::<usize>::new(10, f64::NAN).is_err()); // epsilon must be finite

// Query errors
let s = FixedSizeEpsilonSummary::<usize>::new(10, 0.1).unwrap();
assert!(s.query(0.5).is_err());  // EmptySummary — no values inserted

Choosing Epsilon

The epsilon parameter controls the trade-off between accuracy and memory:

Epsilon Max Error Relative Memory
0.1 10% Lowest
0.01 1% Moderate
0.001 0.1% Higher

A query for rank r on a stream of n elements will return an element whose true rank is within r ± epsilon (i.e., the result is within epsilon * n positions of the exact answer).

Benchmarks

Benchmarked against GK01 from the quantiles crate. Inserting 5000 values with epsilon = 0.01:

Update (insertion) performance — ZW is ~2.6x faster:

zw unbound quantile update
                        time:   [60.780 us 60.855 us 60.936 us]
gk quantile update      time:   [156.84 us 157.02 us 157.24 us]

Query performance (10 quantiles) — ZW is ~1.5x faster:

zw unbound quantile query
                        time:   [229.62 ns 230.16 ns 230.77 ns]
gk quantile query       time:   [350.21 ns 350.48 ns 350.76 ns]

Run benchmarks yourself with:

cargo bench

Optional Features

  • serde — Enable serialization/deserialization support via serde. Cache fields are skipped during serialization and lazily rebuilt on the next query.

    [dependencies]
    zw-fast-quantile = { version = "1.0", features = ["serde"] }

Minimum Supported Rust Version

The MSRV is 1.65.

Documentation

Related Projects

License

Licensed under Apache-2.0.

About

Zhang Wang Fast Approximate Quantiles Algorithm in Rust

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors

Languages