Measurement primitives for perp microstructure research. Reference implementations of the metrics we use at Xylem Group, published as a runnable companion to our research notes. If you read a method described in prose, you can run it on your own data.
Three primitives — markout, vpin, if_stress — each in kos/<name>.py with a worked notebook in examples/ and tests in tests/. Plus a real-data tour: HL silver-layer parquet, ad-hoc Binance archive fetch, and a cross-venue BTC notebook that's the methodology piece behind our applied research repo, paros.
Hyperliquid BTC vs Binance USDT-M BTCUSDT, same 24h window. Mid path overlay and HL−BN basis in bps. Generated by examples/cross_venue_btc.ipynb.
Mean basis −0.65 bps, σ 6.95 bps over 24h — textbook cross-venue arbed BTC, with one −107 bps excursion likely from a stale HL minute-bar mid during a fast move. The notebook also runs a 500-fill markout sweep on each venue and overlays the IQR bands.
How much does the mid move against you after a fill? Computed in basis points across log-spaced horizons from sub-millisecond to one minute. Three normalizations: by half-spread, by realized volatility, by an impact prior. The foundation for any signal-toxicity measurement.
Easley/López de Prado/O'Hara's classic toxicity proxy. Bulk volume classification with a normal CDF, fixed-volume buckets, rolling toxicity score. Plus notes on what doesn't translate cleanly to crypto perps.
Given a population of leveraged longs and a price shock, walks the liquidation cascade: surplus to the IF, depletion timing, ADL queue. Sweep if_initial to find the smallest IF that survives a given shock with target probability.
uv sync # installs kos + dev deps
uv run jupyter lab examples/ # opens the notebooksOr render the README plots from a clean checkout:
uv run python scripts/render_plots.pyThe cross-venue notebook fetches one day of Binance BTCUSDT aggTrades on first run (~80 MB, cached under ~/.cache/binance_archive/). HL data is read from /Users/andnasnd/hl-data — adjust DEFAULT_ROOT in examples/data_hl.py for your mount.
This is a measurement library, not a trading system. Honest about what it can and can't do today:
- HL L2 snapshot is partial. The local
market_dataparquet only covers0G,2Z,AAVE— an artifact of how the snapshot was taken, not a pipeline limit. BTC/ETH sub-second microstructure is gated on a re-snapshot with a broader coin allowlist. - HL VPIN tape needs a parser. Trade fills live inside
replica_cmdssigned-action response bundles and aren't yet exposed as a flat tape. Cross-venue VPIN currently leans on BinanceaggTrades(which carriesis_buyer_maker→ aggressor side). - Binance fetch is ad-hoc.
examples/binance_archive_fetch.pypulls directly fromdata.binance.vision. Production data flows through our sigflow pipeline; the adapter shim inexamples/data_sigflow.pyis the swap point. - Synthetic demos are illustrative, not representative. The IF and VPIN demos use synthetic tapes that show the metric responding to a known regime change. Real-data demos use HL silver-layer parquet and the Binance public archive.
CC BY-NC 4.0 — read, share, cite, run on your own data. No commercial redistribution. See LICENSE.
If you use these primitives in published work, please cite the repo. See CITATION.cff for the canonical record.



