Releases: NVIDIA/kvpress
Releases · NVIDIA/kvpress
v0.5.3
v0.5.2
- Relax dependencies #200 by @SimJeg
- Add daily health check for HuggingFace leaderboard space by @maxjeblick
- Reset DecodingPress state when exiting context manager #192 by @cluster2600
- Fix evaluation README uv sync command #188 by @shpark1104
- Fix BlockPress docstring by @SimJeg
v0.5.1
v0.5.0
v0.4.3
v0.4.2
v0.4.1
✨ New Features
KVzapPress- a fast approximation of KVzip for prefill and decoding compression (https://arxiv.org/abs/2601.07891). Comes with KVzap training and evaluation utilities (#171)ThresholdPress- adaptive compression using score thresholds instead of fixed compression ratios (#171)
📈 Improvements
- Update
KVzipPresswith improvements and evaluation registry support (#172) - Rename
compress-questiontoquery-awarein evaluation config (#168) - Refactor
ObservedAttentionPressfor cleaner implementation (#166) - Add leaderboard generation script (#171)
🐛 Bug Fixes
- Fix empty context handling in pipeline (#165)
v0.4.0
🚀 Release v0.4.0
✨ New Features
- CURPress - Value-Guided KV Compression for LLMs via Approximated CUR Decomposition (#150)
- CompactorPress - Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores (#143)
- Decoding Press Functionality - Support for KV cache compression during the decoding phase (#139)
- AIME25 & Math500 Benchmarks - New evaluation datasets for mathematical reasoning tasks (#142)
post_init_from_modelHook - Add model-specific initialization support in BasePress (#163)
📈 Improvements
- Moved tests to GPU for faster CI execution (#132)
- Improved needle-in-haystack test coverage (#133)
- Updated README and documentation for clarity (#162)
- Enhanced docstrings throughout the codebase (#159)
- Updated decoding notebook with latest examples (#156)
- Code cleanup: moved utilities, cleaned imports (#160)
🐛 Bug Fixes
- Fixed LongBench-v2 benchmark evaluation (#161)
- Fixed kvzip press access to
past_key_values - Fixed ComposedPress behavior (#148)
- Fixed import issues (#144)
📦 Installation
pip install kvpress==0.4.0📚 Full Changelog
v0.3.0
What's Changed
- refactor: optimized covariance transform in ExpectedAttentionPress by @neuralsorcerer in #111
- fix ruler integration tests by @maxjeblick in #113
- fix typo by @neuralsorcerer in #116
- Add needle in haystack test by @alessiodevoto in #121
- fix masked_key_indices by @maxjeblick in #122
- Add copy-pr-bot settings by @maxjeblick in #123
- Add Github runner by @maxjeblick in #124
- evaluation README.md command error and logging error #127 by @wzp-0815 in #128
- add gpu runner by @maxjeblick in #125
- Upgrade expected attention with support for more models by @alessiodevoto in #126
- Add Expected Attention with Stats by @alessiodevoto in #120
⚠️ Transformers compatibility by @maxjeblick in #115 ---> this is a breaking change (the KV caching machinery changed in HF transformers and we adjusted KVPress accordingly)
New Contributors
- @neuralsorcerer made their first contribution in #111
- @wzp-0815 made their first contribution in #128
Full Changelog: v0.2.10...v0.3.0