Skip to content

[NeurIPS 2025 Spotlight] Official implementation of our KV cache compression algorithm BalanceKV in our NeurIPS 2025 Spotlight paper "Streaming Attention Approximation via Discrepancy Theory" https://arxiv.org/abs/2502.07861

Notifications You must be signed in to change notification settings

ksheth96/BalanceKV

Repository files navigation

Streaming Attention Approximation via Discrepancy Theory

This repository contains code for evaluating our proposed BalanceKV algorithm in our NeurIPS 2025 spotlight paper https://arxiv.org/abs/2502.07861 on the LongBench and NIAH benchmarks.

Files and Directories

  • src/balanced_walk.py: Implements the balanced walk algorithm.
  • src/llama_forward.py: Contains functions for inference using the Llama model and patching our custom KV cache.
  • src/metrics_longbench.py: Defines various metrics used for LongBench evaluation.
  • src/single_layer_approx.ipynb: Notebook containing ablation study experiments regarding single layer attention approximation.
  • run_longbench.py: Main script for running the LongBench evaluation.
  • run_needle_in_haystack.py: Main script for running the NIAH evaluation.

Requirements

Install requirements via

pip install -r requirements.txt

Usage

To run the LongBench evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')

python run_longbench.py --kv_type weightedbw --datasets qasper --e

To run the NIAH evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')

python run_needle_in_haystack.py --kv_type "weightedbw" --haystack_dir "<CurrentPath>/data/PaulGrahamEssays"

TODO

To add the multimodal vlm evaluations, will be done in an upcoming commit.

About

[NeurIPS 2025 Spotlight] Official implementation of our KV cache compression algorithm BalanceKV in our NeurIPS 2025 Spotlight paper "Streaming Attention Approximation via Discrepancy Theory" https://arxiv.org/abs/2502.07861

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages