Streaming Attention Approximation via Discrepancy Theory

This repository contains code for evaluating our proposed BalanceKV algorithm in our NeurIPS 2025 spotlight paper https://arxiv.org/abs/2502.07861 on the LongBench and NIAH benchmarks.

src/balanced_walk.py: Implements the balanced walk algorithm.
src/llama_forward.py: Contains functions for inference using the Llama model and patching our custom KV cache.
src/metrics_longbench.py: Defines various metrics used for LongBench evaluation.
src/single_layer_approx.ipynb: Notebook containing ablation study experiments regarding single layer attention approximation.
run_longbench.py: Main script for running the LongBench evaluation.
run_needle_in_haystack.py: Main script for running the NIAH evaluation.

Requirements

Install requirements via

pip install -r requirements.txt

To run the LongBench evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')

python run_longbench.py --kv_type weightedbw --datasets qasper --e

To run the NIAH evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')

python run_needle_in_haystack.py --kv_type "weightedbw" --haystack_dir "<CurrentPath>/data/PaulGrahamEssays"

To add the multimodal vlm evaluations, will be done in an upcoming commit.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/PaulGrahamEssays		data/PaulGrahamEssays
src		src
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
run_longbench.py		run_longbench.py
run_needle_in_haystack.py		run_needle_in_haystack.py