This repository contains code for evaluating our proposed BalanceKV algorithm in our NeurIPS 2025 spotlight paper https://arxiv.org/abs/2502.07861 on the LongBench and NIAH benchmarks.
- src/balanced_walk.py: Implements the balanced walk algorithm.
- src/llama_forward.py: Contains functions for inference using the Llama model and patching our custom KV cache.
- src/metrics_longbench.py: Defines various metrics used for LongBench evaluation.
- src/single_layer_approx.ipynb: Notebook containing ablation study experiments regarding single layer attention approximation.
- run_longbench.py: Main script for running the LongBench evaluation.
- run_needle_in_haystack.py: Main script for running the NIAH evaluation.
Install requirements via
pip install -r requirements.txtTo run the LongBench evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')
python run_longbench.py --kv_type weightedbw --datasets qasper --eTo run the NIAH evaluation, use the following command (default model is ``meta-llama/Llama-3.1-8B-Instruct'')
python run_needle_in_haystack.py --kv_type "weightedbw" --haystack_dir "<CurrentPath>/data/PaulGrahamEssays"To add the multimodal vlm evaluations, will be done in an upcoming commit.