Latest News 🔥
- [2025/05/28]: We release our paper on arXiv:2505.21889 🥳
We need conda to set up environment. Please install conda before executing the following instructions.
source scripts/setup.sh
After setting up environment, you should execute the instructions under conda env euro_par_artifact.
Please make sure models (llama, llama-enhance, deepseek, deepseek-enhance) are properly downloaded under models.
export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_server.sh
export MODEL=llama # choose from [llama, llama-enhance, deepseeek, deepseek-enhance]
./scripts/launch_prefix_server.sh
Launch vLLM before running following commands
python benchmark/async_benchmark_humaneval.py --model llama
python benchmark/async_benchmark_humaneval.py --model llama --use-EFIM
python benchmark/async_benchmark_cceval.py --model llama
python benchmark/async_benchmark_cceval.py --model llama --use-EFIM
Launch vLLM before running following commands
python benchmark/async_benchmark_inference_speed.py --model llama --num-round 5 --num-user 16
python benchmark/async_benchmark_inference_speed.py --model llama-enhance --num-round 5 --num-user 16 --use-EFIM
Q1: How to solve AssertionError at assert completions[idx].success?
A1: One possible solution is increasing the allowed number of open files through ulimit -n 65535.
@misc{guo2025efimefficientservingllms,
title={EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse},
author={Tianyu Guo and Hande Dong and Yichong Leng and Feng Liu and Cheater Lin and Nong Xiao and Xianwei Zhang},
year={2025},
eprint={2505.21889},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.21889},
}