Skip to content

insuhan/calibquant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CalibQuant: 1-Bit KV Cache Quantization via Calibration for Multimodal LLMs

Code for the paper "CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs"

Authors: Insu Han, Zeliang Zhang, Zhiyuan Wang, Yifan Zhu, Susan Liang, Jiani Liu, Haiting Lin, Mingjie Zhao, Chenliang Xu, Kun Wan, Wentian Zhao

This repository provides a guide for setting up and running InternVL with KVcacheQuant for efficient inference.

Installation

  1. Install required packages (e.g., InternVL, Triton):
pip install internvl triton==3.2.0
  1. Download the InternVL2.5-26B/8B model from HuggingFace:

Modify Parameters

  1. Change batch size (line 104 in infer.py)
  2. Set bit number (line 13 in calibquant.py)
  3. Run Inference
python infer.py

Notes

  • Ensure all dependencies are installed before running the script.
  • Modify parameters accordingly for optimal performance based on your hardware.
  • If you encounter issues, refer to the official documentation or repository.

Citation

@article{,
  title={CalibQuant: 1-Bit KV Cache Quantization via Calibration for Multimodal LLMs},
  author={Han, Insu and Zhang, Zeliang and Zhu, Yifan and Liang, Susan and Wang, Zhiyuan and Liu, Jiani and Lin, Haiting and Zhao, Mingjie and Xu, Chenliang and Wan, Kun and Zhao, Wentian},
  journal={arXiv preprint arXiv:2502.14882},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •