SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

This repository contains PyTorch implementation for SparseMM.

Introduce SparseMM

We investigate how MLLMs process visual inputs by analyzing their attention mechanisms and reveal a surprising sparsity phenomenon: only a small subset (approximately less than 5%) of attention heads in LLMs actively contribute to visual understanding, termed Visual Heads. To identify these heads efficiently, we design a training-free framework that quantifies head-level visual relevance through targeted response analysis.

Building on this discovery, we introduce SparseMM, a KV-Cache optimization strategy that allocates asymmetric computation budgets to heads in LLMs based on their visual scores, leveraging the sparity of visual heads for accelerating the inference of MLLMs. Compared with prior KV-Cache acceleration methods that ignore the particularity of visual, SparseMM prioritizes stress and retaining visual semantics during decoding.

Main idea of Visual Head

SparseMM for MLLM Acceleration

Main Results

Results on Multi-modal Benchmarks

Efficiency Evaluation for SparseMM

Visualization of Visual Head

Get Started

Install

Clone this repository:

git clone https://github.com/CR400AF-A/SparseMM.git
cd SparseMM

Init your environment

conda create -n sparsemm python=3.10 -y
conda activate sparsemm

Install packages

Compile CUDA code for Flatten Cache Storage. If you encounter a CUDA compile error, please check your GPU Virtual Architecture and GPU Feature, then change the corresponding compile flag in csrc/build.py

pip install packaging torch==2.5.1
pip uninstall ninja && pip cache purge && pip install ninja --no-cache-dir
cd csrc && make
cd ..

Install other packages

pip install -e .
pip install flash-attn==2.4.1 --no-build-isolation # currently only support FlashAttention
pip install qwen-vl-utils

Install lmms-eval for evaluation

cd lmms-eval
pip install -e .
cd ..

Chase Visual Head

download dataset:

huggingface-cli download --repo-type dataset --resume-download nnethercott/synthdog-en-detection --local-dir /path/to/datasets/synthdog-en-detection

huggingface-cli download --repo-type dataset --resume-download detection-datasets/coco --local-dir /path/to/datasets/coco

process dataset:

python3 scripts/chase_visual_head/process_data.py
python3 scripts/chase_visual_head/process_data_coco.py

chase visual head:

bash scripts/chase_visual_head/llava.sh
bash scripts/chase_visual_head/llava_coco.sh
bash scripts/chase_visual_head/qwen.sh

Eval

bash scripts/eval/llava.sh
bash scripts/eval/mistral.sh
bash scripts/eval/qwen.sh

Viz

bash scripts/others/viz.sh

Speed and Memory

bash scripts/others/speed_and_memory.sh

Citation

If you found this repository useful, please consider citing:

@article{wang2025sparsemm,
  title={SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs},
  author={Wang, Jiahui and Liu, Zuyan and Rao, Yongming and Lu, Jiwen},
  journal={arXiv preprint arXiv:2506.05344},
  year={2025}
}

Acknowledgement

Our codebase is conducted on AdaKV and PyramidKV.
Thanks to lmms-eval team, for building such a useful evaluation system!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
csrc		csrc
lmms-eval		lmms-eval
scripts		scripts
sparsemm		sparsemm
visual_head		visual_head
viz		viz
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
speed_and_memory.py		speed_and_memory.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Introduce SparseMM

Main idea of Visual Head

SparseMM for MLLM Acceleration

Main Results

Results on Multi-modal Benchmarks

Efficiency Evaluation for SparseMM

Visualization of Visual Head

Get Started

Install

Chase Visual Head

Eval

Viz

Speed and Memory

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

CR400AF-A/SparseMM

Folders and files

Latest commit

History

Repository files navigation

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Introduce SparseMM

Main idea of Visual Head

SparseMM for MLLM Acceleration

Main Results

Results on Multi-modal Benchmarks

Efficiency Evaluation for SparseMM

Visualization of Visual Head

Get Started

Install

Chase Visual Head

Eval

Viz

Speed and Memory

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages