Reference

All of the code are built from the official code of LLaVA-1.5. Mainly, you only need to change several parts.

Part1

LLaVA/llava/model/multimodal_projector/builder.py. line290-line309 to decide whether to use the AGV-PR or C-Abstractor. Also line 241 to 243 means (1). use the AGV-PR (2). use the Origin PR (3). skip PR

Part2

LLaVA/llava/model/multimodal_encoder/clip_encoder.py. line40-line58 means use our proposed Anchor selector. line60-line73 means use the pooling strategy.

Part3

LLaVA/llava/model/llava_arch.py Pass the selected anchors to the cross attention module. line 142

Reference

LLaVA

Citation

@article{liu2024visual,
  title={Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model},
  author={Liu, Haogeng and You, Quanzeng and Han, Xiaotian and Liu, Yongfei and Huang, Huaibo and He, Ran and Yang, Hongxia},
  journal={arXiv preprint arXiv:2405.17815},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
builder.py		builder.py
clip_encoder.py		clip_encoder.py
llava_arch.py		llava_arch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Part1

Part2

Part3

Reference

Citation

About

Uh oh!

Releases

Packages

Languages

License

liuhaogeng/Anchor-Former

Folders and files

Latest commit

History

Repository files navigation

Part1

Part2

Part3

Reference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages