This repository provides the implementation of Layer Contrastive Decoding (LayerCD), along with evaluation scripts on the POPE dataset.
- Clone the Repository
git clone [email protected]:maifoundations/LayerCD.git
cd LayerCD- Configure Environment
Set up the environment according to the requirements of the model you want to use with LayerCD (e.g., LLaVA, Cambrian, Molmo). Please refer to the documentation of your chosen model for installation instructions.
- Benchmarks
If you plan to use the POPE benchmark:
- Download the POPE image dataset.
- Update the
IMAGE_BASEpath inutil/constant.py.
- Model Weights
- Update the
MODEL_ZOOdictionary inutil/constant.pywith the paths to your model checkpoints.
- Using Custom Models
- To apply LayerCD to your own model, check the function
evolve_cd_samplinginutil/cd_utils.py. - Modify the image feature extraction logic to match your model’s visual encoder.
Example: running evaluation on POPE:
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python eval.py \
--dataset=POPE \ # Dataset: POPE or MME
[--POPE_sampling_type=coco] \ # POPE sampling set (required for POPE)
[--POPE_type=popular] \ # POPE data type (required for POPE)
--batch_size=8 \ # Inference batch size
--model_type=Molmo \ # Model type: LLaVA, Cambrian, Molmo, or custom
--seed=$seed # Random seedAfter evaluation, compute the final results with:
python util/compute_results.py --dataset=POPE # Dataset: POPE or MMEIf you find this work useful, please consider citing:
@misc{tong2025mitigatinghallucinationmultimodalllms,
title={Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding},
author={Bingkui Tong and Jiaer Xia and Kaiyang Zhou},
year={2025},
eprint={2509.25177},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.25177},
}