Skip to content

maifoundations/LayerCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding

This repository provides the implementation of Layer Contrastive Decoding (LayerCD), along with evaluation scripts on the POPE dataset.


⚙️ Environment Setup

  1. Clone the Repository
git clone [email protected]:maifoundations/LayerCD.git
cd LayerCD
  1. Configure Environment

Set up the environment according to the requirements of the model you want to use with LayerCD (e.g., LLaVA, Cambrian, Molmo). Please refer to the documentation of your chosen model for installation instructions.

  1. Benchmarks

If you plan to use the POPE benchmark:

  • Download the POPE image dataset.
  • Update the IMAGE_BASE path in util/constant.py.
  1. Model Weights
  • Update the MODEL_ZOO dictionary in util/constant.py with the paths to your model checkpoints.
  1. Using Custom Models
  • To apply LayerCD to your own model, check the function evolve_cd_sampling in util/cd_utils.py.
  • Modify the image feature extraction logic to match your model’s visual encoder.

🚀 Running Evaluation

Example: running evaluation on POPE:

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python eval.py \
  --dataset=POPE \                    # Dataset: POPE or MME
  [--POPE_sampling_type=coco] \       # POPE sampling set (required for POPE)
  [--POPE_type=popular] \             # POPE data type (required for POPE)
  --batch_size=8 \                    # Inference batch size
  --model_type=Molmo \                # Model type: LLaVA, Cambrian, Molmo, or custom
  --seed=$seed                        # Random seed

📊 Computing Results

After evaluation, compute the final results with:

python util/compute_results.py --dataset=POPE   # Dataset: POPE or MME

📌 Citation

If you find this work useful, please consider citing:

@misc{tong2025mitigatinghallucinationmultimodalllms,
      title={Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding}, 
      author={Bingkui Tong and Jiaer Xia and Kaiyang Zhou},
      year={2025},
      eprint={2509.25177},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.25177}, 
}

About

Mitigating Hallucination in Multimodal LLMs with Layer Contrastive Decoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages