Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
This is the official repository for Decoupling Contrastive Decoding (DCD), a training-free method for robust hallucination mitigation in Multimodal Large Language Models (MLLMs).
- Create and activate conda environment:
conda create -n dcd python=3.9
conda activate dcd- Install dependencies:
cd DCD
pip install -r requirements.txtNote: For LLaVA-1.5, use transformers==4.31.0. For Qwen2.5-VL, use transformers==4.51.1.
-
Download base models:
- LLaVA-1.5-7B: Download from LLaVA
- Qwen2.5-VL-3B: Available on HuggingFace
-
Download DCD checkpoints:
- Place trained positive/negative projectors in
./DCD_ckpt/directory - Structure:
DCD_ckpt/ ├── negative/ │ ├── llava_rlaifv_mm_projector.bin │ └── rlaifv_3B_negative.pth │ └── ... └── positive/ └── llava_rlaifv_mm_projector.bin └── rlaifv_3B_positive.pth └── ...
- Place trained positive/negative projectors in
-
POPE Evaluation:
# Download COCO validation images mkdir -p eval/data/coco # Place images in eval/data/coco/val2014/ # POPE annotations are in eval/data/POPE/coco/
-
Training Data:
- Take RLAIF-V dataset for example:
cd train/data python rlaiv_convert.py
# Configure paths in eval/scripts/eval_llava_1_5_pope.sh
# Then run:
bash eval/scripts/eval_llava_1_5_pope.sh# Configure paths in eval/scripts/eval_qwen_2_5_vl_pope.sh
# Then run:
bash eval/scripts/eval_qwen_2_5_vl_pope.shbash train/scripts/negative_train.shbash train/scripts/positive_train.shThis project builds upon several excellent open-source projects: