Skip to content

[NeurIPS'2025] Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Notifications You must be signed in to change notification settings

HKUST-LongGroup/DCD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

arXiv Checkpoints

This is the official repository for Decoupling Contrastive Decoding (DCD), a training-free method for robust hallucination mitigation in Multimodal Large Language Models (MLLMs).


🚀 Quick Start

Environment Setup

  1. Create and activate conda environment:
conda create -n dcd python=3.9
conda activate dcd
  1. Install dependencies:
cd DCD
pip install -r requirements.txt

Note: For LLaVA-1.5, use transformers==4.31.0. For Qwen2.5-VL, use transformers==4.51.1.

Model Preparation

  1. Download base models:

  2. Download DCD checkpoints:

    • Place trained positive/negative projectors in ./DCD_ckpt/ directory
    • Structure:
      DCD_ckpt/
      ├── negative/
      │   ├── llava_rlaifv_mm_projector.bin
      │   └── rlaifv_3B_negative.pth
      │   └── ...
      └── positive/
          └── llava_rlaifv_mm_projector.bin
          └── rlaifv_3B_positive.pth
          └── ...
      

Dataset Preparation

  1. POPE Evaluation:

    # Download COCO validation images
    mkdir -p eval/data/coco
    # Place images in eval/data/coco/val2014/
    
    # POPE annotations are in eval/data/POPE/coco/
  2. Training Data:

    • Take RLAIF-V dataset for example:
    cd train/data
    python rlaiv_convert.py
    

🔧 Usage

Evaluation on POPE

Evaluate LLaVA-1.5 with DCD

# Configure paths in eval/scripts/eval_llava_1_5_pope.sh
# Then run:
bash eval/scripts/eval_llava_1_5_pope.sh

Evaluate Qwen2.5-VL with DCD

# Configure paths in eval/scripts/eval_qwen_2_5_vl_pope.sh
# Then run:
bash eval/scripts/eval_qwen_2_5_vl_pope.sh

Training DCD Projectors

Train Negative Projector (Qwen2.5-VL-3B)

bash train/scripts/negative_train.sh

Train Positive Projector (Qwen2.5-VL-3B)

bash train/scripts/positive_train.sh

Acknowledgments

This project builds upon several excellent open-source projects:

  • VCD: Visual Contrastive Decoding
  • LLaVA: Visual instruction tuning framework
  • MS-Swift: Model training framework

About

[NeurIPS'2025] Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published