Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

This is the official repository for Decoupling Contrastive Decoding (DCD), a training-free method for robust hallucination mitigation in Multimodal Large Language Models (MLLMs).

🚀 Quick Start

Environment Setup

Create and activate conda environment:

conda create -n dcd python=3.9
conda activate dcd

Install dependencies:

cd DCD
pip install -r requirements.txt

Note: For LLaVA-1.5, use transformers==4.31.0. For Qwen2.5-VL, use transformers==4.51.1.

Model Preparation

Download base models:
- LLaVA-1.5-7B: Download from LLaVA
- Qwen2.5-VL-3B: Available on HuggingFace

Download DCD checkpoints:

Place trained positive/negative projectors in ./DCD_ckpt/ directory

Structure:

DCD_ckpt/
├── negative/
│   ├── llava_rlaifv_mm_projector.bin
│   └── rlaifv_3B_negative.pth
│   └── ...
└── positive/
    └── llava_rlaifv_mm_projector.bin
    └── rlaifv_3B_positive.pth
    └── ...

Dataset Preparation

POPE Evaluation:

# Download COCO validation images
mkdir -p eval/data/coco
# Place images in eval/data/coco/val2014/

# POPE annotations are in eval/data/POPE/coco/

Training Data:
- Take RLAIF-V dataset for example:
```
cd train/data
python rlaiv_convert.py
```

🔧 Usage

Evaluation on POPE

Evaluate LLaVA-1.5 with DCD

# Configure paths in eval/scripts/eval_llava_1_5_pope.sh
# Then run:
bash eval/scripts/eval_llava_1_5_pope.sh

Evaluate Qwen2.5-VL with DCD

# Configure paths in eval/scripts/eval_qwen_2_5_vl_pope.sh
# Then run:
bash eval/scripts/eval_qwen_2_5_vl_pope.sh

Training DCD Projectors

Train Negative Projector (Qwen2.5-VL-3B)

bash train/scripts/negative_train.sh

Train Positive Projector (Qwen2.5-VL-3B)

bash train/scripts/positive_train.sh

Acknowledgments

This project builds upon several excellent open-source projects:

VCD: Visual Contrastive Decoding
LLaVA: Visual instruction tuning framework
MS-Swift: Model training framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
eval		eval
train		train
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

🚀 Quick Start

Environment Setup

Model Preparation

Dataset Preparation

🔧 Usage

Evaluation on POPE

Evaluate LLaVA-1.5 with DCD

Evaluate Qwen2.5-VL with DCD

Training DCD Projectors

Train Negative Projector (Qwen2.5-VL-3B)

Train Positive Projector (Qwen2.5-VL-3B)

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

HKUST-LongGroup/DCD

Folders and files

Latest commit

History

Repository files navigation

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

🚀 Quick Start

Environment Setup

Model Preparation

Dataset Preparation

🔧 Usage

Evaluation on POPE

Evaluate LLaVA-1.5 with DCD

Evaluate Qwen2.5-VL with DCD

Training DCD Projectors

Train Negative Projector (Qwen2.5-VL-3B)

Train Positive Projector (Qwen2.5-VL-3B)

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages