Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Yi Ding*¹^,², Lijun Li*¹, Bing Cao^†², Jing Shao^†¹

¹Shanghai Artificial Intelligence Laboratory, ²Tianjin University

^*Equal contribution ^†Corresponding author

📢 Please consider citing or 🌟 MIS if our repository is helpful to your work!

🎙️ News

📅[2025-05-26] We release the new version of our paper, including MIRage on more powerful VLMs. Please check out here.

📅[2025-01-31] 🧨 Our paper Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models is released now! 🧨

📅[2025-01-30] 🧨 Our Dataset, MIRage series VLMs are released now! 🧨

📝 Introduction

Large Vision-Language Models (VLMs) have achieved remarkable performance across a wide range of tasks. However, their deployment in safety-critical domains poses significant challenges. Existing safety fine-tuning methods, which focus on textual or multimodal content, fall short in addressing challenging cases or disrupt the balance between helpfulness and harmlessness. Our evaluation highlights a safety reasoning gap: these methods lack safety visual reasoning ability, leading to such bottlenecks. To address this limitation and enhance both visual perception and reasoning in safety-critical contexts, we propose a novel dataset that integrates multi-image inputs with safety Chain-of-Thought (CoT) labels as fine-grained reasoning logic to improve model performance. Specifically, we introduce the Multi-Image Safety (MIS) dataset, an instruction-following dataset tailored for multi-image safety scenarios, consisting of training and test splits. Our experiments demonstrate that fine-tuning InternVL2.5-8B with MIS significantly outperforms both powerful open-source models and API-based models in challenging multi-image tasks requiring safety-related visual reasoning. This approach not only delivers exceptional safety performance but also preserves general capabilities without any trade-offs. Specifically, fine-tuning with MIS increases average accuracy by 0.83% across five general benchmarks and reduces the Attack Success Rate (ASR) on multiple safety benchmarks by a large margin.

📊 Dataset

You can download our MIS dataset from Huggingface 🤗.

🏜️ Models

MIRage: Multi-Image Reasoning Safety Fine-Tuning

You can download InternVL2.5-8B fine-tuned with our MIRage and MIS training data from here 🤗.
You can download Qwen2-VL-7B-Instruct fine-tuned with our MIRage and MIS training data from here 🤗.

📐 Evaluation

Inference

Clone our MIS repo:

git clone https://github.com/DripNowhy/MIS.git
cd MIS

Data Preparation: First, download our MIS test set. Then, organize your data following the structure below:

├── easy_image
│   ├── 1
│   │   └── object1.png
│   │   └── object2.png
│   └── ...
├── hard_image
│   ├── 1
│   │   └── object1.png
│   │   └── object2.png
│   └── ...
├── real_image
│   ├── 1
│   │   └── object1.png
│   │   └── object2.png
│   └── ...
├── mis_easy.json
├── mis_hard.json
└── mis_real.json

For Qwen2-VL series, InternVL2.5 series, Phi3.5-Vision-Instruct, Idefics3-8B, LLaVA-OneVision-72b-Chat-hf models. We recommend you to deploy VLMs using vLLM.
```
pip install vllm
pip install qwen_vl_utils
bash scripts/inf_vllm.sh
```
For LLaVA-NeXT-Interleave, first, install the LLaVA environment by following the instructions in the LLaVA-NeXT Official Repository. Once the LLaVA environment is set up, you can run inferences on the model using the following code:
```
bash scripts/inf_llava.sh
```
For DeepSeek-VL2, first, install the deepseek environment by following the instructions in the DeepSeek-VL2 Official Repository. Once the deepseek environment is set up, you can run inferences on the model using the following code:
```
bash scripts/inf_deepseek.sh
```

Evaluation

Now, you can use GPT-4o as evaluator to do the evaluation. Make sure you have fulfilled your openai api in evaluation/gpt_eval.py.

bash scripts/eval_all.sh

🌟 Citation

@article{ding2025rethinking,
  title={Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models},
  author={Ding, Yi and Li, Lijun and Cao, Bing and Shao, Jing},
  journal={arXiv preprint arXiv:2501.18533},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
evaluation		evaluation
scripts		scripts
static		static
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

🎙️ News

📌 Content

📝 Introduction

📊 Dataset

🏜️ Models

📐 Evaluation

Inference

Evaluation

🌟 Citation

About

Uh oh!

Releases

Packages

Languages

DripNowhy/MIS

Folders and files

Latest commit

History

Repository files navigation

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

🎙️ News

📌 Content

📝 Introduction

📊 Dataset

🏜️ Models

📐 Evaluation

Inference

Evaluation

🌟 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages