Skip to content

DripNowhy/MIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Yi Ding*1,2, Lijun Li*1, Bing Cao2, Jing Shao1

1Shanghai Artificial Intelligence Laboratory, 2Tianjin University

*Equal contribution Corresponding author

Static Badge Static Badge Static Badge

Teaser figure

📢 Please consider citing or 🌟 MIS if our repository is helpful to your work!

🎙️ News

📅[2025-05-26] We release the new version of our paper, including MIRage on more powerful VLMs. Please check out here.

📅[2025-01-31] 🧨 Our paper Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models is released now! 🧨

📅[2025-01-30] 🧨 Our Dataset, MIRage series VLMs are released now! 🧨

📌 Content

📝 Introduction

Motivation of MIS

Large Vision-Language Models (VLMs) have achieved remarkable performance across a wide range of tasks. However, their deployment in safety-critical domains poses significant challenges. Existing safety fine-tuning methods, which focus on textual or multimodal content, fall short in addressing challenging cases or disrupt the balance between helpfulness and harmlessness. Our evaluation highlights a safety reasoning gap: these methods lack safety visual reasoning ability, leading to such bottlenecks. To address this limitation and enhance both visual perception and reasoning in safety-critical contexts, we propose a novel dataset that integrates multi-image inputs with safety Chain-of-Thought (CoT) labels as fine-grained reasoning logic to improve model performance. Specifically, we introduce the Multi-Image Safety (MIS) dataset, an instruction-following dataset tailored for multi-image safety scenarios, consisting of training and test splits. Our experiments demonstrate that fine-tuning InternVL2.5-8B with MIS significantly outperforms both powerful open-source models and API-based models in challenging multi-image tasks requiring safety-related visual reasoning. This approach not only delivers exceptional safety performance but also preserves general capabilities without any trade-offs. Specifically, fine-tuning with MIS increases average accuracy by 0.83% across five general benchmarks and reduces the Attack Success Rate (ASR) on multiple safety benchmarks by a large margin.

📊 Dataset

Dataset figure You can download our MIS dataset from Huggingface 🤗.

🏜️ Models

MIRage

MIRage: Multi-Image Reasoning Safety Fine-Tuning

  • You can download InternVL2.5-8B fine-tuned with our MIRage and MIS training data from here 🤗.
  • You can download Qwen2-VL-7B-Instruct fine-tuned with our MIRage and MIS training data from here 🤗.

📐 Evaluation

Inference

  • Clone our MIS repo:

    git clone https://github.com/DripNowhy/MIS.git
    cd MIS
    
  • Data Preparation: First, download our MIS test set. Then, organize your data following the structure below:

    ├── easy_image
    │   ├── 1
    │   │   └── object1.png
    │   │   └── object2.png
    │   └── ...
    ├── hard_image
    │   ├── 1
    │   │   └── object1.png
    │   │   └── object2.png
    │   └── ...
    ├── real_image
    │   ├── 1
    │   │   └── object1.png
    │   │   └── object2.png
    │   └── ...
    ├── mis_easy.json
    ├── mis_hard.json
    └── mis_real.json
    
  • For Qwen2-VL series, InternVL2.5 series, Phi3.5-Vision-Instruct, Idefics3-8B, LLaVA-OneVision-72b-Chat-hf models. We recommend you to deploy VLMs using vLLM.

    pip install vllm
    pip install qwen_vl_utils
    bash scripts/inf_vllm.sh
    
  • For LLaVA-NeXT-Interleave, first, install the LLaVA environment by following the instructions in the LLaVA-NeXT Official Repository. Once the LLaVA environment is set up, you can run inferences on the model using the following code:

    bash scripts/inf_llava.sh
    
  • For DeepSeek-VL2, first, install the deepseek environment by following the instructions in the DeepSeek-VL2 Official Repository. Once the deepseek environment is set up, you can run inferences on the model using the following code:

    bash scripts/inf_deepseek.sh
    

Evaluation

Now, you can use GPT-4o as evaluator to do the evaluation. Make sure you have fulfilled your openai api in evaluation/gpt_eval.py.

bash scripts/eval_all.sh

🌟 Citation

@article{ding2025rethinking,
  title={Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models},
  author={Ding, Yi and Li, Lijun and Cao, Bing and Shao, Jing},
  journal={arXiv preprint arXiv:2501.18533},
  year={2025}
}

About

Data and Code for Paper Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published