Skip to content
/ ETA Public

[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"

Notifications You must be signed in to change notification settings

DripNowhy/ETA

Repository files navigation

ETA Logo

Generated by DALL·E

📰 News

  • 2025.7.20: We released the implementation of our ETA on InternLM-XComposer2.5 model.
  • 2024.10.10: We have released the paper and code for ETA. If you find it helpful, we would appreciate your citation! !
  • 2025.1.22: Excited to share that ETA has been accepted to ICLR 2025!

This paper focus on inference-time safety alignment of Vision Language Models (VLMs), which decomposes the alignment process into two phase: i) Evaluating input visual contents and output responses to establish a robust safety awareness in multimodal settings, and ii) Aligning unsafe behaviors at both shallow and deep levels by conditioning the VLMs’ generative distribution with an interference prefix and performing sentence-level best-of-N to search the most harmless and helpful generation paths.

ETA Framework

⚙ Installation

  • Clone this repository and navigate to ETA folder.

    git clone https://github.com/DripNowhy/ETA/
    cd ETA
    
  • Install Environment (For LLaVA model)

    conda create -n eta python=3.10 -y
    conda activate eta
    pip install -r requirements.txt
    
  • Install Environment (For InternLM-XComposer model) We recommend to use transformers==4.46.2 for loading InternLM-XComposer model.

    conda create -n eta python=3.10 -y
    conda activate eta
    pip install -r requirements.txt
    pip install transformers==4.46.2
    

    Before you evaluate our ETA on InternLM-XComposer model. Please replace modeling_internlm_xcomposer2.py file in huggingface cache with the one provided in modeling_internlm_xcomposer2.py.

✨ Demo

  • Use eta_quick_use.py to generate
python eta_quick_use.py --gpu_id 0 --qs "your question here" --image_path "your image path here"

🖨️ Evaluation

  • Evaluations on safety benchmarks

    You can evaluate "SPA-VL", "MM-SafetyBench", "FigStep", "Cross-modality Attack" using the script.

    bash scripts/eta_safetybench.sh --save_dir "" --gpu_id 0 --dataset ""
    
  • Evaluatioons general ability

    You can evaluate comprehensive benchmarks and general VQA tasks using scripts provided by LLaVA

    CUDA_VISIBLE_DEVICES=0 bash scripts/eta_mmbench.sh
    
    CUDA_VISIBLE_DEVICES=0 bash scripts/eta_textvqa.sh
    

📄 Citation

Please consider citing our ETA if our repository is helpful to your work!

@article{ding2024eta,
  title={ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time},
  author={Ding, Yi and Li, Bolian and Zhang, Ruqi},
  journal={arXiv preprint arXiv:2410.06625},
  year={2024}
}

About

[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published