Skip to content

Researchtopic/Code-Jailbreak-AudioBench

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

163 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

This repository hosts the implementation of dataset generation pipeline and evaluation code for our paper “Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models”.


Figure 1 –– Pipeline: harmful prompts → TTS audio → Audio-Editing Toolbox → Benchmark.

✨ Highlights

  • Audio Editing Toolbox (AET) ––– seven audio editings (Emphasis · Speed · Intonation · Tone · Background Noise · Celebrity Accent · Emotion) implemented in Python under Editing/.

Figure 2 –– Examples of injecting different audio hidden semantics.
  • Jailbreak-AudioBench Dataset ––– 4,700 base audios × 20 editing types = 94,800 audio samples covering explicit and implicit jailbreak tasks. The dataset also includes an equal number of defended versions of these audio samples to explore defense strategies against audio editing jailbreaks.
  • Plug-and-play evaluation for various Large Audio Language Models (LALMs) with automatic safety judgement via Llama Guard 3.
  • Query-based Audio Editing Jailbreak Method combining different audio editing types, achieve higher Attack Success Rate (ASR) on State-of-the-Art LALMs.

Figure 3 –– ASR performance of Query-based Audio Editing Jailbreak.

🔧 Installation Guide

# Clone repository
git clone https://github.com/Researchtopic/Code-Jailbreak-AudioBench
cd Code-Jailbreak-AudioBench

# Create conda environment
conda create -n audio_editing_jailbreak python=3.10
conda activate audio_editing_jailbreak
pip install -r requirements.txt

# Install dependencies
sudo apt-get update
sudo apt-get install ffmpeg
sudo apt-get install sox libsox-fmt-all

🗂️ Directory Layout

├── Editing/                        # dataset generation & defense scripts
│   ├── generate_original_dataset.py
│   ├── generate_speed_dataset.py   
│   ├── generate_tone_dataset.py    
│   ├── generate_intonation_dataset.py
│   ├── generate_emotion_dataset.py 
│   ├── generate_noise_dataset.py   
│   ├── generate_white_noise_dataset.py
│   ├── generate_accent_dataset.py  
│   ├── convert_sampling_rate.py    
│   ├── create_subdataset.py        
│   └── combine_defense_prompt.py   
├── Inference/                      # model inference code
│   ├── BLSP.py                     
│   ├── VITA1.5.py                  
│   ├── gpt4o.py                    
│   ├── qwen2_audio.py              
│   ├── salmonn_13b.py              
│   └── speechgpt.py                
├── Figs/                           # paper figures & visualisations
└── README.md

🏗️ Dataset Generation

# 1️⃣ text → base audios
python Editing/generate_original_dataset.py

# 2️⃣ resample (optional)
python Editing/convert_sampling_rate.py

# 3️⃣ example edit: Tone +4 semitones
python Editing/generate_tone_dataset.py

# 4️⃣ prepend defense prompt
python Editing/combine_defense_prompt.py

🏃‍♂️ Evaluation

# 1️⃣ example evaluation: MiniCPM-o-2.6
python Inference/minicpm-o-2.6.py

# 2️⃣ use Llama Guard 3 to judge whether the jailbreak is successful
python Inference/analysis/llama3_guard.py

🔍 Code and Paper Correspondence

This codebase implements the complete experimental pipeline described in the paper:

  1. Audio Editing Toolbox (Section 2) - Implemented in Editing/, supporting seven different types of audio editing operations.
  2. Dataset Creation (Section 3) - The complete Jailbreak-AudioBench dataset is constructed using the tools in Editing/.
  3. Model Evaluation (Section 3) - Evaluation of all involved LALM models is implemented in Inference/.
  4. Query-based Audio Editing Jailbreak Attack (Section 4.1) - Implements the Query-based Audio Editing Jailbreak method by combining audio edits.
  5. Defense Method (Section 4.2) - Evaluates basic defense capabilities by prepending a defense prompt.

📦 Pre-trained Models

This project uses the following third-party models:

📜 Citation

If you use Jailbreak-AudioBench in the research, please cite our paper:

@misc{cheng2025jailbreakaudiobenchindepthevaluationanalysis,
      title={Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models}, 
      author={Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu},
      year={2025},
      eprint={2501.13772},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2501.13772}, 
}

📄 Licence

The code in this repository is released under the MIT License.
Jailbreak prompts originate from public datasets (AdvBench, MM-SafetyBench, RedTeam-2K, SafeBench) and comply with their respective licences.

About

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%