Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

This repository hosts the implementation of dataset generation pipeline and evaluation code for our paper “Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models”.

_{Figure 1 –– Pipeline: harmful prompts → TTS audio → Audio-Editing Toolbox → Benchmark.}

✨ Highlights

Audio Editing Toolbox (AET) ––– seven audio editings (Emphasis · Speed · Intonation · Tone · Background Noise · Celebrity Accent · Emotion) implemented in Python under Editing/.

_{Figure 2 –– Examples of injecting different audio hidden semantics.}

Jailbreak-AudioBench Dataset ––– 4,700 base audios × 20 editing types = 94,800 audio samples covering explicit and implicit jailbreak tasks. The dataset also includes an equal number of defended versions of these audio samples to explore defense strategies against audio editing jailbreaks.
Plug-and-play evaluation for various Large Audio Language Models (LALMs) with automatic safety judgement via Llama Guard 3.
Query-based Audio Editing Jailbreak Method combining different audio editing types, achieve higher Attack Success Rate (ASR) on State-of-the-Art LALMs.

_{Figure 3 –– ASR performance of Query-based Audio Editing Jailbreak.}

🔧 Installation Guide

# Clone repository
git clone https://github.com/Researchtopic/Code-Jailbreak-AudioBench
cd Code-Jailbreak-AudioBench

# Create conda environment
conda create -n audio_editing_jailbreak python=3.10
conda activate audio_editing_jailbreak
pip install -r requirements.txt

# Install dependencies
sudo apt-get update
sudo apt-get install ffmpeg
sudo apt-get install sox libsox-fmt-all

🗂️ Directory Layout

├── Editing/                        # dataset generation & defense scripts
│   ├── generate_original_dataset.py
│   ├── generate_speed_dataset.py   
│   ├── generate_tone_dataset.py    
│   ├── generate_intonation_dataset.py
│   ├── generate_emotion_dataset.py 
│   ├── generate_noise_dataset.py   
│   ├── generate_white_noise_dataset.py
│   ├── generate_accent_dataset.py  
│   ├── convert_sampling_rate.py    
│   ├── create_subdataset.py        
│   └── combine_defense_prompt.py   
├── Inference/                      # model inference code
│   ├── BLSP.py                     
│   ├── VITA1.5.py                  
│   ├── gpt4o.py                    
│   ├── qwen2_audio.py              
│   ├── salmonn_13b.py              
│   └── speechgpt.py                
├── Figs/                           # paper figures & visualisations
└── README.md

🏗️ Dataset Generation

# 1️⃣ text → base audios
python Editing/generate_original_dataset.py

# 2️⃣ resample (optional)
python Editing/convert_sampling_rate.py

# 3️⃣ example edit: Tone +4 semitones
python Editing/generate_tone_dataset.py

# 4️⃣ prepend defense prompt
python Editing/combine_defense_prompt.py

🏃‍♂️ Evaluation

# 1️⃣ example evaluation: MiniCPM-o-2.6
python Inference/minicpm-o-2.6.py

# 2️⃣ use Llama Guard 3 to judge whether the jailbreak is successful
python Inference/analysis/llama3_guard.py

🔍 Code and Paper Correspondence

This codebase implements the complete experimental pipeline described in the paper:

Audio Editing Toolbox (Section 2) - Implemented in Editing/, supporting seven different types of audio editing operations.
Dataset Creation (Section 3) - The complete Jailbreak-AudioBench dataset is constructed using the tools in Editing/.
Model Evaluation (Section 3) - Evaluation of all involved LALM models is implemented in Inference/.
Query-based Audio Editing Jailbreak Attack (Section 4.1) - Implements the Query-based Audio Editing Jailbreak method by combining audio edits.
Defense Method (Section 4.2) - Evaluates basic defense capabilities by prepending a defense prompt.

📦 Pre-trained Models

This project uses the following third-party models:

BLSP – Github
SpeechGPT – Github
Qwen2-Audio-7B-Instruct – HuggingFace model card
SALMONN-7B – HuggingFace model card
SALMONN-13B – HuggingFace model card
VITA-1.5 – HuggingFace model card
R1-AQA – HuggingFace model card
MiniCPM-o-2.6 – HuggingFace model card

📜 Citation

If you use Jailbreak-AudioBench in the research, please cite our paper:

@misc{cheng2025jailbreakaudiobenchindepthevaluationanalysis,
      title={Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models}, 
      author={Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu},
      year={2025},
      eprint={2501.13772},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2501.13772}, 
}

📄 Licence

The code in this repository is released under the MIT License.
Jailbreak prompts originate from public datasets (AdvBench, MM-SafetyBench, RedTeam-2K, SafeBench) and comply with their respective licences.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
Editing		Editing
Figs		Figs
Inference		Inference
Text		Text
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

✨ Highlights

🔧 Installation Guide

🗂️ Directory Layout

🏗️ Dataset Generation

🏃‍♂️ Evaluation

🔍 Code and Paper Correspondence

📦 Pre-trained Models

📜 Citation

📄 Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models

✨ Highlights

🔧 Installation Guide

🗂️ Directory Layout

🏗️ Dataset Generation

🏃‍♂️ Evaluation

🔍 Code and Paper Correspondence

📦 Pre-trained Models

📜 Citation

📄 Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages