Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
This repository hosts the implementation of dataset generation pipeline and evaluation code for our paper “Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models”.
- Audio Editing Toolbox (AET) ––– seven audio editings (Emphasis · Speed · Intonation · Tone · Background Noise · Celebrity Accent · Emotion) implemented in Python under
Editing/.
- Jailbreak-AudioBench Dataset ––– 4,700 base audios × 20 editing types = 94,800 audio samples covering explicit and implicit jailbreak tasks. The dataset also includes an equal number of defended versions of these audio samples to explore defense strategies against audio editing jailbreaks.
- Plug-and-play evaluation for various Large Audio Language Models (LALMs) with automatic safety judgement via Llama Guard 3.
- Query-based Audio Editing Jailbreak Method combining different audio editing types, achieve higher Attack Success Rate (ASR) on State-of-the-Art LALMs.
# Clone repository
git clone https://github.com/Researchtopic/Code-Jailbreak-AudioBench
cd Code-Jailbreak-AudioBench
# Create conda environment
conda create -n audio_editing_jailbreak python=3.10
conda activate audio_editing_jailbreak
pip install -r requirements.txt
# Install dependencies
sudo apt-get update
sudo apt-get install ffmpeg
sudo apt-get install sox libsox-fmt-all├── Editing/ # dataset generation & defense scripts
│ ├── generate_original_dataset.py
│ ├── generate_speed_dataset.py
│ ├── generate_tone_dataset.py
│ ├── generate_intonation_dataset.py
│ ├── generate_emotion_dataset.py
│ ├── generate_noise_dataset.py
│ ├── generate_white_noise_dataset.py
│ ├── generate_accent_dataset.py
│ ├── convert_sampling_rate.py
│ ├── create_subdataset.py
│ └── combine_defense_prompt.py
├── Inference/ # model inference code
│ ├── BLSP.py
│ ├── VITA1.5.py
│ ├── gpt4o.py
│ ├── qwen2_audio.py
│ ├── salmonn_13b.py
│ └── speechgpt.py
├── Figs/ # paper figures & visualisations
└── README.md
# 1️⃣ text → base audios
python Editing/generate_original_dataset.py
# 2️⃣ resample (optional)
python Editing/convert_sampling_rate.py
# 3️⃣ example edit: Tone +4 semitones
python Editing/generate_tone_dataset.py
# 4️⃣ prepend defense prompt
python Editing/combine_defense_prompt.py
# 1️⃣ example evaluation: MiniCPM-o-2.6
python Inference/minicpm-o-2.6.py
# 2️⃣ use Llama Guard 3 to judge whether the jailbreak is successful
python Inference/analysis/llama3_guard.pyThis codebase implements the complete experimental pipeline described in the paper:
- Audio Editing Toolbox (Section 2) - Implemented in
Editing/, supporting seven different types of audio editing operations. - Dataset Creation (Section 3) - The complete Jailbreak-AudioBench dataset is constructed using the tools in
Editing/. - Model Evaluation (Section 3) - Evaluation of all involved LALM models is implemented in
Inference/. - Query-based Audio Editing Jailbreak Attack (Section 4.1) - Implements the Query-based Audio Editing Jailbreak method by combining audio edits.
- Defense Method (Section 4.2) - Evaluates basic defense capabilities by prepending a defense prompt.
This project uses the following third-party models:
- BLSP – Github
- SpeechGPT – Github
- Qwen2-Audio-7B-Instruct – HuggingFace model card
- SALMONN-7B – HuggingFace model card
- SALMONN-13B – HuggingFace model card
- VITA-1.5 – HuggingFace model card
- R1-AQA – HuggingFace model card
- MiniCPM-o-2.6 – HuggingFace model card
If you use Jailbreak-AudioBench in the research, please cite our paper:
@misc{cheng2025jailbreakaudiobenchindepthevaluationanalysis,
title={Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models},
author={Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu},
year={2025},
eprint={2501.13772},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2501.13772},
}The code in this repository is released under the MIT License.
Jailbreak prompts originate from public datasets (AdvBench, MM-SafetyBench, RedTeam-2K, SafeBench) and comply with their respective licences.


