Zhongrui Gui1, Junyu Xie1, Tengda Han1, Weidi Xie 2, Andrew Zisserman1
1 Visual Geometry Group (VGG), University of Oxford
2 School of Artificial Intelligence (SAI), Shanghai Jiao Tong University
In this work, we propose to construct an audio-visual character bank automatically to enable audio-visual recognition of animated characters. We further leverage the results for downstream tasks, including Audio Description (AD) Generation and Character-Aware Subtitling. There are several main components in our work, and we list them below.
- See here for constructing the Audio-Visual Character Bank.
- See here for Audio-Visual Recognition for Animated Characters.
- See here for Application on Downstream Tasks.
- Videos can be downloaded here.
- All annotations and the corresponding meta-information can be found here.
- Evaluation scripts, including Character Box mIoU, Character Name AP, and Audio Recognition AP can be found here. For CRITIC and CIDEr, please refer to the original AutoAD repository.
- The visual character recognition results can be downloaded here.
- The audio character recognition results can be downloaded here.
- The AD predictions (by Qwen2-VL w/ LLaMA3 or VideoLLaMA2 w/ LLaMA3) can be downloaded here.
- The character-aware subtitling results can be downloaded here.
The base environment is mostly based on DINOv2 and SAM2. To set up the required dependencies, please follow the instructions below:
conda env create -f conda.yaml
conda activate animated_ad
cd ..
git clone https://github.com/facebookresearch/sam2.git && cd sam2
pip install -e .This environment is set up for automatic construction of character bank and visual character recognition.
If you find this repository helpful, please consider citing our work! 😊
@article{gui2025character,
title={Character-Centric Understanding of Animated Movies},
author={Gui, Zhongrui and Xie, Junyu and Han, Tengda and Xie, Weidi and Zisserman, Andrew},
journal={arXiv preprint arXiv:2509.12204},
year={2025}
}
AutoAD-Zero: https://github.com/Jyxarthur/AutoAD-Zero
Qwen2-VL: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
LLaMA3: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
