The VoxBlink2 Dataset

The VoxBlink2 dataset is a Large Scale speaker recognition dataset with 100K+ speakers obtained from YouTube platform. This repository provides guidelines to build the corpus and relative resources to reproduce the results in our article . For more introduction, please see cite. If you find this repository helpful to your research, don't forget to give us star🌟.

Resource

Let's start with obtaining the resource files and decompressing tar-files.

tar -zxvf spk_info.tar.gz
tar -zxvf vb2_meta.tar.gz 
tar -zxvf asr_res.tar.gz

File structure

% The file structure is summarized as follows: 
|---- data               
|     |---- ossi    # [Folder]evaluation protocols for open-set speaker identification
|     |---- test_vox # [Folder] evaluation protocols for speaker verification
|     |---- spk2videos	# [spk,video1,video2,...]
|---- ckpt #checkpoints for evaluation
|     |---- ecapatdnn # [Folder]
|     |---- resnet34 # [Folder]
|     |---- resnet100 # [Folder]
|     |---- resnet293 # [Folder]
|     |---- face_model # [Folder]
|---- spk_info             # video'tags of speakers：
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- asr_res            # ASR annotations by Whisper：
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- meta		# timestamps for video/audio cropping
|     |---- id000000	# spkid
|           |---- DwgYRqnQZHM	#videoid
|                 |---- 00000.txt	#uttid
|                 |---- ...
|           |---- ... 
|     |---- ...	
|---- face_id            # face_identification modules
|     |---- api.py # corresponding inference functions
|     |---- arcface.py # corresponding model definitions
|     |---- README.md 
|     |---- test.py # Test
|---- ossi            # video'tags of speakers：
|     |---- eval.py # recipe for evaluate openset speaker identification
|     |---- utils.py 
|     |---- example.npy # eg. Resnet34-based embedding for evaluate OSSI 
|---- audio_cropper.py	# extract audio-only segments by timestamps from downloaded audios
|---- video_cropper.py	# extract audio-visual segments by timestamps from downloaded videos
|---- downloader.py	# scripts for download videos
|---- LICENSE		# license
|---- README.md	
|---- requirement.txt

Download

The following procedures show how to construct your VoxBlink2

Pre-requisites

Install ffmpeg:

sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg

Install Python library:

pip install -r requirements.txt

Download videos

We provide two alternatives for you to download video or audio-only segments. We Also leverage multi-thread to facilate download process.

For Audio-Visual

python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode video

For Audio-Only

python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode audio

Crop Audio/Videos

For Audio-Visual

python cropper_video.py --save_dir_audio ${SAVE_PATH_AUDIO} --save_dir_video ${SAVE_PATH_VIDEO} --timestamp_path meta --video_root=${BASE_DIR} --num_workers 4

For Audio-Only

python cropper_audio.py --save_dir ${SAVE_PATH_AUDIO} --timestamp_path meta --audio_root=${BASE_DIR} --num_workers 4

FID Evaluation

We provide simple scripts of our face identification model, which is adopted in curating VoxBlink2. For more, please look at fid.

SV Evaluation

We provide simple scripts for model evaluation of ASV, just execute run_eval.sh in asv folder. For more, please look at asv.

Open-Set Speaker Identification Evaluation

We provide simple scripts for model evaluation of our proposed task: Open-Set Speaker-Identification(OSSI). just execute run_eval_ossi.sh in ossi folder. For more, please look at ossi.

License

The dataset is licensed under the CC BY-NC-SA 4.0 license. This means that you can share and adapt the dataset for non-commercial purposes as long as you provide appropriate attribution and distribute your contributions under the same license. Detailed terms can be found here.

Important Note: Our released dataset only contains annotation data, including the YouTube links, time stamps and speaker labels. We do not release audio or visual data and it is the user's responsibility to decide whether and how to download the video data and whether their intended purpose with the downloaded data is legal in their country. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: [email protected] or [email protected].

Citation

Please cite the paper below if you make use of the dataset:

@misc{lin2024voxblink2100kspeakerrecognition,
      title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark}, 
      author={Yuke Lin and Ming Cheng and Fulin Zhang and Yingying Gao and Shilei Zhang and Ming Li},
      year={2024},
      eprint={2407.11510},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2407.11510}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The VoxBlink2 Dataset

Resource

File structure

Download

Pre-requisites

FID Evaluation

SV Evaluation

Open-Set Speaker Identification Evaluation

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
asv		asv
face_id		face_id
ossi		ossi
LICENSE		LICENSE
README.md		README.md
cropper_audio.py		cropper_audio.py
cropper_video.py		cropper_video.py
downloader.py		downloader.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

The VoxBlink2 Dataset

Resource

File structure

Download

Pre-requisites

FID Evaluation

SV Evaluation

Open-Set Speaker Identification Evaluation

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages