Skip to content

VoxBlink2/ScriptsForVoxBlink2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The VoxBlink2 Dataset

The VoxBlink2 dataset is a Large Scale speaker recognition dataset with 100K+ speakers obtained from YouTube platform. This repository provides guidelines to build the corpus and relative resources to reproduce the results in our article . For more introduction, please see cite. If you find this repository helpful to your research, don't forget to give us star🌟.

Resource

Let's start with obtaining the resource files and decompressing tar-files.

tar -zxvf spk_info.tar.gz
tar -zxvf vb2_meta.tar.gz 
tar -zxvf asr_res.tar.gz

File structure

% The file structure is summarized as follows: 
|---- data               
|     |---- ossi    # [Folder]evaluation protocols for open-set speaker identification
|     |---- test_vox # [Folder] evaluation protocols for speaker verification
|     |---- spk2videos	# [spk,video1,video2,...]
|---- ckpt #checkpoints for evaluation
|     |---- ecapatdnn # [Folder]
|     |---- resnet34 # [Folder]
|     |---- resnet100 # [Folder]
|     |---- resnet293 # [Folder]
|     |---- face_model # [Folder]
|---- spk_info             # video'tags of speakers:
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- asr_res            # ASR annotations by Whisper:
|     |---- id000000	
|     |---- id000001	
|     |---- ...
|---- meta		# timestamps for video/audio cropping
|     |---- id000000	# spkid
|           |---- DwgYRqnQZHM	#videoid
|                 |---- 00000.txt	#uttid
|                 |---- ...
|           |---- ... 
|     |---- ...	
|---- face_id            # face_identification modules
|     |---- api.py # corresponding inference functions
|     |---- arcface.py # corresponding model definitions
|     |---- README.md 
|     |---- test.py # Test
|---- ossi            # video'tags of speakers:
|     |---- eval.py # recipe for evaluate openset speaker identification
|     |---- utils.py 
|     |---- example.npy # eg. Resnet34-based embedding for evaluate OSSI 
|---- audio_cropper.py	# extract audio-only segments by timestamps from downloaded audios
|---- video_cropper.py	# extract audio-visual segments by timestamps from downloaded videos
|---- downloader.py	# scripts for download videos
|---- LICENSE		# license
|---- README.md	
|---- requirement.txt			

Download

The following procedures show how to construct your VoxBlink2

Pre-requisites

  1. Install ffmpeg:
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ffmpeg
  1. Install Python library:
pip install -r requirements.txt
  1. Download videos

We provide two alternatives for you to download video or audio-only segments. We Also leverage multi-thread to facilate download process.

  • For Audio-Visual
python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode video
  • For Audio-Only
python downloader.py --base_dir ${BASE_DIR} --num_workers 4 --mode audio
  1. Crop Audio/Videos
  • For Audio-Visual
python cropper_video.py --save_dir_audio ${SAVE_PATH_AUDIO} --save_dir_video ${SAVE_PATH_VIDEO} --timestamp_path meta --video_root=${BASE_DIR} --num_workers 4
  • For Audio-Only
python cropper_audio.py --save_dir ${SAVE_PATH_AUDIO} --timestamp_path meta --audio_root=${BASE_DIR} --num_workers 4

FID Evaluation

We provide simple scripts of our face identification model, which is adopted in curating VoxBlink2. For more, please look at fid.

SV Evaluation

We provide simple scripts for model evaluation of ASV, just execute run_eval.sh in asv folder. For more, please look at asv.

Open-Set Speaker Identification Evaluation

We provide simple scripts for model evaluation of our proposed task: Open-Set Speaker-Identification(OSSI). just execute run_eval_ossi.sh in ossi folder. For more, please look at ossi.

License

The dataset is licensed under the CC BY-NC-SA 4.0 license. This means that you can share and adapt the dataset for non-commercial purposes as long as you provide appropriate attribution and distribute your contributions under the same license. Detailed terms can be found here.

Important Note: Our released dataset only contains annotation data, including the YouTube links, time stamps and speaker labels. We do not release audio or visual data and it is the user's responsibility to decide whether and how to download the video data and whether their intended purpose with the downloaded data is legal in their country. For YouTube users with concerns regarding their videos' inclusion in our dataset, please contact us via E-mail: [email protected] or [email protected].

Citation

Please cite the paper below if you make use of the dataset:

@misc{lin2024voxblink2100kspeakerrecognition,
      title={VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark}, 
      author={Yuke Lin and Ming Cheng and Fulin Zhang and Yingying Gao and Shilei Zhang and Ming Li},
      year={2024},
      eprint={2407.11510},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2407.11510}, 
}

About

Official Repository For VoxBlink2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors