Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

egs2 (Examples of ESPnet2)

How to use?

See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2

Overview of example information

Directory name Corpus name Task Language URL Note
accentdb A Database of Non-Native English Accents Accent Recognition ENG https://accentdb.org/
accented_french_openslr57 African Accented French Corpus ASR FRA https://www.openslr.org/57/
acesinger ACESinger Singing Corpus SVS CMN WIP
aesrc2020 Accented English Speech Recognition Challenge 2020 ASR ENG https://arxiv.org/abs/2102.10233
aidatatang_200zh Aidatatang_200zh A free Chinese Mandarin speech corpus ASR CMN http://www.openslr.org/resources/62
aishell AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus ASR CMN http://www.aishelltech.com/kysjcp
aishell2 AISHELL-2 Open Source Mandarin Speech Corpus ASR CMN https://www.aishelltech.com/aishell_2
aishell3 AISHELL3 Mandarin multi-speaker text-to-speech TTS CMN https://www.openslr.org/93/
aishell4 AISHELL4 Open Source Mandarin Speech Corpus in Conference Scenario ASR/SE CMN https://www.openslr.org/111/
allsstar_eng ALLSSTAR: L1 and L2 Scripted and Spontaneous Transcripts And Recording (scripted English) ASR ENG https://speechbox.linguistics.northwestern.edu/allsstar
ameboshi Ameboshi Ciphyer's singing voice database SVS JPN https://parapluie2c56m.wixsite.com/mysite
americasnlp22 The Second AmericasNLP Competition ASR BZD, GUG, GVC, QWE, TAV http://turing.iimas.unam.mx/americasnlp/st.html
ami The AMI Meeting Corpus ASR ENG http://groups.inf.ed.ac.uk/ami/corpus/
an4 CMU AN4 database ASR/TTS ENG http://www.speech.cs.cmu.edu/databases/an4/
aphasiabank AphasiaBank database (English) ASR ENG https://aphasia.talkbank.org/
arabic_sc Database for Arabic Speech Commands Recognition SLU ARA https://github.com/ltkbenamer/AR_Speech_Database
asvspoof The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database Fak Speech Detection ENG https://datashare.ed.ac.uk/handle/10283/3336
audioset-20k AudioSet Corpus Multi-label classification ENG https://research.google.com/audioset/
audioset AudioSet Corpus Codec ENG https://research.google.com/audioset/
babel IARPA Babel corups ASR ~20 languages https://www.iarpa.gov/index.php/research-programs/babel
bibletts Bible TTS corups TTS 6 Sub-Saharan Africa languages https://masakhane-io.github.io/bibleTTS/
bn_openslr53 Large bengali ASR training dataset ASR BEN https://openslr.org/53/
bur_openslr80 Burmese ASR training dataset ASR BUR https://openslr.org/80/
catslu CATSLU-MAPS SLU CMN https://sites.google.com/view/catslu/home
catslu_entity CATSLU SLU/Entity Classifi. CMN https://sites.google.com/view/catslu/home
clotho_v2 Clotho v2.1 dataset for audio captioning AAC ENG https://zenodo.org/records/4783391
chime1 The 1st CHiME Speech Separation and Recognition Challenge ASR/Multichannel ASR ENG https://spandh.dcs.shef.ac.uk/chime_challenge/chime2011/
chime2 The 2nd CHiME Speech Separation and Recognition Challenge ASR/Multichannel ASR ENG https://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/
chime4 The 4th CHiME Speech Separation and Recognition Challenge ASR/Multichannel ASR ENG http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/
chime6 The 6th CHiME Speech Separation and Recognition Challenge ASR ENG https://chimechallenge.github.io/chime6/
clarity21 The First Clarity Enhancement Challenge CEC1 SE ENG https://claritychallenge.github.io/clarity_CEC1_doc/
cmu_arctic CMU ARCTIC TTS ENG http://www.festvox.org/cmu_arctic/
cmu_indic CMU INDIC TTS 7 languages http://festvox.org/cmu_indic/
cnceleb CN-Celeb SPK CMN https://openslr.elda.org/resources/82/
commonvoice The Mozilla Common Voice ASR 13 languages https://voice.mozilla.org/datasets
conferencingspeech21 Far-field Multi-channel Speech Enhancement Challenge for Video Conferencing (ConferencingSpeech 2021) SE ENG, CMN https://tea-lab.qq.com/conferencingspeech-2021
coraal Corpus of Regional African American Language ASR ENG https://oraal.uoregon.edu/coraal
covost2 Multilingual speech-to-text translation corpus from Common Voice ST lang pairs from 22 https://github.com/facebookresearch/covost
csj Corpus of Spontaneous Japanese ASR JPN https://pj.ninjal.ac.jp/corpus_center/csj/en/
csmsc Chinese Standard Mandarin Speech Copus TTS CMN https://www.data-baker.com/open_source.html
css10 CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages TTS 10 langauges https://github.com/Kyubyong/css10
dcase22_task1 DCASE Task1 2022 Dataset SLU ENG https://dcase.community/challenge2022/task-low-complexity-acoustic-scene-classification
dirha_wsj Distant-speech Interaction for Robust Home Applications Multichannel ASR ENG https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj
dns_ins20 Deep Noise Suppression Challenge – INTERSPEECH 2020 SE 7 languages +singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2020/
dns_icassp21 Deep Noise Suppression Challenge – ICASSP 2021 SE 11 languages + singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2021/
dns_icassp22 Deep Noise Suppression Challenge – ICASSP 2022 SE 11 languages + singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-icassp-2022/
dns_ins20 Deep Noise Suppression Challenge – INTERSPEECH 2020 SE 11 languages + singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/
dns_ins21 Deep Noise Suppression Challenge – INTERSPEECH 2021 SE 11 languages + singing https://www.microsoft.com/en-us/research/academic-program/deep-noise-suppression-challenge-interspeech-2021/
dsing Automatic Lyric Transcription from Karaoke Vocal Tracks (From DAMP Sing300x30x2) ASR (ALT) ENG singing https://github.com/groadabike/Kaldi-Dsing-task
easycom An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Classification ASR ENG https://github.com/facebookresearch/EasyComDataset
edacc THE EDINBURGH INTERNATIONAL ACCENTS OF ENGLISH CORPUS ASR ENG https://groups.inf.ed.ac.uk/edacc/index.html#contribute-section
emilia Speech Dataset for Large-Scale Speech Generation TTS 6 langauges https://huggingface.co/datasets/amphion/Emilia-Dataset
esc50 Dataset for Environmental Sound Classification Audio Classification https://github.com/karolpiczak/ESC-50
fisher_callhome_spanish Fisher and CALLHOME Spanish--English Speech Translation ASR/ST SPA->ENG https://catalog.ldc.upenn.edu/LDC2014T23
fleurs Few-shot Learning Evaluation of Universal Representations of Speech ASR/Multilingual 102 languages https://huggingface.co/datasets/google/fleurs
freesound Speech Command & Freesound for VAD English https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/datasets.html#speech-command-freesound-for-vad
fsc Fluent Speech Commands Dataset SLU ENG https://fluent.ai/fluent-speech-commands-a-dataset-for-spoken-language-understanding-research/
fsc_challenge Fluent Speech Commands Dataset MASE Eval Challenge splits SLU ENG https://github.com/maseEval/mase
fsc_unseen Fluent Speech Commands Dataset MASE Eval Unseen splits SLU ENG https://github.com/maseEval/mase
Genshin Genshin dataset: dubbing audio of video game Genshin Impact, widely used in GSV community. Contain 100h+ high quality and emotional audio. TTS ENG/CHN/JPN/KOR https://github.com/AI-Hobbyist/Genshin_Datasets
gigaspeech GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio ASR ENG https://github.com/SpeechColab/GigaSpeech
googlei18n_lowresource Googlei18n crowdsource project TTS ENG https://github.com/mirumee/google-i18n-address (most in openslr as separate entries)
grabo Grabo dataset SLU ENG + NLD https://www.esat.kuleuven.be/psi/spraak/downloads/
gramvaani GramVaani ASR Challenge 2022 ASR HI https://sites.google.com/view/gramvaaniasrchallenge/dataset
harpervalley HarperValleyBank: A Domain-Specific Spoken Dialog Corpus SLU ENG https://github.com/cricketclub/gridspace-stanford-harper-valley
Hi-Fi TTS a multi-speaker English dataset for training text-to-speech models TTS ENG https://www.openslr.org/109/
hkust HKUST/MTS: A very large scale Mandarin telephone speech corpus ASR CMN https://catalog.ldc.upenn.edu/LDC2005S15
how2 How2: A Large-scale Dataset for Multimodal Language Understanding ASR/MT/ST ENG->POR https://github.com/srvk/how2-dataset
how2_2000h How2_2000h fbank features ASR/SUM ENG->POR https://arxiv.org/pdf/2110.06263.pdf
hub4_spanish 1997 Spanish Broadcase News Speech ASR SPA https://catalog.ldc.upenn.edu/LDC98S74
hui_acg HUI-audio-corpus-german TTS DEU https://opendata.iisys.de/datasets.html#hui-audio-corpus-german
iam IAM Handwriting Database 3.0 OCR ENG https://fki.tic.heia-fr.ch/databases/iam-handwriting-database
iban Iban language text and speech corpora for ASR ASR IBA https://www.openslr.org/24/
iemocap IEMOCAP database: The Interactive Emotional Dyadic Motion Capture database SLU ENG https://sail.usc.edu/iemocap/
indic_speech IndicSpeech: Text-to-Speech Corpus for Indian Languages TTS 3 indic languages http://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages
interspeech2024_dsu_challenge Interspeech2024 speech processing using discrete speech unit challenge (ASR track) ASR/Multilingual ASR 145 languages https://www.wavlab.org/activities/2024/Interspeech2024-Discrete-Speech-Unit-Challenge/
itako Itako Singing voice synthesis corpus SVS JPN https://zunko.jp/itadev/login.php
iwslt14 IWSLT14 MT shared task MT DEU->ENG http://dl.fbaipublicfiles.com/fairseq/data/iwslt14/de-en.tgz
iwslt21_low_resource ALFFA, IARPA Babel, Gamayun, IWSLT 2021 ASR SWA http://www.openslr.org/25/ https://catalog.ldc.upenn.edu/LDC2017S05 https://gamayun.translatorswb.org/data/ https://iwslt.org/2021/low-resource
iwslt22_dialect IWSLT2022 dialectal speech translation shared task ASR/ST ARA->Tunisian ARA https://github.com/kevinduh/iwslt22-dialect.git
iwslt22_low_resource IWSLT2022 Low-resource speech translation track task ST Tamasheq->FrenchPermalink https://github.com/mzboito/IWSLT2022_Tamasheq_data.git
iwslt24_indic IWSLT2024 Indic speech translation track ST ENG -> HIN, BEN, TAM https://iwslt.org/2024/indic
jdcinal Japanese Dialogue Corpus of Information Navigation and Attentive Listening Annotated with Extended ISO-24617-2 Dialogue Act Tags SLU JPN http://www.lrec-conf.org/proceedings/lrec2018/pdf/464.pdf http://tts.speech.cs.cmu.edu/awb/infomation_navigation_and_attentive_listening_0.2.zip
jkac J-KAC: Japanese Kamishibai and audiobook corpus TTS JPN https://sites.google.com/site/shinnosuketakamichi/research-topics/j-kac_corpus
jmd JMD: Japanese multi-dialect corpus for speech synthesis TTS JPN https://sites.google.com/site/shinnosuketakamichi/research-topics/jmd_corpus
jsss JSSS: Japanese speech corpus for summarization and simplification TTS JPN https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus
jsut Japanese speech corpus of Saruwatari-lab., University of Tokyo ASR/TTS JPN https://sites.google.com/site/shinnosuketakamichi/publication/jsut
jsut_song JSUT-song corpus SVS JPN https://sites.google.com/site/shinnosuketakamichi/publication/jsut-song
jtubespeech Japanese YouTube Speech corpus ASR/TTS JPN
jv_openslr35 Javanese ASR JAV http://www.openslr.org/35
jvs JVS (Japanese versatile speech) corpus TTS JPN https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus
kathbath Kathbath dataset ASR 12 Indian langauges https://ai4bharat.iitm.ac.in/indic-superb
kising KiSing-v2 Corpus (ACESinger augmented) SVS CMN WIP
kosp2e Kosp2e: Korean Speech to English Translation Corpus ASR KOR https://github.com/warnikchow/kosp2e
ksponspeech KsponSpeech (Korean spontaneous speech) corpus ASR KOR https://aihub.or.kr/aidata/105
ksc Kazakh speech corpus ASR KAZ
kss Korean single speaker corpus TTS KOR https://www.kaggle.com/bryanpark/korean-single-speaker-speech-dataset
l3das22 L3DAS22: Machine Learning for 3D Audio Signal Processing - ICASSP 2022 SE ENG https://www.l3das.com/icassp2022/
laborotv LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) ASR JPN https://laboro.ai/column/eg-laboro-tv-corpus-jp
libriheavy_medium Libriheavy medium subset ASR ENG https://github.com/k2-fsa/libriheavy
libriheavy_small Libriheavy small subset ASR ENG https://github.com/k2-fsa/libriheavy
librilight_limited Librilight-limited subset ASR ENG https://dl.fbaipublicfiles.com/librilight/data/librispeech_finetuning.tgz
librimix LibriMix: An Open-Source Dataset for Generalizable Speech Separation SE/DIAR ENG https://github.com/JorisCos/LibriMix
librispeech LibriSpeech ASR corpus ASR ENG http://www.openslr.org/12
librispeech_100 LibriSpeech ASR corpus 100h subset ASR ENG http://www.openslr.org/12
libritts LibriTTS corpus TTS ENG http://www.openslr.org/60
libritts_r LibriTTS-R corpus TTS ENG http://www.openslr.org/141
ljspeech The LJ Speech Dataset TTS ENG https://keithito.com/LJ-Speech-Dataset/
lrs2 The Oxford-BBC Lip Reading Sentences 2 (LRS2) Dataset Lipreading/ASR ENG https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html
lrs3 The Oxford-BBC Lip Reading Sentences 3 (LRS3) Dataset ASR ENG https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html
lt_slurp_spatialized Spatialized Libri-Trans and Spatialized SLURP (LT-S and SLURP-S), Enhancement for Translation and Understanding Dataset SE/ST/SLU ENG
lt_speech_commands Lithuanian Speech Commands dataset LIT https://github.com/kolesov93/lt_speech_commands
m4singer Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus SVS CMN https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view?usp=share_link
magicdata MAGICDATA Mandarin Chinese Read Speech Corpus ASR ENG https://www.openslr.org/68/
makerere Makerere Radio Speech Corpus ASR LUG https://zenodo.org/records/5855017
media MEDIA speech database for French SLU/Entity Classifi. FRA https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/
mediaspeech MediaSpeech: Multilanguage ASR Benchmark and Dataset ASR FRA https://www.openslr.org/108/
meld MELD: Multimodal EmotionLines Dataset SLU ENG https://affective-meld.github.io/
microsoft_speech Microsoft Speech Corpus (Indian languages) ASR 3 languages https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e
mini_an4 Mini version of CMU AN4 database for the integration test ASR/TTS/SE ENG http://www.speech.cs.cmu.edu/databases/an4/
mini_librispeech Mini version of Librispeech corpus DIAR ENG https://openslr.org/31/
misp2021 Multimodal Information Based Speech Processing (MISP) Challenge 2021 ASR/AVSR MAL https://mispchallenge.github.io/
ml_openslr63 Crowdsourced high-quality Malayalam multi-speaker speech data ASR MAL https://openslr.org/63/
mls MLS (A large multilingual corpus derived from LibriVox audiobooks) ASR 8 languages http://www.openslr.org/94/
mr_openslr64 OpenSLR Marathi Corpus ASR MAR http://www.openslr.org/64/
ms_indic_is18 Microsoft Speech Corpus (Indian languages) ASR 3 langs: TEL TAM GUJ https://msropendata.com/datasets/7230b4b1-912d-400e-be58-f84e0512985e
ml_superb Multilingual SUPERB benchmark ASR 145 languages https://multilingual.superbbenchmark.org
ml_superb2 Multilingual SUPERB 2.0 Interspeech 2024 Challenge ASR 154 languages https://multilingual.superbbenchmark.org
mucs21_subtask1 MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages ASR 6 indian languages https://navana-tech.github.io/MUCS2021/challenge_details.html
mucs21_subtask2 MUltilingual and Code-Switching ASR Challenges for Low Resource Indian Languages ASR 2 codeswitching data https://navana-tech.github.io/MUCS2021/challenge_details.html
musdb18 Music source separation corpus and codec corpus ENH/Codec ENG https://sigsep.github.io/datasets/musdb.htmlmust-c/
must_c https://ict.fbk.eu/must-c/ ASR/MT/ST ENG->14langs https://ict.fbk.eu/must-c/
must_c_v2 https://ict.fbk.eu/must-c/ ASR/MT/ST ENG->DEU https://ict.fbk.eu/must-c/
mustard MUStARD: Multimodal Sarcasm Detection Dataset SLU ENG https://github.com/soujanyaporia/MUStARD/
mustard_plus_plus A Multimodal Corpus for Emotion Recognition in Sarcasm SLU ENG https://github.com/cfiltnlp/MUStARD_Plus_Plus/
myst My Science Tutor (MyST) Children's Conversational Speech Corpus ASR ENG https://catalog.ldc.upenn.edu/LDC2021S05
nit_song070 The NITech Japanese speech database SVS JPN http://hts.sp.nitech.ac.jp/archives/2.3/HTS-demo_NIT-SONG070-F001.tar.bz2
nsc National Speech Corpus ASR ENG-SG https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus
ofuton_p_utagoe_db Ofuton_p_utagoe Singing voice synthesis corpus SVS JPN https://sites.google.com/view/oftn-utagoedb/%E3%83%9B%E3%83%BC%E3%83%A0
ogi_kids_speech speech from 1100 children between Kindergarten and Grade 10 ASR ENG https://catalog.ldc.upenn.edu/LDC2007S18
oniku_kurumi_utagoe_db Oniku Singing voice synthesis corpus SVS JPN http://onikuru.info/db-download/
open_li110 Corpus combination with 110 languages Multilingual ASR 100+ languages
open_li52 Corpus combination with 52 languages(Commonvocie + voxforge) Multilingual ASR 52 languages
opencpop Opencpop: Mandarin singing voice synthesis corpus SVS CMN https://wenet.org.cn/opencpop/
pjs Phoneme-balanced Japanese Singing-voice corpus SVS JPN https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus
polyphone_swiss_french Swiss French Polyphone corpus ASR FRA http://catalog.elra.info/en-us/repository/browse/ELRA-S0030_02
portmedia_dom PortMedia French corpus SLU/Entity Classifi. FRA https://catalogue.elra.info/en-us/repository/browse/ELRA-S0371/
portmedia_lang PortMedia Italian corpus SLU/Entity Classifi. ITA https://catalogue.elra.info/en-us/repository/browse/ELRA-S0371/
powsm IPAPack++, speech corpus with 17k hours of normalized phone transcriptions S2T 86 languages https://huggingface.co/anyspeech
primewords_chinese Primewords Chinese Corpus Set 1 ASR CMN https://www.openslr.org/47/
puebla_nahuatl Highland Puebla Nahuatl corpus (endangered language in central Mexico) ASR/ST HPN https://www.openslr.org/92/
qasr_tts TTS character based system using semi-supervised data selection TTS ARA https://arabicspeech.org/qasr_tts
rats RATS Speaker Identification SPK 5 languages https://catalog.ldc.upenn.edu/LDC2021S08
reasonspeech ReazonSpeech: Japanese Corpus collected from TV Programs ASR JPN https://research.reazon.jp/projects/ReazonSpeech/
reverb REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge ASR ENG https://reverb2014.dereverberation.com/
ru_open_stt Russian Open Speech To Text (STT/ASR) Dataset ASR RUS https://github.com/snakers4/open_stt
ruslan RUSLAN: Russian Spoken Language Corpus For Speech Synthesis TTS RUS https://ruslan-corpus.github.io/
sdsv21 SdSV 2021: Short-duration Speaker Verification (SdSV) Challenge 2021 SPK 10+ Languages https://sdsvc.github.io/
seame SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia ASR ENG + CMN https://catalog.ldc.upenn.edu/LDC2015S04
sinhala Sinhala speech recognition corpus ASR SIN https://drive.google.com/file/d/17_e0JhMW4_FPxfh93foplnxb4OQp8zh3/view?usp=sharing
siwis SIWIS: Spoken Interaction with Interpretation in Switzerland TTS FRA https://datashare.ed.ac.uk/handle/10283/2353
slue-voxceleb SLUE: Spoken Language Understanding Evaluation SLU ENG https://github.com/asappresearch/slue-toolkit
slue-voxpopuli SLUE: Spoken Language Understanding Evaluation SLU ENG https://github.com/asappresearch/slue-toolkit
slurp SLURP: A Spoken Language Understanding Resource Package SLU ENG https://github.com/pswietojanski/slurp
slurp_entity SLURP: A Spoken Language Understanding Resource Package SLU/Entity Classifi. ENG https://github.com/pswietojanski/slurp
slurp_spatialized Spatialized SLURP (SLURP-S), Noisy Reverberan Spoken Language Understanding Dataset SLU ENG
sms_wsj SMS-WSJ: A database for in-depth analysis of multi-channel source separation algorithms SE ENG https://github.com/fgnt/sms_wsj
snips SNIPS: A dataset for spoken language understanding SLU ENG https://github.com/sonos/spoken-language-understanding-research-datasets
speechcommands Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition SLU ENG https://www.tensorflow.org/datasets/catalog/speech_commands
spgispeech SPGISpeech 5k corpus ASR ENG https://datasets.kensho.com/datasets/scribe
spring_speech SPRING-INX: Data for Indian Languages ASR ENG https://asr.iitm.ac.in/dataset
stop STOP: Spoken Task Oriented Parsing SLU ENG https://facebookresearch.github.io/spoken_task_oriented_parsing/
su_openslr36 Sundanese ASR SUN http://www.openslr.org/36
swbd Switchboard Corpus for 2-channel Conversational Telephone Speech (300h) ASR ENG https://catalog.ldc.upenn.edu/LDC97S62
swbd_da NXT Switchboard Annotations SLU ENG https://catalog.ldc.upenn.edu/LDC2009T26
swbd_sentiment Speech Sentiment Annotations SLU ENG https://catalog.ldc.upenn.edu/LDC2020T14
talromur Talromur: A large Icelandic TTS corpus TTS ISL https://repository.clarin.is/repository/xmlui/handle/20.500.12537/104, https://aclanthology.org/2021.nodalida-main.50.pdf
talromur2 Talromur 2: Icelandic multi-speaker TTS corpus TTS ISL https://repository.clarin.is/repository/xmlui/handle/20.500.12537/167
tedlium2 TED-LIUM corpus release 2 ASR ENG https://www.openslr.org/19/, http://www.lrec-conf.org/proceedings/lrec2014/pdf/1104_Paper.pdf
tedlium3 TED-LIUM corpus release 3 ASR ENG https://www.openslr.org/51/
tedx_spanish_openslr67 TEDx Spanish Corpus ASR SPA https://www.openslr.org/67/
thchs30 A Free Chinese Speech Corpus Released by CSLT@Tsinghua University ASR/TTS CMN https://www.openslr.org/18/
timit TIMIT Acoustic-Phonetic Continuous Speech Corpus ASR/UASR ENG https://catalog.ldc.upenn.edu/LDC93S1
totonac Highland Totonac corpus (endangered language in central Mexico) ASR TOS http://www.openslr.org/107/
tsukuyomi つくよみちゃんコーパス TTS JPN https://tyc.rei-yumesaki.net/material/corpus
universal_se_v1 Combination of Multi-condition English Corpora (vctk_noisy, dns_ins20, chime4, reverb, whamr) SE ENG
urgent2024 Multi-domain simulated speech enhancement data for the URGENT 2024 Challenge SE ENG https://urgent-challenge.github.io/urgent2024/data/
vctk English Multi-speaker Corpus for CSTR Voice Cloning Toolkit ASR/TTS ENG http://www.udialogue.org/download/cstr-vctk-corpus.html
vctk_reverb Reverberant speech database (48kHz) SE ENG https://datashare.ed.ac.uk/handle/10283/2826
vctk_noisyreverb Noisy reverberant speech database (48kHz) SE ENG https://datashare.ed.ac.uk/handle/10283/2826
vivos VIVOS (Vietnamese corpus for ASR) ASR VIE https://doi.org/10.5281/zenodo.7068130
voices VOiCES ASR/SPK ENG https://iqtlabs.github.io/voices/
voxceleb VoxCeleb SPK 10+ languages https://mm.kaist.ac.kr/datasets/voxceleb/
voxforge VoxForge ASR 7 languages http://www.voxforge.org/
voxlingua107 VoxLingua107 LID 107 languages https://cs.taltech.ee/staff/tanel.alumae/data/voxlingua107/
wenetspeech WenetSpeech: A 10000+ Hours Multi-domain Chinese Corpus for Speech Recognition ASR CMN https://wenet-e2e.github.io/WenetSpeech/
wham The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset SE ENG https://wham.whisper.ai/
whamr WHAMR!: Noisy and Reverberant Single-Channel Speech Separation SE ENG https://wham.whisper.ai/
wsj CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete ASR ENG https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A
wsj0_2mix MERL WSJ0-mix multi-speaker dataset ASR/SE ENG http://www.merl.com/demos/deep-clustering
wsj0_2mix_spatialized MERL WSJ0-mix multi-speaker dataset (Spatialized version) ASR/Multichannel ASR/SE ENG http://www.merl.com/demos/deep-clustering
wsj_kinect Kinect WSJ: Multichannel, Reverberated and Noisy Extension to the WSJ0-2mix dataset SE ENG https://github.com/sunits/Reverberated_WSJ_2MIX
yesno The "yesno" corpus ASR HEB http://www.openslr.org/1
yoloxochitl_mixtec Yoloxochitl-Mixtec corpus (endangered language in central Mexico) ASR XTY http://www.openslr.org/89
zeroth_korean Zeroth-Korean ASR KOR http://www.openslr.org/40
zh_openslr38 ST-CMDS-20170001_1, Free ST Chinese Mandarin Corpus ASR CMN http://www.openslr.org/38
gtsinger A Global Multi-Technique Singing Corpus with Realistic Music Scores SVS 9 languages https://huggingface.co/datasets/GTSinger/GTSinger
galaxy A Large-scale Open-Domain Dataset for Multimodal Learning ASR/AVSR ENG https://github.com/wyh2000/GALAXY