Skip to content

bytedance/SALMONN

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SALMONN for speech quality assessment

Data preperation

Please download the following datasets first.

Speech quality assessment datasets:

BVCC: How do Voices from Past Speech Synthesis Challenges Compare Today
NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis Creators
(Note: you need to resample audios to 16kHz since the input sampling rate of SALMONN is 16kHz)

Speaker similarity dataset:

VoxSim: A perceptual voice similarity dataset

Then run data processing scripts to generate annotations, you can also refer to our annotations.

Finetuned SALMONN-7B checkpoint

Our finetuned SALMONN-7B checkpoint can also be downloaded here.

Run and inference

Just follow the salmonn branch.

License & CODE_OF_CONDUCT

Please refer to salmonn branch for more details.

✨ Citation

If you find this work useful, please cite our papers.

@inproceedings{wang2024enabling,
  title={Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation},
  author={Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others},
  booktitle={Proc. ICASSP},
  address={Hyderabad},
  year={2025}
}

@inproceedings{wang2024enabling,
  title={QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions},
  author={Wang, Siyin and Yu, Wenyi and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others},
  booktitle={Proc. ACL},
  address={Vienna},
  year={2025}
}