Please download pretrained audio encoders from PANNs or HTSAT. We have also uploaded our used audio encoders here.
Put them under pretrained_models/audio_encoders.
-
You can configure training settings in yaml files under
settingsdirectory. -
For our dataloader, we use json files, and the
audiokey refers to the path of the audio clip in your computer or server. -
Run
pretrain.pyfor pretraining, andtrain.pyfor finetuning or training from scratch. -
Fro evaluating audio captions, please prepare COCO caption evaluation tools by yourself.
We provide pretrained audio captioning models for reproducing results.
Pretrained models can be downloaded at Google Drive