captioning

Codes for Audio Captioning

Please download pretrained audio encoders from PANNs or HTSAT. We have also uploaded our used audio encoders here.

Put them under pretrained_models/audio_encoders.

You can configure training settings in yaml files under settings directory.
For our dataloader, we use json files, and the audio key refers to the path of the audio clip in your computer or server.
Run pretrain.py for pretraining, and train.py for finetuning or training from scratch.
Fro evaluating audio captions, please prepare COCO caption evaluation tools by yourself.

We provide pretrained audio captioning models for reproducing results.

Pretrained models can be downloaded at Google Drive