Tag audio at a higher resolution

Thank you for your great work and sharing it!

Do you have any recommendation to use your models to label audio at a higher resolution, say 1 sec or lower? Or even mel frame level?

I've tried applying your models on short windows but below 5 seconds, the results deteriorate a lot (for 1sec it seems to fail completely). I guess it's because the training AudioSet samples are ~10 seconds long.

I've also tried to modify the model to obtain frame-level predictions but it seems that they all use the "mlp" head and getting rid of the adaptative pooling would require a full retrain?

Thank you in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tag audio at a higher resolution #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tag audio at a higher resolution #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions