-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Description
Thank you for your great work and sharing it!
Do you have any recommendation to use your models to label audio at a higher resolution, say 1 sec or lower? Or even mel frame level?
I've tried applying your models on short windows but below 5 seconds, the results deteriorate a lot (for 1sec it seems to fail completely). I guess it's because the training AudioSet samples are ~10 seconds long.
I've also tried to modify the model to obtain frame-level predictions but it seems that they all use the "mlp" head and getting rid of the adaptative pooling would require a full retrain?
Thank you in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels