Skip to content

Tag audio at a higher resolution #3

@adbrebs

Description

@adbrebs

Thank you for your great work and sharing it!

Do you have any recommendation to use your models to label audio at a higher resolution, say 1 sec or lower? Or even mel frame level?

I've tried applying your models on short windows but below 5 seconds, the results deteriorate a lot (for 1sec it seems to fail completely). I guess it's because the training AudioSet samples are ~10 seconds long.

I've also tried to modify the model to obtain frame-level predictions but it seems that they all use the "mlp" head and getting rid of the adaptative pooling would require a full retrain?

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions