Skip to content

Latest commit

 

History

History

README.md

Audio Feature Extraction

This folder provides scripts for extracting MERT-based audio features—the representation used by CLaMP 3’s audio encoder. These features are generated using the MERT-v1-95M model, which processes audio into 5-second non-overlapping segments and averages across all layers and time steps to produce a single feature per segment.

1. Download the MERT Model

Download MERT-v1-95M model from Hugging Face.

Step 1: Extracts MERT features from audio files.

  • Execution:
    Run the script using the following command:
    python extract_mert.py --input_path <input_path> --output_path <output_path> --model_path m-a-p/MERT-v1-95M --mean_features
  • Input: Audio files (.mp3, .wav).
  • Output: MERT-extracted features (.npy).