audio

Audio Feature Extraction

This folder provides scripts for extracting MERT-based audio features—the representation used by CLaMP 3’s audio encoder. These features are generated using the MERT-v1-95M model, which processes audio into 5-second non-overlapping segments and averages across all layers and time steps to produce a single feature per segment.

1. Download the MERT Model

Download MERT-v1-95M model from Hugging Face.

2. extract_mert.py

Step 1: Extracts MERT features from audio files.

Execution:
Run the script using the following command:

python extract_mert.py --input_path <input_path> --output_path <output_path> --model_path m-a-p/MERT-v1-95M --mean_features

Input: Audio files (.mp3, .wav).
Output: MERT-extracted features (.npy).

Name		Name	Last commit message	Last commit date
parent directory ..
MERT_utils.py		MERT_utils.py
MusicHubert.py		MusicHubert.py
README.md		README.md
configuration_musichubert.py		configuration_musichubert.py
extract_mert.py		extract_mert.py
hf_pretrains.py		hf_pretrains.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Audio Feature Extraction

1. Download the MERT Model

2. extract_mert.py

FilesExpand file tree

audio

Directory actions

More options

Directory actions

More options

Latest commit

History

audio

Folders and files

parent directory

README.md

Audio Feature Extraction

1. Download the MERT Model

2. extract_mert.py