code

CLaMP 3 Codebase

CLaMP 3 is designed to enhance music information retrieval (MIR) across various musical modalities and languages. By leveraging contrastive learning, it aligns sheet music, audio, and multilingual text into a shared representation space, achieving top performance on MIR tasks. This codebase covers configuration, training, and feature extraction scripts for CLaMP 3.

Download Pre-trained Weights:

Note: M3 is the symbolic music encoder. If you don't need to retrain CLaMP 3 or fine-tune M3, you can skip it.

Repository Structure

The code/ folder contains the following scripts:

1. config.py

This script holds the training hyperparameters and file paths for the main training scripts:

Key Points:

Default Configuration: Set to the SAAS version, which is optimal for audio data.
Switching Variants: For better performance with symbolic music, switch to the C2 variant by modifying line 66 in config.py (change saas to c2).

2. Training Scripts

a. train_clamp3_audio.py & train_clamp3_symbolic.py

These scripts manage the training of CLaMP 3 based on the modality:

Audio Training: Use train_clamp3_audio.py for MERT-extracted audio features (.npy).
Symbolic Music Training: Use train_clamp3_symbolic.py for symbolic music data (.abc, .mtf).

Core Components:

Text Encoder: Based on XLM-R-base for cross-lingual processing (up to 128 tokens).
Symbolic Music Encoder (M3): Processes ABC and MIDI patches (up to 512 ABC bars or MIDI messages).
Audio Encoder: A transformer trained on MERT features (up to 640 seconds of audio).

Each encoder generates a global semantic feature via average pooling. Training and evaluation data paths are defined in config.py using TRAIN_JSONL and EVAL_JSONL.

Training Commands:

For Symbolic Music:

python -m torch.distributed.launch --nproc_per_node=<number_of_GPUs> --use_env train_clamp3_symbolic.py

For Audio:

python -m torch.distributed.launch --nproc_per_node=<number_of_GPUs> --use_env train_clamp3_audio.py

b. train_m3.py

This script trains the M3 model, which encodes interleaved ABC and MTF files.

Key Points:

Specify the training and evaluation directories in the TRAIN_FOLDERS and EVAL_FOLDERS variables.
Note: Retraining M3 is generally unnecessary for most users, as the pre-trained M3 model is typically sufficient.

Training Command:

python -m torch.distributed.launch --nproc_per_node=<number_of_GPUs> --use_env train_m3.py

3. extract_clamp3.py

This script uses the pre-trained CLaMP 3 model to extract representations from multiple modalities:

Text (.txt)
Sheet Music (.abc)
MIDI (.mtf)
Pre-extracted Audio Features (.npy)

Preprocessing Guidelines:

Text Files: Processed directly.
Sheet Music (.abc): Convert to interleaved ABC notation using scripts in preprocessing/abc/.
MIDI (.mtf): Process with batch_midi2mtf.py.
Audio (.npy): Features extracted with extract_mert.py.

Feature Extraction Options:

Global Semantic Vectors: Via average pooling and a linear layer for classification/retrieval tasks.
Temporal Features: Retain hidden states from the last layer if needed.

Usage:

accelerate launch extract_clamp3.py --epoch <epoch> <input_dir> <output_dir> --get_global

--epoch <epoch>: (Optional) Specify the checkpoint epoch.
<input_dir>: Directory containing the input files.
<output_dir>: Destination folder for the output .npy features.
--get_global: (Required for retrieval!) Extracts a global semantic vector for each input.

4. extract_m3.py

This script extracts representations from sheet music and MIDI data using the pre-trained M3 model.

Key Points:

Processes interleaved ABC notation and MTF formats.
Saves the extracted features as .npy files.
Retains only temporal information (each feature corresponds to a patch such as a bar or MIDI message).

Usage:

accelerate launch extract_m3.py <input_dir> <output_dir>

<input_dir>: Directory containing input files (.abc or .mtf).
<output_dir>: Destination folder for the extracted features.

5. utils.py

This utility script includes various classes and functions supporting model definitions and training utilities across the CLaMP 3 codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

CLaMP 3 Codebase

Repository Structure

1. config.py

2. Training Scripts

a. train_clamp3_audio.py & train_clamp3_symbolic.py

b. train_m3.py

3. extract_clamp3.py

4. extract_m3.py

5. utils.py

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
config.py		config.py
extract_clamp3.py		extract_clamp3.py
extract_m3.py		extract_m3.py
train_clamp3_audio.py		train_clamp3_audio.py
train_clamp3_symbolic.py		train_clamp3_symbolic.py
train_m3.py		train_m3.py
utils.py		utils.py

FilesExpand file tree

code

Directory actions

More options

Directory actions

More options

Latest commit

History

code

Folders and files

parent directory

README.md

CLaMP 3 Codebase

Repository Structure

1. config.py

2. Training Scripts

a. train_clamp3_audio.py & train_clamp3_symbolic.py

b. train_m3.py

3. extract_clamp3.py

4. extract_m3.py

5. utils.py