Understanding the Emergence of Multimodal Representation Alignment

This is the official codebase for the paper Understanding the Emergence of Multimodal Representation Alignment.

Installation

After cloning the directory, initialize the submodule.

git submodule init; git submodule update

The repo is tested with Python=3.10.15 and PyTorch=2.5.0. A new environment can be created via:

conda create -n align python=3.10.15
conda activate align
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

We additionally provide experiment hyperparameters in hyp folder. To use them, move hyp to experiments/hyp.

Synthetic Dataset

Download our synthetic dataset here and move the data folder under synthetic/data. You can also generate synthetic datasets by running the below:

python synthetic/generate_prob_data.py --template-path synthetic/configs/rus_prob_template.yaml

To train the unimodal models and compute alignment using gpus 0-3, run the following from the root directory. See scripts/tune_rus_prob_optuna.sh for more details.

bash ./scripts/tune_rus_prob_optuna.sh "0,1,2,3" scripts/align_configs/rus_prob.yaml mi=1.159 50 300 2

The experiments can be analyzed with the below script.

python ./scripts/analyze_rus.py

MultiBench Datasets

Use the following links to download the processed affect datasets from MultiBench: sarcasm, humor, mosi, mosei and move the downloaded datasets to the datasets folder. For example, the path to MOSEI should be datasets/mosei/mosei_senti_data.pkl. Our AVMNIST dataset can be downloaded here.

An example workflow is as follows. First, run experiments (each bash command will run the same experiment with a different seed) as follows. See ./scripts/tune_real.sh for more details on the arguments.

bash ./scripts/tune_real.sh 0 scripts/align_configs/sarcasm_norm.yaml 50 0 1 "classification" "classification" 2 &&
bash ./scripts/tune_real.sh 0 scripts/align_configs/sarcasm_norm.yaml 50 0 1 "classification" "classification" 22 && 
bash ./scripts/tune_real.sh 0 scripts/align_configs/sarcasm_norm.yaml 50 0 1 "classification" "classification" 42

After running experiments with multiple seeds, compute the alignment/performance correlation.

python ./scripts/evaluate_multiseed.py --exp-config scripts/align_configs/sarcasm_norm.yaml --modalities 0 1

Vision-Language

See platonic-rep-unique for instructions on setting up and running vision-language experiments.

Additional Experiments

Experiments for other datasets can be run as follows.

bash ./scripts/tune_real.sh 0 scripts/align_configs/mosei_norm.yaml 50 0 1 "regression" "posneg-classification" 2

bash ./scripts/tune_real.sh 0 scripts/align_configs/mosi_norm.yaml 50 0 1 "regression" "posneg-classification" 2

bash ./scripts/tune_real.sh 0 scripts/align_configs/humor_norm.yaml 50 0 1 "classification" "classification" 2

bash ./scripts/tune_real.sh 0 scripts/align_configs/avmnist_mfcc_seq.yaml 100 0 1 "classification" "classification" 2

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
datasets		datasets
eval_scripts		eval_scripts
experiments		experiments
hyp		hyp
platonic-rep-unique @ 15f606f		platonic-rep-unique @ 15f606f
scripts		scripts
synthetic		synthetic
training_structures		training_structures
unimodals		unimodals
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Understanding the Emergence of Multimodal Representation Alignment

Installation

Synthetic Dataset

MultiBench Datasets

Vision-Language

Additional Experiments

About

Uh oh!

Releases

Packages

Languages

MeganTj/multimodal_alignment

Folders and files

Latest commit

History

Repository files navigation

Understanding the Emergence of Multimodal Representation Alignment

Installation

Synthetic Dataset

MultiBench Datasets

Vision-Language

Additional Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages