Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Pytorch implementation for cross-modal generalization across language, visual, and audio modalities.

Correspondence to:

Paul Liang ([email protected])
Peter Wu ([email protected])

Paper

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment
Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, and Ruslan Salakhutdinov
ACM Multimedia 2021, NeurIPS 2020 Workshop on Meta Learning

If you find this repository useful, please cite our paper:

@inproceedings{liang2021cross,
  title={Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment},
  author={Liang, Paul Pu and Wu, Peter and Ziyin, Liu and Morency, Louis-Philippe and Salakhutdinov, Ruslan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={2680--2689},
  year={2021}
}

Installation

First check that the requirements are satisfied:
Python 3.6
torch 1.2.0
numpy 1.18.1
sklearn 0.20.0
matplotlib 3.1.2
gensim 3.8.0

The next step is to clone the repository:

git clone https://github.com/peter-yh-wu/xmodal.git

Background

The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i.e. meta-learning) and (2) doing so while being trained on a different source modality. We study a key research question: how can we ensure generalization across modalities despite using separate encoders for different source and target modalities? Our solution is based on meta-alignment, a novel method to align representation spaces using strongly and weakly paired cross-modal data while ensuring quick generalization to new tasks across different modalities. Our results demonstrate strong performance even when the new target modality has only a few (1-10) labeled samples and in the presence of noisy labels, a scenario particularly prevalent in low-resource modalities.

Reproducing Results

cd src
./download_recipe.sh
python3 preprocess_recipe.py
python3 mk_eval_datasets.py -n 3 --seed 0 --train-shots 5 --eval-tasks 8
python3 main.py --seed 0 --iseed 0 --cuda 0 --classes 3 --train-shots 5 --batch 8 --meta-lr 1e-4 --lr-clf 1e-4 --eval-tasks 8 --iterations 9 --test-iterations 9 -l tri --margin 0.5
classes and iseed in main.py should match n and seed in mk_eval_datasets.py, respectively.
To run ablation studies, add the respective tag to main.py, e.g. --reptile1 for Oracle.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Paper

Installation

Background

Reproducing Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

peter-yh-wu/xmodal

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

Paper

Installation

Background

Reproducing Results

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages