Re-Imagining Multimodal Instruction Tuning: A Representation View

🔥🔥🔥 Our arxiv version is currently available. Please check it out! 🔥🔥🔥

Figure1: Overview of our MRT approach. Representation editors ψ ∈ {ψV , ψc, ψP , ψS} are the only tunable parameters while the entire model remains completely frozen. During fine-tuning, we jointly edit the visual representations in the vision encoder, the cross-modality layer, and the prefix and suffix of textual-oriented fraction in the multimodal representations in the LLM. These editors efficiently and effectively optimize the model representations during multimodal instruction tuning.

Installation

git clone https://github.com/comeandcode/MRT.git
cd MRT

conda create -n mrt python=3.9 -y
conda activate mrt
pip install packaging
pip install -e . --no-cache-dir
pip install numpy==1.26.4
pip install ninja
pip install transformers==4.31.0
pip install torch==2.0.1
pip install flash-attn==2.6.3 --no-build-isolation --no-cache-dir

LLaVA_align Weights

The weigth for stage-1 LLaVA is liuhaotian/llava-pretrain-vicuna-7b-v1.3 and lmsys/vicuna-7b-v1.3 please download it for MRT.

Usage

# Train
sh train.sh

# Eval
sh eval.sh

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
MRT		MRT
images		images
llava		llava
scripts		scripts
README.md		README.md
conda_env_version.txt		conda_env_version.txt
eval.sh		eval.sh
pyproject.toml		pyproject.toml
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Re-Imagining Multimodal Instruction Tuning: A Representation View

Installation

LLaVA_align Weights

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Re-Imagining Multimodal Instruction Tuning: A Representation View

Installation

LLaVA_align Weights

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages