Skip to content

[CVPR 2025 Official Code] Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

License

Notifications You must be signed in to change notification settings

SeokilHam/ProDiaL

Repository files navigation

Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

Authors: Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim

Paper: https://arxiv.org/abs/2411.15224

Installation

Requirements:

  • Linux
  • NVIDIA GPU
  • PyTorch 1.12+
  • CUDA 11.6+
conda create -n ProDiaL python=3.9
pip install -r requirements.txt

Usage

1. Set Hyperparameters in ProDiaL

CUDA_VISIBLE_DEVICES=0 python train.py \
    --model_path="state-spaces/mamba-130m" \
    --tokenizer_path="EleutherAI/gpt-neox-20b" \
    --instruction_datasets="[hellaswag]" \
    --output_dir="outputs" \
    --random_seed=42 \
    --sequence_max_length=512 \
    --save_steps=1 \
    --batch_size=4 \
    --cache_dir="huggingface" \
    --num_epochs=21 \
    --weight_decay=0.01 \
    --learning_rate=1e-4 \
    --dropout_rate=0.1 \
    --logging_steps=100 \
    --config_path="configs/130m" \
    --r_b1=768 \
    --r_b2=1536 \
    --off_diagonal_rank=16 \

Run bash file

bash train_hellaswag.sh

Evaluations

conda create -n eval_ProDiaL python=3.9
pip install lm-eval==0.4.2
pip install causal_conv1d-1.5.0.post8
pip install mamba-ssm==1.2.0.post1

Run evaluation with (more documentation at the lm-evaluation-harness repo):

python evals/lm_harness_eval.py --model mamba_ssm --tasks hellaswag --device cuda --batch_size 256 --seed 42 --model_args pretrained=state-spaces/mamba-130m

For evaluating multiple checkpoints at once,

bash eval_hellaswag.sh

Citation

@article{ham2024parameter,
  title={Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation},
  author={Ham, Seokil and Kim, Hee-Seon and Woo, Sangmin and Kim, Changick},
  journal={arXiv preprint arXiv:2411.15224},
  year={2024}
}

Reference

This codebase was partially adapted from the following repositories:

We thank the authors for open-sourcing their work.

About

[CVPR 2025 Official Code] Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published