Authors: Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim
Paper: https://arxiv.org/abs/2411.15224
Requirements:
- Linux
- NVIDIA GPU
- PyTorch 1.12+
- CUDA 11.6+
conda create -n ProDiaL python=3.9
pip install -r requirements.txtCUDA_VISIBLE_DEVICES=0 python train.py \
--model_path="state-spaces/mamba-130m" \
--tokenizer_path="EleutherAI/gpt-neox-20b" \
--instruction_datasets="[hellaswag]" \
--output_dir="outputs" \
--random_seed=42 \
--sequence_max_length=512 \
--save_steps=1 \
--batch_size=4 \
--cache_dir="huggingface" \
--num_epochs=21 \
--weight_decay=0.01 \
--learning_rate=1e-4 \
--dropout_rate=0.1 \
--logging_steps=100 \
--config_path="configs/130m" \
--r_b1=768 \
--r_b2=1536 \
--off_diagonal_rank=16 \bash train_hellaswag.sh
conda create -n eval_ProDiaL python=3.9
pip install lm-eval==0.4.2
pip install causal_conv1d-1.5.0.post8
pip install mamba-ssm==1.2.0.post1Run evaluation with (more documentation at the lm-evaluation-harness repo):
python evals/lm_harness_eval.py --model mamba_ssm --tasks hellaswag --device cuda --batch_size 256 --seed 42 --model_args pretrained=state-spaces/mamba-130mFor evaluating multiple checkpoints at once,
bash eval_hellaswag.sh@article{ham2024parameter,
title={Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation},
author={Ham, Seokil and Kim, Hee-Seon and Woo, Sangmin and Kim, Changick},
journal={arXiv preprint arXiv:2411.15224},
year={2024}
}
This codebase was partially adapted from the following repositories:
- (https://github.com/state-spaces/mamba) (Apache 2.0 License)
- (https://github.com/sangHa0411/Llama-Instruction-Tuning) (MIT License)
We thank the authors for open-sourcing their work.