Published at the Thirteenth International Conference on Learning Representations (ICLR 2025)
HiRA introduces a novel parameter-efficient fine-tuning (PEFT) method for large language models (LLMs) by leveraging the Hadamard product to achieve high-rank adaptations while keeping the number of trainable parameters low. This method enhances the expressive power of model updates, resulting in improved performance on tasks such as commonsense reasoning, dialogue generation, and mathematical reasoning.
.
├── README.md
├── customized_trainer
│ └── customized_trainer.py # Customized trainer implementation
├── data_file # Raw datasets for various tasks
│ └── ... # (Subfolders include convai2, llm_adapt, etc.)
├── dataset
│ ├── dataset_hg.py # Data loader for heterogeneous datasets
│ ├── dataset_hg_combined.py # Combined dataset handling
│ └── format_inputs.py # Functions for formatting model inputs
├── env.yml # Environment configuration for Conda
├── eval_commonsense.py # Evaluation script for commonsense reasoning
├── hira # HiRA core modules and tuners
│ ├── peft_model.py # Implementation of HiRA and related PEFT models
│ ├── mapping.py # Mapping utilities for adapting models
│ ├── import_utils.py # Helper functions for model import and setup
│ └── tuners # Various PEFT tuners (e.g., lora, prefix, p_tuning)
│ └── ... # Tuners implementations (lora.py, prefix_tuning.py, etc.)
├── models
│ └── get_models.py # Functions to load pre-trained models and HiRA variants
├── paper
│ └── HiRA Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models.pdf
└── train_hira.py # Main entrance script for training and evaluation
HiRA reformulates the update step for LLMs as follows:
∆W = W₀ ⊙ (A · B)
where:
- W₀ is the frozen pre-trained weight matrix,
- A and B are low-rank matrices, and
- ⊙ denotes the elementwise (Hadamard) product.
This formulation enables HiRA to achieve a high effective rank in the update while keeping computational costs and the number of trainable parameters similar to methods like LoRA. Extensive experiments demonstrate that HiRA outperforms traditional PEFT techniques on multiple benchmarks.
-
Clone the Repository:
git clone https://github.com/hqsiswiliam/hira.git cd hira -
Set Up the Environment:
We recommend using Conda with the provided
env.ymlfile:conda env create -f env.yml conda activate hira
-
Install Dependencies:
Ensure that required packages such as
torch,transformers,numpy,tqdm,python-dotenv, andjsonlinesare installed. The Conda environment should handle these dependencies.
For training on a single GPU, use the following command:
python train_hira.py \
--peft_type=hira \
--model=meta-llama/Meta-Llama-3-8B \
--r_ab=32 \
--enable_grad_ckpt --epoch=3 --lr=1e-3 --batch=16 \
--dataset=common_170k --seed=36 \
--warmup=100 --eval_strategy=steps --eval_steps=80 \
--output_folder=results_hira --target_modules=q_proj,k_proj,v_proj,up_proj,down_projThis command fine-tunes the Meta-Llama-3-8B model on the common_170k dataset using HiRA. Key options include:
- --peft_type=hira: Selects the HiRA adaptation method.
- --model=meta-llama/Meta-Llama-3-8B: Specifies the pre-trained model.
- --r_ab=32: Sets the HiRA-specific hyperparameter.
- --enable_grad_ckpt: Enables gradient checkpointing.
- --epoch, --lr, --batch, --seed, --warmup, --eval_strategy, --eval_steps: Configure training hyperparameters.
- --output_folder: Directory to store training outputs.
- --target_modules: Specifies target modules for adaptation.
For multi-node or multi-GPU distributed training using DeepSpeed, you can use a script similar to the following:
deepspeed --master_port=29500 --num_gpus=4 --num_nodes=2 --hostfile=hostfile train_hira.py \
--peft_type=hira \
--model=meta-llama/Meta-Llama-3-8B \
--r_ab=32 \
--enable_grad_ckpt --epoch=3 --lr=1e-3 --batch=16 \
--dataset=common_170k --ds_config=ds_configs/ds_config_fp16_z3.json --seed=36 \
--warmup=100 --eval_strategy=steps --eval_steps=80 \
--output_folder=results_hira --target_modules=q_proj,k_proj,v_proj,up_proj,down_projFor one-node with multi-GPU training using DeepSpeed, you can use a script similar to the following:
deepspeed --master_port=29500 --num_gpus=4 --num_nodes=1 train_hira.py \
--peft_type=hira \
--model=meta-llama/Meta-Llama-3-8B \
--r_ab=32 \
--enable_grad_ckpt --epoch=3 --lr=1e-3 --batch=16 \
--dataset=common_170k --ds_config=ds_configs/ds_config_fp16_z3.json --seed=36 \
--warmup=100 --eval_strategy=steps --eval_steps=80 \
--output_folder=results_hira --target_modules=q_proj,k_proj,v_proj,up_proj,down_projThis example uses DeepSpeed options for training on 1 node with 4 GPUs each.
You can evaluate your trained models using the provided run_eval_decoding.sh script. This script runs evaluation on multiple datasets with a beam size of 4. To use the script, supply the checkpoint path as its first argument:
bash run_eval_decoding.sh /path/to/your/checkpointThe script executes evaluation commands for datasets such as boolq, piqa, siqa, hellas, winog, arce, arcc, and obqa.
Content of run_eval_decoding.sh:
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=boolq --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=piqa --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=siqa --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=hellas --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=winog --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=arce --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=arcc --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4
CUDA_VISIBLE_DEVICES=0 python train_hira.py --dataset=obqa --batch=16 --output_folder=temp --ckpt=$1 --beam_size=4Each command evaluates the model on a specific dataset using a beam size of 4.
-
Data Handling:
- The
datasetfolder includes modules for loading and preprocessing data from various tasks. - Data files are organized under
data_fileby task (e.g.,convai2,boolq, etc.).
- The
-
Model & Tuners:
- The core implementations of HiRA and other PEFT methods are located in the
hiradirectory. - The file
models/get_models.pycontains functions to load pre-trained models and integrate HiRA-based adaptations.
- The core implementations of HiRA and other PEFT methods are located in the
-
Training & Evaluation:
- The main script,
train_hira.py, orchestrates training, evaluation, and checkpointing. - Customized training routines and gradient checkpointing are implemented in
customized_trainer/customized_trainer.py.
- The main script,
- Download the complete commonsense datasets from here and download the commonsense 170k finetuning dataset from here
- Then put the downloaded file into corresponding folders (e.g.,
data_file/llm_adapt) - We also provide a Google drive download link for the ease of data downloading.
- Please read and accept the DATA_LICENSE before you download.
For a detailed explanation of the methodology, theoretical analysis, and experimental results, please refer to our paper:
HiRA Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models.pdf
Key Contributions:
- High-Rank Updates: Achieved through the Hadamard product, enabling increased expressiveness without additional trainable parameters.
- Efficient Adaptation: HiRA seamlessly merges with pre-trained weights, introducing no extra inference overhead.
- Extensive Evaluation: Superior performance is demonstrated on benchmarks spanning commonsense reasoning, dialogue generation, and mathematical reasoning.
If you use this work in your research, please cite:
@inproceedings{
huang2025hira,
title={Hi{RA}: Parameter-Efficient Hadamard High-Rank Adaptation for Large Language Models},
author={Qiushi Huang and Tom Ko and Zhan Zhuang and Lilian Tang and Yu Zhang},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=TwJrTz9cRS}
}