046211-LoRA-Compression

@oamsalemd, @idob8 - Winter 2024

Project documentation

Topics

Introduction
- Compression
- LoRA
Project goal
Method
Experiments and results
Conclusions
Future work
How to run
Ethics Statement

Introduction

Compression

Compressing pre-trained neural networks reduces memory usage, speeds up inference, and enables deployment on resource-constrained devices. It optimizes model efficiency, real-time performance, and energy consumption, making deep learning models more practical for diverse computing environments. We tested multiple model compression methods that can potentially achieve better computational usage, and tested their effect on the pre-trained model.

Data type Quantization - in this method we use more compact data type to store the model weights. this technique can potentially save memory (capacity and bandwidth).

Sparsity - in this method we use "sparse" weight matrices, for any given block we allow only 1 cell to have non zero value. This technique can potentially save memory (capacity and bandwidth) and also reduce the number of effective multiplication instructions.

On the other hand, both methods can potentially damage the accuracy of the model and might demand retraining the model.

LoRA

LoRA (Low Rank Adaptation) is a technique for efficiently fine-tune pre-trained models. The basic idea is to train only a low-rank matrix that will be added the pretrained weight matrix.^[1] Previous works have shown the benefits of LoRA in transfer-learning for pre-trained LLM-s.^[1]

Given a 'Linear' layer W of in_dimXout_dim, we choose low rank r s.t. r < in_dim, out_dim.
We freeze the W matrix, so it remains intact while re-training the model.
The matrices A (of in_dimXr) and B (of rXout_dim) are initialized.
We set the new activation to be h=x@(W+a*A@B) for the input x (of 1Xin_dim), and a factor a.
During training, only A and B matrices are learned.

Project goal

Our objective is to combine model compression with LoRA in pre-trained models, to optimize model size with minimal damaging to model accuracy and minimal retraining. We test the method's efficacy for image classification tasks.

Method

We used ‘resnet18’ pre-trained on ImageNet1K^[2]
- For training we used only a small subset of the original dataset (50,000 images out of 1,281,167)
The compression methods we tested were:
- Data type quantization to int1.
- Sparsity with block size of 4X4.
The compression was implemented only on the FC layer of the model.
Given:

compression ratio was calculated as follows:

Memory and instructions compression ratio for Sparse 4X4 method:

Memory compression ratio for INT1 quantization method:

We tested the appending of LoRA layer of ranks: [2, 4, 8, 16, 32, 64, 128].
- We tested 2 initialization methods. The first was the initialization suggested in the original LoRA paper, A is initialized as N(0,\sigma^2) and B=0. The second one was SVD decomposition of the diff from original matrix.
All model's parameters except LoRA parameters were frozen. LoRA parameters were trained for 10 epochs and the best epoch was chosen (in terms of accuracy on the validation set).
Hyper parameters were chosen for each rank separately using Optuna:
- Optimizer, learning rate, batch size, "alpha" factor (LoRA)
Finally we evaluated the accuracy on a test set for each LoRA rank and for each initialization method.

Experiments and results

We expected the graph to be monotonically ascending. One potential explanation for their instability could be that the training hyper-parameters choice has a big effect on the model’s test accuracy. Even though increasing the LoRA rank increases the number parameters in the model, we could not always set the training hyper-parameters for the model to be optimized for the task and produce better accuracy.

For Sparse 4X4 compression, we can see that increasing the LoRA rank generally improves the model accuracy for the test set. For LoRA rank of 128 with SVD initialization, the experiment showed just 1.74% accuracy drop, with ×2.27 compression ratio.

For INT1 quantization, small LoRA ranks have shown significant improvement compared to the quantized-only model’s test accuracy. Unlike Sparse 4X4 compression, we could not see an improvement in the model’s accuracy for larger LoRA ranks. The best accuracy drop was for LoRA rank of 128 with paper-suggested initialization. The experiment showed 5.01% accuracy drop, with ×2.44 memory compression ratio. The best trade-off was for LoRA rank of 2 with paper-suggest initialization. The experiment showed 5.98% accuracy drop, with ×26.91 memory compression ratio.

SVD decomposition initialization showed better and more stable results for Sparse 4X4 compression. For INT1 quantization, this initialization method did not improve the results compared to the paper-suggested initialization.

Conclusions

Increasing LoRA rank generally gives better accuracy, yet not matching the original model’s accuracy.
Training the LoRA parameters requires minor computation effort.
The combination of all LoRA ranks with compression methods that were tested results in memory compression, while sparsity method also results in computation reduction.
LoRA training is unstable and very prone to hyper-parameters modification.
Using initialization with SVD decomposition could provide in better results.

Future work

We believe that our project shows potential for further research of the benefits from combining model compression methods with LoRA. We believe such research could be done with:

Test the method’s performance for ‘Linear’-rich models (e.g. Transformers, MLP-based, …)
Explore more compression hyper-parameters (e.g. int8, sparse 3X3, ...)
Explore more initialization methods for the LoRA matrices
Apply the method for DoRA^[3] variation and examine results

How to run

Environments settings

Clone to a new directory: git clone <URL> <DEST_DIR>
cd /path/to/DEST_DIR
pip install -r requirements.txt
Download ImageNet subset from: https://www.kaggle.com/datasets/tusonggao/imagenet-validation-dataset/code
Move the images directory to: DEST_DIR/../archive/imagenet_validation

Execution commands

python train_evaluate/train_model.py --init {paper_init,svd_init} [ > log.txt]

--init: determines the LoRA matrices initialization method (default: paper_init)
Recommended: pipe the output to log.txt file
Results will appear in DEST_DIR/results

Description:

Initiates the resnet18 model, pretrained on ImageNet
Loads the pre-downloaded ImageNet dataset and splits train/val/test subsets
Per each compression method (sparse, int1):

Compresses the model
Initiates LoRA appended to FC layer(s)
Sweeps LoRA rank values, and uses Optuna to find the best training hyper-parameters per each rank
Outputs the results to a dedicated directory

Results directory contains:

evaluation.csv: summary of evaluation accuracy for test subset per each LoRA rank
acc_quant=COMP_TYPE_r=RANK.png: accuracy per epoch (train, validation), for COMP_TYPE (sparse, int1), for RANK (LoRA rank)
loss_quant=COMP_TYPE=r_RANK.png: loss per epoch (train, validation), for COMP_TYPE (sparse, int1), for RANK (LoRA rank)
quant=COMP_TYPE_r=RANK_eval_acc=ACCUR.ckpt: model post-training parameters, for COMP_TYPE (sparse, int1), for RANK (LoRA rank), with test accuracy of ACCUR
quant=COMP_TYPE_r=RANK_optimization_history.html: Optuna trials summary for COMP_TYPE (sparse, int1) and RANK (LoRA rank) hyper-parameter tuning

Ethics Statement

Stakeholders

End-users, deep learning researchers, technology companies, and regulatory bodies.

Implications

End-users can benefit from faster and more efficient image classification models, improving user experience. However, there may be concerns about privacy if sensitive information is processed. Deep learning researchers can advance the field with innovative techniques, but they must ensure fairness and transparency in model development and deployment. Technology companies can enhance product performance and reduce resource consumption, yet they need to address potential biases and ensure responsible AI practices. Regulatory bodies play a crucial role in establishing guidelines and standards to protect user rights, promote fairness, and mitigate risks associated with AI technologies.

Ethical Considerations

Prioritizing user privacy and data protection through robust security measures and transparent data handling practices. Mitigating biases in data and algorithms to ensure fairness and equity in classification outcomes. Providing clear explanations and documentation on the use of quantization and LoRA techniques to enhance model transparency and interpretability. Engaging in ongoing dialogue with stakeholders and regulatory bodies to address ethical concerns, promote responsible AI practices, and uphold societal values in AI development and deployment.

^[1] Hu, Edward J., et al. “Lora: Low-rank adaptation of large language models.” arXiv preprint arXiv:2106.09685 (2021).

^[2] https://huggingface.co/timm/resnet18.tv_in1k

^[3] Liu, Shih-Yang, et al. "DoRA: Weight-Decomposed Low-Rank Adaptation." arXiv preprint arXiv:2402.09353 (2024).

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
helper_functions		helper_functions
models		models
results_no_svd		results_no_svd
results_with_svd		results_with_svd
train_evaluate		train_evaluate
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

046211-LoRA-Compression

Project documentation

Topics

Introduction

Compression

LoRA

Project goal

Method

Experiments and results

Conclusions

Future work

How to run

Environments settings

Execution commands

Ethics Statement

Stakeholders

Implications

Ethical Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

046211-LoRA-Compression

Project documentation

Topics

Introduction

Compression

LoRA

Project goal

Method

Experiments and results

Conclusions

Future work

How to run

Environments settings

Execution commands

Ethics Statement

Stakeholders

Implications

Ethical Considerations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages