ELiCiT: Effective and Lightweight Lossy Compression of Tensors

This repository contains the official implementation of ELiCiT, described in the paper ELiCiT: Effective and Lightweight Lossy Compression of Tensors, Jihoon Ko, Taehyung Kwon, Jinhong Jung, and Kijung Shin, ICDM 2024.

ELiCiT (Effective and Lightweight Lossy Compression of Tensors) is an algorithm for lossy compression of tensors. ELiCiT has the following advantages:

Compact and Accurate: It consistently achieves a better trade-off between compressed size and approximation error than all considered competitors. Specifically, ELiCiT compresses tensors to sizes 1.51-5.05x smaller than competitors while achieving similar fitness. It also achieves 5-48% better fitness than competitors with similar output sizes.
Fast: While giving similar outputs with better fitness, ELiCiT is 11.8-96.0x faster than deep-learning methods.
Applicable: It is successfully applied to matrix completion and neural network compression, providing a better trade-off between model size and application performance, compared to state-of-the-art competitors for these applications.

Note: Refer to supplementary_material.pdf for the paper appendix.

Requirements

The common required Python packages are listed as follows:

pytorch ≥ 1.8.1
torch-scatter ≥ 2.0.8

For tensor completion, you need to install the following packages additionally:

optuna ≥ 3.1.1

For neural-network compression, you need to install the following packages additionally:

transformers==4.30.2
datasets==2.13.1
evaluate==0.4.0
accelerate==0.20.3
seqeval==1.2.2

Training & Evaluation

Tensor Compression (in `tensor_compression/` directory)

To train and evaluate a qELiCiT model, run this command:

python train.py --input-path [filename] --output-path [filename] --num-features [num-features] --lr1 [lr1] --lr2 [lr2] --gpus [gpu-ids] --seed [seed]

For example, you can compress the tensor stored in tensor_compression/uber.npy with qELiCiT using the following command:

python train.py --input-path uber.npy --output-path uber_40.elicit --num-features 40 --lr1 1e-3 --lr2 1e-2 --gpus 0 1 2 3 --seed 0

To measure the fitness of the compressed output, run this command:

python evaluate.py --original-path [original-tensor-path] --compressed-path [compressed-output-path] --gpus [gpu-ids]

In the directory, we provide a compressed output tensor_compression/example_uber_30.elicit of the Uber dataset. You can measure the fitness using the following command:

python evaluate.py --original-path uber.npy --compressed-path example_uber_40.elicit --gpus 0,1,2,3

Matrix Completion (in `matrix_completion/` directory)

To find the optimal hyperparameter settings of qELiCiT++, run this command:

python search.py --dataset-name [ml-10m|ml-1m|ml-100k|douban|flixster] --budget [4|8|16|32] --gpu [gpu-id]

For example, you can find the optimal hyperparameter settings of qELiCiT++ with budget 32*(#rows + #columns) on the douban dataset using the following command:

python search.py --dataset-name douban --budget 32 --gpu 0

To evaluate the performance of qELiCiT++, run this command:

python evaluate.py --dataset-name [ml-10m|ml-1m|ml-100k|douban|flixster] --budget [4|8|16|32] --lamb1 [lamb1] --lamb2 [lamb2] --lamb3 [lambda3] --lr1 [lr1] --lr2 [lr2] --gpu [gpu-id]

Neural-network Compression (in `neuralnet_compression/` directory)

To fine-tune the BERT_base model on the GLUE subtasks, run this command:

python finetune_bert.py --task [cola|mnli|mrpc|qnli|qqp|sst2|stsb] --lr [2e-5|3e-5|5e-5] --weight-decay [0|0.01] --num-epochs [2|3|4] --seed [0|1000|2000|3000|4000] --gpu [gpu-id]

To compress the fine-tuned model using TFW-qELiCiT, run this command:

python tfw_qelicit.py --task [cola|mnli|mrpc|qnli|qqp|sst2|stsb] --lr [2e-5|3e-5|5e-5] --weight-decay [0|0.01] --num-epochs [2|3|4] --seed [0|1000|2000|3000|4000] --gpu [gpu-id]

To fine-tune the compressed model, run this command:

python finetune_ours.py --task [cola|mnli|mrpc|qnli|qqp|sst2|stsb] --target [target-checkpoint-path] --lr [2e-5|3e-5|5e-5] --weight-decay [0|0.01] --num-epochs [2|3|4] --seed [0|1000|2000|3000|4000] --gpu [gpu-id]

Tensor-stream Summarization (in `stream_summarization/` directory)

To evaluate the performance of cELiCiT on summarizing tensor streams, run this command:

python train_graph.py --input-path example_data --reg 0.1 --seed [0|1000|2000] --gpu [gpu-id] # for 2-order tensor streams

or

python train_tensor.py --input-path example_data --reg 0.1 --seed [0|1000|2000] --gpu [gpu-id] # for 3-order tensor streams

Datasets

Tensor Compression

We used 8 real-world datasets, which are listed below. All datasets we considered are available at the TensorCodec repository.

Order	Name	Shape	#Entries	Brief description	Source
4	Absorb	192 x 288 x 30 x 120	199.1M	Climate	Climate Data at the National Center for Atmospheric Research
	NYC	265 x 265 x 28 x 35	68.8M	Traffic volume	New York City Government
3	Action	100 x 570 x 567	32.3M	Video features	Multivariate LSTM-FCNs
	Activity	337 x 570 x 320	61.5M	Video features	Multivariate LSTM-FCNs
	Airquality	5600 x 362 x 6	12.2M	Climate	Air Korea
	PEMS	963 x 144 x 440	61.0M	Traffic volume	The UEA & UCR Time Series Classification Repository
	Stock	1317 x 88 x 916	106.2M	Stock	Zoom-Tucker
	Uber	183 x 24 x 1140	5.0M	Traffic volume	FROSTT

Matrix Completion

We used 5 real-world matrices containing ratings of movies provided by users. For the douban and flixster datasets, we provide the resources in the matrix_completion/data/ directory. The ML-100K, ML-1M and ML-10M datasets are available at https://grouplens.org/datasets/movielens/, and we provide matrix_completion/data/split.py for splitting the ratings into train/val/test split.

Neural-network Compression

We used 7 downstream tasks from the GLUE benchmark for evaluating our method and its competitors. The datasets for the tasks will be automatically downloaded when you run the provided code for training and evaluation.

Tensor-stream Summarization

Using the Lkml and Enron datasets referenced in the MayFly framework, we sampled streams of four different lengths for each dataset using the source code provided in the linked framework. Detailed statistics of the sampled streams can be found in the paper.

Terms and Conditions

If you use this code as part of any published research, please consider acknowledging our ICDM 2024 paper.

@inproceedings{ko2024elicit,
  title={ELiCiT: Effective and Lightweight Lossy Compression of Tensors},
  author={Ko, Jihoon and Kwon, Taeyhung and Jung, Jinhong and Shin, Kijung},
  booktitle={2024 IEEE International Conference on Data Mining (ICDM)},
  year={2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ELiCiT: Effective and Lightweight Lossy Compression of Tensors

Requirements

Training & Evaluation

Tensor Compression (in `tensor_compression/` directory)

Matrix Completion (in `matrix_completion/` directory)

Neural-network Compression (in `neuralnet_compression/` directory)

Tensor-stream Summarization (in `stream_summarization/` directory)

Datasets

Tensor Compression

Matrix Completion

Neural-network Compression

Tensor-stream Summarization

Terms and Conditions

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
matrix_completion		matrix_completion
neuralnet_compression		neuralnet_compression
stream_summarization		stream_summarization
tensor_compression		tensor_compression
README.md		README.md
supplementary_material.pdf		supplementary_material.pdf

jihoonko/elicit

Folders and files

Latest commit

History

Repository files navigation

ELiCiT: Effective and Lightweight Lossy Compression of Tensors

Requirements

Training & Evaluation

Tensor Compression (in tensor_compression/ directory)

Matrix Completion (in matrix_completion/ directory)

Neural-network Compression (in neuralnet_compression/ directory)

Tensor-stream Summarization (in stream_summarization/ directory)

Datasets

Tensor Compression

Matrix Completion

Neural-network Compression

Tensor-stream Summarization

Terms and Conditions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Tensor Compression (in `tensor_compression/` directory)

Matrix Completion (in `matrix_completion/` directory)

Neural-network Compression (in `neuralnet_compression/` directory)

Tensor-stream Summarization (in `stream_summarization/` directory)

Packages