Compact Decomposition of Irregular Tensors for Data Compression: From Sparse to Dense to High-Order Tensors
This repository is the official implementation of Compact Decomposition of Irregular Tensors for Data Compression: From Sparse to Dense to High-Order Tensors, Taehyung Kwon, Jihoon Ko, Jinhong Jung, Jun-Gi Jang, and Kijung Shin, KDD 2024.
Please see the requirements.txt
numpy==1.21.6
scipy==1.7.3
torch==1.13.1
tqdm==4.51.0
Please download and check the datasets below for more details.
It should be a pickle file (.pickle) that saves a dictionary. In the dictionary, 'idx' saves the indices of non-zero entries, and 'val' saves the values of them.
It should be a numpy array (.npy) where each entry contains a slice of an irregular tensor.
The training processes of Light-IT and Light-IT++ are implemented in main.py.
action:train_cpwhen running only Light-IT,trainwhen running Light-IT and Light-IT++.-tp,--tensor_path: file path for an irregular tensor. A file should be pickle files('.pickle') for sparse tensors and numpy files ('.numpy') for dense tensors.-op,--output_path: output path for saving the parameters and fitness.-r,--rank: rank of the model.-d,--is_dense: 'True' when the input tensor is dense, 'False' when the input tensor is sparse.-e,--epoch: Number of epochs for Light-IT.-lr,--lr: Learning rate for Light-IT.-s,--seed: Seed of execution.
-de,--device: GPU id for execution.-b,--batch: Batch size for computation in Light-IT-bnz,--batch_nz: Batch size for computing the loss (corresponding to the non-zero entries) of Light-IT. Please reduce the batch sizes (-bz, -bnz) when O.O.M occurs in GPU!
-ea,--epoch_als: Number of epochs for Light-IT++.
# Run Light-IT only
python 23-Irregular-Tensor/main.py train_cp -tp ../input/23-Irregular-Tensor/usstock.npy -op output/usstock_r4_s0_lr0.01 -r 4 -d True -de 0 -lr 0.01 -e 500 -s 0
# Run Light-IT and Light-IT++
python 23-Irregular-Tensor/main.py train -tp ../input/23-Irregular-Tensor/usstock.npy -op output/usstock_r4_s0_lr0.01 -r 4 -d True -de 0 -lr 0.01 -e 500 -s 0
usstock_r4_s0_lr0.01.txt: Saved the running time and fitnessusstock_r4_s0_lr0.01_cp.pt: Saved the parameters of Light-ITusstock_r4_s0_lr0.01.pt: Saved the parameters of Light-IT++
Checking the compressed sizes of Light-IT and Light-IT++ are implemented in huffman.py.
-tp, -r, -d, -de, -bz, -bnz: same with the cases of running Light-IT and Light-IT++.-rp,--result_path: path for the '.pt' file.-cp,--is_cp: "True" when using the output of Light-IT, "False" when using the output of Light-IT++
python huffman.py -tp ../data/23-Irregular-Tensor/cms.pickle -rp results/cms-lr0.01-rank5.pt -cp False -r 5 -de 0 -d False
| Name | N_max | N_avg | Size (except the 1st mode) | Order | Density | Source | Download Link |
|---|---|---|---|---|---|---|---|
| CMS | 175 | 35.4 | 284 x 91,586 | 3 | 0.00501 | US government | Link |
| MIMIC-III | 280 | 12.3 | 1,000 x 37,163 | 3 | 0.00733 | MIMIC-III Clinical Database | Link |
| Korea-stock | 5,270 | 3696.5 | 88 x 1,000 | 3 | 0.998 | DPar2 | Link |
| US-stock | 7,883 | 3912.6 | 88 x 1,000 | 3 | 1 | DPar2 | Link |
| Enron | 554 | 80.6 | 1,000 x 1,000 x 939 | 4 | 0.0000693 | FROSTT | Link |
| Delicious | 312 | 16.4 | 1,000 x 1,000 x 31,311 | 4 | 0.00000397 | FROSTT | Link |