This is the official code for the following paper:
MJ Muckley, A El-Nouby, K Ullrich, H Jegou, J Verbeek. Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models. In ICML, 2023.
The steps below are recommended with conda. If you elect not to use conda,
you should be able to create your own install procedure.
Create a new environment with Python 3.10:
conda create --name illm python=3.10Activate the environment:
conda activate illmFollow the instructions to install PyTorch (the code is tested with 2.0.0):
Install project requirements by running the following in this illm/
directory:
pip install -r requirements.txtWeights are released under the CC-BY-NC 4.0 license available in the repository root
This repository is configured to use torch.hub. An example for loading the
ILLM model targeting 0.14 bpp is shown below:
import torch
model = torch.hub.load("facebookresearch/NeuralCompression", "msillm_quality_3")To list all available models, you can use:
torch.hub.list("facebookresearch/NeuralCompression")To get information on a model and its target bitrate, you can use:
print(torch.hub.help("facebookresearch/NeuralCompression", "msillm_quality_3"))We release the following models:
| Bitrate | MS-ILLM | No-GAN |
|---|---|---|
| 0.002 bpp | "msillm_quality_vlo1" | |
| 0.004 bpp | "msillm_quality_vlo2" | |
| 0.035 bpp | "msillm_quality_1" | "noganms_quality_1" |
| 0.07 bpp | "msillm_quality_2" | "noganms_quality_2" |
| 0.14 bpp | "msillm_quality_3" | "noganms_quality_3" |
| 0.3 bpp | "msillm_quality_4" | "noganms_quality_4" |
| 0.45 bpp | "msillm_quality_5" | "noganms_quality_5" |
| 0.9 bpp | "msillm_quality_6" | "noganms_quality_6" |
The No-GAN models are released for researchers seeking to fine-tune them with their own methods.
The eval_folder_example.py script provides an example for reproducing the
numbers from the paper. Simply run
python eval_folder_example.py $PATH_TO_CLIC2020And it should run on the folder you provide and print the metrics for the 0.14 bpp target model.
This code makes heavy use of
hydra. Before running the code, it
is recommended to familiarize yourself with hydra usage and how to configure
jobs. After this, you will be able to follow the examples below and easily
tailor them to your own uses by modifying the configs.
First, you'll need to pretrain an autoencoder without the GAN. Launch a 2-GPU job for pretraining the autoencoder at a 0.14 bpp target bitrate:
python train.py \
experiment_name=pretrain0.14bpp \
data.open_images_root=$PATH_TO_OPENIMAGES \
data.batch_size=8 \
distortion_loss=mse_lpips \
distortion_loss.mse_param=150.0 \
distortion_loss.lpips_param=1.0 \
distortion_loss.backbone=alex \
optimizer.model_opt.lr=0.0003 \
rate_target=target-0.14 \
model=hific_autoencoderNote: If you're in a SLURM cluster environment and want to train all rates,
you can use the multirun feature of hydra. An example SLURM config is in
conf/mode/submitit_multi_node.yaml.
For the second training stage (with GAN), you'll need to update the path to
your pretrained HiFiC autoencoder by swapping out the path field in this
file:
conf/pretrained_autoencoder/example.yaml
And you'll need to put your pretrained VQ-VAE here:
conf/pretrained_latent_autoencoder/example.yaml
Alternatively, you can use the pretrained autoencoder and VQ-VAE from'
torch.hub with the 20221108_vqvae_xcit_p8_ch64_cb1024_h8.yaml and
nogan_target0.14bpp configs.:
python train.py \
experiment_name=finetune \
data.open_images_root=/datasets01/open_images/030822/v6 \
data.batch_size=16 \
trainer.max_steps=100000 \
optimizer=model_adamw_disc_const \
model=hific_autoencoder \
model.freeze_encoder=true \
model.freeze_bottleneck=true \
distortion_loss=mse_lpips \
distortion_loss.mse_param=150.0 \
distortion_loss.lpips_param=1.0 \
distortion_loss.backbone=alex \
+discriminator=condunet_ch1025_factor8_context220 \
+lightning_module.generator_weight=0.008 \
+latent_projector=vqvae_xcit_p8_ch64_cb1024_h8 \
+pretrained_autoencoder=nogan_target0.14bpp \
+pretrained_latent_autoencoder=20221108_vqvae_xcit_p8_ch64_cb1024_h8This should give a model with about a 2.3 validation FID on OpenImages V6.
By default the project uses Weights & Biases for
logging. If you don't have wandb installed, the trainer should try to
fall-back to training without it (note: images will not be logged.), but there
may be some issues you'll have to debug. Fortunately, all of the logging code
is confined to train.py. You shouldn't have to modify anything outside of
that file.
If you're using wandb, you'll need to ensure that you're logged into your
account or you've set up the anonymous logging mode.
The paper was written with the HiFiC architecture, and this is used in all experiments. The nice thing about the HiFiC architecture is that it's very stable for discriminator fine-tuning. For newer architectures, such as ELIC, discriminator fine-tuning can be more difficult. It is possible to get good results, but it requires more hyperparameter tuning, and you may need to use different weights or learning rates for different parts of the R-D curve.
If you find recipes that work particularly well, feel free to post them as a Discussion (or go ahead and publish and toss a cite :) ).
@inproceedings{muckley2023improving,
author = {Muckley, Matthew J. and El-Nouby, Alaaeldin and Ullrich, Karen and Jégou, Hervé and Verbeek, Jakob},
title = {Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models},
booktitle = {International Conference on Machine Learning},
year = {2023},
}