This repository contains the official implementation for CoordGAN introduced in the following paper: CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs (CVPR 2022). The code is developed based on the Pytorch framework(1.8.0) with python 3.7
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs
Jiteng Mu, Shalini De Mello, Zhiding Yu, Nuno Vasconcelos, Xiaolong Wang, Jan Kautz, Sifei Liu
CVPR, 2022
Project Page / ArXiv / Video
Please follow the individual instructions to download the datasets and put data in the data directory.
| Dataset | Description |
|---|---|
| CelebAMask-HQ | Please follow the CelebAMask-HQ instructions to download the CelebAMask-HQ dataset. |
| Stanford Cars | Please follow the Stanford Cars instructions to download the Stanford Cars training subset. |
| AFHQ-cat | Please follow the AFHQ instructions to download the AFHQ cat training subset. |
| DatasetGAN | DatasetGAN annoated images are used for semantic label propagation evaluation. |
Please follow the individual instructions to download the datasets and put data in the data directory.
| Checkpoints | Description |
|---|---|
| IR-SE50 Model | Follow the repo pixel2style2pixel to download pretrained IR-SE50 model taken from TreB1eN. This is required for ArcFace score evaluation. |
The project is developed with packages shown in the reqruirements.txt. Please install the packages by running
pip install -r requirements.txt
We design a structure-texture disentangled GAN such that dense correspondence can be extracted explicitly from the structural component, where the key component is to represent the image structure in a coordinate space that is shared by all images. Specifically, the structure of each generated image is represented as a warped coordinate frame, transformed from a shared canonical 2D coordinate frame.
CoordGAN generates images at resolution 128x128. Please run the following command to train CoordGAN,
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 train.py --output_dir CHECKPOINT_128 --config configs/celebA_128.yaml
where --output_dir specifies the output directory, --config specifies the config file (configs/celebA_128.yaml or configs/stanfordcar_128.yaml or configs/afhqcat_128.yaml).
For CelebAMask-HQ, we first train CoordGAN with an output size of 128 × 128 and then append two upsampling layers to generate high-resolution images (512x512). With the previous checkpoint, CoordGAN can be further trained with following command,
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 train.py --output_dir celebA_512 --config configs/celebA_512.yaml --ckpt CHECKPOINT_128 --reinit_discriminator True --fix_struc_grad True
where --output_dir specifies the output directory, --config specifies the config file, --ckpt specifies the checkpoint of resolution 128x128.
Please running the following command to evaluate the FID score,
python eval_fid.py --ckpt CHECKPOINT --dataset DATASET --size SIZE
where --ckpt specifies the CoordGAN checkpoint, --dataset specifies the category (currently support celebA or stanfordcar or afhq-cat), --size specifies the resolution (currently 128 or 512). Each iteration synthesize a pair of images plus a pair of images with swapped texture codes. The input folder contains the resized real images, samples folder contains the synthesized images, and samples_swap contains the texture swapped images.
After generate images, FID score can be obtained with torch_fidelity. Please install the package and then run,
fidelity --gpu 0 --fid --input1 GENERATED_IMAGES --input2 REAL_IMAGES
where GENERATED_IMAGES should be replaced with the samples folder path after running the previous command and REAL_IMAGES with the input folder path.
The CoordGAN can be equipped with an encoder to enable the extraction of dense correspondence from real images. Please run the following command to train an encoder,
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 train-encoder.py --output_dir celebA_enc --config configs/celebA_enc.yaml --ckpt ENC_CHECKPOINT
where --output_dir specifies the output directory, --config specifies the config file (configs/celebA_enc.yaml or configs/stanfordcar_enc.yaml), --ckpt specifies the trained GAN checkpoint in the first stage.
We quantitatively demonstrate the quality of the extracted dense correspondence on the task of semantic label propagation. Given one reference image with semantic labels, its correspondence map is first inferred with the trained encoder. Another correspondence map is then inferred for a query image and the labels of the reference image can be obtained. This can be done by running,
python eval_corr.py --ckpt ENC_CHECKPOINT --segdataset SEGDATASET
where --ckpt specifies the obtained the ENC_CHECKPOINT checkpoint --segdataset specifies the category (currently support datasetgan-face-34 or datasetgan-car-20 or celebA-7)
If you found our work useful, please cite
@InProceedings{mu2022coordgan,
author = {Mu, Jiteng and De Mello, Shalini and Yu, Zhiding
and Vasconcelos, Nuno and Wang, Xiaolong and Kautz, Jan and Liu, Sifei},
title = {CoordGAN: Self-Supervised Dense Correspondences Emerge from GANs},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022}
}
The code is heavely based on the styleganv2 pytorch implementation, CIPS, Swapping Autoencoder
Nvidia-licensed CUDA kernels (fused_bias_act_kernel.cu, upfirdn2d_kernel.cu) is for non-commercial use only.


