Releases: Genera1Z/VQ-VFM-OCL
dataset-voc
This is dataset Pascal VOC in LMDB database format, which can be used off-the-shelf in this repo.
Pascal VOC 2007 is taken for validation while 2012 is taken for training.
dataset-movi_d
This is dataset MOVi-D in LMDB database format, which can be used off-the-shelf in this repo.
dataset-coco
This is dataset Microsoft COCO in LMDB database format, which can be used off-the-shelf in this repo.
dataset-clevrtex
This is dataset ClevrTex in LMDB database format, which can be used off-the-shelf in this repo.
ClevrTex-Full is taken for training while ClevrTex-OOD is taken for validation.
vqdino_tfd
Here are model checkpoints for our VVO, with DINO2 for encoding and Transformer decoder for decoding.
This is a counterpart to baseline SLATE and STEVE.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
vqdino_mlp
Here are model checkpoints for our VVO, with DINO2 for encoding and spatially-broadcast MLP for decoding.
This is a counterpart to baseline DINOSAUR.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
vqdino_dfz
Here are model checkpoints for our VVO, with DINO2 for encoding and conditional UNet Diffusion model for decoding.
This is a counterpart to baseline SlotDiffusion.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
slotdiffusion
Here are model checkpoints for baseline SlotDiffusion, with DINO2 for encoding.
This is a counterpart to our VQDINO-Dfz.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
slatesteve
Here are model checkpoints for baseline SLATE and STEVE, with DINO2 for encoding.
This is a counterpart to our VQDINO-Tfd.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).