Skip to content

Releases: Genera1Z/VQ-VFM-OCL

dataset-voc

25 May 09:03
a2f46f0

Choose a tag to compare

This is dataset Pascal VOC in LMDB database format, which can be used off-the-shelf in this repo.
Pascal VOC 2007 is taken for validation while 2012 is taken for training.

dataset-movi_d

25 May 17:25
5d051ea

Choose a tag to compare

This is dataset MOVi-D in LMDB database format, which can be used off-the-shelf in this repo.

dataset-coco

25 May 17:25
5d051ea

Choose a tag to compare

This is dataset Microsoft COCO in LMDB database format, which can be used off-the-shelf in this repo.

dataset-clevrtex

25 May 09:14
5d051ea

Choose a tag to compare

This is dataset ClevrTex in LMDB database format, which can be used off-the-shelf in this repo.
ClevrTex-Full is taken for training while ClevrTex-OOD is taken for validation.

vqdino_tfd

24 May 17:13

Choose a tag to compare

Here are model checkpoints for our VVO, with DINO2 for encoding and Transformer decoder for decoding.
This is a counterpart to baseline SLATE and STEVE.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

vqdino_mlp

24 May 17:17

Choose a tag to compare

Here are model checkpoints for our VVO, with DINO2 for encoding and spatially-broadcast MLP for decoding.
This is a counterpart to baseline DINOSAUR.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

vqdino_dfz

24 May 17:27

Choose a tag to compare

Here are model checkpoints for our VVO, with DINO2 for encoding and conditional UNet Diffusion model for decoding.
This is a counterpart to baseline SlotDiffusion.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

slotdiffusion

24 May 17:40

Choose a tag to compare

Here are model checkpoints for baseline SlotDiffusion, with DINO2 for encoding.
This is a counterpart to our VQDINO-Dfz.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

slatesteve

24 May 17:31

Choose a tag to compare

Here are model checkpoints for baseline SLATE and STEVE, with DINO2 for encoding.
This is a counterpart to our VQDINO-Tfd.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

dinosaur

24 May 17:35

Choose a tag to compare

Here are model checkpoints for baseline DINOSAUR, with DINO2 for encoding.
This is a counterpart to our VQDINO-Mlp.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).