Releases · Genera1Z/VQ-VFM-OCL

This is dataset Pascal VOC in LMDB database format, which can be used off-the-shelf in this repo.
Pascal VOC 2007 is taken for validation while 2012 is taken for training.

This is dataset MOVi-D in LMDB database format, which can be used off-the-shelf in this repo.

This is dataset Microsoft COCO in LMDB database format, which can be used off-the-shelf in this repo.

This is dataset ClevrTex in LMDB database format, which can be used off-the-shelf in this repo.
ClevrTex-Full is taken for training while ClevrTex-OOD is taken for validation.

Here are model checkpoints for our VVO, with DINO2 for encoding and Transformer decoder for decoding.
This is a counterpart to baseline SLATE and STEVE.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Here are model checkpoints for our VVO, with DINO2 for encoding and spatially-broadcast MLP for decoding.
This is a counterpart to baseline DINOSAUR.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Here are model checkpoints for our VVO, with DINO2 for encoding and conditional UNet Diffusion model for decoding.
This is a counterpart to baseline SlotDiffusion.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Here are model checkpoints for baseline SlotDiffusion, with DINO2 for encoding.
This is a counterpart to our VQDINO-Dfz.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Here are model checkpoints for baseline SLATE and STEVE, with DINO2 for encoding.
This is a counterpart to our VQDINO-Tfd.
Models are trained on datasets ClevrTex, COCO, VOC and MOVi-D, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Here are model checkpoints for baseline DINOSAUR, with DINO2 for encoding.
This is a counterpart to our VQDINO-Mlp.
Models are trained on datasets ClevrTex, COCO and VOC, with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

Releases: Genera1Z/VQ-VFM-OCL

dataset-voc

Uh oh!

dataset-movi_d

Uh oh!

dataset-coco

Uh oh!

dataset-clevrtex

Uh oh!

vqdino_tfd

Uh oh!

vqdino_mlp

Uh oh!

vqdino_dfz

Uh oh!

slotdiffusion

Uh oh!

slatesteve

Uh oh!

dinosaur

Uh oh!