Please check our brand new OCL works:
- RandSF.Q: significantly surpasses state-of-the-art video OCL, e.g., SlotContrast, by up to 10 points!
- SmoothSA: improves the state of the art even further, e.g., SPOT / DIAS (images) and SlotContrast / RandSF.Q (videos), with minimal modifications!
Unlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities. OCL typically aggregates object superpixels into slots by iteratively applying competitive cross attention, known as Slot Attention, with the slots as the query. However, once initialized, these slots are reused naively, causing redundant slots to compete with informative ones for representing objects. This often results in objects being erroneously segmented into parts. Additionally, mainstream methods derive supervision signals solely from decoding slots into the input's reconstruction, overlooking potential supervision based on internal information. To address these issues, we propose Slot Attention with re-Initialization and self-Distillation (DIAS):
Official implementation of ACM MM 2025 paper "Slot Attention with Re-Initialization and Self-Distillation". Please note that features slot pruning, along with re-initialization, are not included.
Object Discovery Performance
Which are detailed in acc-v3.xlsx. (Encoding with backbone DINO2-S/14 at resolution 256x256/224)
| ari | arifg | mbo | miou | |
|---|---|---|---|---|
| dias_r-clevrtex | 80.9±0.3 | 79.1±0.3 | 63.3±0.1 | 61.9±0.0 |
| dias_r-coco | 25.6±0.1 | 41.2±0.3 | 31.7±0.1 | 30.2±0.1 |
| dias_r-voc | 30.9±0.5 | 33.5±0.7 | 43.4±0.5 | 42.4±0.5 |
For my implementation of baseline methods and their model checkpoints, please visit my repo VQ-VFM-OCL.
⭐⭐⭐ Please check GitHub repo VQ-VFM-OCL. ⭐⭐⭐
- config-dias/ # *** configs for our DIAS ***
- object_centric_bench/
- datum/ # dataset loading and preprocessing
- model/ # model building
- ...
- dias.py # *** for our DIAS model building ***
- ...
- learn/ # metrics, optimizers and callbacks
- train.py
- eval.py
- requirements.txt- archive-dias/ # our DIAS models and logsDatasets ClevrTex, COCO and VOC, which are converted into LMDB format and can be used off-the-shelf, are available as releases.
- dataset-clevrtex: converted dataset ClevrTex.
- dataset-coco: converted dataset COCO.
- dataset-voc: converted dataset VOC.
The checkpoints and training logs (@ random seeds 42, 43 and 44) for all models in the table above are available as releases.
- archive-dias: model checkpoints and train/val logs of DIAS trained on datasets CLEVRTEX, Microsoft COCO and Pascal VOC.
Take DIAS on COCO as an example.
(1) Environment
To set up the environment, run:
# python 3.11
pip install -r requirements.txt(2) Dataset
To prepare the dataset, download Converted Datasets and unzip to path/to/your/dataset/. Or convert them by yourself according to XxxDataset.convert_dataset() docs.
(3) Train
To train the model, run:
python train.py \
--seed 42 \
--cfg_file config-dias/dias_r-coco.py \
--data_dir path/to/your/dataset \
--save_dir save(4) Evaluate
To evaluate the model, run:
python eval.py \
--cfg_file config-dias/dias_r-coco.py \
--data_dir path/to/your/dataset \
--ckpt_file archive-dias/dias_r-coco/best.pth \
--is_viz True \
--is_img True
# object discovery accuracy values will be printed in the terminal
# object discovery visualization will be saved to ./dias_r-coco/If you have any issues on this repo or cool ideas on OCL, please do not hesitate to contact me!
- page: https://genera1z.github.io
- email: [email protected], [email protected]
If you are applying OCL (not limited to this repo) to tasks like visual question answering, visual prediction/reasoning, world modeling and reinforcement learning, let us collaborate!
If you find this repo useful, please cite our work.
@article{zhao2025dias,
title={{Slot Attention with Re-Initialization and Self-Distillation}},
author={Zhao, Rongzhen and Zhao, Yi and Kannala, Juho and Pajarinen, Joni},
journal={ACM MM},
year={2025}
}