`DIAS` Slot Attention with Re-Initialization and Self-Distillation

⚗️ (2026/01/06) Update !!!

Please check our brand new OCL works:

RandSF.Q: significantly surpasses state-of-the-art video OCL, e.g., SlotContrast, by up to 10 points!
SmoothSA: improves the state of the art even further, e.g., SPOT / DIAS (images) and SlotContrast / RandSF.Q (videos), with minimal modifications!

Unlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities. OCL typically aggregates object superpixels into slots by iteratively applying competitive cross attention, known as Slot Attention, with the slots as the query. However, once initialized, these slots are reused naively, causing redundant slots to compete with informative ones for representing objects. This often results in objects being erroneously segmented into parts. Additionally, mainstream methods derive supervision signals solely from decoding slots into the input's reconstruction, overlooking potential supervision based on internal information. To address these issues, we propose Slot Attention with re-Initialization and self-Distillation (DIAS): $\emph{i)}$ We reduce redundancy in the aggregated slots and re-initialize extra aggregation to update the remaining slots; $\emph{ii)}$ We drive the bad attention map at the first aggregation iteration to approximate the good at the last iteration to enable self-distillation. Experiments demonstrate that DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning.

🎉 Accepted to ACM MM 2025 as a Poster

Official implementation of ACM MM 2025 paper "Slot Attention with Re-Initialization and Self-Distillation". Please note that features slot pruning, along with re-initialization, are not included.

🏆 Performance

Object Discovery Performance

Which are detailed in acc-v3.xlsx. (Encoding with backbone DINO2-S/14 at resolution 256x256/224)

	ari	arifg	mbo	miou
dias_r-clevrtex	80.9±0.3	79.1±0.3	63.3±0.1	61.9±0.0
dias_r-coco	25.6±0.1	41.2±0.3	31.7±0.1	30.2±0.1
dias_r-voc	30.9±0.5	33.5±0.7	43.4±0.5	42.4±0.5

For my implementation of baseline methods and their model checkpoints, please visit my repo VQ-VFM-OCL.

🌟 Highlights

⭐⭐⭐ Please check GitHub repo VQ-VFM-OCL. ⭐⭐⭐

🧭 Repo Stucture

Source code.

- config-dias/          # *** configs for our DIAS ***
- object_centric_bench/
  - datum/              # dataset loading and preprocessing
  - model/              # model building
    - ...
    - dias.py           # *** for our DIAS model building ***
    - ...
  - learn/              # metrics, optimizers and callbacks
- train.py
- eval.py
- requirements.txt

Releases.

- archive-dias/      # our DIAS models and logs

🚀 Converted Datasets

Datasets ClevrTex, COCO and VOC, which are converted into LMDB format and can be used off-the-shelf, are available as releases.

dataset-clevrtex: converted dataset ClevrTex.
dataset-coco: converted dataset COCO.
dataset-voc: converted dataset VOC.

🧠 Model Checkpoints & Training Logs

The checkpoints and training logs (@ random seeds 42, 43 and 44) for all models in the table above are available as releases.

archive-dias: model checkpoints and train/val logs of DIAS trained on datasets CLEVRTEX, Microsoft COCO and Pascal VOC.

🔥 How to Use

Take DIAS on COCO as an example.

(1) Environment

To set up the environment, run:

# python 3.11
pip install -r requirements.txt

(2) Dataset

To prepare the dataset, download Converted Datasets and unzip to path/to/your/dataset/. Or convert them by yourself according to XxxDataset.convert_dataset() docs.

(3) Train

To train the model, run:

python train.py \
    --seed 42 \
    --cfg_file config-dias/dias_r-coco.py \
    --data_dir path/to/your/dataset \
    --save_dir save

(4) Evaluate

To evaluate the model, run:

python eval.py \
    --cfg_file config-dias/dias_r-coco.py \
    --data_dir path/to/your/dataset \
    --ckpt_file archive-dias/dias_r-coco/best.pth \
    --is_viz True \
    --is_img True
# object discovery accuracy values will be printed in the terminal
# object discovery visualization will be saved to ./dias_r-coco/

🤗 Contact & Support

If you have any issues on this repo or cool ideas on OCL, please do not hesitate to contact me!

If you are applying OCL (not limited to this repo) to tasks like visual question answering, visual prediction/reasoning, world modeling and reinforcement learning, let us collaborate!

📚 Citation

If you find this repo useful, please cite our work.

@article{zhao2025dias,
  title={{Slot Attention with Re-Initialization and Self-Distillation}},
  author={Zhao, Rongzhen and Zhao, Yi and Kannala, Juho and Pajarinen, Joni},
  journal={ACM MM},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`DIAS` Slot Attention with Re-Initialization and Self-Distillation

⚗️ (2026/01/06) Update !!!

🎉 Accepted to ACM MM 2025 as a Poster

🏆 Performance

🌟 Highlights

🧭 Repo Stucture

🚀 Converted Datasets

🧠 Model Checkpoints & Training Logs

🔥 How to Use

🤗 Contact & Support

📚 Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
config-dias		config-dias
object_centric_bench		object_centric_bench
.gitignore		.gitignore
LICENSE		LICENSE
acc-v3.xlsx		acc-v3.xlsx
eval.py		eval.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

DIAS Slot Attention with Re-Initialization and Self-Distillation

⚗️ (2026/01/06) Update !!!

🎉 Accepted to ACM MM 2025 as a Poster

🏆 Performance

🌟 Highlights

🧭 Repo Stucture

🚀 Converted Datasets

🧠 Model Checkpoints & Training Logs

🔥 How to Use

🤗 Contact & Support

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`DIAS` Slot Attention with Re-Initialization and Self-Distillation

Packages