Skip to content

Genera1Z/DIAS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIAS Slot Attention with Re-Initialization and Self-Distillation



⚗️ (2026/01/06) Update !!!

Please check our brand new OCL works:

  • RandSF.Q: significantly surpasses state-of-the-art video OCL, e.g., SlotContrast, by up to 10 points!
  • SmoothSA: improves the state of the art even further, e.g., SPOT / DIAS (images) and SlotContrast / RandSF.Q (videos), with minimal modifications!




Unlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities. OCL typically aggregates object superpixels into slots by iteratively applying competitive cross attention, known as Slot Attention, with the slots as the query. However, once initialized, these slots are reused naively, causing redundant slots to compete with informative ones for representing objects. This often results in objects being erroneously segmented into parts. Additionally, mainstream methods derive supervision signals solely from decoding slots into the input's reconstruction, overlooking potential supervision based on internal information. To address these issues, we propose Slot Attention with re-Initialization and self-Distillation (DIAS): $\emph{i)}$ We reduce redundancy in the aggregated slots and re-initialize extra aggregation to update the remaining slots; $\emph{ii)}$ We drive the bad attention map at the first aggregation iteration to approximate the good at the last iteration to enable self-distillation. Experiments demonstrate that DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning.

🎉 Accepted to ACM MM 2025 as a Poster

Official implementation of ACM MM 2025 paper "Slot Attention with Re-Initialization and Self-Distillation". Please note that features slot pruning, along with re-initialization, are not included.

🏆 Performance

Object Discovery Performance

Which are detailed in acc-v3.xlsx. (Encoding with backbone DINO2-S/14 at resolution 256x256/224)

ari arifg mbo miou
dias_r-clevrtex 80.9±0.3 79.1±0.3 63.3±0.1 61.9±0.0
dias_r-coco 25.6±0.1 41.2±0.3 31.7±0.1 30.2±0.1
dias_r-voc 30.9±0.5 33.5±0.7 43.4±0.5 42.4±0.5

For my implementation of baseline methods and their model checkpoints, please visit my repo VQ-VFM-OCL.

🌟 Highlights

⭐⭐⭐ Please check GitHub repo VQ-VFM-OCL. ⭐⭐⭐

🧭 Repo Stucture

Source code.

- config-dias/          # *** configs for our DIAS ***
- object_centric_bench/
  - datum/              # dataset loading and preprocessing
  - model/              # model building
    - ...
    - dias.py           # *** for our DIAS model building ***
    - ...
  - learn/              # metrics, optimizers and callbacks
- train.py
- eval.py
- requirements.txt

Releases.

- archive-dias/      # our DIAS models and logs

🚀 Converted Datasets

Datasets ClevrTex, COCO and VOC, which are converted into LMDB format and can be used off-the-shelf, are available as releases.

🧠 Model Checkpoints & Training Logs

The checkpoints and training logs (@ random seeds 42, 43 and 44) for all models in the table above are available as releases.

  • archive-dias: model checkpoints and train/val logs of DIAS trained on datasets CLEVRTEX, Microsoft COCO and Pascal VOC.

🔥 How to Use

Take DIAS on COCO as an example.

(1) Environment

To set up the environment, run:

# python 3.11
pip install -r requirements.txt

(2) Dataset

To prepare the dataset, download Converted Datasets and unzip to path/to/your/dataset/. Or convert them by yourself according to XxxDataset.convert_dataset() docs.

(3) Train

To train the model, run:

python train.py \
    --seed 42 \
    --cfg_file config-dias/dias_r-coco.py \
    --data_dir path/to/your/dataset \
    --save_dir save

(4) Evaluate

To evaluate the model, run:

python eval.py \
    --cfg_file config-dias/dias_r-coco.py \
    --data_dir path/to/your/dataset \
    --ckpt_file archive-dias/dias_r-coco/best.pth \
    --is_viz True \
    --is_img True
# object discovery accuracy values will be printed in the terminal
# object discovery visualization will be saved to ./dias_r-coco/

🤗 Contact & Support

If you have any issues on this repo or cool ideas on OCL, please do not hesitate to contact me!

If you are applying OCL (not limited to this repo) to tasks like visual question answering, visual prediction/reasoning, world modeling and reinforcement learning, let us collaborate!

📚 Citation

If you find this repo useful, please cite our work.

@article{zhao2025dias,
  title={{Slot Attention with Re-Initialization and Self-Distillation}},
  author={Zhao, Rongzhen and Zhao, Yi and Kannala, Juho and Pajarinen, Joni},
  journal={ACM MM},
  year={2025}
}

About

Slot Attention with Re-Initialization and Self-Distillation, ACM MM 2025.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages