`SmoothSA` Smoothing Slot Attention Iterations and Recurrences

Slot Attention (SA) and its variants lie at the heart of mainstream Object-Centric Learning (OCL). Objects in an image can be aggregated into respective slot vectors, by \textit{iteratively} refining cold-start query vectors, typically three times, via SA on image features. For video, such aggregation is \textit{recurrently} shared across frames, with queries cold-started on the first frame while transitioned from the previous frame's slots on non-first frames. However, the cold-start queries lack sample-specific cues thus hinder precise aggregation on the image or video's first frame; Also, non-first frames' queries are already sample-specific thus require transforms different from the first frame's aggregation. We address these issues for the first time with our \textit{SmoothSA}: (1) To smooth SA iterations on the image or video's first frame, we \textit{preheat} the cold-start queries with rich information of input features, via a tiny module self-distilled inside OCL; (2) To smooth SA recurrences across all video frames, we \textit{differentiate} the homogeneous transforms on the first and non-first frames, by using full and single iterations respectively. Comprehensive experiments on object discovery, recognition and downstream benchmarks validate our method's effectiveness. Further analyses intuitively illuminate how our method smooths SA iterations and recurrences.

Official source code, model checkpoints and training logs for paper "Smoothing Slot Attention Iterations and Recurrences".

Our model achitecture:

🏆 Performance

Object discovery accuracy: (Input resolution is 256×256 (224×224); DINO2 ViT-S/14 is used for encoding) Numbers are detailed in acc-v3.xlsx.

Object discovery visualization:

Object recognition accuracy: Numbers are detailed in acc-recogn-v3.xlsx.

🌟 Highlights

⭐⭐⭐ Please check GitHub repo VQ-VFM-OCL. ⭐⭐⭐

🧭 Repo Stucture

Source code.

- config-smoothsa/      # *** configs for our SmoothSA ***
- config-spot/          # configs for baseline SPOT
- object_centric_bench/
  - datum/              # dataset loading and preprocessing
  - model/              # model building
    - ...
    - smoothsa.py       # *** for our SmoothSA model building ***
    - randsfq.py        # *** for our SmoothSA model building ***
    - ...
  - learn/              # metrics, optimizers and callbacks
- train.py
- eval.py
- requirements.txt

Releases.

- archive-smoothsa/     # *** our RandSF.Q models and logs ***
- archive-spot/         # baseline model checkpoints and training logs
- archive-recogn/       # object recognition models based on SmoothSA and SPOT

🚀 Converted Datasets

Datasets ClevrTex, COCO, VOC, MOVi-C, MOVi-D and YTVIS, which are converted into LMDB format and can be used off-the-shelf, are available as below.

dataset-clevrtex: converted dataset ClevrTex.
dataset-coco: converted dataset COCO.
dataset-voc: converted dataset VOC.
dataset-movi_c: converted dataset MOVi-C.
dataset-movi_d: converted dataset MOVi-D.
dataset-ytvis: converted dataset YTVIS, the high-quality version.

🧠 Model Checkpoints & Training Logs

The checkpoints and training logs (@ random seeds 42, 43 and 44) for all models are available as releases. All backbones are unified as DINO2-S/14.

archive-smoothsa: Our SmoothSA trained on datasets ClevrTex, COCO, VOC, MOVi-C/D and YTVIS.
- Model checkpoints and training logs of our own method.
archive-spot: SPOT on ClevrTex, COCO and VOC.
- My implementation of paper SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers, CVPR 2024.
For other image OCL baselines, SLATE, DINOSAUR, SlotDiffusion and DIAS, please check repo VQ-VFM-OCL and DIAS;
For other video OCL baselines, VideoSAUR, SlotContrast and RandSF.Q, please check repo RandSF.Q.
archive-recogn: Object recognition models based on our SmoothSA and baseline SPOT, trained on datasets COCO and YTVIS.
- Slots extracted by SmoothSA or SPOT are matched with ground-truth object segmentations by some threshold, and the matched slots are used to train category classification and bounding box regression.
For other object recognition baselines, RandSF.Q and SlotContrast, please check repo RandSF.Q.

🔥 How to Use

Take SmoothSA on COCO as an example.

(1) Environment

To set up the environment, run:

# python 3.11
pip install -r requirements.txt

(2) Dataset

To prepare the dataset, download Converted Datasets and unzip to path/to/your/dataset/. Or convert them by yourself according to XxxDataset.convert_dataset() docs.

(3) Train

To train the model, run:

python train.py \
    --seed 42 \
    --cfg_file config-smoothsa/smoothsa_r-coco.py \
    --data_dir path/to/your/dataset \
    --save_dir save

(4) Evaluate

To evaluate the model, run:

python eval.py \
    --cfg_file config-smoothsa/smoothsa_r-coco.py \
    --data_dir path/to/your/dataset \
    --ckpt_file archive-smoothsa/smoothsa_r-coco/best.pth \
    --is_viz True \
    --is_img True
# object discovery accuracy values will be printed in the terminal
# object discovery visualization will be saved to ./smoothsa_r-coco/

🤗 Contact & Support

If you have any issues on this repo or cool ideas on OCL, please do not hesitate to contact me!

If you are applying OCL (not limited to this repo) to tasks like visual question answering, visual prediction/reasoning, world modeling and reinforcement learning, let us collaborate!

⚗️ Further Research

My further research works on OCL can be found in my repos or my academic page.

📚 Citation

If you find this repo useful, please cite our work.

@article{zhao2025smoothsa,
  title={{Smoothing Slot Attention Iterations and Recurrences}},
  author={Zhao, Rongzhen and Yang, Wenyan and Kannala, Juho and Pajarinen, Joni},
  journal={arXiv:2508.05417},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`SmoothSA` Smoothing Slot Attention Iterations and Recurrences

🏆 Performance

🌟 Highlights

🧭 Repo Stucture

🚀 Converted Datasets

🧠 Model Checkpoints & Training Logs

🔥 How to Use

🤗 Contact & Support

⚗️ Further Research

📚 Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
config-smoothsa		config-smoothsa
config-spot		config-spot
object_centric_bench		object_centric_bench
res		res
.gitignore		.gitignore
LICENSE		LICENSE
acc-recogn-v3.xlsx		acc-recogn-v3.xlsx
acc-v3.xlsx		acc-v3.xlsx
eval.py		eval.py
readme.md		readme.md
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

SmoothSA Smoothing Slot Attention Iterations and Recurrences

🏆 Performance

🌟 Highlights

🧭 Repo Stucture

🚀 Converted Datasets

🧠 Model Checkpoints & Training Logs

🔥 How to Use

🤗 Contact & Support

⚗️ Further Research

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`SmoothSA` Smoothing Slot Attention Iterations and Recurrences

Packages