[CVPR 2025] MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
We propose a novel MANTA (MAmba for ANTicipation) network for stochastic long-term dense anticipation. Our model enables effective long-term temporal modelling even for very long sequences while maintaining linear complexity in sequence length.
Here is the overview of our proposed model:
If you find this code or our model useful, please cite our paper:
@inproceedings{zatsarynna2025manta,
author = {Olga Zatsarynna and Emad Bahrami and Yazan Abu Farha and Gianpiero Francesca and Juergen Gall},
title = {MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025},
}To create the conda environment run the following command:
# install conda
conda env create --name manta --file docker/env.yml
source activate manta
# install mamba
cd docker/VideoMamba
# causal conv
cd causal-conv1d
python setup.py develop
cd ..
# mamba
cd mamba
python setup.py develop
cd ..The features and annotations of the Breakfast dataset can be downloaded from link 1 or link 2.
Follow the instructions at Assembly101-Download-Scripts to download the TSM features.
We converted the .lmdb features to numpy for faster loading.
The coarse-annotations can be downloaded from assembly101-annotations.
To train the MANTA, run:
bash scripts/prob/train_<dataset>_prob.shTo evaluate the MANTA, run:
bash scripts/prob/predict_<dataset>_prob.shThis will show the evaluation results of the final model, as well as save final predictions
into the ./diff_results directory.
With the final results saved, you can run the evaluation faster using the following script:
bash scripts/prob/predict_precomputed_<dataset>_prob.shMake sure to update the paths (features and annotations) in the above scripts to match your system. For changing the training and evaluation splits (for Breakfast dataset), as well as values of other hyper-parameters, modify the scripts accordingly.
In our code we made use of the following repositories: VideoMamba, VideoMambaSuite and LTC. We sincerely thank the authors for their codebases!
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


