MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Official PyTorch implementation of MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
- Accepted by CVPR2025
- Arxiv: https://arxiv.org/pdf/2410.00871
| Model | Pretrained Checkpoint |
|---|---|
| MAP-Tiny | map_tiny_patch16_224 |
| MAP-Small | map_small_patch16_224 |
| MAP-Base | map_base_patch16_224 |
| MAP-Large | map_large_patch16_224 |
| MAP-Huge | map_huge_patch16_224 |
*We also provide a 600-epoch MAP-huge.
# torch>=2.0, cuda>=11.8
pip install timm==0.4.12 mlflow==2.9.1
pip install causal-conv1d==1.1.0
pip install mamba-ssm==1.1.1python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
--model mambar_tiny_patch16_224 --batch 1024 \
--lr 5e-4 --weight-decay 0.05 \
--data-path in1k --output_dir ./output \
--epochs 1600 --input-size 224*For downstream fine-tuning, we strictly follow other baselines to ensure fair comparisons. For the Mamba-based framework, please refer to Mamba-r, and for the Transformer-based framework, please refer to MAE.
The code for 3D tasks is mostly derived from Mamba3D, so you should first follow the Mamba3D instructions to set up the environment and run its training code.
We have open-sourced the key files for the MAP model implementation and pretraining. You can directly copy them into the Mamba3D/models directory to support the MAP framework and MAP pretraining.
- Release MAP applications in point cloud video/robotics
- Release the VideoMAP code
- Release the VideoMAP paper
- Release the weights of MAP-Large and MAP-Huge
- Release the model and pretrain code for the 3D Task
- Release MAP training code and weights
- Release the MAP paper
We thank the work of MAE, Mamba-r, MambaVision, ARM and Mamba3D for their contributions to the field and for inspiring us to propose this project.
@article{liu2024map,
title={MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining},
author={Liu, Yunze and Yi, Li},
journal={arXiv preprint arXiv:2410.00871},
year={2024}
}If you have any questions for discussion or potential collaboration, please feel free to contact me at [email protected].
