MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Official PyTorch implementation of MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Accepted by CVPR2025
Arxiv: https://arxiv.org/pdf/2410.00871

Models

Model	Pretrained Checkpoint
MAP-Tiny	map_tiny_patch16_224
MAP-Small	map_small_patch16_224
MAP-Base	map_base_patch16_224
MAP-Large	map_large_patch16_224
MAP-Huge	map_huge_patch16_224

*We also provide a 600-epoch MAP-huge.

Environments

# torch>=2.0, cuda>=11.8
pip install timm==0.4.12 mlflow==2.9.1
pip install causal-conv1d==1.1.0
pip install mamba-ssm==1.1.1

Training

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --model mambar_tiny_patch16_224 --batch 1024 \
    --lr 5e-4 --weight-decay 0.05 \
    --data-path in1k --output_dir ./output \
    --epochs 1600 --input-size 224

*For downstream fine-tuning, we strictly follow other baselines to ensure fair comparisons. For the Mamba-based framework, please refer to Mamba-r, and for the Transformer-based framework, please refer to MAE.

PointMAP：MAP for 3D Task

The code for 3D tasks is mostly derived from Mamba3D, so you should first follow the Mamba3D instructions to set up the environment and run its training code.

We have open-sourced the key files for the MAP model implementation and pretraining. You can directly copy them into the Mamba3D/models directory to support the MAP framework and MAP pretraining.

TODO

Release MAP applications in point cloud video/robotics
Release the VideoMAP code
Release the VideoMAP paper
Release the weights of MAP-Large and MAP-Huge
Release the model and pretrain code for the 3D Task
Release MAP training code and weights
Release the MAP paper

Acknowledge

We thank the work of MAE, Mamba-r, MambaVision, ARM and Mamba3D for their contributions to the field and for inspiring us to propose this project.

Citation

@article{liu2024map,
  title={MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining},
  author={Liu, Yunze and Yi, Li},
  journal={arXiv preprint arXiv:2410.00871},
  year={2024}
}

Contact

If you have any questions for discussion or potential collaboration, please feel free to contact me at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Assets		Assets
PointMAP		PointMAP
README.md		README.md
augment.py		augment.py
datasets.py		datasets.py
engine.py		engine.py
main.py		main.py
mamba_custom.py		mamba_custom.py
models_mamba.py		models_mamba.py
samplers.py		samplers.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Models

Environments

Training

PointMAP：MAP for 3D Task

TODO

Acknowledge

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

yunzeliu/MAP

Folders and files

Latest commit

History

Repository files navigation

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Models

Environments

Training

PointMAP：MAP for 3D Task

TODO

Acknowledge

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages