Skip to content

yunzeliu/MAP

Repository files navigation

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining

Official PyTorch implementation of MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining


Models

Model Pretrained Checkpoint
MAP-Tiny map_tiny_patch16_224
MAP-Small map_small_patch16_224
MAP-Base map_base_patch16_224
MAP-Large map_large_patch16_224
MAP-Huge map_huge_patch16_224

*We also provide a 600-epoch MAP-huge.

Environments

# torch>=2.0, cuda>=11.8
pip install timm==0.4.12 mlflow==2.9.1
pip install causal-conv1d==1.1.0
pip install mamba-ssm==1.1.1

Training

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --model mambar_tiny_patch16_224 --batch 1024 \
    --lr 5e-4 --weight-decay 0.05 \
    --data-path in1k --output_dir ./output \
    --epochs 1600 --input-size 224

*For downstream fine-tuning, we strictly follow other baselines to ensure fair comparisons. For the Mamba-based framework, please refer to Mamba-r, and for the Transformer-based framework, please refer to MAE.

PointMAP:MAP for 3D Task

The code for 3D tasks is mostly derived from Mamba3D, so you should first follow the Mamba3D instructions to set up the environment and run its training code.

We have open-sourced the key files for the MAP model implementation and pretraining. You can directly copy them into the Mamba3D/models directory to support the MAP framework and MAP pretraining.

TODO

  • Release MAP applications in point cloud video/robotics
  • Release the VideoMAP code
  • Release the VideoMAP paper
  • Release the weights of MAP-Large and MAP-Huge
  • Release the model and pretrain code for the 3D Task
  • Release MAP training code and weights
  • Release the MAP paper

Acknowledge

We thank the work of MAE, Mamba-r, MambaVision, ARM and Mamba3D for their contributions to the field and for inspiring us to propose this project.

Citation

@article{liu2024map,
  title={MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining},
  author={Liu, Yunze and Yi, Li},
  journal={arXiv preprint arXiv:2410.00871},
  year={2024}
}

Contact

If you have any questions for discussion or potential collaboration, please feel free to contact me at [email protected].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages