Junsheng Zhou1,2*, Jinsheng Wang1*, Baorui Ma1*, Yu-Shen Liu2, Tiejun Huang1,3, Xinlong Wang1
1BAAI, 2THU, 3PKU
* Equal Contribution
ICLR 2024 (Spotlight)
We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.
We are committed to open-sourcing Uni3D related materials, including:
- Extended Uni3D to a 3D metric (Uni3D-score) for enhanced semantic coherence in text-to-3D tasks. For details, see GeoDream.
- The weights of models range from 6M to 1B parameters.
- Evaluation code
- Evaluation data
- Pretraining code
- Pretraining data
We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.
Clone this repository and install the required packages:
git clone https://github.com/baaivision/Uni3D.git
cd Uni3D
conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
Core packages:
- Pytorch version 2.0.1
- open-clip-torch version 2.20.0
- timm version 0.9.7
- DeepSpeed version 0.10.3
- Open3D version 0.17.0
| Model | Training Data | Objaverse-LVIS Top1 (Top5) | ModelNet40 Top1 (Top5) | ScanObjectNN Top1 (Top5) |
|---|---|---|---|---|
| Uni3d-B | Ensembled w/o LVIS | 45.9 (74.8) | 86.1 (98.7) | 61.7 (89.5) |
| Uni3d-B | Ensembled | 51.7 (80.8) | 86.3 (97.9) | 63.8 (90.2) |
| Uni3d-L | Ensembled w/o LVIS | 46.2 (74.7) | 86.6 (97.8) | 58.4 (90.1) |
| Uni3d-L | Ensembled | 53.1 (81.5) | 86.3 (98.3) | 58.2 (89.4) |
| Uni3d-g | Ensembled w/o LVIS | 47.2 (76.1) | 86.8 (98.4) | 66.5 (90.1) |
| Uni3d-g | Ensembled | 53.5 (82.0) | 87.3 (99.2) | 63.9 (91.7) |
| Uni3d-g 🔥 | Ensembled | 55.3 (82.9) | 88.2 (99.3) | 65.3 (92.7) |
We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.
- Please refer to DATASETS.md for evaluation dataset preparation.
- [Recommended 🤗] Download the clip model and put it in
/path/to/clip_modelfolder. - Download model zoo weights and put them in
/path/to/checkpointsfolder. - Run
bash scripts/inference.sh [scale]to evaluate the model on the above datasets, e.g.,bash scripts/inference.sh giant.
- Please refer to DATASETS.md for pre-train dataset preparation.
- [Recommended 🤗] Download the clip model and put it in
/path/to/clip_modelfolder. - [Recommended 🤗] Download the initialization model and put it in
/path/to/init_modelfolder. - Run
bash scripts/pretrain.shto pre-train the model on ensemble datasets.
Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.
This work is supported by the National Science and Technology Major Project (No. 2022ZD0116314).
本项目受新一代人工智能国家科技重大专项(No. 2022ZD0116314)支持。
@inproceedings{zhou2023uni3d,
title={Uni3d: Exploring unified 3d representation at scale},
author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
booktitle={International Conference on Learning Representations (ICLR)},
year={2024}
}




