Skip to content

baaivision/Uni3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

overview

We present Uni3D, a unified and scalable 3D pretraining framework for large-scale 3D representation learning, and explore its limits at the scale of one billion parameters. Uni3D uses a 2D initialized ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features. Via the simple architecture and pretext task, Uni3D can leverage abundant 2D pretrained models as initialization and image-text aligned models as the target, unlocking the great potential of 2D models and scaling-up strategies to the 3D world. We efficiently scale up Uni3D to one billion parameters, and set new records on a broad range of 3D tasks.

Schedule

We are committed to open-sourcing Uni3D related materials, including:

  • Extended Uni3D to a 3D metric (Uni3D-score) for enhanced semantic coherence in text-to-3D tasks. For details, see GeoDream.
  • The weights of models range from 6M to 1B parameters.
  • Evaluation code
  • Evaluation data
  • Pretraining code
  • Pretraining data

We hope to foster the growth of our community through open-sourcing and promoting collaboration👬. Let's step towards multimodal intelligence together🍻.

Installation

Clone this repository and install the required packages:

git clone https://github.com/baaivision/Uni3D.git
cd Uni3D

conda create -n uni3d python=3.8
conda activate uni3d
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r requirements.txt

# install pointnet2 extensions from https://github.com/erikwijmans/Pointnet2_PyTorch
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

Core packages:

Model Zoo

Model Training Data Objaverse-LVIS Top1 (Top5) ModelNet40 Top1 (Top5) ScanObjectNN Top1 (Top5)
Uni3d-B Ensembled w/o LVIS 45.9 (74.8) 86.1 (98.7) 61.7 (89.5)
Uni3d-B Ensembled 51.7 (80.8) 86.3 (97.9) 63.8 (90.2)
Uni3d-L Ensembled w/o LVIS 46.2 (74.7) 86.6 (97.8) 58.4 (90.1)
Uni3d-L Ensembled 53.1 (81.5) 86.3 (98.3) 58.2 (89.4)
Uni3d-g Ensembled w/o LVIS 47.2 (76.1) 86.8 (98.4) 66.5 (90.1)
Uni3d-g Ensembled 53.5 (82.0) 87.3 (99.2) 63.9 (91.7)
Uni3d-g 🔥 Ensembled 55.3 (82.9) 88.2 (99.3) 65.3 (92.7)

Evaluation of Zero-shot 3D classification

We evaluate the zero-shot 3D classification performance on three datasets: Objaverse-LVIS, ModelNet40 and ScanObjectNN.

  1. Please refer to DATASETS.md for evaluation dataset preparation.
  2. [Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
  3. Download model zoo weights and put them in /path/to/checkpoints folder.
  4. Run bash scripts/inference.sh [scale] to evaluate the model on the above datasets, e.g., bash scripts/inference.sh giant.

Pre-training

  1. Please refer to DATASETS.md for pre-train dataset preparation.
  2. [Recommended 🤗] Download the clip model and put it in /path/to/clip_model folder.
  3. [Recommended 🤗] Download the initialization model and put it in /path/to/init_model folder.
  4. Run bash scripts/pretrain.sh to pre-train the model on ensemble datasets.

Visualization

Open-world Understanding

scene

One-shot Part Segmentation

partseg

Point Cloud Painting

editing

Cross-modal Retrieval

retrival_text

retrival

Acknowledgement

Uni3D is built using the awesome EVA, OpenCLIP, timm, DeepSpeed, ULIP and OpenShape.

This work is supported by the National Science and Technology Major Project (No. 2022ZD0116314).

本项目受新一代人工智能国家科技重大专项(No. 2022ZD0116314)支持。

Citation

@inproceedings{zhou2023uni3d,
  title={Uni3d: Exploring unified 3d representation at scale},
  author={Zhou, Junsheng and Wang, Jinsheng and Ma, Baorui and Liu, Yu-Shen and Huang, Tiejun and Wang, Xinlong},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

About

[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5