Skip to content

dreamguo/TreeSBA

Repository files navigation

TreeSBA

This repository is the official implementation of TreeSBA.

TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly
Mengqi Guo, Chen Li, Yuyang Zhao, Gim Hee Lee

Project Website arXiv

Abstract

Inferring step-wise actions to assemble 3D objects with primitive bricks from images is a challenging task due to complex constraints and the vast number of possible combinations. Recent studies have demonstrated promising results on sequential LEGO brick assembly through the utilization of LEGO-Graph modeling to predict sequential actions. However, existing approaches are class-specific and require significant computational and 3D annotation resources. In this work, we first propose a computationally efficient breadth-first search (BFS) LEGO-Tree structure to model the sequential assembly actions by considering connections between consecutive layers. Based on the LEGO-Tree structure, we then design a class-agnostic tree-transformer framework to predict the sequential assembly actions from the input multi-view images. A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice. We mitigate this problem by leveraging synthetic-to-real transfer learning. Specifically, our model is first pre-trained on synthetic data with full supervision from the available action labels. We then circumvent the requirement for action labels in the real data by proposing an action-to-silhouette projection that replaces action labels with input image silhouettes for self-supervision. Without any annotation on the real data, our model outperforms existing methods with 3D supervision by 7.8% and 11.3% in mIoU on the MNIST and ModelNet Construction datasets, respectively.

1. Installation

Pull repo.

git clone [email protected]:dreamguo/TreeSBA.git

Create conda environment.

conda env create -f environment.yml
conda activate TreeSBA

2. Usage

Download RAD, MNIST-C, ModelNet-C data used in the paper here and put them under ./dataset folder.

Please follow the data folder as:

├── dataset
|   ├── graph_dat
|   |   └── random13to18.dat   # RAD-S dataset
|   |   └── random15to50.dat   # RAD dataset
|   |   └── random.dat         # RAD-1k dataset
|   |   └── ...
|   ├── tree_actions
|   |   └── random13to18       # RAD-S dataset
|   |   └── ...
|   ├── tree_dep_actions
│   │   └── random13to18       # RAD-S dataset
|   |   └── ...
|   ├── voxel
│   │   └── mnist_all          # MNIST-C dataset
│   │   └── modelnet_all       # ModelNet-C40 dataset
|   |   └── ...
│   ├── voxel_img
│   │   └── random13to18       # RAD-S dataset
|   |   └── ...
├── ...

Training.

Pre-train on RAD dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/RAD.txt

Pre-train on RAD-S dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/RAD-S.txt

Fine tune on MNIST-C dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/MNIST-C.txt

Fine tune on ModelNet-C3 dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/ModelNet-C.txt

We also upload pre-trained checkpoints here, please put them under ./pretrained_model folder.

Test.

Test on MNIST-C dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/MNIST-C.txt --inference 1 --save_obj 1 --load_model_path pretrained_model/mnist_all.pt

Test on ModelNet-C3 dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/ModelNet-C.txt --inference 1 --save_obj 1 --load_model_path pretrained_model/modelnet_all3.pt

Test on ModelNet-C40 dataset.

CUDA_VISIBLE_DEVICES=0 python run.py --config configs/ModelNet-C.txt --inference 1 --save_obj 1 --load_model_path pretrained_model/modelnet_all40.pt

3. Dataset build from scratch

python prepare_dataset/prepareRAD.py

Our code for generting LEGO dataset is based on Combinatorial-3D-Shape-Generation.

4. Citation

If you make use of our work, please cite our paper:

@article{Guo2024TreeSBA,
  author    = {Guo, Mengqi and Li, Chen and Zhao, Yuyang and Lee, Gim Hee},
  title     = {TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly},
  journal   = {European Conference on Computer Vision (ECCV)},
  year      = {2024},
}

5. Ackowledgements

This work is based on GenerativeLEGO and Combinatorial-3D-Shape-Generation. If you use this code in your research, please also acknowledge their work.

About

[ECCV2024] TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published