Skip to content

woodfrog/poly-diffuse

 
 

Repository files navigation

PolyDiffuse: Polygonal Shape Reconstruction via
Guided Set Diffusion Models

Jiacheng Chen , Ruizhi Deng , Yasutaka Furukawa

Simon Fraser University

ArXiv preprint (arXiv 2306.01461), Project page

This repository provides the official implementation of the paper PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models. This branch contains the code of the HD mapping task. The code of the floorplan reconstruction task is in the main branch.

The implementation of PolyDiffuse for the HD mapping task refers to the open-source works EDM and MapTR. The overall training and sampling framework follows EDM, while the folder projects is borrowed and adapted from MapTR (for data pipeline, denoising network architecture, and evaluation). We thank the authors for releasing their source code.

Introduction video

polydiffuse-intro-video.mp4

Preparation

The denoising network and the data pipeline mostly follow MapTR, so both the environment installation and dataset downloads refer to the original MapTR repo.

Installation

After trying the installation instructions provided by the MapTR repo on various machines, we found the following three-step installation process to be the most smooth:

(1). Create a conda enviroment and activate it:

conda create -n polydiffuse-maptr python=3.8 -y
conda activate polydiffuse-maptr

(2). Install the requirements via pip while ignoring the dependencies:

pip install -r requirements.txt --no-dependencies

(3). Compile the mmdet3d and GKT as in the original MapTR

cd ./mmdetection3d
python setup.py develop
cd ..

cd ./projects/mmdet3d_plugin/maptr/modules/ops/geometric_kernel_attn
python setup.py build install

Data

Since we follow the RGB input setting as in the original MapTR paper, we only need a subset of the nuScenes dataset (i.e., RGB captures from six surrounding cameras, map annotations). To simplify the data preparation process and save the disk space, we provide a zipped file for all necessary data (~40GB) via this Dropbox folder link. Please download it into ./data, run cat nuscenes_maptr_processed.zip.part* > nuscenes_maptr_processed.zip to merge the small splits, and unzip. Or you can also follow the guidelines provided by MapTR to download the entire dataset and run the preprocessing.

We run the official MapTR to serve as one of our proposal generators. The saved results are provided in this link. Please download and put it under ./init_results. Or you can also run the MapTR to get the outputs.

Model checkpoints

Please download the following two checkpoints: (1). Our pretrained models, put it into ./training-runs and unzip. (2). Pretrained MapTR and ResNet (needed for denoising training), unzip it as ./ckpts.

The final file structure should be as follows:

data
  ├── can_bus  
  ├── nuscenes 
          │── samples
          │── maps
          └── ...  
ckpts   # checkpoints for initialize the denoising network
  ├── resnet50-19c8e357.pth 
  ├── maptr_tiny_r50_110e.pth

init_results
  ├── maptr_test.json  # MapTR test results (serving as initial proposals)

training-runs
  ├── nuscenes_pretrained_ckpts
          │── guide/...  # checkpoints of guidance network
          └── denoise/...  # checkpoints of denoising network

Testing

The testing pipeline consists of sampling, visualization, and quantitative evaluation.

Sampling & visualization

First, run the sampling for all the test examples by:

CUDA_VISIBLE_DEVICES=0 bash scripts/sample.sh

The default setting uses MapTR results as the initial proposal, set up the argument --proposal_type=rough_annot to init with mimic rough annotations. Set --viz_results=True to visualize the predictions, and set --viz_gif=True to also show the per-step GIF animation for each test sample.

Note that the default parameters in the script assume the use of pretrained checkpoints. If you re-train the models, remember to set up the parameters accordingly.

Evaluation

Evaluate the results via:

bash scripts/eval_map.sh

The argument --results_path should be set up properly to point to the output of the sampling script. The arguement --consider_angle is turned on to consider the angle-level correctness as discussed in our papaer. Without this argument, the evaluation becomes the same as the original mAP with Chamfer-only matching criterion.

Training

The training of PolyDiffuse consists of two separate stages: 1) guidance training and 2) denoising training.

Guidance training

Train the guidance network by:

bash scripts/train_guide.sh

The training of the guidance network takes around an hour on a single NVIDIA RTX A5000 GPU.

Denoising training

Then train the denoising network by:

bash scripts/train.sh

Note that the path to the guidance network trained in the first stage needs to be set up properly with the argument --guide_ckpt. On our machine, the training takes around 45 hours with 8 NVIDIA RTX A5000, or around 65 hours with 4 NVIDIA RTX A5000.

Acknowledgements

This research is partially supported by NSERC Discovery Grants with Accelerator Supplements and DND/NSERC Discovery Grant Supplement, NSERC Alliance Grants, and John R. Evans Leaders Fund (JELF). We thank the Digital Research Alliance of Canada and BC DRI Group for providing computational resources.

Citation

If you find PolyDiffuse is helpful in your work, please consider starring 🌟 the repo and citing it by:

@article{Chen2023PolyDiffuse,
  title={PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models},
  author={Jiacheng Chen and Ruizhi Deng and Yasutaka Furukawa},
  journal={ArXiv},
  year={2023},
  volume={abs/2306.01461}
}

About

Code for "PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models", NeurIPS 2023

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published