Jiacheng Chen , Ruizhi Deng , Yasutaka Furukawa
Simon Fraser University
ArXiv preprint (arXiv 2306.01461), Project page
This repository provides the official implementation of the paper PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models. This branch contains the code of the HD mapping task. The code of the floorplan reconstruction task is in the main branch.
The implementation of PolyDiffuse for the HD mapping task refers to the open-source works EDM and MapTR. The overall training and sampling framework follows EDM, while the folder projects is borrowed and adapted from MapTR (for data pipeline, denoising network architecture, and evaluation). We thank the authors for releasing their source code.
polydiffuse-intro-video.mp4
The denoising network and the data pipeline mostly follow MapTR, so both the environment installation and dataset downloads refer to the original MapTR repo.
After trying the installation instructions provided by the MapTR repo on various machines, we found the following three-step installation process to be the most smooth:
(1). Create a conda enviroment and activate it:
conda create -n polydiffuse-maptr python=3.8 -y
conda activate polydiffuse-maptr(2). Install the requirements via pip while ignoring the dependencies:
pip install -r requirements.txt --no-dependencies(3). Compile the mmdet3d and GKT as in the original MapTR
cd ./mmdetection3d
python setup.py develop
cd ..
cd ./projects/mmdet3d_plugin/maptr/modules/ops/geometric_kernel_attn
python setup.py build installSince we follow the RGB input setting as in the original MapTR paper, we only need a subset of the nuScenes dataset (i.e., RGB captures from six surrounding cameras, map annotations). To simplify the data preparation process and save the disk space, we provide a zipped file for all necessary data (~40GB) via this Dropbox folder link. Please download it into ./data, run cat nuscenes_maptr_processed.zip.part* > nuscenes_maptr_processed.zip to merge the small splits, and unzip. Or you can also follow the guidelines provided by MapTR to download the entire dataset and run the preprocessing.
We run the official MapTR to serve as one of our proposal generators. The saved results are provided in this link. Please download and put it under ./init_results. Or you can also run the MapTR to get the outputs.
Please download the following two checkpoints:
(1). Our pretrained models, put it into ./training-runs and unzip.
(2). Pretrained MapTR and ResNet (needed for denoising training), unzip it as ./ckpts.
The final file structure should be as follows:
data
├── can_bus
├── nuscenes
│── samples
│── maps
└── ...
ckpts # checkpoints for initialize the denoising network
├── resnet50-19c8e357.pth
├── maptr_tiny_r50_110e.pth
init_results
├── maptr_test.json # MapTR test results (serving as initial proposals)
training-runs
├── nuscenes_pretrained_ckpts
│── guide/... # checkpoints of guidance network
└── denoise/... # checkpoints of denoising network
The testing pipeline consists of sampling, visualization, and quantitative evaluation.
First, run the sampling for all the test examples by:
CUDA_VISIBLE_DEVICES=0 bash scripts/sample.sh
The default setting uses MapTR results as the initial proposal, set up the argument --proposal_type=rough_annot to init with mimic rough annotations. Set --viz_results=True to visualize the predictions, and set --viz_gif=True to also show the per-step GIF animation for each test sample.
Note that the default parameters in the script assume the use of pretrained checkpoints. If you re-train the models, remember to set up the parameters accordingly.
Evaluate the results via:
bash scripts/eval_map.sh
The argument --results_path should be set up properly to point to the output of the sampling script. The arguement --consider_angle is turned on to consider the angle-level correctness as discussed in our papaer. Without this argument, the evaluation becomes the same as the original mAP with Chamfer-only matching criterion.
The training of PolyDiffuse consists of two separate stages: 1) guidance training and 2) denoising training.
Train the guidance network by:
bash scripts/train_guide.sh
The training of the guidance network takes around an hour on a single NVIDIA RTX A5000 GPU.
Then train the denoising network by:
bash scripts/train.sh
Note that the path to the guidance network trained in the first stage needs to be set up properly with the argument --guide_ckpt. On our machine, the training takes around 45 hours with 8 NVIDIA RTX A5000, or around 65 hours with 4 NVIDIA RTX A5000.
This research is partially supported by NSERC Discovery Grants with Accelerator Supplements and DND/NSERC Discovery Grant Supplement, NSERC Alliance Grants, and John R. Evans Leaders Fund (JELF). We thank the Digital Research Alliance of Canada and BC DRI Group for providing computational resources.
If you find PolyDiffuse is helpful in your work, please consider starring 🌟 the repo and citing it by:
@article{Chen2023PolyDiffuse,
title={PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models},
author={Jiacheng Chen and Ruizhi Deng and Yasutaka Furukawa},
journal={ArXiv},
year={2023},
volume={abs/2306.01461}
}