π[arXiv] β π[PDF] β
NEWS:π₯3D-GRES is accepted at ACM MM 2024 (Oral)!π₯
Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji

3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.However, current approaches are limited to segmenting a single target, restricting the versatility of the task. To overcome this limitation, we introduce Generalized 3D Referring Expression Segmentation (3D-GRES), which extends the capability to segment any number of instances based on natural language instructions.In addressing this broader task, we propose the Multi-Query Decoupled Interaction Network (MDIN), designed to break down multi-object segmentation tasks into simpler, individual segmentations.MDIN comprises two fundamental components: Text-driven Sparse Queries (TSQ) and Multi-object Decoupling Optimization (MDO). TSQ generates sparse point cloud features distributed over key targets as the initialization for queries. Meanwhile, MDO is tasked with assigning each target in multi-object scenarios to different queries while maintaining their semantic consistency. To adapt to this new task, we build a new dataset, namely Multi3DRes. Our comprehensive evaluations on this dataset demonstrate substantial enhancements over existing models, thus charting a new path for intricate multi-object 3D scene comprehension.
Requirements
- Python 3.7 or higher
- Pytorch 1.12
- CUDA 11.3 or higher
The following installation suppose python=3.8 pytorch=1.12.1 and cuda=11.3.
-
Create a conda virtual environment
conda create -n 3d-gres python=3.8 conda activate 3d-gres -
Clone the repository
git clone https://github.com/sosppxo/MDIN.git -
Install the dependencies
Install Pytorch 1.12.1
pip install spconv-cu113 conda install pytorch-scatter -c pyg # or pip install https://data.pyg.org/whl/torch-1.12.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl pip install -r requirements.txtInstall segmentator from this repo (We wrap the segmentator in ScanNet).
-
Setup, Install mdin and pointgroup_ops.
sudo apt-get install libsparsehash-dev python setup.py develop cd gres_model/lib/ python setup.py develop -
Compile pointnet++
cd pointnet2
python setup.py install --user
cd ..
Download the ScanNet v2 dataset.
Put the downloaded scans folder as follows.
MDIN
βββ data
β βββ scannetv2
β β βββ scans
Split and preprocess point cloud data
cd data/scannetv2
bash prepare_data.sh
The script data into train/val folder and preprocess the data. After running the script the scannet dataset structure should look like below.
MDIN
βββ data
β βββ scannetv2
β β βββ scans
β β βββ train
β β βββ val
Download ScanRefer annotations following the instructions.
In the original ScanRefer annotations, all ann_id within each scene were individually assigned based on the corresponding object_id, resulting in duplicate ann_id. We have modified the ScanRefer annotations, and the revised annotation data, where each ann_id within a scene is unique, can be accessed here.
Put the downloaded ScanRefer folder as follows.
MDIN
βββ data
β βββ ScanRefer
β β βββ ScanRefer_filtered_train_new.json
β β βββ ScanRefer_filtered_val_new.json
Downloading the Multi3DRefer annotations.
Put the downloaded Multi3DRefer folder as follows.
MDIN
βββ data
β βββ Multi3DRefer
β β βββ multi3drefer_train.json
β β βββ multi3drefer_val.json
There are some typos in the original text, please correct them according to Issue #6 to prevent syntax parsing errors.
Or download the modified Multi3DRefer(New)
Downloading the ReferIt3D annotations and convert the .csv file into a .json format consistent with the Multi3DRefer format.
Put the downloaded ReferIt3D folder as follows.
MDIN
βββ data
β βββ ReferIt3D
β β βββ sr3d_train.json
β β βββ sr3d_val.json
β β βββ nr3d_train.json
β β βββ nr3d_val.json
Or download the modified ReferIt3D(.json)
Download SPFormer pretrained model (We only use the Sparse 3D U-Net backbone for training).
Move the pretrained model to backbones.
mkdir backbones
mv ${Download_PATH}/sp_unet_backbone.pth backbones/
Download pretrain models and move it to checkpoints.
| Benchmark | Task | mIoU | [email protected] | [email protected] | Model |
|---|---|---|---|---|---|
| Multi3DRes | 3D-GRES | 47.5 | 66.9 | 44.7 | Model |
| ScanRefer | 3D-RES | 48.3 | 58.0 | 53.1 | Model |
| Nr3D | 3D-RES | 38.6 | 48.4 | 42.2 | Model |
| Sr3D | 3D-RES | 46.4 | 56.6 | 51.3 | Model |
For 3D-GRES:
bash scripts/train_3dgres.sh
For 3D-RES:
bash scripts/train_3dres.sh
For 3D-GRES:
bash scripts/test_3dgres.sh
For 3D-RES:
bash scripts/test_3dres.sh
If you find this work useful in your research, please cite:
@misc{wu20243dgresgeneralized3dreferring,
title={3D-GRES: Generalized 3D Referring Expression Segmentation},
author={Changli Wu and Yihang Liu and Jiayi Ji and Yiwei Ma and Haowei Wang and Gen Luo and Henghui Ding and Xiaoshuai Sun and Rongrong Ji},
year={2024},
eprint={2407.20664},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.20664},
}
Sincerely thanks for ReLA, M3DRef-CLIP, EDA, SceneGraphParser, SoftGroup, SSTNet and SPFormer repos. This repo is build upon them.
