Ayca Takmaz1,2,†,
Alexandros Delitzas1,
Robert W. Sumner1,
Francis Engelmann1,2,3,*,
Johanna Wald2,*,
Federico Tombari2
1ETH Zurich,
2Google,
3Stanford
†work done as an intern at Google Zurich
*equal supervision
Search3D is a an approach that builds a hierarchical open-vocabulary 3D scene representation, enabling the search for entities at varying levels of granularity: fine-grained object parts, entire objects, or regions described by attributes like materials.
Below, we outline the steps for setting up the environment and installing the necessary packages.
conda create -n search3d python=3.10
conda activate search3d
pip install -e . # install current repository in editable modepip install numpy==1.26
pip install --upgrade "jax[cuda12_pip]==0.4.26" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip install -r search3d/mask_feature_computation/big_vision_siglip/big_vision/requirements.txt# you can verify that the installed jax and tensorflow can indeed access the GPUs in Python with the following test:
from jax.lib import xla_bridge
print(xla_bridge.get_backend().platform)
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))pip install numpy==1.26 torch==1.12.1 torchvision==0.13.1 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install trimesh open3d imageio open-clip-torchpython -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
pip install git+https://github.com/cocodataset/panopticapi.git
pip install opencv-python transformers hydra-core omegaconf kornia
cd search3d/dense_feature_computation/semantic_sam/Mask2Former/mask2former/modeling/pixel_decoder/ops
sh make.shStep 5. Installing the packages required for Segmentator (geometric oversegmentation with graph cut)
pip install numba
cd search3d/object_and_part_computation/segmentator/csrc
mkdir build && cd build
export CUDA_BIN_PATH=/usr/local/cuda-11.7
cmake .. \
-DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` \
-DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
-DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))") \
-DCMAKE_INSTALL_PREFIX=`python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())'`
make && make installTBDYou can download all necessary checkpoints for the underlying submodules (SigLIP, SemanticSAM etc.) from this Google Drive folder. Once you download the checkpoints into a folder and unpack it, you can link that directory to the resources folder in this repository. You can do this in the following way:
mkdir resources
ln -s /path/to/folder/with/the/downloaded/checkpoints resourcesTBDThere are a couple of components of Search3D that we run in order to compute object masks, object features, segments and segment features. At the moment in the current form of this codebase, we perform the merging of the segments and hierarchical search directly in our evaluation scripts. We plan to integrate that directly in this codebase too in the near future. Here, we are outlining how to compute the masks and features for the MultiScan dataset.
# first set-up the environment (see previous section)
# run the following script that computes and extracts all object masks for all scenes in MultiScan
# please don't forget to set the dataset directory and output directory in this script!
bash run_search3d_multiscan_obj_masks.sh# run the following script that reads all object masks extracted in the previous step and computes object features for all scenes in MultiScan
# please don't forget to set the dataset directory and output directory in this script!
bash run_search3d_multiscan_obj_features.sh# run the following script that reads all object masks extracted in the first step, computes segments constrained to these object instances and exports the hierarchical scene representation.
# then, it computes segment features for all scenes in MultiScan (see the second section in the following script)
# please don't forget to set the dataset directory and output directory in this script!
bash run_search3d_multiscan_obj_features.sh@article{takmaz2025search3d,
title={{Search3D: Hierarchical Open-Vocabulary 3D Segmentation}},
author={Takmaz, Ayca and Delitzas, Alexandros and Sumner, Robert W. and Engelmann, Francis and Wald, Johanna and Tombari, Federico},
journal={IEEE Robotics and Automation Letters (RA-L)},
year={2025}
}