OpenCity3D🏙️: 3D Urban Scene Understanding with Vision-Language Models

Valentin Bieri¹, Marco Zamboni¹ Nicolas S Blumer^1,2 Qingxuan Chen^1,2
Francis Engelmann^1,3

¹ETH Zürich ²University of Zurich ³Stanford University

WACV 2025

Paper | Project Page

OpenCity3D is a zero-shot approach for open-vocabulary 3D urban scene understanding.

BibTex

@inproceedings{opencity3d2025,
    title = {OpenCity3D: 3D Urban Scene Understanding with Vision-Language Models},
    author = {Bieri, Valentin and Zamboni, Marco and Blumer, Nicolas S. and Chen, Qingxuan and Engelmann, Francis},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year = {2025},
    organization = {IEEE}
}

Setup Environment

Please clone this repository first with running:

git clone https://github.com/opencity3d/opencity3d.git

Preparing Conda environment:

# Create environment and install dependencies:
conda env create --file environment.yml

# Activate environment
conda activate opencity

Pipeline

Dataset Generation

Get the mesh by Blender using Blosm. By following the guide you could select the place and paste the coordinate into Blender.
Export the mesh with .glb file and render with Meshlab. Save the object file (.obj) together with the texture images.
Create a data folder and a folder under data with scene name. To obtain the RGB and depth image, you should run generate_dataset.py

Your dataset should have the following structure:

<data>
|---scane_name
|   |---scene.glb
|   |---scene.obj
|   |---texture_0.jpg
|   |---texture_1.jpg
|   |---...

First, optionally change the locaction of the file and output path in generate_dataset.py:

'''
Example of how to generate RGB and depth image in generate_dataset.py
'''

......
......

# Change here for your own scene generation
file = "/path/to/your/data/scene_name/scene.obj"
output_path = "/path/to/your/data/scene-output-v1/"

......
......

Then run the `generate_dataset.py' by:

cd dataset_generation
python generate_dataset.py

By doing this, you could get the rendered RGB-D and depth images and thus finish dataset generation:

<data>
|---scene_name
|   |---scene.glb
|   |---scene.obj
|   |---texture_0.jpg
|   |---texture_1.jpg
|   |---...
|---scene-output-v1
|   |---color
|   |   |---0.jpg
|   |   |---1.jpg
|   |   |---...
|   |---depth
|   |   |---0.npy
|   |   |---1.npy
|   |   |---...
|   |---intrinsic
|   |   |---intrinsic_color.txt
|   |   |---projection_matrix.txt
|   |---pose
|   |   |---0.txt
|   |   |---1.txt
|   |   |---...

Piepline

Step 1: Generate image features. Run the following code:

cd .. # Back to project root folder
cd preprocessing
python preprocess.py --dataset_path $path-of-scene-output-v1 --model siglip --mode highlight

To run the baseline (Openscene/LangSplat + CLIP) you should run preprocess_level0.py instead of preprocess.py.

ATTENTION! This step takes a lot time, after that you may find the generated feature under the folder /scene-output-v1/ with name language_features_highlight (Without highlight for running baseline)

Step 2: Projecting features to scene and generate point cloud.

Adjust the path to the scene mesh and language features generated:

'''
Example of how to generate RGB and depth image in convert_to_point_cloud.py
'''

......
......

if __name__ == "__main__":
    if True:
        base_path = "/path/to/your/data/scene-output-v1/"
        obj_path = "/path/to/your/data/scene_name/scene.obj"
        full_embeddings_mode = False # True if you are doing baseline
        
    convert_to_pcd(obj_path = obj_path, #"scene_example_downsampled.ply",
                    images_path= base_path + "color",
                    depth_path = base_path + "depth",
                    feat_path = base_path + "language_features",
                    mask_path = base_path + "language_features",
                    full_embedding_path = base_path + "full_image_embeddings",
                    poses_path = base_path + "pose",
                    intrinsics_path = base_path + "intrinsic/projection_matrix.txt",
                    output_path = "semantic_point_cloud.ply",
                    full_embeddings_mode = full_embeddings_mode)

Running the code:

cd ..
python convert_to_point_cloud.py

You may find the generated features and point cloud file (point_features_highlight.npy and generated_point_cloud.ply) under the /eval folder.

Example

We prepare the Rotterdam scene's process result, which contains the generated point cloud and extracted highlighted features. You can download via the link and play with the visualize_pcd_features.ipynb under sandbox folder.

How to play with:

Create a folder under /data/ with the name /embedded_point_clouds/. Then create a scene_name sub-folder under /embedded_point_clouds/.
Put the generated feature and point cloud files into it. Then change the following configuration in the notebook:

tag = "scene_name" # name of the sub-folder
model_type = "siglip"
crop_type = "highlight" #"highlight" #"full"

Run the following uncommented notebook to download the siglip model and tokens.
- E.g. Running this block for visualizing the heat scene of query result.

queries = ["tree"] # Set the query here
query_embed = encode_text(queries, model, tokenizer)
sim = features @ query_embed 
sim = sim.max(axis=1)
# sim = np.exp(sim)
    # sim = np.exp(sim) / (np.exp(sim) + np.exp(max_canonical_sim))
print(sim.shape)

for i, query in enumerate(queries):
    visualize(pcd, sim[:,i], query)

TODO list:

Update Readme
release the arhxiv camera-ready version
release the code of the embedding training
release the preprocessed dataset and the pretrained embeddings
release the code of the visulization cookbook
release the code of experienment tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenCity3D🏙️: 3D Urban Scene Understanding with Vision-Language Models

WACV 2025

Paper | Project Page

BibTex

Setup Environment

Pipeline

Example

How to play with:

TODO list:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset_generation		dataset_generation
preprocessing		preprocessing
sandbox		sandbox
.DS_Store		.DS_Store
README.md		README.md
convert_to_point_cloud.py		convert_to_point_cloud.py
environment.yml		environment.yml

opencity3d/opencity3d

Folders and files

Latest commit

History

Repository files navigation

OpenCity3D🏙️: 3D Urban Scene Understanding with Vision-Language Models

WACV 2025

Paper | Project Page

BibTex

Setup Environment

Pipeline

Example

How to play with:

TODO list:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages