Skip to content

snuvclab/oor

Repository files navigation

[ICCV 2025] Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Pipeline

This is the official code for the paper "Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models".

Installation

System Requirements

This code has been tested in the following settings, but is expected to work in other systems.

  • Ubuntu 20.04
  • CUDA 11.8
  • NVIDIA RTX A6000

Conda Environment

conda create -n oor python=3.8
conda activate oor
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
git checkout -f v0.7.2
pip install -e .
pip install transformers==4.38.2 opencv-python==4.2.0.32 scipy==1.4.1 numpy==1.23.5 tensorboardX==2.5.1 pyrender==0.1.45 torchdiffeq matplotlib wandb trimesh[easy]

Training Dataset and Pre-trained Model Preparation

An unexpected accident caused the loss of all our data and code. We are able to recover the training and inference code, but not the data generation part. Instead, we release a rule-based dataset.

The rule-based dataset consists of the following 9 pairs of OORs:

  • Object category pair: (desk, monitor) | Relationship: on
  • Object category pair: (desk, keyboard) | Relationship: on
  • Object category pair: (desk, mouse) | Relationship: on
  • Object category pair: (desk, teacup) | Relationship: on
  • Object category pair: (desk, teapot) | Relationship: on
  • Object category pair: (monitor, keyboard) | Relationship: in front of
  • Object category pair: (mouse, keyboard) | Relationship: next to
  • Object category pair: (teacup, teapot) | Relationship: around
  • Object category pair: (teacup, teapot) | Relationship: pour

Each OOR pair dataset contains 1000 OOR samples: 500 are generated with one object as the base and the other as the target, and the remaining 500 are generated with the base and target swapped.

Each OOR pair dataset is saved as a pickle file. See data/oor_pickle/ directory.

The object meshes used to create the training dataset are located in the data/CAD directory. They are all collected from Sketchfab:

We also provide a pre-trained model trained on a rule-based dataset. See results/ckpts/OOR/ckpt_epoch20000.pth

Training

If you want to train your model from scratch, run:

bash scripts/train_score_r_t_s.sh

If you don't change any arguments, checkpoints will be saved in results/ckpts/OOR.

Inference

Generate Pairwise OOR

Set text_prompt, base_object and target_object in scripts/inference_r_t_s.sh. Then,

bash scripts/inference_r_t_s.sh

The inference results will be saved as follows: results/inference/pairwise_oor/{input_text_prompt}/base-{base_object}_target-{target_object}/inference.pkl

Generate Masked Pairwise OOR

This is similar to the inpainting in image diffusion models. Therefore, additional information about the mask is required. The mask structure should be as follows:

{'target_R': None or (3,3) shape array, 'target_t': None or (3,) shape array, 'target_s': None or (3,) shape array, 'base_s': None or (3,) shape array}

If not None, the information is masked and maintained during the inference process. The format should be a relative pose and scale defined in the base object’s instance canonical space. For example, the longest component of base_s should be 1. Please refer to the method formulation section of the paper for details. See data/mask_info_pickle/desk_teacup_on.pkl for an example.

Now, set text_prompt, base_object, target_object, and mask_info_path in scripts/inference_masked_r_t_s.sh. Then,

bash scripts/inference_masked_r_t_s.sh

The inference results will be saved as follows: results/inference/masked_pairwise_oor/{input_text_prompt}/base-{base_object}_target-{target_object}/{mask_info_name}.pkl

Generate Multi-object OOR

Multi-object object generation requires information about the scene. The structure is as follows:

{'prompt_list': ['A monitor is on a desk', ...], 'base_list': [('desk', 0), ...], 'target_list': [('monitor', 1), ...]}
  • len(prompt_list) == len(base_list) == len(target_list)
  • When drawing a graph with (base, target) pairs, it must be a DAG with a single starting node (global base).
  • The obj id of the global base must be 0.

See data/multi_info_pickle/desk_monitor_keyboard_mouse.pkl for an example.

Now, set scene_info_path in scripts/inference_multi_r_t_s.sh. Then,

bash scripts/inference_multi_r_t_s.sh

The inference results will be saved as follows: results/inference/multi_oor/{scene_info_name}/inference.pkl

Adding Objects to the Existing Scene

We need information about the existing scene and the OORs we want to add. The structure is as follows:

{'prompt_list': ['A teapot on a desk', ...], 'base_list': [('desk', 0), ...], 'target_list': [('teapot', 3), ...], 'existing_scene_info': {('desk', 0): {"R": ..., "t": ..., "s": ...}, ...}}
  • len(prompt_list) == len(base_list) == len(target_list)
  • When drawing a graph with (base, target) pairs, it must be a DAG with a single starting node (global base).
  • The obj id of the global base must be 0.
  • Objects added to a scene cannot be ancestors of nodes in the existing scene graph, and must be descendants of at least one existing node (More specifically, it is not allowed that the added object is the base and the existing object is the target).

See data/add_scene_info_pickle/desk_teacupx2_teapotx2.pkl for an example.

Now, set scene_info_path in scripts/inference_add_to_scene.sh. Then,

bash scripts/inference_add_to_scene.sh

The inference results will be saved as follows: results/inference/add_multi_oor/{scene_info_name}/inference.pkl

Changing OORs in the Existing Scene

As you can see from the paper, the OOR change procedure is deterministic. Therefore, we remove the batch size from the inputs and instead load multiple scene info from the input pickle file, using the number of scenes as the batch size. The structure is as follows:

{'prompt_list': ['A teapot pours into a teacup.', 'A teapot pours into a teacup.'], 'base_list': [('teacup', 1), ('teacup', 2)], 'target_list': [('teapot', 3), ('teapot', 4)], 'existing_scene_info_list': [{('desk', 0): {"R": ..., "t": ..., "s": ...}, ...}, ...]}
  • len(prompt_list) == len(base_list) == len(target_list)
  • When drawing a graph with (base, target) pairs, it must be a DAG (It does not have to be a single start node).
  • The obj id of the global base must be 0.
  • For each OOR to be changed, every object pair (base object, target object) must already exist within existing_scene_info, and the global base cannot be the target object.
  • Assume that the scale of objects (both base and target) is fixed.

See data/change_scene_info_pickle/desk_teacupx2_teapotx2.pkl for an example.

Now, set scene_info_path in scripts/inference_change_oor.sh. Then,

bash scripts/inference_change_oor.sh

The inference results will be saved as follows: results/inference/change_multi_oor/{scene_info_name}/inference.pkl

Visualization

Run:

python3 visualize_oor.py --pickle_path {pickle path}

# For example,
# 1) dataset
python3 visualize_oor.py --pickle_path data/oor_pickle/desk_monitor_on.pkl
# 2) inference
python3 visualize_oor.py --pickle_path results/inference/pairwise_oor/A_monitor_is_on_a_desk/base-desk_target-monitor/inference.pkl

We provide a very naive pyrender visualization. I personally recommend using Blender.

TODOs

  • Integrating inconsistency and collision term into Changing OORs in the Existing Scene

Citation

@inproceedings{oor,
      title={Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models}, 
      author={Baik, Sangwon and Kim, Hyeonwoo and Joo, Hanbyul},
      booktitle={ICCV},
      year={2025}
}

About

[ICCV 2025] Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published