TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image

ICRA 2025

Haoxiao Wang¹, Kaichen Zhou¹*, Binrui Gu¹, Zhiyuan Feng², Weijie Wang³, Peilin Sun³, Yicheng Xiao⁴, Jianhua Zhang¹, Hao Dong¹*

¹Peking University, ²Tsinghua University, ³Zhejiang University, ⁴Southeast University

*equal contributions, *corresponding author

Installation • Dataset • Training • Testing • Results • Citation

TransDiff presents a diffusion-based method for depth estimation of transparent objects. By leveraging RGB cues like edges and normals, our model gradually refines depth through a denoising process. Despite the challenges of reflection and refraction, TransDiff achieves accurate, material-agnostic depth maps and outperforms existing methods on both synthetic and real-world datasets.

Dataset

Download the ClearGrasp dataset from ClearGrasp. This dataset contains RGB-D images of transparent objects for depth completion and manipulation tasks.

Dataset Structure

data/
├── cleargrasp/
│   ├── train/
│   │   ├── rgb/
│   │   ├── depth/
│   │   ├── mask/
│   │   └── init_depth/
│   └── test/
│       ├── rgb/
│       ├── depth/
│       ├── mask/
│       └── init_depth/
└── data_json/
    └── cleargrasp_train_0_1.json

Installation

Prerequisites

Our released implementation is tested on:

Ubuntu 20.04 / Ubuntu 22.04
Python 3.10.x
NVIDIA CUDA 12.4
8x NVIDIA GTX 4090 / 8x NVIDIA A100 RTX GPUs

Environment

conda create -n transdiff python=3.10
conda activate transdiff
pip install -r requirements.txt

Training

Quick Start Script

Use the provided script for ClearGrasp dataset:

cd src
chmod +x run_cleargrasp.sh
./run_cleargrasp.sh

Script contents:

#!/bin/bash
python main.py \
    --dir_data DATA_PATH \
    --data_name CLEARGRASP \
    --split_json DATA_JSON_PATH \
    --patch_height 144 --patch_width 256 \
    --gpus 0,1,2,3,4,5,6,7 \
    --loss "1.0*L1+1.0*L2+1.0*DDIM" \
    --epochs 30 \
    --batch_size 64 \
    --max_depth 1.5 \
    --save CLEARGRASP_results \
    --model_name Transdiff_Diffusion \
    --backbone_module swin \
    --backbone_name swin_large_naive_l4w722422k \
    --head_specify DDIMDepthEstimate_Swin_ADDHAHI \

Testing

Inference on Test Set

python main.py \
    --dir_data DATA_PATH \
    --data_name CLEARGRASP \
    --split_json DATA_JSON_PATH \
    --patch_height 144 --patch_width 256 \
    --gpus 0 \
    --max_depth 1.5 \
    --batch_size 1 \
    --test_only \
    --pretrain path/to/trained/model.pt \
    --save test_results \
    --save_image \
    --model_name Transdiff_Diffusion \
    --backbone_module swin \
    --backbone_name swin_large_naive_l4w722422k \
    --head_specify DDIMDepthEstimate_Swin_ADDHAHI

Common Installation Issues

1. MMDetection Installation:

# If mmdet installation fails, try:
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
mim install mmdet

2. OpenEXR Installation:

# On Ubuntu/Debian:
sudo apt-get install libopenexr-dev

# On CentOS/RHEL:
sudo yum install OpenEXR-devel

# Using conda:
conda install -c conda-forge openexr-python

Citation

If you find this work useful in your research, please cite:

@article{wang2025transdiff,
  title={Transdiff: Diffusion-based method for manipulating transparent objects using a single rgb-d image},
  author={Wang, Haoxiao and Zhou, Kaichen and Gu, Binrui and Feng, Zhiyuan and Wang, Weijie and Sun, Peilin and Xiao, Yicheng and Zhang, Jianhua and Dong, Hao},
  journal={arXiv preprint arXiv:2503.12779},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
imgs		imgs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image

Dataset

Dataset Structure

Installation

Prerequisites

Environment

Training

Quick Start Script

Testing

Inference on Test Set

Common Installation Issues

Citation

About

Uh oh!

Releases

Packages

Languages

License

Trans-Diff/TransDiff

Folders and files

Latest commit

History

Repository files navigation

TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image

Dataset

Dataset Structure

Installation

Prerequisites

Environment

Training

Quick Start Script

Testing

Inference on Test Set

Common Installation Issues

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages