Skip to content

[GCPR 2025 Best Paper Award] Official implementation of 'CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry'

License

Notifications You must be signed in to change notification settings

Jchao-Xie/CoProU

Repository files navigation

CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry

1 Computer Vision Group at Technical University of Munich (TUM)
2 DeepScenario
3 Munich Center for Machine Learning (MCML)

* Shared first authorship  † Corresponding author

Project Page arXiv

Contribution

We present Combined Projected Uncertainty Visual Odometry (CoProU-VO)
a novel visual odometry approach that robustly handles regions violating the static scene assumption within an unsupervised visual odometry framework.

Uncertainty Visualization

Figure: Gray areas in the images indicate invalid regions excluded from loss calculation. Photometric residual brightness represents error magnitude, while projection brightness reflects uncertainty. Dynamic objects may appear distorted due to the static scene assumption. Our method robustly masks high-uncertainty regions, distinguishes parked cars (e.g., green box) from moving cars (e.g., red boxes), and detects occluded parts of parked vehicles (e.g., yellow box).

Preparation

Environment

conda create -n coprou python=3.9
conda activate coprou

# Install PyTorch and torchaudio (version 2.7.0 with CUDA 11.8 support)
# ⚠️ Make sure to install the version that matches your local CUDA version.
# You can find other compatible versions at https://pytorch.org/get-started/previous-versions/
pip install torch==2.7.0+cu118 torchvision==0.22.0+cu118 torchaudio==2.7.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# We use xFormers==0.0.30. Make sure to install a version compatible with your installed PyTorch version.
pip install xformers==0.0.30 --extra-index-url https://download.pytorch.org/whl/cu118

# Install other required Python packages
pip install -r requirements.txt

Datasets and Preprocessing

We trained and evaluated our model on two datasets:

Please download the datasets from the official links above and organize them under the \storage directory as follows:

\storage
  \KITTI_odometry
    \dataset
      \sequences
        ...
  \nuScenes
    \maps
    \samples
    \sweeps
    \v1.0-trainval
    ...

Please Use the following commands to preprocess the datasets.

KITTI Odometry

python data/prepare_train_data.py storage/KITTI_odometry/dataset \
    --dataset-format 'kitti_odom' \
    --dump-root storage/kitti_vo_256/ \
    --width 832 --height 256 \
    --num-threads 4

nuScenes

python data/nusc.py --config data/nuscenes_config/local_nusc.yaml

Processed data will be saved under folder \storage

Checkpoints

Create folder \checkpoints,

mkdir -p checkpoints

and put the following checkpoints under the created folder.

CoProU-VO

Download CoProU-VO checkpoints.

Pre-trained ViTs

Download Depth-Anything-V2-Small and ViT-S/14 distilled

# Download Depth-Anything-V2-Small checkpoint
wget -O checkpoints/depth_anything_v2_vits.pth "https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true"

# Download ViT-S/14 distilled (DINOv2) checkpoint
wget -O checkpoints/dinov2_vits14_pretrain.pth "https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth"

Inference and Visualization

Once the dataset and checkpoint are prepared, inference on two consecutive images can be performed using the following command as an example:

KITTI

python intermediate_visualization.py \
  --pretrained-dispnet checkpoints/dispnet_checkpoint_kitti.pth.tar \
  --pretrained-posenet checkpoints/exp_pose_checkpoint_kitti.pth.tar \
  --img-height 256 \
  --img-width 832 \
  --dataset kitti\
  --tgt-img storage/kitti_vo_256/05_2/002354.jpg \
  --ref-img storage/kitti_vo_256/05_2/002353.jpg

nuScenes

python intermediate_visualization.py \
  --pretrained-dispnet checkpoints/dispnet_checkpoint_nusc.pth.tar \
  --pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar \
  --img-height 256 \
  --img-width 416 \
  --dataset nuscenes\
  --tgt-img storage/nuscenes_416_256/scene-0685_0/n008-2018-08-28-16-16-48-0400__CAM_FRONT__1535488216112404.jpg\
  --ref-img storage/nuscenes_416_256/scene-0685_0/n008-2018-08-28-16-16-48-0400__CAM_FRONT__1535488216262404.jpg

Outputs, including depths, uncertainties, and synthesized image will be saved under \visualization.

Training

Training on KITTI:

torchrun --nproc_per_node=2 --master-port=29755 lightning_train.py storage/kitti_vo_256 --dataset kitti \
--encoder vits --dan \
--epochs 75 -b12 -s0.1 -c0.6 --sequence-length 3 \
--with-ssim 1 --with-mask 0 --with-auto-mask 1 --with-pretrain 1 \
--name kitti --lr 5e-4 

Training on nuScenes

torchrun --nproc_per_node=4 --master-port=29755 lightning_train.py storage/nuscenes_416_256 --dataset nuscenes \
--encoder vits --dan \
--epochs 25 -b8 -s0.1 -c0.6 --skip-frames 2 --sequence-length 3 \
--with-ssim 1 --with-mask 0 --with-auto-mask 1 --with-pretrain 1 \
--name nusc --lr 5e-4 

Tensorboard and Checkpoints are saved under \checkpoints.

Evaluation

Evaluation of our provided checkpoints

On KITTI:

python test_vo.py --pretrained-posenet checkpoints/exp_pose_checkpoint_kitti.pth.tar --img-height 256 --img-width 832 --dataset-dir storage/KITTI_odometry/dataset/sequences/ --sequence 09 --output-dir eval_result/kitti/

python kitti_eval/eval_odom.py --result=eval_result/kitti/ --align='7dof'

On nuScenes:

# eval
python test_vo_nusc.py --pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc

python nusc_eval/eval_odom.py --result=eval_result/nusc/checkpoints/exp_pose_checkpoint_nusc/ --align='7dof'
# test
python test_vo_nusc.py --test --pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc

python nusc_eval/eval_odom.py --test --result=eval_result/nusc/checkpoints/exp_pose_checkpoint_nusc/ --align='7dof'

Evaluation of your trained checkpoints

On KITTI:

python test_vo.py --pretrained-model <path to the checkpoints auto-saved by training script> --img-height 256 --img-width 832 --dataset-dir storage/KITTI_odometry/dataset/sequences/ --sequence 09 --output-dir eval_result/kitti/

python kitti_eval/eval_odom.py --result=eval_result/kitti/ --align='7dof'

On nuScenes:

# eval
python test_vo_nusc.py --pretrained-model <path to the checkpoints auto-saved by training script>  --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc

python nusc_eval/eval_odom.py --result=eval_result/nusc/checkpoints/+'<name of your checkpoint>' --align='7dof'
# test
python test_vo_nusc.py --test --pretrained-model <path to the checkpoints auto-saved by training script>  --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc

python nusc_eval/eval_odom.py --test --result=eval_result/nusc/checkpoints/+'<name of your checkpoint>' --align='7dof'

Visual Odometry Results on KITTI odometry dataset

CoProU-VO result trained on sequence 00, 02-08

Metric Seq. 09 Seq. 10
ATE (m) 9.84 11.28
t_err (%) 4.56 7.76
r_err (degree/100m) 2.02 3.58

Acknowledgement

We appreciate the contributions of the following projects, which have greatly supported our work:

License

This project is licensed under the GNU General Public License v3.0.
See the LICENSE file for more details.

If you find our work useful in your research, please consider citing our paper:

@InProceedings{xie2025gcpr, 
  title={CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry}, 
  author={Xie, Jingchao and Dhaouadi, Oussema and Chen, Weirong and Meier, Johannes and Kaiser, Jacques and Cremers, Daniel}, 
  booktitle= {DAGM German Conference on Pattern Recognition}, 
  year={2025} 
}

About

[GCPR 2025 Best Paper Award] Official implementation of 'CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry'

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages