Jingchao Xie*1,3, Oussema Dhaouadi*1,2,3†, Weirong Chen1,3, Johannes Meier1,2,3,
1 Computer Vision Group at Technical University of Munich (TUM)
2 DeepScenario
3 Munich Center for Machine Learning (MCML)
* Shared first authorship † Corresponding author
We present Combined Projected Uncertainty Visual Odometry (CoProU-VO) —
a novel visual odometry approach that robustly handles regions violating the static scene assumption within an unsupervised visual odometry framework.
Figure: Gray areas in the images indicate invalid regions excluded from loss calculation. Photometric residual brightness represents error magnitude, while projection brightness reflects uncertainty. Dynamic objects may appear distorted due to the static scene assumption. Our method robustly masks high-uncertainty regions, distinguishes parked cars (e.g., green box) from moving cars (e.g., red boxes), and detects occluded parts of parked vehicles (e.g., yellow box).
conda create -n coprou python=3.9
conda activate coprou
# Install PyTorch and torchaudio (version 2.7.0 with CUDA 11.8 support)
# ⚠️ Make sure to install the version that matches your local CUDA version.
# You can find other compatible versions at https://pytorch.org/get-started/previous-versions/
pip install torch==2.7.0+cu118 torchvision==0.22.0+cu118 torchaudio==2.7.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
# We use xFormers==0.0.30. Make sure to install a version compatible with your installed PyTorch version.
pip install xformers==0.0.30 --extra-index-url https://download.pytorch.org/whl/cu118
# Install other required Python packages
pip install -r requirements.txtWe trained and evaluated our model on two datasets:
Please download the datasets from the official links above and organize them under the \storage directory as follows:
\storage
\KITTI_odometry
\dataset
\sequences
...
\nuScenes
\maps
\samples
\sweeps
\v1.0-trainval
...Please Use the following commands to preprocess the datasets.
python data/prepare_train_data.py storage/KITTI_odometry/dataset \
--dataset-format 'kitti_odom' \
--dump-root storage/kitti_vo_256/ \
--width 832 --height 256 \
--num-threads 4python data/nusc.py --config data/nuscenes_config/local_nusc.yamlProcessed data will be saved under folder \storage
Create folder \checkpoints,
mkdir -p checkpointsand put the following checkpoints under the created folder.
Download CoProU-VO checkpoints.
Download Depth-Anything-V2-Small and ViT-S/14 distilled
# Download Depth-Anything-V2-Small checkpoint
wget -O checkpoints/depth_anything_v2_vits.pth "https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth?download=true"
# Download ViT-S/14 distilled (DINOv2) checkpoint
wget -O checkpoints/dinov2_vits14_pretrain.pth "https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth"
Once the dataset and checkpoint are prepared, inference on two consecutive images can be performed using the following command as an example:
python intermediate_visualization.py \
--pretrained-dispnet checkpoints/dispnet_checkpoint_kitti.pth.tar \
--pretrained-posenet checkpoints/exp_pose_checkpoint_kitti.pth.tar \
--img-height 256 \
--img-width 832 \
--dataset kitti\
--tgt-img storage/kitti_vo_256/05_2/002354.jpg \
--ref-img storage/kitti_vo_256/05_2/002353.jpg
python intermediate_visualization.py \
--pretrained-dispnet checkpoints/dispnet_checkpoint_nusc.pth.tar \
--pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar \
--img-height 256 \
--img-width 416 \
--dataset nuscenes\
--tgt-img storage/nuscenes_416_256/scene-0685_0/n008-2018-08-28-16-16-48-0400__CAM_FRONT__1535488216112404.jpg\
--ref-img storage/nuscenes_416_256/scene-0685_0/n008-2018-08-28-16-16-48-0400__CAM_FRONT__1535488216262404.jpg
Outputs, including depths, uncertainties, and synthesized image will be saved under \visualization.
torchrun --nproc_per_node=2 --master-port=29755 lightning_train.py storage/kitti_vo_256 --dataset kitti \
--encoder vits --dan \
--epochs 75 -b12 -s0.1 -c0.6 --sequence-length 3 \
--with-ssim 1 --with-mask 0 --with-auto-mask 1 --with-pretrain 1 \
--name kitti --lr 5e-4 torchrun --nproc_per_node=4 --master-port=29755 lightning_train.py storage/nuscenes_416_256 --dataset nuscenes \
--encoder vits --dan \
--epochs 25 -b8 -s0.1 -c0.6 --skip-frames 2 --sequence-length 3 \
--with-ssim 1 --with-mask 0 --with-auto-mask 1 --with-pretrain 1 \
--name nusc --lr 5e-4 On KITTI:
python test_vo.py --pretrained-posenet checkpoints/exp_pose_checkpoint_kitti.pth.tar --img-height 256 --img-width 832 --dataset-dir storage/KITTI_odometry/dataset/sequences/ --sequence 09 --output-dir eval_result/kitti/
python kitti_eval/eval_odom.py --result=eval_result/kitti/ --align='7dof'On nuScenes:
# eval
python test_vo_nusc.py --pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc
python nusc_eval/eval_odom.py --result=eval_result/nusc/checkpoints/exp_pose_checkpoint_nusc/ --align='7dof'# test
python test_vo_nusc.py --test --pretrained-posenet checkpoints/exp_pose_checkpoint_nusc.pth.tar --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc
python nusc_eval/eval_odom.py --test --result=eval_result/nusc/checkpoints/exp_pose_checkpoint_nusc/ --align='7dof'On KITTI:
python test_vo.py --pretrained-model <path to the checkpoints auto-saved by training script> --img-height 256 --img-width 832 --dataset-dir storage/KITTI_odometry/dataset/sequences/ --sequence 09 --output-dir eval_result/kitti/
python kitti_eval/eval_odom.py --result=eval_result/kitti/ --align='7dof'On nuScenes:
# eval
python test_vo_nusc.py --pretrained-model <path to the checkpoints auto-saved by training script> --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc
python nusc_eval/eval_odom.py --result=eval_result/nusc/checkpoints/+'<name of your checkpoint>' --align='7dof'# test
python test_vo_nusc.py --test --pretrained-model <path to the checkpoints auto-saved by training script> --img-height 256 --img-width 416 --dataset-dir storage/nuscenes_416_256/ --output-dir eval_result/nusc
python nusc_eval/eval_odom.py --test --result=eval_result/nusc/checkpoints/+'<name of your checkpoint>' --align='7dof'| Metric | Seq. 09 | Seq. 10 |
|---|---|---|
| ATE (m) | 9.84 | 11.28 |
| t_err (%) | 4.56 | 7.76 |
| r_err (degree/100m) | 2.02 | 3.58 |
We appreciate the contributions of the following projects, which have greatly supported our work:
-
SfMLearner-Pytorch - A pioneering framework for end-to-end monocular visual odometry.
-
SC-Depth - Our baseline.
-
Kitti-Odom-Eval-Python - Python implementation for KITTI odometry evaluation.
-
RoGS - Preprocessing code for the nuScenes dataset.
-
DepthAnything-v2 and DINOv2 – Providing Vision Transformer backbone features.
This project is licensed under the GNU General Public License v3.0.
See the LICENSE file for more details.
@InProceedings{xie2025gcpr,
title={CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry},
author={Xie, Jingchao and Dhaouadi, Oussema and Chen, Weirong and Meier, Johannes and Kaiser, Jacques and Cremers, Daniel},
booktitle= {DAGM German Conference on Pattern Recognition},
year={2025}
}
