Frano Rajič1 ·
Haofei Xu1 ·
Marko Mihajlovic1 ·
Siyuan Li1 ·
Irem Demir1
Emircan Gündoğdu1 ·
Lei Ke2 ·
Sergey Prokudin1,3 ·
Marc Pollefeys1,4 ·
Siyu Tang1
1ETH Zürich
2Carnegie Mellon University
3Balgrist University Hospital
4Microsoft
MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. It fuses multi-view features into a unified 3D feature point cloud, within which it leverages kNN-based correlation to capture spatiotemporal relationships across views. A transformer then iteratively refines the point tracks, handling occlusions and adapting to varying camera setups without per-sequence optimization.
- August 28, 2025: Public release.
This repo was validated on Python 3.10.12, PyTorch 2.3.0 (CUDA 12.1), cuDNN 8903, and gcc 11.3.0. If you want a fresh minimal environment that runs the Hub demo and demo.py:
conda create -n 3dpt python=3.10.12 -y
conda activate 3dpt
conda install pytorch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install -r https://raw.githubusercontent.com/ethz-vlg/mvtracker/refs/heads/main/requirements.txt
# Optional, speeds up the model
pip install --upgrade --no-build-isolation flash-attn==2.5.8 # Speeds up attention
pip install "git+https://github.com/ethz-vlg/pointcept.git@2082918#subdirectory=libs/pointops" # Speeds up kNN search; may require gcc 11.3.0: conda install -c conda-forge gcc_linux-64=11.3.0 gxx_linux-64=11.3.0 gcc=11.3.0 gxx=11.3.0With the minimal dependencies in place, you can try MVTracker directly via PyTorch Hub:
import torch
import numpy as np
from huggingface_hub import hf_hub_download
device = "cuda" if torch.cuda.is_available() else "cpu"
mvtracker = torch.hub.load("ethz-vlg/mvtracker", "mvtracker", pretrained=True, device=device)
# Example input from demo sample (downloaded automatically)
sample = np.load(hf_hub_download("ethz-vlg/mvtracker", "data_sample.npz"))
rgbs = torch.from_numpy(sample["rgbs"]).float()
depths = torch.from_numpy(sample["depths"]).float()
intrs = torch.from_numpy(sample["intrs"]).float()
extrs = torch.from_numpy(sample["extrs"]).float()
query_points = torch.from_numpy(sample["query_points"]).float()
with torch.no_grad():
results = mvtracker(
rgbs=rgbs[None].to(device) / 255.0,
depths=depths[None].to(device),
intrs=intrs[None].to(device),
extrs=extrs[None].to(device),
query_points_3d=query_points[None].to(device),
)
pred_tracks = results["traj_e"].cpu() # [T,N,3]
pred_vis = results["vis_e"].cpu() # [T,N]
print(pred_tracks.shape, pred_vis.shape)Alternatively, you can run our interactive demo:
python demo.py --rerun save --lightweightBy default this saves a lightweight .rrd recording (e.g., mvtracker_demo.rrd) that you can open in any Rerun viewer. The simplest option is to drag and drop the file into the online viewer. For the best experience, you can also install Rerun locally (pip install rerun-sdk==0.21.0; rerun). Results can be explored interactively in the viewer with WASD/QE navigation, mouse rotation and zoom, and timeline playback controls.
[Interactive viewer on a cluster or with GUI support - click to expand]
If you are working on a cluster, you can stream results directly to your laptop by forwarding a port (ssh -R 9876:localhost:9876 user@cluster) and then running the demo in streaming mode (python demo.py --rerun stream), which sends live data into your local Rerun instance. If you are running the demo locally with GUI support, you can automatically spawn a Rerun window (python demo.py --rerun spawn).
You can use a pretrained model directly via PyTorch Hub (see Quick Start above), or clone this repo if you want to run our demo, evaluation, or training. We recommend using PyTorch with CUDA for best performance. CPU-only runs are possible but very slow.
git clone https://github.com/ethz-vlg/mvtracker.git
cd mvtrackerTo extend the conda environment from the Quick Start to support training and evaluation, install the full requirements by running pip install -r requirements.full.txt. Baselines based on SpatialTracker V1 also require cupy:
pip install tensorflow==2.12.1 tensorflow-datasets tensorflow-graphics tensorboard
pip install cupy-cuda12x==12.2.0
python -m cupyx.tools.install_library --cuda 12.x --library cutensor
python -m cupyx.tools.install_library --cuda 12.x --library nccl
python -m cupyx.tools.install_library --cuda 12.x --library cudnnTo benchmark multi-view 3D point tracking, we provide preprocessed versions of three datasets:
- MV-Kubric: a synthetic training dataset adapted from single-view Kubric into a multi-view setting.
- Panoptic Studio: evaluation benchmark with real-world activities such as basketball, juggling, and toy play (10 sequences).
- DexYCB: evaluation benchmark with real-world hand–object interactions (10 sequences).
[Downloading our preprocessed datasets - click to expand]
You can download and extract them as (~72 GB after extraction):
# MV-Kubric (simulated + DUSt3R depths)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/kubric-multiview--test.tar.gz -P datasets/
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/kubric-multiview--test--dust3r-depth.tar.gz -P datasets/
tar -xvzf datasets/kubric-multiview--test.tar.gz -C datasets/
tar -xvzf datasets/kubric-multiview--test--dust3r-depth.tar.gz -C datasets/
rm datasets/kubric-multiview*.tar.gz
# Panoptic Studio (optimization-based depth from Dynamic3DGS)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/panoptic-multiview.tar.gz -P datasets/
tar -xvzf datasets/panoptic-multiview.tar.gz -C datasets/
rm datasets/panoptic-multiview.tar.gz
# DexYCB (Kinect + DUSt3R depths)
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/dex-ycb-multiview.tar.gz -P datasets/
wget https://huggingface.co/datasets/ethz-vlg/mv3dpt-datasets/resolve/main/dex-ycb-multiview--dust3r-depth.tar.gz -P datasets/
tar -xvzf datasets/dex-ycb-multiview.tar.gz -C datasets/
tar -xvzf datasets/dex-ycb-multiview--dust3r-depth.tar.gz -C datasets/
rm datasets/dex-ycb-multiview*.tar.gz
# $ du -sch datasets/*
# 31G kubric-multiview
# 13G panoptic-multiview
# 29G dex-ycb-multiview
# 72G total[Regenerating datasets from scratch - click to expand]
If you wish to regenerate datasets from scratch, we provide scripts with docstrings that explain usage and list the commands we used. For licensing and usage terms, please refer to the original datasets.
- MV-Kubric data for training and testing can be generated with ethz-vlg/kubric.
- DexYCB can be downloaded and labels regenerated using
scripts/dex_ycb_to_neus_format.py; note that we have created labels for 10 sequences, but DexYCB is much larger and more labels could be produced if needed. - Panoptic Studio can be downloaded and labels regenerated using
scripts/panoptic_studio_preprocessing.py. - DUSt3R depths can be produced for any dataset with
scripts/estimate_depth_with_duster.py. - For unlabeled datasets used only in qualitative experiments, we provide the following preprocessing scripts: 4D-Dress, Hi4D, EgoExo4D, and SelfCap.
For quick testing, we also release a small demo sample (~200 MB):
python demo.py --random_query_pointsOur generic loader GenericSceneDataset supports adding new datasets. It can compute depths on the fly with DUSt3R, VGGT, MonoFusion, or MoGe-2, and can also estimate camera poses with VGGT.
Evaluation is driven by Hydra configs. See mvtracker/cli/eval.py and configs/eval.yaml for details.
To evaluate MVTracker with our best model, first download the checkpoint from Hugging Face:
wget https://huggingface.co/ethz-vlg/mvtracker/resolve/main/mvtracker_200000_june2025.pth -P checkpoints/Then run:
python -m mvtracker.cli.eval \
experiment_path=logs/mvtracker \
model=mvtracker \
datasets.eval.names=[kubric-multiview-v3-views0123] \
restore_ckpt_path=checkpoints/mvtracker_200000_june2025.pth
# Expected result:
# {
# "eval_kubric-multiview-v3-views0123/model__ate_visible__dynamic-static-mean": 5.07,
# "eval_kubric-multiview-v3-views0123/model__average_jaccard__dynamic-static-mean": 81.42,
# "eval_kubric-multiview-v3-views0123/model__average_pts_within_thresh__dynamic-static-mean": 90.00
# }To evaluate a baseline, e.g. CoTracker3-Online (auto-downloaded checkpoint), run:
python -m mvtracker.cli.eval experiment_path=logs/cotracker3-online model=cotracker3_online
# Expected result:
# {
# "eval_panoptic-multiview-views1_7_14_20/model__average_jaccard__any": 74.56
# }For more baselines and dataset setups (e.g. varying camera counts, camera subsets, etc.), see scripts/slurm/eval.sh for the commands used in our experiments.
[Details on evaluation parameters - click to expand]
The evaluation datasets are specified with datasets.eval.names. Each name is parsed by the dataset from_name() factory (see e.g. DexYCBMultiViewDataset.from_name), which supports modifiers such as -views, -duster, -novelviews, -removehand, -2dpt, or -cached. This makes it easy to select subsets of cameras, enable different depth sources, or ensure deterministic track sampling. The main labeled benchmarks are:
- Kubric (synthetic) — e.g.
kubric-multiview-v3-views0123 - Panoptic Studio (real) — e.g.
panoptic-multiview-views1_7_14_20 - DexYCB (real) — e.g.
dex-ycb-multiview-views0123
For reproducibility of our main results, we also provide cached variants of each benchmark, which freeze track selection exactly as used in our paper. Without -cached, random seeding ensures reproducibility, but cached versions guarantee identical tracks across environments. The following cached variants are included in the released datasets:
kubric-multiview-v3-views0123-cachedkubric-multiview-v3-duster0123-cachedpanoptic-multiview-views1_7_14_20-cachedpanoptic-multiview-views27_16_14_8-cachedpanoptic-multiview-views1_4_7_11-cacheddex-ycb-multiview-views0123-cacheddex-ycb-multiview-duster0123-cached
To run a small overfitting test that fits into 24 GB GPU RAM:
python -m mvtracker.cli.train +experiment=mvtracker_overfit_miniFor a full-scale MVTracker on an 80 GB GPU:
python -m mvtracker.cli.train +experiment=mvtracker_overfit[Scene normalization - click to expand]
Performance depends strongly on scene normalization. MVTracker was trained on Kubric with randomized but bounded scales and camera setups. At test time, scenes with very different scales, rotations, or translations must be aligned to this distribution. Our generic loader provides an automatic normalization that assumes the ground plane is parallel to the XY plane. This automatic normalization worked reasonably well for 4D-Dress, Hi4D, EgoExo4D, and SelfCap. For Panoptic and DexYCB, we applied manual similarity transforms, which are encoded in the respective dataloaders. Robust, general-purpose normalization remains an open challenge.
[Challenges and future directions - click to expand]
The central challenge in multi-view 3D point tracking is 4D reconstruction: obtaining depth maps that are accurate, temporally consistent, and available in real time, especially under sparse-view setups. MVTracker performs well when sensor depth and camera calibration are provided, but in settings where both must be estimated, errors in reconstruction quickly make tracking unreliable. While learned motion priors help tolerate moderate noise, they cannot replace a robust reconstruction backbone. We believe progress will hinge on methods that jointly solve depth estimation and tracking for mutual refinement, or large-scale foundation models for 4D reconstruction and tracking that fully leverage data and compute. We hope the community will direct future efforts toward this goal.
Our code builds upon and was inspired by many prior works, including SpaTracker, CoTracker, and DUSt3R. We thank the authors for releasing their code and pretrained models. We are also grateful to maintainers of Rerun for their helpful visualization toolkit.
If you find our repository useful, please consider giving it a star ⭐ and citing our work:
@inproceedings{rajic2025mvtracker,
title = {Multi-View 3D Point Tracking},
author = {Raji{\v{c}}, Frano and Xu, Haofei and Mihajlovic, Marko and Li, Siyuan and Demir, Irem and G{\"u}ndo{\u{g}}du, Emircan and Ke, Lei and Prokudin, Sergey and Pollefeys, Marc and Tang, Siyu},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025}
}


