PointSt3R: Point Tracking Through 3D Grounded Correspondence

Project Page | Paper | BibTeX

This is the official implementation of PointSt3R, a variant of MASt3R fine-tuned to better handle dynamic point tracking.

Environment

Our work was conducted on ARM achitecture, so the environment was created in the following way:

conda create -n pointst3r python=3.10
conda activate pointst3r
conda install pytorch-gpu -c conda-forge
conda install --no-deps conda-forge::torchvision
conda install imageio
conda install mediapy

pip install pillow-heif
pip install pyrender
pip install kapture
pip install numpy-quaternion
pip install boto3
pip install tensorflow
pip install wandb
pip install tensorboard
pip install prettytable
pip install scikit-image
pip install scikit-learn
pip install pypng

If you are not using ARM, it may be easier to follow the original MASt3R repo.

Evaluation

You can download the PointSt3R models with and without visibility, trained with 95% dynamic correspondences per batch from the following OneDrive link.

Links for downloading the evaluatation datasets used in the paper are as follows:

TAP-Vid

The evaluations for DAVIS, RoboTAP and RGB-Stacking can all be run with the following script:

python3 pointst3r_tapnet_eval.py --checkpoint=checkpoints/PointSt3R_95.pth --input_yres=384 --dataset_location=/your/path/that/contains/data --dataset_name=[davis, robo or rgb] --split=[0, 1, 2, 3, 4 or None]

Note that a split number can be defined when evaluating RoboTAP.

EgoPoints

python3 pointst3r_ego_points_eval.py --checkpoint=checkpoints/PointSt3R_95.pth --input_yres=384 --eval_folder=/your/path/that/contains/ego_points

PO Static/Dynamic Split

python3 pointst3r_po_stat_dyn_eval.py --checkpoint=checkpoints/PointSt3R_95.pth --input_yres=384 --dataset_location=/your/path/to/pointodyssey_v2/test --annots_location=/your/path/to/pointodyssey_v2/static_dynamic_test

Pstudio Minival (3D)

First generate the 3D tracks using the following command (split simply chooses which 10 of the 50 files in minival you want to generate for, as the evaluation takes a long time):

python3 pointst3r_pstudio_3d_eval.py --checkpoint=checkpoints/PointSt3R_95.pth --input_yres=288 --save_folder=pstudio_minival_results_PointSt3R_95 --dataset_location=/your/path/to/tapvid3d_datasets --split=[0, 1, 2, 3 or 4]

Then use this script to evaluate the tracks. For our evaluation we edit this script to have the following options:

Scaling = median
Thresholds = [0.1, 0.3, 0.5, 1.0]
Use Fixed Thresholds = True

Training

Download the baseline checkpoint MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.

You will need to download the following datasets:

Dataset	Download Link	Post-Processing
PointOdyssey	`MonSt3R`	N/A
Kubric	`CoTracker3 Kubric dataset`	N/A
DynamicReplica	`DynamicStereo`	`format_dr_annots.py`

To train PointSt3R without visibility, run the following:

torchrun --nproc_per_node=4 --master_port=29350 train.py \
    --train_dataset "3_334 @ PointOdysseyDUSt3R(2, 16, [10,30,50,70,90,110,130,150,170], 2, False, True, 'linear_1_2', 0, False, False, False, split='train', ROOT='/your/path/to/pointodyssey_v2', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter) + 3_333 @ CotrackerKubricDUSt3R(2, 16, [10,20,30,40,50,60,70,80,90], 2, False, True, 'linear_1_2', 0, False, split='train', ROOT='/your/path/to/CoTracker3_Kubric', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter) + 3_333 @ DynamicReplicaDUSt3R(2, 16, [10,30,50,70,90,110,130,150,170], 2, False, True, 'linear_1_2', 0, split='train', ROOT='/your/path/to/dynamic_stereo/dynamic_replica_data/train', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter)" \
    --test_dataset "1000 @ PointOdysseyDUSt3R(2, 16, [1,2,3,4,5,6,7,8,9], 2, False, True, None, 0, False, False, False, split='test', ROOT='/your/path/to/pointodyssey_v2', resolution=[(512, 384)], n_corres=1024, use_soft_negs=False, seed=777)" \
    --model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf), freeze='encoder')" \
    --train_criterion "ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean')" \
    --test_criterion "Regr3D(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288)" \
    --pretrained "checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth" \
    --lr 0.00005 --min_lr 1e-06 --warmup_epochs 3 --epochs 50 --batch_size 4 --accum_iter 1 \
    --save_freq 1 --keep_freq 1 --eval_freq 1 --print_freq=10 --disable_cudnn_benchmark \
    --output_dir "results/PointSt3R_95.pth"

To train PointSt3R with visibility, run the following:

torchrun --nproc_per_node=4 --master_port=29350 train.py \
    --train_dataset "3_334 @ PointOdysseyDUSt3R(2, 16, [10,30,50,70,90,110,130,150,170], 2, False, True, 'linear_1_2', 0, False, False, True, split='train', ROOT='/your/path/to/pointodyssey_v2', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter) + 3_333 @ CotrackerKubricVISDUSt3R(2, 16, [10,20,30,40,50,60,70,80,90], 2, False, True, 'linear_1_2', 0, False, True, split='train', ROOT='/your/path/to/CoTracker3_Kubric', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter) + 3_333 @ DynamicReplicaVISDUSt3R(2, 16, [10,30,50,70,90,110,130,150,170], 2, False, True, 'linear_1_2', 0, True, split='train', ROOT='/your/path/to/dynamic_stereo/dynamic_replica_data/train', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, use_soft_negs=False, dyn_ratio=0.95, transform=ColorJitter)" --test_dataset "1000 @ PointOdysseyDUSt3R(2, 16, [1,2,3,4,5,6,7,8,9], 2, False, True, None, 0, False, False, True, split='test', ROOT='/your/path/to/pointodyssey_v2', resolution=[(512, 384)], n_corres=1024, use_soft_negs=False, seed=777)" \
    --model "AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24+vis', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf), freeze='encoder')" \
    --train_criterion "ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean') + BalancedVisHeadLossV2(MaskCE(reduction='mean'))" \
    --test_criterion "Regr3D(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288) + BalancedVisHeadLossV2(MaskCE(reduction='mean'))" \
    --pretrained "checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth" \
    --lr 0.00005 --min_lr 1e-06 --warmup_epochs 3 --epochs 50 --batch_size 4 --accum_iter 1 \
    --save_freq 1 --keep_freq 1 --eval_freq 1 --print_freq=10 --disable_cudnn_benchmark \
    --output_dir "results/PointSt3R_95_w_vis.pth"

Citing PointSt3R

@inproceedings{
  title = {{PointSt3R}: Point Tracking through 3D Ground Correspondence},
  author={Guerrier, Rhodri and Harley, Adam W. and Damen, Dima},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
dust3r		dust3r
mast3r		mast3r
pips2_utils		pips2_utils
README.md		README.md
format_dr_annots.py		format_dr_annots.py
pointst3r_ego_points_eval.py		pointst3r_ego_points_eval.py
pointst3r_po_stat_dyn_eval.py		pointst3r_po_stat_dyn_eval.py
pointst3r_pstudio_3d_eval.py		pointst3r_pstudio_3d_eval.py
pointst3r_tapnet_eval.py		pointst3r_tapnet_eval.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PointSt3R: Point Tracking Through 3D Grounded Correspondence

Project Page | Paper | BibTeX

Environment

Evaluation

TAP-Vid

EgoPoints

PO Static/Dynamic Split

Pstudio Minival (3D)

Training

Citing PointSt3R

About

Uh oh!

Releases

Packages

Languages

rhodriguerrier/PointSt3R

Folders and files

Latest commit

History

Repository files navigation

PointSt3R: Point Tracking Through 3D Grounded Correspondence

Project Page | Paper | BibTeX

Environment

Evaluation

TAP-Vid

EgoPoints

PO Static/Dynamic Split

Pstudio Minival (3D)

Training

Citing PointSt3R

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages