GitHub - facebookresearch/gotrack: GoTrack: Generic 6DoF Object Pose Refinement and Tracking, CV4MR 2025

GoTrack: Generic 6DoF Object Pose Refinement and Tracking

Van Nguyen Nguyen · Christian Forster · Sindi Shkodrani · Bugra Tekin
Vincent Lepetit · Cem Keskin · Tomas Hodan

gotrack_teaser.mp4

This is the official implementation of our work GoTrack which proposes an efficient and accurate CAD-based method for 6DoF pose refinement and tracking of unseen objects. Given a CAD model of an object, an RGB image with known intrinsics that shows the object in an unknown pose, and an initial object pose, Gotrack refines the object pose such as the 2D projection of the model aligns closely with the object’s appearance in the image.

We also incorporate two existing methods for 2D object detection and coarse pose estimation, CNOS and FoundPose (also built on top of DINOv2), respectively, to have a three-stage pose estimation pipeline. Note that the results of CNOS and FoundPose are slightly different from the original implementations, as we have simplified the setup and used consistent image resolutions and rendering settings across all three methods.

Additionally, this repo supports the fisheye camera model used in the HOT3D dataset.

Example results of the GoTrack refiner on LM-O, YCB-V, and T-LESS datasets. The input image is shown in the first column, and the template retrieved using the BoW-based approach from FoundPose in the second. The third and fourth columns show the predictions of our GoTrack network, which can then be used to remap pixels from the template to the input image as shown in the fifth column. The last column presents the final pose estimated by PnP-RANSAC from 2D-3D correspondences (the contour of the object model in the initial pose is shown in blue, and in the estimated pose in red). As shown in the third column, our method can reliably predict which part of the object is visible, despite never seeing the object during training.

Setup

Click to expand

This repository uses hydra to manage the configuration files. Please make sure that you update root_dir the user's configuration located at configs/user/default.yaml before conducting any experiments.

Environment

Download the code with the git submodules, navigate to the folder, and setup the conda environment:

git clone --recurse-submodules https://github.com/facebookresearch/gotrack
cd gotrack
conda env create -f environment.yml
bash scripts/env.sh # bop_toolkit and dinov2

Dataset

For pose refinement, download the BOP datasets and the predictions of coarse pose methods from the BOP Challenge (FoundPose, GigaPose):

# BOP-Classic-Core datasets
python -m scripts.download_bop_classic_core
python -m scripts.download_coarse_poses_bop_classic_core
python -m scripts.download_default_detections_bop_classic_core

# BOP-H3 datasets
python -m scripts.download_bop_h3
python -m scripts.download_default_detections_bop_h3

Directory Structure

After downloading, the datasets should be organized in the following directory structure:

bop_datasets/               # This is your $BOP_PATH
├── lmo/                    # Dataset directory for LM-O
│   ├── camera.json
│   ├── dataset_info.md
│   ├── models/             # 3D models of the objects
│   ├── models_eval/        # Simplified models for evaluation
│   ├── test/               # Test images and annotations
│   └── ...
├── tudl/
├── ...
├── default_detections/
│   ├── classic_bop23_model_based_unseen/
│       └── cnos-fastsam/
│           ├── cnos-fastsam_ycbv-test_f4f2127c-6f59-447c-95b3-28e1e591f1a1.json
│           └── ...
├── coarse_poses/
│   ├── foundPose/
│       ├── foundPose_ycbv-test_f4f2127c-6f59-447c-95b3-28e1e591f1a1.json
│       ├── ...
│   ├── gigaPose/
│       ├── ...

Using GoTrack

Click to expand

Pose refinement

export DATASET_NAME=lmo
export COARSE_POSE_METHOD=foundpose # or gigaPose
python -m scripts.inference_gotrack mode=pose_refinement dataset_name=$DATASET_NAME coarse_pose_method=$COARSE_POSE_METHOD

Similar to FoundPose, we report the results produced by this open-source repository and compare them with the published results. The main difference between the implementation used to get the scores reported in the paper and the open-source implementation is the renderer (for the paper we used an internal renderer):

Dataset	Published AR	Reproduced AR
LMO	56.5	56.2
T-LESS	50.4	48.5
YCB-V	63.1	62.9

Pose estimation (detection, coarse, refinement)

Using the command below to run the three-stage pose estimation pipeline built on top of the frozen features of DINOv2: 2D object detection (CNOS), coarse pose estimation (FoundPose), and refinement (GoTrack):

export DATASET_NAME=lmo
python -m scripts.inference_pose_estimation dataset_name=$DATASET_NAME mode=localization

# For using the default detections
python -m scripts.inference_pose_estimation dataset_name=$DATASET_NAME mode=localization model.use_default_detections=true

Note that whenfast_pose_estimation=true, the pipeline retrieves only the nearest template during the coarse pose estimation stage (used in FoundPose), and use num_iterations_refinement=1 for pose refinement stage. These settings help speed up the inference time. Below are the visualization results of the pipeline on the LMO dataset:

Pose tracking

Pose tracking using frame-to-frame flow is currently not implemented in this repository, whereas tracking using model-to-frame flow can be added by propagating the pose estimated in the previous frame to the next frame and using it as the initial pose for refinement with the current GoTrack refiner.

Qualitative tracking results

On HOT3D dataset (Aria):

hot3d_aria.mp4

On HOT3D dataset (Quest 3):

hot3d_quest3.mp4

On YCB-V dataset:

ycbv.mp4

Acknowledgements

If you find this work useful in your research, please cite

@article{
  nguyen2025gotrack,
  author    = {Nguyen, Van Nguyen and Forster, Christian and Tekin, Bugra and Shkodrani, Sindi and Lepetit, Vincent and Keskin, Cem and Hoda{\v{n}}, Tom{\'a}{\v{s}}},
  title     = {GoTrack: Generic 6DoF Object Pose Refinement and Tracking},
  journal   = {Computer Vision and Patern Recognition Workshops (CVPRW)},
  year      = {2025},
}

Thanks to the following repositories: FoundPose, GigaPose, PFA, MegaPose, CNOS, DINOv2, BOP Toolkit, DPT, Crocov2, and Dust3r.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
configs		configs
dataloader		dataloader
external		external
media		media
model		model
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
gotrack_checkpoint.pt		gotrack_checkpoint.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GoTrack: Generic 6DoF Object Pose Refinement and Tracking

Table of Contents

Setup

Environment

Dataset

Directory Structure

Using GoTrack

Pose refinement

Pose estimation (detection, coarse, refinement)

Pose tracking

Qualitative tracking results

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

facebookresearch/gotrack

Folders and files

Latest commit

History

Repository files navigation

GoTrack: Generic 6DoF Object Pose Refinement and Tracking

Table of Contents

Setup

Environment

Dataset

Directory Structure

Using GoTrack

Pose refinement

Pose estimation (detection, coarse, refinement)

Pose tracking

Qualitative tracking results

Acknowledgements

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages