Van Nguyen Nguyen
·
Christian Forster
·
Sindi Shkodrani
·
Bugra Tekin
Vincent Lepetit
·
Cem Keskin
·
Tomas Hodan
gotrack_teaser.mp4
This is the official implementation of our work GoTrack which proposes an efficient and accurate CAD-based method for 6DoF pose refinement and tracking of unseen objects. Given a CAD model of an object, an RGB image with known intrinsics that shows the object in an unknown pose, and an initial object pose, Gotrack refines the object pose such as the 2D projection of the model aligns closely with the object’s appearance in the image.
We also incorporate two existing methods for 2D object detection and coarse pose estimation, CNOS and FoundPose (also built on top of DINOv2), respectively, to have a three-stage pose estimation pipeline. Note that the results of CNOS and FoundPose are slightly different from the original implementations, as we have simplified the setup and used consistent image resolutions and rendering settings across all three methods.
Additionally, this repo supports the fisheye camera model used in the HOT3D dataset.
Click to expand
This repository uses hydra to manage the configuration files. Please make sure that you update root_dir the user's configuration located at configs/user/default.yaml before conducting any experiments.
Download the code with the git submodules, navigate to the folder, and setup the conda environment:
git clone --recurse-submodules https://github.com/facebookresearch/gotrack
cd gotrack
conda env create -f environment.yml
bash scripts/env.sh # bop_toolkit and dinov2For pose refinement, download the BOP datasets and the predictions of coarse pose methods from the BOP Challenge (FoundPose, GigaPose):
# BOP-Classic-Core datasets
python -m scripts.download_bop_classic_core
python -m scripts.download_coarse_poses_bop_classic_core
python -m scripts.download_default_detections_bop_classic_core
# BOP-H3 datasets
python -m scripts.download_bop_h3
python -m scripts.download_default_detections_bop_h3After downloading, the datasets should be organized in the following directory structure:
bop_datasets/ # This is your $BOP_PATH
├── lmo/ # Dataset directory for LM-O
│ ├── camera.json
│ ├── dataset_info.md
│ ├── models/ # 3D models of the objects
│ ├── models_eval/ # Simplified models for evaluation
│ ├── test/ # Test images and annotations
│ └── ...
├── tudl/
├── ...
├── default_detections/
│ ├── classic_bop23_model_based_unseen/
│ └── cnos-fastsam/
│ ├── cnos-fastsam_ycbv-test_f4f2127c-6f59-447c-95b3-28e1e591f1a1.json
│ └── ...
├── coarse_poses/
│ ├── foundPose/
│ ├── foundPose_ycbv-test_f4f2127c-6f59-447c-95b3-28e1e591f1a1.json
│ ├── ...
│ ├── gigaPose/
│ ├── ...Click to expand
export DATASET_NAME=lmo
export COARSE_POSE_METHOD=foundpose # or gigaPose
python -m scripts.inference_gotrack mode=pose_refinement dataset_name=$DATASET_NAME coarse_pose_method=$COARSE_POSE_METHODSimilar to FoundPose, we report the results produced by this open-source repository and compare them with the published results. The main difference between the implementation used to get the scores reported in the paper and the open-source implementation is the renderer (for the paper we used an internal renderer):
| Dataset | Published AR | Reproduced AR |
|---|---|---|
| LMO | 56.5 | 56.2 |
| T-LESS | 50.4 | 48.5 |
| YCB-V | 63.1 | 62.9 |
Using the command below to run the three-stage pose estimation pipeline built on top of the frozen features of DINOv2: 2D object detection (CNOS), coarse pose estimation (FoundPose), and refinement (GoTrack):
export DATASET_NAME=lmo
python -m scripts.inference_pose_estimation dataset_name=$DATASET_NAME mode=localization
# For using the default detections
python -m scripts.inference_pose_estimation dataset_name=$DATASET_NAME mode=localization model.use_default_detections=trueNote that whenfast_pose_estimation=true, the pipeline retrieves only the nearest template during the coarse pose estimation stage (used in FoundPose), and use num_iterations_refinement=1 for pose refinement stage. These settings help speed up the inference time. Below are the visualization results of the pipeline on the LMO dataset:
Pose tracking using frame-to-frame flow is currently not implemented in this repository, whereas tracking using model-to-frame flow can be added by propagating the pose estimated in the previous frame to the next frame and using it as the initial pose for refinement with the current GoTrack refiner.
On HOT3D dataset (Aria):
hot3d_aria.mp4
On HOT3D dataset (Quest 3):
hot3d_quest3.mp4
On YCB-V dataset:
ycbv.mp4
If you find this work useful in your research, please cite
@article{
nguyen2025gotrack,
author = {Nguyen, Van Nguyen and Forster, Christian and Tekin, Bugra and Shkodrani, Sindi and Lepetit, Vincent and Keskin, Cem and Hoda{\v{n}}, Tom{\'a}{\v{s}}},
title = {GoTrack: Generic 6DoF Object Pose Refinement and Tracking},
journal = {Computer Vision and Patern Recognition Workshops (CVPRW)},
year = {2025},
}
Thanks to the following repositories: FoundPose, GigaPose, PFA, MegaPose, CNOS, DINOv2, BOP Toolkit, DPT, Crocov2, and Dust3r.
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
