It detects the object on the image and creates a mask using isaac_ros_rtdetr.
This mask is used by FoundationPose to start iterating on the pose estimation.
A final pose estimation is provided by FoundationPose.
AFAIK, the models used in step 1 are only valid for objects that fall under certain categories, e.g. see SyntheticaDETR or YCB. Also, the API indicates that the pose estimation node is subscribed to the /segmentation topic, which must be published by the object detection nodes.
With the previous in mind, I’d like if someone could clarify the following questions:
How can I use isaac_ros_foundationpose on a novel, custom object that does not fall into any category of the DetectNet, RT-DETR or YOLOv8 object detection models?
Is it possible to exclusively use CAD data without any retraining for this custom, novel object? As stated in the documentation: FoundationPose is designed to perform pose estimation on previously unseen objects without model retraining.
Yes a 2D object detection model has to be trained for the 3D object detection to work with FoundationPose. isaac_ros_foundationpose expects a segmentation mask as one of the inputs. In our tutorials we use synthetica_detr to 2D object detection. And convert that into a segmentation box using nvidia::isaac_ros::foundationpose::Detection2DToMask
You will have to train a 2D object detection model. Someone else from our team can get back to you on if/ how to do that with Isaac ROS.
The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.
The CAD and the a 2D object detection model/segmentation mask is required. " without model retraining" refers to without retraining the 3D object detection model,ie FoundationPose.
I think that should be way explicitly stated in at least the Isaac ROS Pose Estimation overview. I had checked quite some resources but was not able to get a clear statement on weather retraining was needed.
Here’s an example for DOPE
Thanks! Should I look into TAO to train on objects that don’t fall into the DOPE categories?