Hi,
I am using the isaac_ros_yolov8 package for real-time inference on a video stream using YOLOv8. I have noticed a significant drop in the frame rate of ROS2 images (22 FPS) when using this node, although the trtexec tool shows a throughput of 45 FPS on the Jetson Orin NX.
Upon further investigation, I found that the problem is caused by the isaac_ros_yolov8 decoder node using the cv::dnn::NMSBoxes function, which is extremely slow (perhaps because OpenCV is not built with CUDA support in the Isaac ROS Humble Docker image).
https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_object_detection/blob/main/isaac_ros_yolov8/src/yolov8_decoder_node.cpp#L108
I also discovered that Ultralytics can export the non-maximum suppression (NMS) operation inside the ONNX model, and TensorRT has a plugin support for NMS.
I have two questions:
-
Is it possible to export the YOLOv8 model with NMS and use it with the isaac_ros_yolov8 package? If yes, what would the decoder implementation look like?
-
Is it possible to use a GPU-accelerated NMS implementation instead of the OpenCV NMSBoxes method?
Thank you.