-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Optimize RetinaNet inference time #2799
Copy link
Copy link
Closed
Description
🚀 Feature
The postprocessing step in RetinaNet is slow, and the whole inference time for RetinaNet is almost twice slower than Faster R-CNN as of today.
In particular,
vision/torchvision/models/detection/retinanet.py
Lines 442 to 471 in 5bb81c8
| for class_index in range(num_classes): | |
| # remove low scoring boxes | |
| inds = torch.gt(scores_per_image[:, class_index], self.score_thresh) | |
| boxes_per_class, scores_per_class, labels_per_class = \ | |
| boxes_per_image[inds], scores_per_image[inds, class_index], labels_per_image[inds, class_index] | |
| other_outputs_per_class = [(k, v[inds]) for k, v in other_outputs_per_image] | |
| # remove empty boxes | |
| keep = box_ops.remove_small_boxes(boxes_per_class, min_size=1e-2) | |
| boxes_per_class, scores_per_class, labels_per_class = \ | |
| boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep] | |
| other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class] | |
| # non-maximum suppression, independently done per class | |
| keep = box_ops.nms(boxes_per_class, scores_per_class, self.nms_thresh) | |
| # keep only topk scoring predictions | |
| keep = keep[:self.detections_per_img] | |
| boxes_per_class, scores_per_class, labels_per_class = \ | |
| boxes_per_class[keep], scores_per_class[keep], labels_per_class[keep] | |
| other_outputs_per_class = [(k, v[keep]) for k, v in other_outputs_per_class] | |
| image_boxes.append(boxes_per_class) | |
| image_scores.append(scores_per_class) | |
| image_labels.append(labels_per_class) | |
| for k, v in other_outputs_per_class: | |
| if k not in image_other_outputs: | |
| image_other_outputs[k] = [] | |
| image_other_outputs[k].append(v) |
For reference, Detectron2 has sped up inference on RetinaNet a few times already, with latest optimization present in facebookresearch/detectron2@8999946 , and also batch inference over classes (and only does a for loop on the number of feature maps, which is much smaller than the number of COCO classes)
Reactions are currently unavailable