Change batched NMS threshold for-loop version#4990
Conversation
According to the benchmark link #1311 (comment), for GPU the threshold should be higher (which is why detectron2 used 40k). This PR changes the threshold to be device dependent.
💊 CI failures summary and remediationsAs of commit 51b90a7 (more details on the Dr. CI page):
1 failure not recognized by patterns:
1 job timed out:
This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the PR @ppwwyyxx
We decided to use the same threshold for CPU and GPU to keep things as simple as possible and avoid potential surprises.
Do you have a use-case where there are more than 4k boxes on a GPU?
|
LVIS has 1200 classes. So in the last NMS of a faster r-cnn trained on LVIS it's common to have 10k-30k boxes. |
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the details @ppwwyyxx . The changes LGTM but I'm wondering, did you observe a significant time improvement with the new threshold?
|
IIRC we added thresholding in detectron2 because batched_nms OOM for a specific model with too many boxes. Then we did some benchmarks and determined 40k threshold for GPU. This is similar to the conclusion in #1311. |
fmassa
left a comment
There was a problem hiding this comment.
Thanks for the PR @ppwwyyxx !
I believe this is a net improvement. I think the previous threshold was indeed not ideal for GPU, and pytorch/benchmark#547 (comment) seems to indicate that.
|
Hey @fmassa! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Summary: torchvision already has the thresholding logic. The threshold in torchvision is not good, which will be fixed in pytorch/vision#4990 Reviewed By: zhanghang1989 Differential Revision: D32661585 fbshipit-source-id: 971b68a914c2bba29dff37298c38e31b02735c1d
Summary: According to the benchmark link #1311 (comment), for GPU the threshold should be higher (which is why detectron2 used 40k). This PR changes the threshold to be device dependent. Reviewed By: NicolasHug Differential Revision: D32759198 fbshipit-source-id: df6fff4b7fed27e41d9cfeb51d17a94b4511b4e7 Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Francisco Massa <[email protected]>
According to the benchmark link #1311 (comment), for GPU the threshold should be higher (which is why detectron2 used 40k).
This PR changes the threshold to be device dependent.