Skip to content

Conversation

@chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Feb 14, 2025

This PR removes the implicit filtering-out DDS ops from running on TRT. In other words, by default, DDS nodes will be run by TRT if it supports.

Moreover, it adds new provider option trt_op_types_to_exclude:

  • User can provide op type list to be excluded from running on TRT
  • e.g. trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"

(This PR basically adds back featurethat previously being held to merge.)

[Note]
There may be potential performance issues in TRT 10 when running models that contain DDS operations such as NonMaxSuppression, NonZero, and RoiAlign (e.g., Faster-RCNN).
If user encounters significant performance degradation, we suggest specifying those DDS ops to be excluded from running by TRT, i.e. trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlign". Those DDS nodes will be run by CUDA EP or CPU.

@chilo-ms
Copy link
Contributor Author

This PR also modifies NonZero and NMS unit-tests to accommodate TRT's output.

@yf711
Copy link
Contributor

yf711 commented Feb 21, 2025

I've tested on Windows with FasterRCNN/MaskRCNN and see no perf regression on latency after enabling DDS.

@chilo-ms chilo-ms merged commit 23f787e into main Feb 21, 2025
96 of 98 checks passed
@chilo-ms chilo-ms deleted the chi/trt_ops_to_exclude branch February 21, 2025 18:24
guschmue pushed a commit that referenced this pull request Mar 6, 2025
…TRT (#23705)

This PR removes the implicit filtering-out DDS ops from running on TRT.
In other words, by default, DDS nodes will be run by TRT if it supports.

Moreover, it adds new provider option `trt_op_types_to_exclude`: 
- User can provide op type list to be excluded from running on TRT
- e.g. `trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"`

(This PR basically adds back
[feature](#22681
previously being held to merge.)


[Note] 
There may be potential performance issues in TRT 10 when running models
that contain DDS operations such as NonMaxSuppression, NonZero, and
RoiAlign (e.g., Faster-RCNN).
If user encounters significant performance degradation, we suggest
specifying those DDS ops to be excluded from running by TRT, i.e.
trt_op_types_to_exclude=\"NonMaxSuppression,NonZero,RoiAlign\". Those
DDS nodes will be run by CUDA EP or CPU.
ashrit-ms pushed a commit that referenced this pull request Mar 17, 2025
…TRT (#23705)

This PR removes the implicit filtering-out DDS ops from running on TRT.
In other words, by default, DDS nodes will be run by TRT if it supports.

Moreover, it adds new provider option `trt_op_types_to_exclude`: 
- User can provide op type list to be excluded from running on TRT
- e.g. `trt_op_types_to_exclude="NonMaxSuppression,NonZero,RoiAlignl"`

(This PR basically adds back
[feature](#22681
previously being held to merge.)


[Note] 
There may be potential performance issues in TRT 10 when running models
that contain DDS operations such as NonMaxSuppression, NonZero, and
RoiAlign (e.g., Faster-RCNN).
If user encounters significant performance degradation, we suggest
specifying those DDS ops to be excluded from running by TRT, i.e.
trt_op_types_to_exclude=\"NonMaxSuppression,NonZero,RoiAlign\". Those
DDS nodes will be run by CUDA EP or CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants