-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add cuda graph implementation for NV TRT RTX EP #25787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cuda graph implementation for NV TRT RTX EP #25787
Conversation
…there is no impact on perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive CUDA Graph support to the NV TensorRT RTX Execution Provider to improve inference performance by reducing per-kernel launch overhead and enabling better GPU throughput for repeated inference runs.
- Implements graph annotation ID-based CUDA graph management for multi-graph support
- Adds automatic detection and disabling of CUDA graphs for unsupported scenarios (shape tensors)
- Refactors stream management to support both user-provided and internally created streams
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nv_execution_provider.h | Adds CUDA graph method declarations and per-thread context data structures |
| nv_execution_provider.cc | Implements core CUDA graph logic with capture/replay functionality and stream management |
| cuda_graph.h | Adds overloaded Replay method signature for sync flag support |
| cuda_graph.cc | Implements sync flag support in CUDA graph replay functionality |
| nv_provider_options.h | Updates CUDA graph enable option name for consistency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
onnxruntime/core/providers/nv_tensorrt_rtx/nv_execution_provider.cc
Outdated
Show resolved
Hide resolved
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
@jywu-msft can you mark this for 1.23 release? |
08ad1d7 to
49a8b7f
Compare
FIxed |
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
aa82e20 to
a376b3c
Compare
|
Please help resolve the conflicts |
4f3aa35 to
c4fd393
Compare
|
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
|
Azure Pipelines successfully started running 5 pipeline(s). |
|
open and reopen for the specific CI to run |
|
@microsoft-github-policy-service agree company="NVIDIA" |
### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]>
- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866
|
The change is added to the release branch |
### Description This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP). ### Motivation and Context Integrating CUDA Graphs into the NV TRT RTX EP provides: Lower latency by minimizing per-kernel launch overhead. Better throughput for repeated inference runs. Improved efficiency on GPUs with high kernel launches overhead sensitivity. --------- Co-authored-by: Maximilian Mueller <[email protected]> Co-authored-by: Gaurav Garg <[email protected]>
Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP).
Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead sensitivity.