Skip to content

Conversation

@umangb-09
Copy link
Contributor

Description

This change adds CUDA Graph support to the NV TensorRT RTX Execution Provider (EP).

Motivation and Context

Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead sensitivity.

@jywu-msft jywu-msft requested a review from Copilot August 19, 2025 14:46
@jywu-msft jywu-msft added the ep:NvRTX NV RTX execution provider label Aug 19, 2025
@jywu-msft jywu-msft requested a review from chilo-ms August 19, 2025 14:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive CUDA Graph support to the NV TensorRT RTX Execution Provider to improve inference performance by reducing per-kernel launch overhead and enabling better GPU throughput for repeated inference runs.

  • Implements graph annotation ID-based CUDA graph management for multi-graph support
  • Adds automatic detection and disabling of CUDA graphs for unsupported scenarios (shape tensors)
  • Refactors stream management to support both user-provided and internally created streams

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
nv_execution_provider.h Adds CUDA graph method declarations and per-thread context data structures
nv_execution_provider.cc Implements core CUDA graph logic with capture/replay functionality and stream management
cuda_graph.h Adds overloaded Replay method signature for sync flag support
cuda_graph.cc Implements sync flag support in CUDA graph replay functionality
nv_provider_options.h Updates CUDA graph enable option name for consistency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@umangb-09
Copy link
Contributor Author

@jywu-msft can you mark this for 1.23 release?

@umangb-09 umangb-09 force-pushed the umang/cuda_graph_msr_main branch from 08ad1d7 to 49a8b7f Compare August 22, 2025 05:57
@umangb-09
Copy link
Contributor Author

please fix ClangFormat errors and also resolve conflicts

FIxed

@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@umangb-09 umangb-09 force-pushed the umang/cuda_graph_msr_main branch from aa82e20 to a376b3c Compare August 25, 2025 08:36
@chilo-ms
Copy link
Contributor

Please help resolve the conflicts

@umangb-09 umangb-09 force-pushed the umang/cuda_graph_msr_main branch from 4f3aa35 to c4fd393 Compare August 26, 2025 03:54
@chilo-ms
Copy link
Contributor

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@chilo-ms chilo-ms closed this Aug 26, 2025
@chilo-ms chilo-ms reopened this Aug 26, 2025
@chilo-ms
Copy link
Contributor

open and reopen for the specific CI to run

@umangb-09
Copy link
Contributor Author

umangb-09 commented Aug 27, 2025

@microsoft-github-policy-service agree company="NVIDIA"

@chilo-ms chilo-ms merged commit 16ae99e into microsoft:main Aug 27, 2025
157 checks passed
snnn pushed a commit that referenced this pull request Aug 28, 2025
### Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution
Provider (EP).

### Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead
sensitivity.

---------

Co-authored-by: Maximilian Mueller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
snnn pushed a commit that referenced this pull request Aug 29, 2025
- **Relax WeightBiasQuantization constraint for larger QDQ node group
(#25673)**
- **Add cuda graph implementation for NV TRT RTX EP (#25787)**
- **python GPU IO Bindings for NVIDIA  (#25776)**
- **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)**
- **Fix a long standing bug on file memory mapping on windows.
(#25833)**
- **Add API for precompiled model compatibility check using just the
compat info (#25841)**
- **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for
mobile build (#25849)**
- **Add default constructor to Ort::Status. (#25860)**
- #25871
- #25878
- #25884
- #25886
- #25866
@snnn
Copy link
Contributor

snnn commented Aug 30, 2025

The change is added to the release branch

gedoensmax added a commit to gedoensmax/onnxruntime that referenced this pull request Sep 2, 2025
### Description
This change adds CUDA Graph support to the NV TensorRT RTX Execution
Provider (EP).

### Motivation and Context
Integrating CUDA Graphs into the NV TRT RTX EP provides:
Lower latency by minimizing per-kernel launch overhead.
Better throughput for repeated inference runs.
Improved efficiency on GPUs with high kernel launches overhead
sensitivity.

---------

Co-authored-by: Maximilian Mueller <[email protected]>
Co-authored-by: Gaurav Garg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:NvRTX NV RTX execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants