[None][fix] Reuse batch_indices_cuda across CUDA graph captures in EAGLE3 by achartier · Pull Request #14381 · NVIDIA/TensorRT-LLM

achartier · 2026-05-21T03:58:16Z

Summary by CodeRabbit

Chores
- Optimized CUDA memory allocation for speculative execution resource management to improve resource handling and reusability.

Description

Reuse batch_indices_cuda across CUDA graph captures in EAGLE3

This applies the same fix as #13920 to batch_indices_cuda

Test Coverage

Manually validates

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-21T04:01:37Z

📝 Walkthrough

Walkthrough

This PR threads a new CUDA-allocated batch_indices_cuda tensor through EAGLE3's speculative decoding resource management hierarchy. The tensor is allocated in the base Eagle3ResourceManager, optionally stored in Eagle3OneModelDynamicTreeResourceManager, and conditionally reused or independently allocated in Eagle3OneModelSpecMetadata with a size assertion.

Changes

EAGLE3 batch_indices_cuda Resource Management

Layer / File(s)	Summary
batch_indices_cuda allocation and threading `tensorrt_llm/_torch/speculative/eagle3.py`	`Eagle3ResourceManager` allocates `batch_indices_cuda` as a CUDA int tensor; `Eagle3OneModelDynamicTreeResourceManager` adds an optional `batch_indices_cuda` attribute initialized in its constructor; `Eagle3OneModelSpecMetadata.__post_init__` prefers reusing `spec_resource_manager.batch_indices_cuda` (with shape assertion) but falls back to allocating its own when unavailable.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

hyukn

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: reusing batch_indices_cuda across CUDA graph captures in EAGLE3, following the template format with ticket/issue ID and type.
Description check	✅ Passed	The description includes a brief explanation of the fix and references the related PR, with checklist items marked complete. However, the Test Coverage section only states 'Manually validates' without specific test details.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/speculative/eagle3.py (1)
67-71: 💤 Low value

Optional: Consider removing trailing commas for consistency.

The new torch.empty(...) allocations use trailing commas (after device='cuda',), but existing allocations in this file omit them (e.g., lines 66, 221-226, 401). For consistency with the established style in this file, consider removing the trailing commas.
Minor style adjustment
 self.batch_indices_cuda = torch.empty(
     [max_num_requests],
     dtype=torch.int,
-    device='cuda',
+    device='cuda'
 )
Apply the same change to the other two allocations at lines 143-147 and 411-415.
Also applies to: 143-147, 411-415
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/speculative/eagle3.py` around lines 67 - 71, Remove the
trailing commas from the torch.empty(...) allocation calls for consistency with
the file's style: update the call that assigns self.batch_indices_cuda
(currently torch.empty([max_num_requests], dtype=torch.int, device='cuda',)) to
remove the comma after device='cuda', and make the same change to the two other
torch.empty allocations referenced in the comment (the other torch.empty calls
in this file) so none of them end with a trailing comma after the final
argument.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tensorrt_llm/_torch/speculative/eagle3.py`:
- Around line 67-71: Remove the trailing commas from the torch.empty(...)
allocation calls for consistency with the file's style: update the call that
assigns self.batch_indices_cuda (currently torch.empty([max_num_requests],
dtype=torch.int, device='cuda',)) to remove the comma after device='cuda', and
make the same change to the two other torch.empty allocations referenced in the
comment (the other torch.empty calls in this file) so none of them end with a
trailing comma after the final argument.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 988561ba-9730-405c-b150-7b6aabd73ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 67b0654 and 0099d6c.

📒 Files selected for processing (1)

tensorrt_llm/_torch/speculative/eagle3.py

achartier · 2026-05-21T04:04:16Z

/bot run

tensorrt-cicd · 2026-05-21T04:10:11Z

PR_Github #49579 [ run ] triggered by Bot. Commit: 0099d6c Link to invocation

tensorrt-cicd · 2026-05-21T08:13:41Z

PR_Github #49579 [ run ] completed with state SUCCESS. Commit: 0099d6c
/LLM/main/L0_MergeRequest_PR pipeline #39204 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

achartier · 2026-05-21T21:25:12Z

/bot run

tensorrt-cicd · 2026-05-21T21:32:48Z

PR_Github #49770 [ run ] triggered by Bot. Commit: 0099d6c Link to invocation

tensorrt-cicd · 2026-05-21T23:46:31Z

PR_Github #49770 [ run ] completed with state SUCCESS. Commit: 0099d6c
/LLM/main/L0_MergeRequest_PR pipeline #39368 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

achartier · 2026-05-22T21:16:26Z

/bot run

tensorrt-cicd · 2026-05-22T21:23:14Z

PR_Github #49989 [ run ] triggered by Bot. Commit: 0099d6c Link to invocation

tensorrt-cicd · 2026-05-23T00:10:13Z

PR_Github #49989 [ run ] completed with state SUCCESS. Commit: 0099d6c
/LLM/main/L0_MergeRequest_PR pipeline #39554 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

achartier · 2026-05-26T15:16:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-26T15:22:49Z

PR_Github #50358 [ run ] triggered by Bot. Commit: 13f3798 Link to invocation

tensorrt-cicd · 2026-05-27T15:23:36Z

PR_Github #50358 [ run ] completed with state ABORTED. Commit: 13f3798

Link to invocation

… in Eagle3 Share a single batch_indices_cuda buffer from the resource manager across all CUDA graph metadata copies, mirroring the existing hidden_states dedup pattern (PR NVIDIA#13920). Previously each of the 34 CUDA graph variants allocated its own torch.empty([max_num_requests]) tensor in __post_init__. Since only one graph executes at a time and the buffer is overwritten via [:num_seqs].copy_() before each use, sharing is safe. Validated on H100 with LLaMA-3.1-8B + Eagle3: - Baseline: 34 unique batch_indices_cuda tensors - Fixed: 1 shared tensor, identical inference outputs Signed-off-by: Aurelien Chartier <[email protected]>

achartier · 2026-05-28T17:02:07Z

/bot run

tensorrt-cicd · 2026-05-28T17:09:04Z

PR_Github #50840 [ run ] triggered by Bot. Commit: 542e27d Link to invocation

tensorrt-cicd · 2026-05-28T17:48:15Z

PR_Github #50840 [ run ] completed with state FAILURE. Commit: 542e27d
/LLM/main/L0_MergeRequest_PR pipeline #40311 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

achartier · 2026-05-28T17:51:09Z

/bot run

tensorrt-cicd · 2026-05-28T17:57:10Z

PR_Github #50853 [ run ] triggered by Bot. Commit: 542e27d Link to invocation

tensorrt-cicd · 2026-05-28T18:42:04Z

PR_Github #50853 [ run ] completed with state FAILURE. Commit: 542e27d
/LLM/main/L0_MergeRequest_PR pipeline #40320 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

achartier · 2026-05-28T19:01:03Z

/bot run

tensorrt-cicd · 2026-05-28T19:13:15Z

PR_Github #50862 [ run ] triggered by Bot. Commit: 542e27d Link to invocation

tensorrt-cicd · 2026-05-28T19:39:43Z

PR_Github #50862 [ run ] completed with state FAILURE. Commit: 542e27d
/LLM/main/L0_MergeRequest_PR pipeline #40330 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

achartier · 2026-05-28T20:34:59Z

/bot run

tensorrt-cicd · 2026-05-28T20:41:54Z

PR_Github #50881 [ run ] triggered by Bot. Commit: 542e27d Link to invocation

tensorrt-cicd · 2026-05-29T02:44:03Z

PR_Github #50881 [ run ] completed with state SUCCESS. Commit: 542e27d
/LLM/main/L0_MergeRequest_PR pipeline #40348 completed with status: 'SUCCESS'

CI Report

Link to invocation

achartier requested a review from a team as a code owner May 21, 2026 03:58

achartier requested a review from zheyuf May 21, 2026 03:58

github-actions Bot assigned achartier May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

zheyuf approved these changes May 21, 2026

View reviewed changes

achartier force-pushed the dedup-batch-indices-cuda branch from 0099d6c to 13f3798 Compare May 26, 2026 15:12

achartier force-pushed the dedup-batch-indices-cuda branch from 13f3798 to 542e27d Compare May 28, 2026 17:01

achartier merged commit 8cdde83 into NVIDIA:main May 29, 2026
7 checks passed

achartier deleted the dedup-batch-indices-cuda branch June 5, 2026 18:43

Conversation

achartier commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 21, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

achartier commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

achartier commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

tensorrt-cicd commented May 21, 2026

Uh oh!

achartier commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 23, 2026

Uh oh!

achartier commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

achartier commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

achartier commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

achartier commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

achartier commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 28, 2026

Uh oh!

tensorrt-cicd commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

achartier commented May 21, 2026 •

edited by coderabbitai Bot

Loading