Skip to content

[https://nvbugs/6001694][fix] Add CUDA profiler API scoping for visual gen nsys profiling#12432

Merged
chang-l merged 5 commits into
NVIDIA:mainfrom
chang-l:fix/visgen-nsys-profiling-6001694
May 5, 2026
Merged

[https://nvbugs/6001694][fix] Add CUDA profiler API scoping for visual gen nsys profiling#12432
chang-l merged 5 commits into
NVIDIA:mainfrom
chang-l:fix/visgen-nsys-profiling-6001694

Conversation

@chang-l
Copy link
Copy Markdown
Collaborator

@chang-l chang-l commented Mar 22, 2026

Summary

  • Add TLLM_PROFILE_VISUAL_GEN_START_STOP env var support in visual gen BasePipeline. Same scoping pattern as the LLM path (TLLM_PROFILE_START_STOP) but visual-gen-specific so the two backends can diverge cleanly. Use with nsys profile -c cudaProfilerApi ... to scope CUPTI instrumentation via cudaProfilerStart/cudaProfilerStop.
  • Without scoping, sustained CUPTI instrumentation through warmup (79K+ kernel launches for a 14B model) causes CUDA "Command Buffer Full" overhead to accumulate, eventually degrading kernel activity tracing. GPU kernel events are silently dropped during generation while CPU NVTX markers continue — creating the misleading nsys profile reported in NVBug 6001694.
  • Step indices in the A-B form are per-request (each denoise() call resets the counter to 0). This differs from the LLM path's global executor iteration counter, where one forward pass services all in-flight requests.
  • New keyword modes:
    • predenoise — single-shot capture of per-request pre-loop work inside denoise() (CFG config setup, scheduler refresh, TeaCache reset) up to the first denoise step.
    • postdenoise — single-shot capture from end of last denoise step to pipeline cleanup, covering VAE decode.

Usage

# Profile full generation (denoise + VAE decode), skip warmup:
TLLM_PROFILE_VISUAL_GEN_START_STOP=all nsys profile -c cudaProfilerApi \
    --capture-range-end=stop-shutdown \
    -w true -t cuda,nvtx,osrt,cudnn,cublas \
    python examples/visual_gen/visual_gen_wan_t2v.py ...

# Profile specific denoise steps only (single request):
TLLM_PROFILE_VISUAL_GEN_START_STOP=0-4 nsys profile -c cudaProfilerApi \
    --capture-range-end=stop \
    -w true -t cuda,nvtx,osrt,cudnn,cublas \
    python examples/visual_gen/visual_gen_wan_t2v.py ...

# Profile only the per-request pre-loop work (CFG setup, scheduler refresh, ...):
TLLM_PROFILE_VISUAL_GEN_START_STOP=predenoise nsys profile -c cudaProfilerApi \
    --capture-range-end=stop \
    -w true -t cuda,nvtx,osrt,cudnn,cublas \
    trtllm-serve <model> --extra_visual_gen_options <config>.yml

# Profile only VAE decode (post-denoise):
TLLM_PROFILE_VISUAL_GEN_START_STOP=postdenoise nsys profile -c cudaProfilerApi \
    --capture-range-end=stop-shutdown \
    -w true -t cuda,nvtx,osrt,cudnn,cublas \
    trtllm-serve <model> --extra_visual_gen_options <config>.yml

# Multi-request capture (e.g. trtllm-serve) — produces one .nsys-rep
# per request, up to N. `stop` / `stop-shutdown` only retain request 1;
# `repeat:N` is required to capture multiple requests:
TLLM_PROFILE_VISUAL_GEN_START_STOP=0-4 nsys profile -c cudaProfilerApi \
    --capture-range-end=repeat:N \
    -w true -t cuda,nvtx,osrt,cudnn,cublas \
    trtllm-serve <model> --extra_visual_gen_options <config>.yml

Test plan

  • Verified TLLM_PROFILE_VISUAL_GEN_START_STOP=0-1: captures only denoise steps 0-1 (5,154 kernels)
  • Verified TLLM_PROFILE_VISUAL_GEN_START_STOP=all: captures full generation including VAE decode (18,117 kernels)
  • Verified unset env var with plain nsys profile: no behavioral change (backward compatible)
  • Verified 4-GPU (cfg=2, ulysses=2) with Wan2.2-T2V-A14B: all steps captured on all ranks (102,214 kernels)
  • All tests on B200 with Wan2.2-T2V-A14B-Diffusers
  • Verified multi-request under trtllm-serve with --capture-range-end=repeat:5 on Wan2.1-T2V-1.3B (B200): 3 sequential POSTs to /v1/videos/generations produced 3 separate .nsys-rep files, each containing exactly the configured denoise_step 0/1/2 NVTX ranges — confirming step indices reset per request and repeat:N is the correct flag for serve-style workloads.
  • Verified predenoise mode under trtllm-serve on Wan2.1-T2V-1.3B (B200): request 200 OK, captured .nsys-rep contains the per-request pre-loop CUDA work (CFG-embed CatArrayBatchedCopy) and 0 denoise_step NVTX ranges.
  • Verified postdenoise mode under trtllm-serve on Wan2.1-T2V-1.3B (B200): request 200 OK, captured .nsys-rep contains 1 _decode_latents NVTX range (~0.78s) plus cuDNN VAE convolution kernels and 0 denoise_step NVTX ranges.
  • Verified the literal Usage commands above on Wan2.1-T2V-1.3B (B200, 480x832x33):
    • all + stop-shutdown + python visual_gen_wan_t2v.py + -t cuda,nvtx,osrt,cudnn,cublas: .nsys-rep contains 50 _scheduler_step instances, 50 unique denoise_step N indices, and _decode_latents (~376 ms VAE).
    • 0-4 + stop + python visual_gen_wan_t2v.py + -t cuda,nvtx,osrt,cudnn,cublas: script exits 0; .nsys-rep contains exactly 5 _scheduler_step instances and denoise_step 0/1/2/3/4 (no other indices).
    • 0-4 + repeat:5 + trtllm-serve + -t cuda,nvtx,osrt,cudnn,cublas with 3 sequential POSTs (44s/35s/21s, all 200 OK): produced 3 separate .nsys-rep files; each contains exactly denoise_step 0/1/2/3/4, confirming per-request scoping for the literal recipe.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added profiling capabilities to visual generation with support for step-accurate measurement and configurable profiling ranges.
    • Integrated CUDA profiler for detailed performance analysis during inference.
  • Chores

    • Enhanced profiling state management and lifecycle handling in generation workflows.

@chang-l chang-l requested a review from a team as a code owner March 22, 2026 19:07
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

Adds CUDA profiling capabilities to the visual generation pipeline by introducing profiler state management, parsing environment variable configuration for profiling ranges, and instrumenting the warmup, denoise, decode, and cleanup phases to conditionally record performance metrics.

Changes

Cohort / File(s) Summary
CUDA Profiling Instrumentation
tensorrt_llm/_torch/visual_gen/pipeline.py
Added _parse_profile_range() helper to parse profiling configuration from environment variables. Extended BasePipeline with profiler state tracking (_is_warmup, _profile_range, _profiling_active) and profiler control methods (_cuda_profiler_start(), _cuda_profiler_stop()). Modified warmup(), denoise(), decode_latents(), and cleanup() to conditionally start/stop CUDA profiling based on configuration and step index, excluding warmup phases from profiler scope.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding CUDA profiler API scoping for visual generation nsys profiling, which matches the PR's core objective.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering the rationale, implementation details, usage examples, and extensive test verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
tensorrt_llm/_torch/visual_gen/pipeline.py (3)

28-30: Replace en-dash characters with standard hyphens in docstring.

Static analysis flagged ambiguous "–" (en-dash) characters. Use "-" (hyphen-minus) for consistency and to avoid potential issues with ASCII-only tooling.

🔧 Proposed fix
-    * ``A-B``  – profile denoise steps A through B (same ``A-B`` format as LLM path)
-    * ``all``  – profile the full generation forward (denoise + VAE), skip warmup
-    * (unset)  – no profiler API calls; plain ``nsys profile`` captures everything
+    * ``A-B``  - profile denoise steps A through B (same ``A-B`` format as LLM path)
+    * ``all``  - profile the full generation forward (denoise + VAE), skip warmup
+    * (unset)  - no profiler API calls; plain ``nsys profile`` captures everything
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/pipeline.py` around lines 28 - 30, Docstring
uses en-dash characters (–) in the bullet list which static analysis flagged;
replace them with standard hyphen-minus (-) characters. Edit the docstring in
tensorrt_llm._torch.visual_gen.pipeline (the multiline comment containing the
"A-B", "all" and "(unset)" bullets) and change each "–" to "-" so entries read
"* ``A-B``  - profile...", "* ``all``  - profile...", "* (unset)  - no
profiler...". Ensure only the punctuation characters are replaced and
formatting/markdown backticks remain unchanged.

41-46: Consider adding error handling for malformed input.

If the user provides a malformed value like "1-2-3" or "foo", the current code will raise an unhandled ValueError. This could be confusing for users. Consider adding validation with a clearer error message, or documenting the expected failure behavior.

🛡️ Optional: Add error handling
     # A-B format (same parser as LLM path)
     if "-" in val:
-        start, stop = val.split("-")
-        return int(start), int(stop)
+        parts = val.split("-")
+        if len(parts) != 2:
+            raise ValueError(
+                f"Invalid TLLM_PROFILE_START_STOP format: '{val}'. "
+                "Expected 'A-B', 'all', or a single step number."
+            )
+        try:
+            return int(parts[0]), int(parts[1])
+        except ValueError:
+            raise ValueError(
+                f"Invalid TLLM_PROFILE_START_STOP range: '{val}'. "
+                "Start and stop must be integers."
+            )
     # Single step
-    v = int(val)
-    return v, v
+    try:
+        v = int(val)
+        return v, v
+    except ValueError:
+        raise ValueError(
+            f"Invalid TLLM_PROFILE_START_STOP value: '{val}'. "
+            "Expected 'A-B', 'all', or a single step number."
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/pipeline.py` around lines 41 - 46, The parsing
of the range string in the block that reads the variable val currently assumes
either "start-stop" or a single integer and will raise an unhandled ValueError
for inputs like "1-2-3" or "foo"; update this code to validate the format and
provide a clear error when malformed: if "-" in val, split and ensure there are
exactly two parts, attempt to cast both parts to int inside a try/except and
raise a ValueError with a descriptive message if casting fails or parts count !=
2; for the single-value branch, wrap int(val) in try/except and raise a clear
ValueError on failure. Ensure you keep the same return semantics (start, stop)
on success and reference the same variable names (val, start, stop) so callers
remain unchanged.

535-545: Minor style improvements suggested by static analysis.

  1. Line 543: Loop variable name is unused - rename to _name
  2. Lines 535, 545: Use tuple unpacking instead of concatenation
♻️ Proposed style fixes
         if extra_latents:
             extra_results = []
-            for name, (extra_latent, extra_decode_fn) in extra_latents.items():
+            for _name, (extra_latent, extra_decode_fn) in extra_latents.items():
                 extra_results.append(extra_decode_fn(extra_latent))
-            result = (primary_result,) + tuple(extra_results)
+            result = (primary_result, *extra_results)

And similarly for line 535:

             if extra_latents:
                 extra_results = [efn(elat) for _, (elat, efn) in extra_latents.items()]
-                result = (primary_result,) + tuple(extra_results)
+                result = (primary_result, *extra_results)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/visual_gen/pipeline.py` around lines 535 - 545, Rename
the unused loop variable name to _name in the extra_latents loop inside the
branch where self.rank == 0, and replace tuple concatenation usages with tuple
unpacking for clarity: when collecting extra_results (the block that calls
extra_decode_fn on extra_latent) change the loop to for _name, (extra_latent,
extra_decode_fn) in extra_latents.items(), and where the code currently builds
result using (primary_result,) + tuple(extra_results) (and the symmetric case
above), use (primary_result, *extra_results) instead; keep the else case
returning primary_result unchanged. Ensure references are to decode_fn,
extra_latents, extra_results, primary_result, and result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/_torch/visual_gen/pipeline.py`:
- Around line 28-30: Docstring uses en-dash characters (–) in the bullet list
which static analysis flagged; replace them with standard hyphen-minus (-)
characters. Edit the docstring in tensorrt_llm._torch.visual_gen.pipeline (the
multiline comment containing the "A-B", "all" and "(unset)" bullets) and change
each "–" to "-" so entries read "* ``A-B``  - profile...", "* ``all``  -
profile...", "* (unset)  - no profiler...". Ensure only the punctuation
characters are replaced and formatting/markdown backticks remain unchanged.
- Around line 41-46: The parsing of the range string in the block that reads the
variable val currently assumes either "start-stop" or a single integer and will
raise an unhandled ValueError for inputs like "1-2-3" or "foo"; update this code
to validate the format and provide a clear error when malformed: if "-" in val,
split and ensure there are exactly two parts, attempt to cast both parts to int
inside a try/except and raise a ValueError with a descriptive message if casting
fails or parts count != 2; for the single-value branch, wrap int(val) in
try/except and raise a clear ValueError on failure. Ensure you keep the same
return semantics (start, stop) on success and reference the same variable names
(val, start, stop) so callers remain unchanged.
- Around line 535-545: Rename the unused loop variable name to _name in the
extra_latents loop inside the branch where self.rank == 0, and replace tuple
concatenation usages with tuple unpacking for clarity: when collecting
extra_results (the block that calls extra_decode_fn on extra_latent) change the
loop to for _name, (extra_latent, extra_decode_fn) in extra_latents.items(), and
where the code currently builds result using (primary_result,) +
tuple(extra_results) (and the symmetric case above), use (primary_result,
*extra_results) instead; keep the else case returning primary_result unchanged.
Ensure references are to decode_fn, extra_latents, extra_results,
primary_result, and result.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4170482-e0ff-426b-aab9-771c10375016

📥 Commits

Reviewing files that changed from the base of the PR and between 3353334 and 3d74db2.

📒 Files selected for processing (1)
  • tensorrt_llm/_torch/visual_gen/pipeline.py

@chang-l chang-l force-pushed the fix/visgen-nsys-profiling-6001694 branch from 3d74db2 to 036b805 Compare March 22, 2026 19:12
@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Mar 22, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39842 [ run ] triggered by Bot. Commit: 036b805 Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Mar 22, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39842 [ run ] completed with state FAILURE. Commit: 036b805
/LLM/main/L0_MergeRequest_PR pipeline #31017 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39846 [ run ] triggered by Bot. Commit: 036b805 Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Mar 23, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39857 [ run ] triggered by Bot. Commit: 036b805 Link to invocation

@chang-l chang-l requested review from NVShreyas, zhengd-nv and zhenhuaw-me and removed request for zhengd-nv March 23, 2026 03:23
@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Mar 23, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39893 [ run ] triggered by Bot. Commit: 036b805 Link to invocation

Comment thread tensorrt_llm/_torch/visual_gen/pipeline.py Outdated
Comment thread tensorrt_llm/_torch/visual_gen/pipeline.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39893 [ run ] completed with state SUCCESS. Commit: 036b805
/LLM/main/L0_MergeRequest_PR pipeline #31062 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chang-l chang-l force-pushed the fix/visgen-nsys-profiling-6001694 branch from ccc9333 to d2414ab Compare March 30, 2026 18:30
@chang-l chang-l requested a review from a team as a code owner April 30, 2026 18:21
chang-l added 4 commits April 30, 2026 11:47
…l gen nsys profiling

Add TLLM_PROFILE_START_STOP support to visual gen pipelines, following the
same pattern as the LLM path (PyExecutor). This enables proper nsys GPU
kernel capture during generation by scoping CUPTI instrumentation via
cudaProfilerStart/Stop.

Root cause: Without scoping, nsys/CUPTI continuously instruments the full
pipeline including warmup (79K+ kernel launches for a 14B model). The
sustained CUPTI overhead causes CUDA "Command Buffer Full" events to
accumulate, eventually degrading CUPTI's kernel activity tracing. By the
time actual generation starts, GPU kernel events are silently dropped
while CPU NVTX markers continue to work — creating the misleading profile
reported in the bug.

The fix uses the existing TLLM_PROFILE_START_STOP env var with
nsys -c cudaProfilerApi to start CUPTI fresh at generation time:

  TLLM_PROFILE_START_STOP=all nsys profile -c cudaProfilerApi \
      --capture-range-end=stop-shutdown ...
    → captures denoise + VAE decode, skips warmup

  TLLM_PROFILE_START_STOP=0-4 nsys profile -c cudaProfilerApi \
      --capture-range-end=stop ...
    → captures only denoise steps 0 through 4

Verified on B200 (single-GPU and 4-GPU cfg=2 ulysses=2) with
Wan2.2-T2V-A14B: all denoise steps + VAE decode captured with correct
GPU kernel events.

Signed-off-by: Chang Liu <[email protected]>
…STOP in visual gen profiler

Update _parse_profile_range() to accept comma-separated ranges
(e.g. "0-4,10-14") matching the LLM path's _load_iteration_indexes
format. Returns (frozenset(starts), frozenset(stops)) instead of a
single (start, stop) tuple; update the denoise loop accordingly.

Signed-off-by: Chang Liu <[email protected]>
@chang-l chang-l force-pushed the fix/visgen-nsys-profiling-6001694 branch from 30d9f1d to 3895e52 Compare April 30, 2026 18:49
@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Apr 30, 2026

/bot reuse-pipeline

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46429 [ reuse-pipeline ] triggered by Bot. Commit: 3895e52 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46429 [ reuse-pipeline ] completed with state SUCCESS. Commit: 3895e52
Can't reuse PR_Github #0 with status: UNKNOWN

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented Apr 30, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46433 [ run ] triggered by Bot. Commit: 3895e52 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46433 [ run ] completed with state SUCCESS. Commit: 3895e52
/LLM/main/L0_MergeRequest_PR pipeline #36504 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented May 4, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46662 [ run ] triggered by Bot. Commit: 3895e52 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46662 [ run ] completed with state SUCCESS. Commit: 3895e52
/LLM/main/L0_MergeRequest_PR pipeline #36701 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented May 4, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46698 [ run ] triggered by Bot. Commit: 3895e52 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46698 [ run ] completed with state SUCCESS. Commit: 3895e52
/LLM/main/L0_MergeRequest_PR pipeline #36735 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented May 5, 2026

/bot run --disable-fail-fast

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented May 5, 2026

/bot run

@chang-l chang-l enabled auto-merge (squash) May 5, 2026 04:55
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46745 [ run ] triggered by Bot. Commit: 2964fe7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46745 [ run ] completed with state SUCCESS. Commit: 2964fe7
/LLM/main/L0_MergeRequest_PR pipeline #36776 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@chang-l
Copy link
Copy Markdown
Collaborator Author

chang-l commented May 5, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46791 [ run ] triggered by Bot. Commit: 2964fe7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46791 [ run ] completed with state SUCCESS. Commit: 2964fe7
/LLM/main/L0_MergeRequest_PR pipeline #36814 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chang-l chang-l merged commit 5a8d3e1 into NVIDIA:main May 5, 2026
7 checks passed
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants