[None][feat] Move DWDP contention optimization into DwdpConfig by JintaoPengCS · Pull Request #12974 · NVIDIA/TensorRT-LLM

JintaoPengCS · 2026-04-13T02:09:48Z

Move DWDP contention optimization from environment variable (TRTM_dwdp_contention_opt) to a proper DwdpConfig field. When enabled, uses cudaMemcpyBatchAsync with round-robin slice ordering and 2 MiB slice granularity to reduce NVLink contention during expert prefetch.

Summary by CodeRabbit

Release Notes

New Features
- Added contention_opt configuration option to DWDP settings for memory optimization control.
Performance
- Enhanced prefetch operations with optimized memory copying mechanism to reduce NVLink contention when enabled.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Move DWDP contention optimization from environment variable (TRTM_dwdp_contention_opt) to a proper DwdpConfig field. When enabled, uses cudaMemcpyBatchAsync with round-robin slice ordering and 2 MiB slice granularity to reduce NVLink contention during expert prefetch. Signed-off-by: jintaop <[email protected]> Signed-off-by: JtaoPeng <[email protected]>

Signed-off-by: jintaop <[email protected]> Signed-off-by: JtaoPeng <[email protected]>

JintaoPengCS · 2026-04-13T05:52:27Z

/bot run --disable-fail-fast

coderabbitai · 2026-04-13T05:55:18Z

📝 Walkthrough

Walkthrough

This PR introduces a batched memory copy optimization for the DWDP module. A new configuration option contention_opt enables an alternative prefetch mechanism that batches multiple remote memory copies into a single cudaMemcpyBatchAsync call with round-robin slice interleaving, replacing individual cudaMemcpyAsync operations per peer.

Changes

Cohort / File(s)	Summary
Configuration `tensorrt_llm/llmapi/llm_args.py`	Added `contention_opt` boolean field to `DwdpConfig` (default `False`) to enable limited-contention prefetch optimization with 2 MiB batched slice granularity.
Batched Prefetch Implementation `tensorrt_llm/_torch/pyexecutor/dwdp.py`	Introduced global `MEMCPY_BATCH_SLICE_BYTES` constant and per-layer prefetch plan caching. Added conditional logic to select between baseline `_prefetch_layer_default()` (original per-peer async copies) and new `_prefetch_layer_batched()` (single batched async call with round-robin slice interleaving). Refactored `prefetch_layer()` to route based on `use_batched_prefetch` flag.

Sequence Diagram(s)

sequenceDiagram
    participant DM as DwdpManager
    participant CUDA as CUDA Stream
    participant MEM as GPU Memory

    rect rgba(100, 150, 200, 0.5)
    Note over DM,MEM: Baseline Path (use_batched_prefetch=False)
    DM->>DM: prefetch_layer()
    DM->>DM: _prefetch_layer_default()
    loop For each remote peer
        DM->>CUDA: cudaMemcpyAsync(full region)
        CUDA->>MEM: Copy entire num_prefetch_experts
    end
    CUDA->>CUDA: Synchronize
    end

    rect rgba(150, 200, 100, 0.5)
    Note over DM,MEM: Optimized Path (use_batched_prefetch=True)
    DM->>DM: prefetch_layer()
    DM->>DM: _build_batched_prefetch_copy_plan()
    DM->>DM: Split copies into MEMCPY_BATCH_SLICE_BYTES slices
    DM->>DM: Interleave slices round-robin across peers
    DM->>CUDA: cudaMemcpyBatchAsync(assembled pointers/sizes)
    CUDA->>MEM: Copy interleaved slices in batched call
    CUDA->>CUDA: Synchronize
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description explains the change and motivation, but critical template sections like 'Test Coverage' and 'Description' are incomplete or only partially filled.	Complete the 'Description' section with detailed explanation and fill in the 'Test Coverage' section with relevant test cases that safeguard the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: moving DWDP contention optimization from environment variable to DwdpConfig field.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tensorrt_llm/_torch/pyexecutor/dwdp.py (2)

265-269: Redundant global reassignment.

The MEMCPY_BATCH_SLICE_BYTES global is already initialized to 1 << 21 at module level (line 25). Reassigning it to the same value when contention_opt is enabled has no effect. Consider removing this block unless the intent is to allow configuration of different slice sizes in the future.

♻️ Suggested simplification

         self.use_batched_prefetch = config.contention_opt
-
-        if config.contention_opt:
-            global MEMCPY_BATCH_SLICE_BYTES
-            MEMCPY_BATCH_SLICE_BYTES = 1 << 21

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/dwdp.py` around lines 265 - 269, The global
reassignment of MEMCPY_BATCH_SLICE_BYTES inside the dwdp.PyExecutor constructor
is redundant because MEMCPY_BATCH_SLICE_BYTES is already initialized to 1 << 21
at module level; remove the block that sets MEMCPY_BATCH_SLICE_BYTES when
config.contention_opt is true (the lines referencing global
MEMCPY_BATCH_SLICE_BYTES and setting it to 1 << 21) or, if you intend to allow a
different slice size via config, replace that block with reading a new config
property (e.g., config.memcpy_batch_slice_bytes) and assign it to
MEMCPY_BATCH_SLICE_BYTES accordingly, ensuring you reference the same global
symbol in PyExecutor and document the config option.

620-621: Inconsistent None check between prefetch paths.

The batched path (_prefetch_layer_batched → _build_batched_prefetch_copy_plan) includes a defensive None check for dst_tensor (lines 584-587), but the default path here does not. While dst_tensor shouldn't be None for peer_rank != self.dwdp_rank based on buffer initialization logic, consider adding the same guard for consistency and defensive coding.

🛡️ Suggested defensive check

             dst_tensor = self.prefetch_buffer.buffers[buffer_idx][param_name][peer_rank]
+            if dst_tensor is None:
+                raise RuntimeError(
+                    f"Missing prefetch buffer for peer_rank={peer_rank} param={param_name}"
+                )
             dst_ptr = dst_tensor.data_ptr()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/dwdp.py` around lines 620 - 621, Add the same
defensive None check used in the batched prefetch path here: when retrieving
dst_tensor via self.prefetch_buffer.buffers[buffer_idx][param_name][peer_rank]
inside the non-batched prefetch flow, verify dst_tensor is not None before
calling dst_tensor.data_ptr(); if it is None, skip this copy/continue (or handle
the missing buffer the same way
_prefetch_layer_batched/_build_batched_prefetch_copy_plan do). This change
should reference dst_tensor, buffer_idx, param_name, peer_rank and keep behavior
consistent with _prefetch_layer_batched/_build_batched_prefetch_copy_plan.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/pyexecutor/dwdp.py`:
- Around line 651-660: The call to cudart.cudaMemcpyBatchAsync requires CUDA
12.8+; update the code around the cudaMemcpyBatchAsync invocation (reference
cudart.cudaMemcpyBatchAsync and self.prefetch_buffer.prefetch_stream.cuda_stream
in dwdp.py) to either document this requirement in the module docstring/comments
and/or add a runtime check (use the CUDA runtime/version API such as
cudart.cudaRuntimeGetVersion or similar) before calling cudaMemcpyBatchAsync and
raise a clear, actionable error if the detected CUDA version is less than 12.8
(e.g., advise upgrading CUDA or fallback to a supported memcpy path).

---

Nitpick comments:
In `@tensorrt_llm/_torch/pyexecutor/dwdp.py`:
- Around line 265-269: The global reassignment of MEMCPY_BATCH_SLICE_BYTES
inside the dwdp.PyExecutor constructor is redundant because
MEMCPY_BATCH_SLICE_BYTES is already initialized to 1 << 21 at module level;
remove the block that sets MEMCPY_BATCH_SLICE_BYTES when config.contention_opt
is true (the lines referencing global MEMCPY_BATCH_SLICE_BYTES and setting it to
1 << 21) or, if you intend to allow a different slice size via config, replace
that block with reading a new config property (e.g.,
config.memcpy_batch_slice_bytes) and assign it to MEMCPY_BATCH_SLICE_BYTES
accordingly, ensuring you reference the same global symbol in PyExecutor and
document the config option.
- Around line 620-621: Add the same defensive None check used in the batched
prefetch path here: when retrieving dst_tensor via
self.prefetch_buffer.buffers[buffer_idx][param_name][peer_rank] inside the
non-batched prefetch flow, verify dst_tensor is not None before calling
dst_tensor.data_ptr(); if it is None, skip this copy/continue (or handle the
missing buffer the same way
_prefetch_layer_batched/_build_batched_prefetch_copy_plan do). This change
should reference dst_tensor, buffer_idx, param_name, peer_rank and keep behavior
consistent with _prefetch_layer_batched/_build_batched_prefetch_copy_plan.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 70f5fe9a-55d6-498f-aeea-677d045b5740

📥 Commits

Reviewing files that changed from the base of the PR and between 28cf4f5 and 657b225.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/dwdp.py
tensorrt_llm/llmapi/llm_args.py

tensorrt-cicd · 2026-04-13T05:58:12Z

PR_Github #42962 [ run ] triggered by Bot. Commit: 657b225 Link to invocation

tensorrt-cicd · 2026-04-13T21:59:49Z

PR_Github #42962 [ run ] completed with state SUCCESS. Commit: 657b225
/LLM/main/L0_MergeRequest_PR pipeline #33619 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JintaoPengCS · 2026-04-14T01:43:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-14T01:50:13Z

PR_Github #43135 [ run ] triggered by Bot. Commit: 657b225 Link to invocation

Superjomn

LGTM on the llmapi changes.

tensorrt-cicd · 2026-04-14T08:58:27Z

PR_Github #43135 [ run ] completed with state SUCCESS. Commit: 657b225
/LLM/main/L0_MergeRequest_PR pipeline #33766 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…tention_opt The default value is already set to 2 MiB at module level, making this override unnecessary. Signed-off-by: JtaoPeng <[email protected]>

JintaoPengCS · 2026-04-14T09:43:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-14T09:50:02Z

PR_Github #43213 [ run ] triggered by Bot. Commit: cff3c6c Link to invocation

tensorrt-cicd · 2026-04-14T09:50:03Z

PR_Github #43213 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

JintaoPengCS · 2026-04-14T16:25:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-14T16:31:56Z

PR_Github #43255 [ run ] triggered by Bot. Commit: fb712d6 Link to invocation

tensorrt-cicd · 2026-04-14T16:31:57Z

PR_Github #43255 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

JintaoPengCS · 2026-04-19T13:22:01Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-19T13:29:05Z

PR_Github #44190 [ run ] triggered by Bot. Commit: 5008c68 Link to invocation

tensorrt-cicd · 2026-04-19T21:01:13Z

PR_Github #44190 [ run ] completed with state SUCCESS. Commit: 5008c68
/LLM/main/L0_MergeRequest_PR pipeline #34616 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JintaoPengCS · 2026-04-20T01:30:43Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T01:36:41Z

PR_Github #44237 [ run ] triggered by Bot. Commit: 5008c68 Link to invocation

tensorrt-cicd · 2026-04-20T06:03:31Z

PR_Github #44237 [ run ] completed with state FAILURE. Commit: 5008c68
/LLM/main/L0_MergeRequest_PR pipeline #34659 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JintaoPengCS · 2026-04-20T06:06:47Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T06:13:42Z

PR_Github #44349 [ run ] triggered by Bot. Commit: 5008c68 Link to invocation

tensorrt-cicd · 2026-04-20T08:38:11Z

PR_Github #44349 [ run ] completed with state FAILURE. Commit: 5008c68
/LLM/main/L0_MergeRequest_PR pipeline #34767 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JintaoPengCS · 2026-04-20T08:50:54Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-20T08:57:13Z

PR_Github #44415 [ run ] triggered by Bot. Commit: 5008c68 Link to invocation

tensorrt-cicd · 2026-04-20T13:52:30Z

PR_Github #44415 [ run ] completed with state SUCCESS. Commit: 5008c68
/LLM/main/L0_MergeRequest_PR pipeline #34827 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

JintaoPengCS · 2026-04-21T06:24:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-21T06:31:46Z

PR_Github #44653 [ run ] triggered by Bot. Commit: 81bc4e7 Link to invocation

JintaoPengCS · 2026-04-21T09:00:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-21T09:07:48Z

PR_Github #44688 [ run ] triggered by Bot. Commit: a1f0d13 Link to invocation

JintaoPengCS · 2026-04-22T05:46:27Z

/bot run --disable-fail-fast --reuse-test

tensorrt-cicd · 2026-04-22T05:53:05Z

PR_Github #44901 [ run ] triggered by Bot. Commit: a1f0d13 Link to invocation

JintaoPengCS · 2026-04-23T00:50:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T00:56:45Z

PR_Github #45048 [ run ] triggered by Bot. Commit: eb8c955 Link to invocation

JintaoPengCS · 2026-04-23T16:06:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-23T16:14:01Z

PR_Github #45214 [ run ] triggered by Bot. Commit: eb8c955 Link to invocation

tensorrt-cicd · 2026-04-23T21:32:51Z

PR_Github #45214 [ run ] completed with state SUCCESS. Commit: eb8c955
/LLM/main/L0_MergeRequest_PR pipeline #35481 completed with status: 'SUCCESS'

CI Report

Link to invocation

…A#12974) Signed-off-by: jintaop <[email protected]> Signed-off-by: JtaoPeng <[email protected]>

Re-implements the DWDP IPC-era contention optimization (PR NVIDIA#12974) on top of the VA-based ``DWDPWeightManager.prefetch_layer``. The ``DwdpConfig.contention_opt`` field that PR NVIDIA#12974 added is already carried through the schema; this commit makes it functional again on the VA path. When ``contention_opt=True``, ``prefetch_layer`` issues a single ``cudaMemcpyBatchAsync`` per layer with sub-slices interleaved across peers in 2 MiB chunks (``MEMCPY_BATCH_SLICE_BYTES``), so that any moment in flight touches every peer's NVLink link rather than saturating one. When ``contention_opt=False`` (default) the existing per-slice ``torch.Tensor.copy_`` path is unchanged. Adaptations from the IPC original onto the VA composite-VA structure: * Plan construction walks the same ``get_remote_slices`` / per-peer chunking that ``_prefetch_layer_per_slice`` uses (so Mode B's non-uniform / overlapping ranges resolved by ``lookup_owner`` interleave correctly, not just uniform partitions). * Each peer's per-layer contribution is split into 2 MiB sub-slices, then a round-robin emitter across peers concatenates them into the final ``(dst_ptrs, src_ptrs, sizes)`` triple consumed by ``cudaMemcpyBatchAsync``. * Plans are cached per ``layer_idx`` in ``self._batched_copy_plans`` since peer ``data_ptr()`` and destination slice ``data_ptr()`` are stable after ``setup_dwdp`` returns; this avoids host-side rebuild on every prefetch and matches the cache discipline used by PR NVIDIA#12974. Plumbing: * ``DwdpConfig.contention_opt`` -> ``DwdpManager.setup()`` -> ``setup_dwdp(contention_opt=...)`` -> ``DWDPWeightManager(contention_opt=...)``. * ``DWDPWeightManager.__slots__`` extended with ``_use_batched_prefetch`` and ``_batched_copy_plans``. Signed-off-by: tianyuz-nv <[email protected]>

github-actions Bot assigned JintaoPengCS Apr 13, 2026

[None][feat] Set default MEMCPY_BATCH_SLICE_BYTES to 2 MiB

657b225

Signed-off-by: jintaop <[email protected]> Signed-off-by: JtaoPeng <[email protected]>

JintaoPengCS changed the title ~~[None][feat] Add contention_opt to DwdpConfig for batched prefetch~~ **[JIRA ticket/NVBugs ID/GitHub issue/None] @coderabbitai title** Apr 13, 2026

JintaoPengCS changed the title ~~**[JIRA ticket/NVBugs ID/GitHub issue/None] @coderabbitai title**~~ [None][feat] Move DWDP contention optimization into DwdpConfig Apr 13, 2026

JintaoPengCS marked this pull request as ready for review April 13, 2026 05:50

JintaoPengCS requested review from a team as code owners April 13, 2026 05:50

JintaoPengCS requested a review from syuoni April 13, 2026 05:50

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/dwdp.py

Superjomn approved these changes Apr 14, 2026

View reviewed changes

syuoni approved these changes Apr 14, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/dwdp.py Outdated

Comment thread tensorrt_llm/llmapi/llm_args.py

[None][fix] Remove redundant MEMCPY_BATCH_SLICE_BYTES override in con…

cff3c6c

…tention_opt The default value is already set to 2 MiB at module level, making this override unnecessary. Signed-off-by: JtaoPeng <[email protected]>

JintaoPengCS enabled auto-merge (squash) April 14, 2026 13:17

kaiyux approved these changes Apr 14, 2026

View reviewed changes

Merge branch 'NVIDIA:main' into dwdp_production_pr2

fb712d6

Merge branch 'main' into dwdp_production_pr2

449908f

Merge branch 'NVIDIA:main' into dwdp_production_pr2

81bc4e7

Merge branch 'main' into dwdp_production_pr2

a1f0d13

Merge branch 'main' into dwdp_production_pr2

eb8c955

JintaoPengCS merged commit 79396d8 into NVIDIA:main Apr 23, 2026
5 checks passed

ziyixiong-nv pushed a commit to ziyixiong-nv/TensorRT-LLM that referenced this pull request Apr 24, 2026

[None][feat] Move DWDP contention optimization into DwdpConfig (NVIDI…

ae72bb2

…A#12974) Signed-off-by: jintaop <[email protected]> Signed-off-by: JtaoPeng <[email protected]>

tianyuz-nv mentioned this pull request May 25, 2026

[None][feat] Refactor DWDP from CUDA IPC to CUDA VMM + MNNVL composite VA #14453

Merged

7 tasks

Conversation

JintaoPengCS commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

JintaoPengCS commented Apr 13, 2026

Uh oh!

coderabbitai Bot commented Apr 13, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

tensorrt-cicd commented Apr 13, 2026

Uh oh!

JintaoPengCS commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Superjomn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

JintaoPengCS commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

JintaoPengCS commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

JintaoPengCS commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

tensorrt-cicd commented Apr 19, 2026

Uh oh!

JintaoPengCS commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

JintaoPengCS commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

JintaoPengCS commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

JintaoPengCS commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

JintaoPengCS commented Apr 21, 2026

Uh oh!

JintaoPengCS commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading