[None][feat] Add FP4 residual quantization kernel without channel reo… by Tracin · Pull Request #13117 · NVIDIA/TensorRT-LLM

Tracin · 2026-04-16T08:06:48Z

Add nvfp4_quantize_residual_with_block_size kernel that performs ARCQuant-style FP4 quantization with residual, using the quantize_with_block_size loop pattern (grid-stride rows, block-stride columns). Unlike the existing reorder kernel, this version reads directly from global memory without channel reordering, eliminating shared memory and __syncthreads() overhead.

Key components:

cvt_group_to_fp4_residual: reusable device helper (analogous to cvt_warp_fp16_to_fp4) that quantizes a 16-element group to FP4 with optional residual quantization
Vectorized 128-bit loads, float32 precision throughout, PDL support
Reuses get_sf_out_offset_128x4, fp32_vec_to_e2m1, reciprocal_approximate_ftz from quantization.cuh
Torch op fp4_quantize_with_residual exposed for Python access
Correctness test (GEMM cosine similarity) and perf benchmarks

Summary by CodeRabbit

Release Notes

New Features
- Introduced fp4_quantize_with_residual operation for optimized FP4 quantization without channel reordering
- Supports bfloat16 and fp8 input types with flexible activation and weight quantization modes
Tests
- Added comprehensive correctness and performance validation tests for the new quantization operation
- Includes benchmarking utilities to measure CUDA kernel performance

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

…rder Add nvfp4_quantize_residual_with_block_size kernel that performs ARCQuant-style FP4 quantization with residual, using the quantize_with_block_size loop pattern (grid-stride rows, block-stride columns). Unlike the existing reorder kernel, this version reads directly from global memory without channel reordering, eliminating shared memory and __syncthreads() overhead. Key components: - cvt_group_to_fp4_residual: reusable device helper (analogous to cvt_warp_fp16_to_fp4) that quantizes a 16-element group to FP4 with optional residual quantization - Vectorized 128-bit loads, float32 precision throughout, PDL support - Reuses get_sf_out_offset_128x4, fp32_vec_to_e2m1, reciprocal_approximate_ftz from quantization.cuh - Torch op fp4_quantize_with_residual exposed for Python access - Correctness test (GEMM cosine similarity) and perf benchmarks Signed-off-by: Tracin <[email protected]>

Tracin · 2026-04-16T08:09:19Z

/bot run

tensorrt-cicd · 2026-04-16T08:14:53Z

PR_Github #43721 [ run ] triggered by Bot. Commit: aa78617 Link to invocation

coderabbitai · 2026-04-16T08:15:53Z

📝 Walkthrough

Walkthrough

This PR adds a new ARCQuant FP4 quantization kernel nvfp4_quantize_residual_with_block_size with block-based residual computation support. The implementation includes device kernel logic, host launcher, PyTorch bindings, and comprehensive correctness and performance tests.

Changes

Cohort / File(s)	Summary
Kernel implementation `cpp/tensorrt_llm/kernels/arcquantFP4.cu`	Added PTX-based FP4 dequantization helper `e2m1_uint32_to_float8`, new device function `cvt_group_to_fp4_residual` for quantizing and dequantizing FP4 groups with residual computation, global kernel `nvfp4_quantize_residual_with_block_size` using grid-stride row/block-stride group loops with vectorized loads and conditional residual quantization, host launcher wrapper, and 3 explicit template instantiations for (bf16/fp8_e4m3, GROUP_SIZE=16, ACT/WEIGHT). Removed local `reciprocal_approximate_ftz` helper (moved to `quantization.cuh`).
Kernel interface `cpp/tensorrt_llm/kernels/arcquantFP4.h`	Added templated function declaration `run_nvfp4_quantize_residual_with_block_size` exposing the new quantization kernel interface to host code.
PyTorch binding `cpp/tensorrt_llm/thop/fp4Quantize.h`, `cpp/tensorrt_llm/thop/fp4Quantize.cpp`	Added public C++ API `fp4_quantize_with_residual` with dtype validation (bf16/fp8 inputs, float32 scales), shape constraints (KE/KQ/total alignment), output tensor allocation via `computeSwizzledLayoutSFSize`, dtype-specific kernel dispatch with `ArcQuantType::ACT/WEIGHT` selection, and PyTorch operator registration with CUDA implementation mapping.
Tests `tests/unittest/_torch/thop/parallel/test_arcquant_fp4.py`	Added benchmarking helpers (`_benchmark`, `_effective_bytes`, `_gb_per_sec`), correctness test `test_arcquant_fp4_with_residual` validating end-to-end GEMM cosine similarity, and two performance comparison tests (`test_arcquant_fp4_perf_no_residual`, `test_arcquant_fp4_perf_with_residual`) comparing old vs new quantization methods.

Sequence Diagram

sequenceDiagram
    participant PyTorch as PyTorch Operator
    participant HostLauncher as Host Launcher
    participant DeviceKernel as Device Kernel
    participant Memory as GPU Memory

    PyTorch->>HostLauncher: fp4_quantize_with_residual(X, input_scale, KE, is_act)
    HostLauncher->>HostLauncher: Validate dtypes & shapes
    HostLauncher->>Memory: Allocate output buffers (q_out, q_scale)
    HostLauncher->>DeviceKernel: Launch nvfp4_quantize_residual_with_block_size
    
    DeviceKernel->>Memory: Load GROUP_SIZE elements (float4 vectorized)
    DeviceKernel->>DeviceKernel: Convert to float32 with global_scale
    DeviceKernel->>DeviceKernel: Compute max-based scale factor
    DeviceKernel->>DeviceKernel: Quantize to FP4 e2m1
    DeviceKernel->>DeviceKernel: Dequantize back (e2m1_uint32_to_float8)
    DeviceKernel->>DeviceKernel: Compute residuals
    alt if residual output enabled
        DeviceKernel->>DeviceKernel: Conditionally quantize residuals
    end
    
    DeviceKernel->>Memory: Write packed outputs (float2 or float4)
    HostLauncher->>PyTorch: Return (QX, SFX) tensors

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[None][feat] 2FP4 / Arcquant. #11333: Implements ARCQuant FP4 quantization kernels with related reordered residual quantization interfaces in the same arcquantFP4 files.

Suggested reviewers

binghanc
nv-guomingz
yuxianq
liji-nv
QiJune

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is partially related to the changeset—it mentions FP4 residual quantization kernel without channel reorder, which is a real component, but is incomplete (truncated with '…') and does not fully convey the scope of changes including the new Torch op and test coverage.	Complete the truncated title to clearly summarize the main change, e.g., '[None][feat] Add FP4 residual quantization kernel without channel reorder and Torch op' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description provides detailed explanation of the solution (kernel implementation, key components, test coverage) and includes a checked checklist item, but the template sections (Description, Test Coverage) are not explicitly filled out—the content is in the code block above the template.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

tests/unittest/_torch/thop/parallel/test_arcquant_fp4.py (1)

9-40: Add return type annotations to the new helpers.

These utilities are missing explicit return types, which is out of step with the repo's Python typing rules.

✍️ Suggested cleanup

-def _benchmark(fn, warmup=10, iters=100):
+def _benchmark(fn, warmup: int = 10, iters: int = 100) -> float:
@@
-def _effective_bytes(M, K, KE, input_dtype=torch.bfloat16):
+def _effective_bytes(M: int, K: int, KE: int, input_dtype: torch.dtype = torch.bfloat16) -> int:
@@
-def _gb_per_sec(total_bytes, time_ms):
+def _gb_per_sec(total_bytes: int, time_ms: float) -> float:

As per coding guidelines, "Always annotate Python function return types; use None if the function does not return anything".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/thop/parallel/test_arcquant_fp4.py` around lines 9 -
40, Add explicit return type annotations to the three helper functions: annotate
_benchmark(...) -> float, _effective_bytes(M, K, KE, input_dtype: torch.dtype =
torch.bfloat16) -> int, and _gb_per_sec(total_bytes, time_ms) -> float; update
the function signatures (not bodies) to include these types so they comply with
the project's typing rules.

cpp/tensorrt_llm/kernels/arcquantFP4.cu (1)

463-503: Label the get_sf_out_offset_128x4 arguments inline.

These call sites are hard to audit with five positional arguments and two std::optional values. Inline parameter names would make the swizzled-layout indexing much easier to review.

🧭 Suggested cleanup

-                auto sfOffset
-                    = get_sf_out_offset_128x4(std::nullopt, rowIdx, sfIdx, std::optional<int>(numRows), numSFCols);
+                auto sfOffset = get_sf_out_offset_128x4(
+                    /*batchIdx=*/std::nullopt, /*mIdx=*/rowIdx, /*kIdx=*/sfIdx,
+                    /*numRows=*/std::optional<int>(numRows), /*numColVecs=*/numSFCols);
@@
-                auto sfOffset
-                    = get_sf_out_offset_128x4(std::nullopt, rowIdx, pos, std::optional<int>(numRows), numSFCols);
+                auto sfOffset = get_sf_out_offset_128x4(
+                    /*batchIdx=*/std::nullopt, /*mIdx=*/rowIdx, /*kIdx=*/pos,
+                    /*numRows=*/std::optional<int>(numRows), /*numColVecs=*/numSFCols);
@@
-                    auto sfOffsetRes = get_sf_out_offset_128x4(
-                        std::nullopt, rowIdx, pos + 1, std::optional<int>(numRows), numSFCols);
+                    auto sfOffsetRes = get_sf_out_offset_128x4(
+                        /*batchIdx=*/std::nullopt, /*mIdx=*/rowIdx, /*kIdx=*/pos + 1,
+                        /*numRows=*/std::optional<int>(numRows), /*numColVecs=*/numSFCols);

As per coding guidelines, "In C++ function calls with non-obvious parameters, use inline C comments with the format /*paramName=*/ to document parameters".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/arcquantFP4.cu` around lines 463 - 503, Call sites
of get_sf_out_offset_128x4 are hard to read because they pass multiple
positional args and std::optional values; update each call (e.g., the ones using
std::nullopt, rowIdx, pos, std::optional<int>(numRows), numSFCols and the pos+1
variant) to use inline parameter-name comments like /*argName=*/ before each
argument (for example /*swizzle=*/std::nullopt, /*row=*/rowIdx, /*col=*/pos,
/*numRows=*/std::optional<int>(numRows), /*numSFCols=*/numSFCols) so the meaning
of each parameter is explicit and easier to audit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/thop/fp4Quantize.cpp`:
- Around line 313-328: Add input validation at the top of
fp4_quantize_with_residual: check that M > 0 and KQ > 0 to avoid zero-dimension
launches, verify input_scale.numel() > 0 (and that input_scale.device() ==
X.device()) to avoid invalid device pointer dereferences, and keep the existing
dtype requirement for input_scale; if any check fails, TORCH_CHECK with a clear
message. Reference fp4_quantize_with_residual (and the downstream arcquantFP4.cu
kernel behavior) so the checks guard against blockSize==0 and invalid device
pointers.
- Around line 313-314: The file containing the function
fp4_quantize_with_residual has an outdated NVIDIA copyright header ending at
2023; update the header year range to include 2026 (e.g., change "2023" to
"2023-2026" or similar per project convention) so the modified file reflects
2026 changes and matches the project's copyright header policy.

In `@cpp/tensorrt_llm/thop/fp4Quantize.h`:
- Around line 39-40: Update the copyright header in
cpp/tensorrt_llm/thop/fp4Quantize.h to include 2026 (since the file was modified
in 2026); locate the top-of-file NVIDIA copyright block (near the declaration of
fp4_quantize_with_residual) and change the year range or add 2026 so the header
reflects the modification year.

---

Nitpick comments:
In `@cpp/tensorrt_llm/kernels/arcquantFP4.cu`:
- Around line 463-503: Call sites of get_sf_out_offset_128x4 are hard to read
because they pass multiple positional args and std::optional values; update each
call (e.g., the ones using std::nullopt, rowIdx, pos,
std::optional<int>(numRows), numSFCols and the pos+1 variant) to use inline
parameter-name comments like /*argName=*/ before each argument (for example
/*swizzle=*/std::nullopt, /*row=*/rowIdx, /*col=*/pos,
/*numRows=*/std::optional<int>(numRows), /*numSFCols=*/numSFCols) so the meaning
of each parameter is explicit and easier to audit.

In `@tests/unittest/_torch/thop/parallel/test_arcquant_fp4.py`:
- Around line 9-40: Add explicit return type annotations to the three helper
functions: annotate _benchmark(...) -> float, _effective_bytes(M, K, KE,
input_dtype: torch.dtype = torch.bfloat16) -> int, and _gb_per_sec(total_bytes,
time_ms) -> float; update the function signatures (not bodies) to include these
types so they comply with the project's typing rules.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9d1bac2b-7524-4ada-a743-01f63a4e5f47

📥 Commits

Reviewing files that changed from the base of the PR and between ac9ea3c and aa78617.

📒 Files selected for processing (5)

cpp/tensorrt_llm/kernels/arcquantFP4.cu
cpp/tensorrt_llm/kernels/arcquantFP4.h
cpp/tensorrt_llm/thop/fp4Quantize.cpp
cpp/tensorrt_llm/thop/fp4Quantize.h
tests/unittest/_torch/thop/parallel/test_arcquant_fp4.py

tensorrt-cicd · 2026-04-16T10:04:22Z

PR_Github #43721 [ run ] completed with state FAILURE. Commit: aa78617
/LLM/main/L0_MergeRequest_PR pipeline #34204 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-17T02:14:45Z

/bot run

tensorrt-cicd · 2026-04-17T02:21:20Z

PR_Github #43886 [ run ] triggered by Bot. Commit: 5f971f0 Link to invocation

tensorrt-cicd · 2026-04-17T02:38:02Z

PR_Github #43886 [ run ] completed with state FAILURE. Commit: 5f971f0
/LLM/main/L0_MergeRequest_PR pipeline #34338 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-20T02:49:08Z

/bot run

tensorrt-cicd · 2026-04-20T02:54:31Z

PR_Github #44265 [ run ] triggered by Bot. Commit: f430438 Link to invocation

tensorrt-cicd · 2026-04-20T03:47:58Z

PR_Github #44265 [ run ] completed with state FAILURE. Commit: f430438
/LLM/main/L0_MergeRequest_PR pipeline #34688 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-20T05:38:56Z

/bot run

tensorrt-cicd · 2026-04-20T05:45:44Z

PR_Github #44323 [ run ] triggered by Bot. Commit: f430438 Link to invocation

tensorrt-cicd · 2026-04-20T05:59:19Z

PR_Github #44323 [ run ] completed with state FAILURE. Commit: f430438
/LLM/main/L0_MergeRequest_PR pipeline #34746 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-20T08:14:04Z

/bot run

tensorrt-cicd · 2026-04-20T08:20:53Z

PR_Github #44397 [ run ] triggered by Bot. Commit: f430438 Link to invocation

tensorrt-cicd · 2026-04-20T11:10:12Z

PR_Github #44397 [ run ] completed with state FAILURE. Commit: f430438
/LLM/main/L0_MergeRequest_PR pipeline #34814 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Add torch.library.register_fake for trtllm::fp4_quantize_with_residual so the op participates correctly in torch.compile / fake-tensor tracing. Signed-off-by: Tracin <[email protected]>

Tracin · 2026-04-21T06:49:53Z

/bot run

tensorrt-cicd · 2026-04-21T06:56:48Z

PR_Github #44660 [ run ] triggered by Bot. Commit: e454c56 Link to invocation

tensorrt-cicd · 2026-04-21T12:32:00Z

PR_Github #44660 [ run ] completed with state SUCCESS. Commit: e454c56
/LLM/main/L0_MergeRequest_PR pipeline #35034 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-22T02:21:59Z

/bot run

Tracin · 2026-04-22T05:32:18Z

/bot run

tensorrt-cicd · 2026-04-22T05:38:58Z

PR_Github #44899 [ run ] triggered by Bot. Commit: e454c56 Link to invocation

tensorrt-cicd · 2026-04-22T11:29:39Z

PR_Github #44899 [ run ] completed with state SUCCESS. Commit: e454c56
/LLM/main/L0_MergeRequest_PR pipeline #35234 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-23T02:11:34Z

/bot run

tensorrt-cicd · 2026-04-23T02:17:13Z

PR_Github #45067 [ run ] triggered by Bot. Commit: e454c56 Link to invocation

tensorrt-cicd · 2026-04-23T05:32:23Z

PR_Github #45067 [ run ] completed with state SUCCESS. Commit: e454c56
/LLM/main/L0_MergeRequest_PR pipeline #35370 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-23T06:34:51Z

/bot run

tensorrt-cicd · 2026-04-23T06:41:01Z

PR_Github #45136 [ run ] triggered by Bot. Commit: f65767d Link to invocation

tensorrt-cicd · 2026-04-23T14:24:34Z

PR_Github #45136 [ run ] completed with state SUCCESS. Commit: f65767d
/LLM/main/L0_MergeRequest_PR pipeline #35422 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tracin · 2026-04-24T02:10:18Z

/bot run

tensorrt-cicd · 2026-04-24T02:17:23Z

PR_Github #45295 [ run ] triggered by Bot. Commit: f65767d Link to invocation

tensorrt-cicd · 2026-04-24T04:45:28Z

PR_Github #45295 [ run ] completed with state SUCCESS. Commit: f65767d
/LLM/main/L0_MergeRequest_PR pipeline #35549 completed with status: 'SUCCESS'

CI Report

Link to invocation

hyukn

LGTM.

…rder (NVIDIA#13117) Signed-off-by: Tracin <[email protected]>

github-actions Bot assigned Tracin Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread cpp/tensorrt_llm/thop/fp4Quantize.cpp

Comment thread cpp/tensorrt_llm/thop/fp4Quantize.cpp

Comment thread cpp/tensorrt_llm/thop/fp4Quantize.h

Merge branch 'main' into fp4_quant_residual

5f971f0

Merge branch 'main' into fp4_quant_residual

f430438

[None][feat] Register fake op for fp4_quantize_with_residual

6ac8598

Add torch.library.register_fake for trtllm::fp4_quantize_with_residual so the op participates correctly in torch.compile / fake-tensor tracing. Signed-off-by: Tracin <[email protected]>

Tracin requested a review from a team as a code owner April 21, 2026 06:49

Tracin requested a review from hyukn April 21, 2026 06:49

Merge branch 'main' into fp4_quant_residual

e454c56

Merge branch 'main' into fp4_quant_residual

f65767d

hyukn approved these changes Apr 24, 2026

View reviewed changes

Tracin merged commit c4b8e8e into NVIDIA:main Apr 24, 2026
5 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[None][feat] Add FP4 residual quantization kernel without channel reo…

c4d7758

…rder (NVIDIA#13117) Signed-off-by: Tracin <[email protected]>

Conversation

Tracin commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

Tracin commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

coderabbitai Bot commented Apr 16, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

Tracin commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

tensorrt-cicd commented Apr 17, 2026

Uh oh!

Tracin commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

Tracin commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

Tracin commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

tensorrt-cicd commented Apr 20, 2026

Uh oh!

Tracin commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

tensorrt-cicd commented Apr 21, 2026

Uh oh!

Tracin commented Apr 22, 2026

Uh oh!

Tracin commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

Tracin commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

Tracin commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Uh oh!

tensorrt-cicd commented Apr 23, 2026

Tracin commented Apr 16, 2026 •

edited

Loading