TensorRT: Quantization issues with convtranspose3D

utilisateur4351 · December 13, 2025, 12:52pm

Description

Environment

TensorRT Version: 10.10
GPU Type: RTX5070
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: Win11
Python Version (if applicable): 3.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

My code for export model (quantification model) and my model

code.txt (12.0 KB)

model.txt (59.6 KB)

Hello,

I don’t understand why convtranspose3d works in int8 with implicit quantization, and when I want to use explicit quantization or QAT phase with the model optimizer, convtranspose3d is now in fp16.
I am using a model optimizer with a default int8 configura21tion.
I think this is only a problem related to the model optimizer, but I’m not sure.

Here is my implicit ONNX.

Here is the result after TensorRT quantization.

So, here, it s int8 convTranspose3D.

Inference time in implicit mode.

9 ms inference time

Here is explicit ONNX quantization.

Here is the result after TensorRT quantization.

Inference time in explicit mode.

In explicit quantization: inference time of 11 ms.

carolyuu · December 13, 2025, 1:19pm

*** Please note that this reply is generated by LLM automatically ***

The issue you’re experiencing with convtranspose3D working in INT8 with implicit quantization but not with explicit quantization or QAT phase using the model optimizer is likely due to the way TensorRT handles quantization for this specific layer.

In implicit quantization, TensorRT automatically determines the quantization scheme for each layer, including convtranspose3D. However, when using explicit quantization or QAT, you need to specify the quantization scheme for each layer explicitly.

In your case, it seems that the model optimizer is defaulting to FP16 for convtranspose3D when using explicit quantization or QAT. This is likely because the model optimizer is not able to find an INT8 implementation for convtranspose3D on your specific hardware (RTX 5070).

There are a few possible reasons for this:

Hardware limitations: As mentioned in the TensorRT documentation, SM 7.5 and earlier devices may not have INT8 implementations for all layers. Although your RTX 5070 is a more recent device, it’s possible that the INT8 implementation for convtranspose3D is not available or not optimized for your specific hardware.
Model optimizer configuration: The model optimizer may not be configured to use INT8 for convtranspose3D explicitly. You can try setting the BuilderFlag option to kINT8 for the convtranspose3D layer to force the model optimizer to use INT8.
QAT limitations: QAT is a more complex process than implicit quantization, and it may not always produce the same results. It’s possible that the QAT process is not able to find an optimal INT8 implementation for convtranspose3D on your specific hardware.

To resolve this issue, you can try the following:

Check the ONNX operator support matrix: Verify that the convtranspose3D layer is supported in the ONNX opset you’re using and that it has an INT8 implementation available.
Use the BuilderFlag option: Set the BuilderFlag option to kINT8 for the convtranspose3D layer to force the model optimizer to use INT8.
Try a different QAT configuration: Experiment with different QAT configurations to see if you can find an optimal INT8 implementation for convtranspose3D.
Use a different hardware: If possible, try running your model on a different NVIDIA GPU to see if the issue is specific to your RTX 5070.

In terms of inference time, the difference between implicit and explicit quantization is likely due to the fact that explicit quantization requires more computations to perform the quantization and dequantization operations. However, the inference time difference of 2ms (9ms vs 11ms) is relatively small, and you may need to consider other factors such as model accuracy and robustness when deciding between implicit and explicit quantization.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Topic		Replies	Views
[TensorRT 10.x] Is ConvTranspose3d supported in INT8 on Jetson? (QAT Workflow) TensorRT tensorrt , camera , cuda	4	36	December 9, 2025
Tensorrt build QAT(pytorch-quantization) int8 3D operator failed TensorRT	1	681	February 15, 2022
Explicit quantization vs implicit quantization TensorRT	3	2079	April 26, 2022
ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16 TensorRT cudnn , tensorrt-model-optimizer	2	232	September 19, 2025
TensorRT explicit quantization layer fusion TensorRT tensorrt	4	1187	May 3, 2022
How can we know we have convert the onnx to int8trt rather than Float32? TensorRT tensorrt	23	2054	June 14, 2021
Practical aspects about neural networks quantization with TensorRT TensorRT tensorrt	1	935	March 31, 2023
Post-Training Quantization (PTQ) for semantic segmentation model running on Jetson Orin NX Jetson Orin NX tensorrt	24	499	March 26, 2025
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2447	January 6, 2022
TensorRT --fp16 pre and post Int8 quantization TensorRT cudnn	1	141	September 2, 2024

TensorRT: Quantization issues with convtranspose3D

Description

Environment

Relevant Files

Related topics