TensorRT Quantization for Jetson Inference

Hi Everyone, I’m currently working with a Jetson Orin Nano and I’m trying to optimize the model by using TensorRT. Right now the system has Jetpack 6.2 which means TensorRT is version 10.3. I managed to get the model engine to build properly on the device, however, it was too slow.

I then tried to quantize the model for inference using a Cloud VM, however, that broke everything and caused issues. I managed to quantize the model and get an ONNX model out, but when trying to build the engine I would get a variety of errors depending on how I tried to fix it. I installed the ModelOptimizer following the documentation from here. The system I used was:

  • a Colab notebook with a T4 GPU
  • ONNX input model

My attempts to fix the model build issues are described below:

  • Allow Int8 and FP16 datatypes in the model by setting the builder config flags with:
    • config.set_flag(trt.BuilderFlag.FP16)
    • config.set_flag(trt.BuilderFlag.INT8)
  • Manually ensuring the network is strongly typed with:
    • network = builder.create_network(1 <<int(trt.NetworkDefinitionCreationFlag.STRONGLY_TYPED))

The model build failed with the following errors when trying to build an engine for a quantized model:

CaskFlattenConvolution: No valid tactics for lower_res_blocks.3.conv2.weight + /lower_res_blocks.3/conv2/Conv + /lower_res_blocks.3/Add + PWN(/Add_1)

[10/15/2025-07:19:32] [TRT] [E] [weightsPtr.h::values::182] Error Code 2: Internal Error (Assertion type() == expectedDataType<T>() failed. )

[10/15/2025-07:37:19] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 3: API Usage Error (Parameter check failed, condition: !config.getFlag(BuilderFlag::kINT8). )

Any suggestions on how to resolve this would be great as well as suggestions on how to standardise the libraries and environment for the Model Optimizer so that it can work with TensorRT on the Jetson would be great.

Thank you in advance

*** Please note that this reply is generated by LLM automatically ***

I’d be happy to help you with your issue!

It seems like you’re trying to optimize your model using TensorRT on a Jetson Orin Nano, but you’re encountering some issues. You’ve managed to build the model engine on the device, but it’s too slow. You’ve also tried to quantize the model using a Cloud VM, but that broke everything and caused issues.

Firstly, let’s talk about the version of TensorRT you’re using. You mentioned that you’re using Jetpack 6.2, which means you’re using TensorRT 10.3. This is a relatively old version of TensorRT, and it’s possible that some of the issues you’re encountering are due to this.

One of the main issues you’re facing is that you’re getting a variety of errors when trying to build the engine for a quantized model. You’ve tried a few things to fix this, including allowing Int8 and FP16 datatypes in the model, and manually ensuring the network is strongly typed.

However, it seems like these attempts haven’t been successful. You mentioned that you’ve installed the ModelOptimizer, but you didn’t specify which version you’re using. It’s possible that the version of ModelOptimizer you’re using is not compatible with TensorRT 10.3.

Here are a few suggestions that might help you resolve this issue:

  1. Update to a newer version of TensorRT: If possible, try updating to a newer version of TensorRT, such as TensorRT 8.6.13. This version has several improvements and bug fixes that might help resolve the issues you’re encountering.
  2. Use a compatible version of ModelOptimizer: Make sure you’re using a version of ModelOptimizer that’s compatible with TensorRT 10.3. You can check the documentation for ModelOptimizer to see which versions are compatible with TensorRT.
  3. Check the model’s precision: When building the engine, try setting the precision to FP32 instead of INT8 or FP16. This might help resolve some of the issues you’re encountering.
  4. Use the trtexec tool: Instead of using the trtexec tool to build the engine, try using the trtexec tool to run the model. This might help you identify if the issue is with the engine building process or with the model itself.
  5. Check the model’s architecture: Make sure the model’s architecture is compatible with TensorRT. Some models might require specific modifications to work with TensorRT.

In terms of standardizing the libraries and environment for the ModelOptimizer, here are a few suggestions:

  1. Use a Docker container: Consider using a Docker container to standardize the environment for the ModelOptimizer. This will ensure that the environment is consistent across different machines.
  2. Use a virtual environment: Consider using a virtual environment, such as conda or virtualenv, to standardize the environment for the ModelOptimizer. This will ensure that the environment is consistent across different machines.
  3. Use a consistent version of ModelOptimizer: Make sure to use a consistent version of ModelOptimizer across different machines. This will ensure that the results are consistent and reproducible.

I hope these suggestions help you resolve the issues you’re encountering! If you have any further questions or need more help, feel free to ask.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

Thanks for the response. The support matrix for the Model Optimizer states that it is compatible with any TensorRT version greater than 10.
The model architecture seems to be fine as it’s a relatively simple model based on Unet with no large customizations to any blocks.

Hi,

It looks like you’re using a CNN instead of an LLM. Is that correct?
If so, could you check the steps below to see if it can work?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.