Model outputting NaNs

zakhrik · February 10, 2026, 11:59am

Description

I’m running into a issue while compiling my model from .pt to .onnx to .engine.
I have DINOv3 and I’m trying to compile two optimized versions of this model - one for imputs of dimension 112, and one for 512.
After running it through conversion to ONNXFP32 and then to TRTENGINEFP16, my model for 112 works fine and outputs real numbers, but for some reason the model optimized for 512 output NaNs.
I have tried using polygraphy debug precision and the –layerPrecisions flag to maybe allow some layers to remain in FP32, but nothing helps.
It’s important to notice that the model works very well for the size of 112.

Thank you so much for your help!

Environment

TensorRT Version: 10.4.0
GPU Type: V100-PCIE-16GB
Nvidia Driver Version: 535.129.03
CUDA Version: 12.6
CUDNN Version: 9.4.0
Operating System + Version: Red Hat Enterprise Linux 8.10
Python Version (if applicable): 3.11
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.7.1
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tritonserver:24.09-py3

Relevant Files

https://drive.google.com/drive/folders/1Jg3Jad5DkvKsN4rtHtFkV4Au16Hxov1n?usp=drive_link

Steps To Reproduce

This is what I get for 112:

And this is what I get for 512:

Clone the repo - GitHub - zakcory/model-opt: Converting .pt to .onnx to .engine
Copy the .pt DINOv3 model into sources/models/raw after cloning
Run the ./convert_all_models.sh script to convert to .onnx and then to .engine
Run the ./test_all_models.sh to run the engines both for 112 and for 512 and see the outputs

NOTES:

Make sure to have uv installed to have the bash scripts run succesfully (they use uv run and uv sync). If you do decide to use pip instead, download the dependencies from the requirements.txt file and change the uv run inside the bash script to python
Make sure to run the scripts on the V100, as for me the script runs fine on A10 but the problem seems to persist with the input size of 512 on V100 specifically

avishai11900 · February 10, 2026, 1:05pm

zakhrik:

Description

I’m running into a issue while compiling my model from .pt to .onnx to .engine.
I have DINOv3 and I’m trying to compile two optimized versions of this model - one for imputs of dimension 112, and one for 512.
After running it through conversion to ONNXFP32 and then to TRTENGINEFP16, my model for 112 works fine and outputs real numbers, but for some reason the model optimized for 512 output NaNs.
I have tried using polygraphy debug precision and the –layerPrecisions flag to maybe allow some layers to remain in FP32, but nothing helps.
It’s important to notice that the model works very well for the size of 112.

Thank you so much for your help!

Environment

TensorRT Version: 10.4.0
GPU Type: V100-PCIE-16GB
Nvidia Driver Version: 535.129.03
CUDA Version: 12.6
CUDNN Version: 9.4.0
Operating System + Version: Red Hat Enterprise Linux 8.10
Python Version (if applicable): 3.11
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.7.1
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tritonserver:24.09-py3

Relevant Files

models - Google Drive

Steps To Reproduce

This is what I get for 112:

IMG_01701920×1440 878 KB

And this is what I get for 512:

IMG_01691920×1440 822 KB

Clone the repo - GitHub - zakcory/model-opt: Converting .pt to .onnx to .engine

Copy the .pt DINOv3 model into sources/models/raw after cloning

Run the ./convert_all_models.sh script to convert to .onnx and then to .engine

Run the ./test_all_models.sh to run the engines both for 112 and for 512 and see the outputs

NOTES:

Make sure to have uv installed to have the bash scripts run succesfully (they use uv run and uv sync). If you do decide to use pip instead, download the dependencies from the requirements.txt file and change the uv run inside the bash script to python

Make sure to run the scripts on the V100, as for me the script runs fine on A10 but the problem seems to persist with the input size of 512 on V100 specifically

I’m seeing the same issue on my side. The model optimized for input size 112 runs correctly and produces valid outputs, but the 512 version produces NaNs after conversion to TensorRT FP16. I also tried adjusting precision settings and keeping some layers in FP32, but it didn’t resolve the problem.

Thanks in advance,
Avishai

yveskrei · February 18, 2026, 6:14pm

Experiencing the same issue with my V100 GPU.
Compared it with my personal RTX 4070.
On the V100 gpu it persists with outputting NaNs for larger input dimensions, as if with a newer GPU the issue seems not to appear.

Happens with FP16 conversion only for some reason.

eliyabs2112 · April 20, 2026, 10:53am

I am seeing the exact same behavior on a similar setup (V100, TensorRT 10.x). Like the OP, my DINOv3 conversions work perfectly at smaller input resolutions (112x112), but yield NaN outputs as soon as the spatial dimension scales up to 512x512.

I suspect this might be related to specific tactical kernels being selected for the V100’s Volta architecture during the FP16 optimization phase for larger tensor shapes. I’ve also attempted to isolate the issue using --layerPrecisions to keep the attention heads in FP32, but the NaNs persist.

Given that this works on Ampere (A10) but fails on Volta (V100) for the exact same 512 input, could an NVIDIA engineer look into whether there is a known precision overflow in the scaled dot-product attention or LayerNorm kernels specifically for the V100 on TRT 10.4? This is a major blocker for deploying high-resolution Vision Transformer models on older enterprise hardware.

Topic		Replies	Views
TensorRT with fp16 return nan for all outputs TensorRT	5	4313	February 5, 2021
Use trtexec to convert onnx format to fp16 tensorRT engine, and perform inference and export nan TensorRT cudnn	2	287	June 3, 2025
TensorRT output full of NaN TensorRT	1	527	October 19, 2023
All outputs are nan TensorRT	5	2973	September 23, 2022
Simple ResNet model from PyTorch - "nan" Output TensorRT	1	1652	April 9, 2021
Tensorrt output NAN for FP16 and for FP16+INT8 TensorRT tensorrt , jetson-inference , cudnn , jetson	0	127	November 15, 2024
Fp 16 trt model will output nan value Jetson AGX Xavier tensorrt	13	1359	October 24, 2022
How to debug trt fp16 Nan output with polygraphy? TensorRT tensorrt	2	234	July 1, 2025
TensorRT model always return NaN output TensorRT	2	422	June 20, 2024
Wrong output when converting Depth Anything V2 ONNX model to Tensorrt TensorRT cudnn , jetson	2	524	April 2, 2025

Model outputting NaNs

Description

Environment

Relevant Files

Steps To Reproduce

Related topics