cudaErrorNoKernelImageForDevice running NanoOWL

Hi,

I’m trying to run the tutorial at GitHub - NVIDIA-AI-IOT/nanoowl: A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.

I’m running a freshly flashed Jetson Orin Nano 8GB with jetpack 6.2 installed.

I installed all the dependencies listed and was able to verify installation and build the NanoOWL package and then build the TensorRT engine.

I try to run the example prediction owl_predict.py but I am getting the error cudaErrorNoKernelImageForDevice.

I tried to verify the CUDA installation with nvcc –version with no luck.

I then exported CUDA -12.6 to the PATH and was then able to run nvcc –version successfully but I still get the cudaErrorNoKernelImageForDevice error.

Any idea what is wrong?

I did install pytorch for CUDA12.6 and tested it separately to make sure torch was working.

I downloaded numba and ran a simple “hello world” script in python and it worked just fine.

happy@happy-desktop:~/Desktop$ python3 hello_world.py
/usr/lib/python3/dist-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/usr/local/lib/python3.10/dist-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))
Hello from GPU!
Hello from CPU!

Hi,

Could you try the PyTorch package in the link below:

Thanks.

That did the trick.

I still had to fight some dependencies. I’ll create one last post to list all the commands I had to use to get it to work for the next poor slob.

Hopefully Jetpack 7 will save us all!

Dec 2025

The Story of getting nanoOWL to work on a Jetson Orin Nano Developer Kit 8GB

I re-flashed a SIM card with the Jetpack 6.2.1

I accepted the ubuntu updates (If you don’t do this the snap has an error and you can’t install Chromium).

I installed Chromium.

I went to the referenced link to install pytorch (just torch): https://pypi.jetson-ai-lab.io/jp6/cu126

I installed pip “sudo apt update” “sudo apt install python3-pip”

Then I “pip install torch-2.8.0-cp310-cp310-linux_aarch64.whl” with the wheel file downloaded from the pypi link.

I then checked the installation with python in the command line:

import torch
x = torch.rand(5, 3)
print(x)
tensor([[0.7626, 0.8550, 0.0565],
[0.4420, 0.0316, 0.8504],
[0.2350, 0.7186, 0.2145],
[0.0988, 0.3881, 0.6704],
[0.3906, 0.4185, 0.2418]])
torch.cuda.is_available()
True
exit()

Then I downloaded transformers “python3 -m pip install transformers”

Then I cloned the nanoowl repository

happy@happy-desktop:~$ git clone GitHub - NVIDIA-AI-IOT/nanoowl: A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.
cd nanoowl
python3 setup.py develop --user

Then I had to download and install torchvision with a wheel from pypi “pip install torchvision-0.23.0-cp310-cp310-linux_aarch64.whl”

Then I had to update pillow “pip install pillow==11.1.0”

Then I had to download an install onneruntime from pypi “pip install onnxruntime_gpu-1.23.0-cp310-cp310-linux_aarch64.whl”

Then I had to reversion numpy “pip install --force-reinstall numpy==1.25”

Then I had to install onnx “pip install onnx”

Then I had to download and install torch2trt: git clone GitHub - NVIDIA-AI-IOT/torch2trt: An easy to use PyTorch to TensorRT converter
cd torch2trt
python setup.py install

Then I was able to build the engine ”python3 -m nanoowl.build_image_encoder_engine data/owl_image_encoder_patch32.engine”

NOTE: You have to close out Chromium and everything else to free up the RAM for the GPU! The GPU cannot use virtual RAM so no SWAP file will work. The jetson-ai-lab seup recommends disabling the desktop GUI to free up the RAM but I was able to get it to run with just the terminal running.

I was then finally able to run the example on GitHub:

cd examples
python3 owl_predict.py
–prompt=“[an owl, a glove]”
–threshold=0.1
–image_encoder_engine=../data/owl_image_encoder_patch32.engine

Hi,

Thanks for the details.
Good to know it works now.

https://pypi.jetson-ai-lab.io/jp6/cu126 is down?
I knew I should have saved the .whl files separately!