How come TensorRT-LLM is not supported on the AGX Thor? I tried to port my TensorRT-LLM-based project from the Orin to the Thor. After a bit of debugging and going into rabbit holes on this forum, I found out that not only is it not supported anymore, but it will be replaced by TensorRT Edge-LLM. Is this a totally different framework? Is it a fork of TensorRT-LLM? Also, is it not available yet?
Hi,
TensorRT-LLM doesn’t work on the Jetson platform.
For Thor, we will support TensorRT Edge-LLM from the upcoming release as a replacement.
Thanks.
Hi,
Do you mean on the newest iteration of the Jetson platform? It works just fine on the older, AGX Orin (props to @dusty_nv for the tutorials and container images). I am curious if my TensorRT-LLM Python code could be easily ported to TensorRT Edge-LLM, once it’s available.
Thanks.
The lack of transparency around TensorRT Edge-LLM and explanation for the need for a fork rather than extensions to TensorRT-LLM is really frustrating especially as this is a Blackwell GPU. Any information would be super appreciated @nvidia - thanks :-)
Hi, both
As TensorRT Edge-LLM has not been released yet, we are not able to share detailed information on the public forum.
But there is no plan to support TensorRT-LLM on the Jetson platform.
There was a workable version but we don’t officially support it.
Thanks.
Hi,
What is its approximate release date?
Thanks.
Hi,
TensorRT Edge-LLM will come with JetPack 7.1.
We expect to release JetPack 7.1 in early next year.
Thanks.
Ok that makes sense, I suppose Thor is an edge device, or as ChatGPT likse to say it’s “edge device that’s been to the gym”.
Any chance you can share more on the comment “there was a workable solution but we don’t support it”? Was that a solution from another Thor user? I’ve tried to compile tensorrt-llm from scratch back I got into dependency hell with cuBLAST and other issues compiling all the cuda kernels. Ooof.
Hi,
For now, you can try to run the LLMs with the container below instead:
Thanks.
Hi there,
I couldn’t find a TensorRT-LLM image in the jetson-containers repo — are you working with the 0.12.0-jetson branch?
Is it possible to use the latest version of TensorRT-LLM on Jetson Orin, and could you point me to some related docs or examples?
Thanks a lot!
Hi,
For Orin, it’s recommended to try our ollama or vLLM container below:
Ollama:
vLLM:
Thanks.