Install vllm in Thor failed

liu.jialu · September 8, 2025, 2:12am

uv venv --python 3.12 --seed

source .venv/bin/activate

uv pip install vllm --torch-backend=auto

report error:

  "/home/nvidia/.cache/uv/builds-v0/.tmpjXZPq7/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py",
  line 368, in run
      self.build_extensions()
    File "<string>", line 232, in build_extensions
    File "<string>", line 210, in configure
    File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake',
  '/home/nvidia/.cache/uv/sdists-v9/pypi/vllm/0.10.1.1/itULH14ewSqkQjUU-v7LE/src',
  '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda',
  '-DVLLM_PYTHON_EXECUTABLE=/home/nvidia/.cache/uv/builds-v0/.tmpjXZPq7/bin/python',
  '-DVLLM_PYTHON_PATH=/usr/lib/python312.zip:/usr/lib/python3.12:/usr/lib/python3.12/lib-dynload:/home/nvidia/.cache/uv/builds-v0/.tmpjXZPq7/lib/python3.12/site-packages:/home/nvidia/.cache/uv/builds-v0/.tmpjXZPq7/lib/python3.12/site-packages/setuptools/_vendor',
  '-DFETCHCONTENT_BASE_DIR=/home/nvidia/.cache/uv/sdists-v9/pypi/vllm/0.10.1.1/itULH14ewSqkQjUU-v7LE/src/.deps',
  '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile',
  '-DCMAKE_JOB_POOLS:STRING=compile=14',
  '-DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc']' returned non-zero exit
  status 1.

  hint: This usually indicates a problem with the package or the build environment.

DaneLLL · September 8, 2025, 5:22am

Hi,
We will check whether vLLM is supported on AGX Thro or not. There are some working solutions in

Making sure you're not a bot!

You may give it a try.

AastaLLL · September 8, 2025, 11:59pm

Hi,

Does vLLM container work for you?
If yes, you can find the vLLM container for Thor below:

nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3

Thanks.

PrinceHal · September 11, 2025, 12:41am

As the moderators suggested, you can’t just install vLLM the normal way yet.

However, if you follow their suggestion, you can run the docker container like this:

mkdir ~/.cache/nim
export LOCAL_NIM_CACHE=~/.cache/nim

docker run --ipc=host --net host --gpus all --runtime=nvidia --privileged -it --rm -u 0:0 --name=testvllm --ipc=host -v "$LOCAL_NIM_CACHE:/root/.cache" nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3

(you should export the .cache directory from your home directory unless you prefer something else. This will keep the LLMs you download from disappearing if you delete the container)

Once in the container, do the following:

log in to HuggingFace with your own token (free to get)
```
hf auth login
```
then download a model – I haven’t been able to make medium to large models fit even though there would seem to be enough memory; here’s a small one that works
you made need to empty the memory cache as well before running the model
```
hf download gghfez/gemma-3-4b-novision
```
then run the vllm server, picking your port and allowing hosts from other machines

export HF_HOME=/root/.cache/huggingface

vllm serve gghfez/gemma-3-4b-novision --host 0.0.0.0 --port 1234

curl http://localhost:1234/v1/models [from outside the container]

curl http://localhost:1234/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer no-key" -d '{
"messages": [   
    {
        "role": "system",
        "content": "You are an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
    },
    {
        "role": "user",
        "content": "Write a limerick about Python exceptions"
    }
  ]
}'

the model takes a while to load and sucked up lots of memory (or the server did); I’m sure some parameters should be added to adjust this

[added after more experimentation:]

I really don’t know what I’m doing with vllm, but I was able to reduce memory usage and improve performance with a few tweaks. I’ve had much better luck on the Thor with llama.cpp that I built in Nvidia’s latest PyTorch container.

FWIW you might want to fire up the server with this command and parameters. I’m sure others could greatly improve it.

vllm serve gghfez/gemma-3-4b-novision --host 0.0.0.0 --port 1234 --max-model-len 1k --gpu-memory-utilization 0.5 --max-num-batched-tokens 2048 --tensor-parallel-size 1 --enable-prefix-caching

AastaLLL · September 15, 2025, 1:29pm

Hi,

Thanks a lot for sharing the details.
We will release the vLLM container for Thor on NGC, so it’s recommended to use it instead of other toolkits.

@liu.jialu Does the vLLM container also work on your side?

Thanks.

PrinceHal · October 2, 2025, 9:02pm

Well, that was pretty fast:

Announcement of updated vLLM for Thor

I am now testing it. The Llama 3.3 70B model loaded fine, but I haven’t done the benchmark tests to compare this vLLM version with the previous one.

system · October 16, 2025, 9:02pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
求救，运行vllm报错 Jetson Thor camera , generative_ai	3	116	November 17, 2025
Run VLLM in Thor from VLLM Repository Jetson Thor	15	823	November 29, 2025
Announcing new VLLM container & 3.5X increase in Gen AI Performance in just 5 weeks of Jetson AGX Thor Launch Jetson Thor jetson , llama-31-8b-instruct , llama , nemotron	46	2262	December 14, 2025
Issue with run gpt-oss-120b in vLLM Jetson Thor generative_ai	23	1850	October 18, 2025
Run vllm fail Jetson Thor generative_ai	2	251	September 11, 2025
Getting Error in installing vllm error on Jetson orin nx jetpack6.2 Jetson Orin NX jetpack , cuda , pytorch , python	7	618	May 15, 2025
vLLM container 25.10-py3 fails to start Jetson Thor generative_ai	13	321	December 8, 2025
vLLM container out of date for new models DGX Spark / GB10	10	1060	November 14, 2025
求救，Jetson Agx Thor边缘设备安装了cuda13，怎么安装vllm0.11.0版本呢。谢谢解答 Jetson Thor generative_ai	2	53	November 11, 2025
Vllm on Jetson AGX orin Jetson AGX Orin pytorch , generative_ai	9	4182	July 17, 2024

Install vllm in Thor failed

Related topics