求救，运行vllm报错

903592775 · November 14, 2025, 3:47am

(llm311) nvidia@localhost:~$ python -m vllm.entrypoints.openai.api_server \

--model /data/lqy/qwen/Qwen3-VL-8B-Instruct \
--served-model-name Qwen3-VL-8B-Instruct \
--host 0.0.0.0 \
--port 9000 \
--gpu-memory-utilization 0.7 \
--skip-mm-profiling

INFO 11-14 11:34:33 [init.py:216] Automatically detected platform cuda.
(APIServer pid=2490067) INFO 11-14 11:34:34 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=2490067) INFO 11-14 11:34:34 [utils.py:233] non-default args: {‘host’: ‘0.0.0.0’, ‘port’: 9000, ‘model’: ‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, ‘served_model_name’: [‘Qwen3-VL-8B-Instruct’], ‘gpu_memory_utilization’: 0.7, ‘skip_mm_profiling’: True}
(APIServer pid=2490067) INFO 11-14 11:34:34 [model.py:547] Resolved architecture: Qwen3VLForConditionalGeneration
(APIServer pid=2490067) torch_dtype is deprecated! Use dtype instead!
(APIServer pid=2490067) INFO 11-14 11:34:34 [model.py:1510] Using max model len 262144
(APIServer pid=2490067) INFO 11-14 11:34:35 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 11-14 11:34:39 [init.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=2490429) INFO 11-14 11:34:41 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=2490429) INFO 11-14 11:34:41 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model=‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, speculative_config=None, tokenizer=‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-VL-8B-Instruct, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“”,“backend”:“”,“custom_ops”:,“splitting_ops”:[“vllm.unified_attention”,“vllm.unified_attention_with_output”,“vllm.mamba_mixer2”,“vllm.mamba_mixer”,“vllm.short_conv”,“vllm.linear_attention”,“vllm.plamo2_mamba_mixer”,“vllm.gdn_attention”,“vllm.sparse_attn_indexer”],“use_inductor”:true,“compile_sizes”:,“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“cudagraph_mode”:[2,1],“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“use_inductor_graph_partition”:false,“pass_config”:{},“max_capture_size”:512,“local_cache_dir”:null}
(EngineCore_DP0 pid=2490429) /data/conda/envs/llm311/lib/python3.11/site-packages/torch/cuda/init.py:326: UserWarning:
(EngineCore_DP0 pid=2490429) NVIDIA Thor with CUDA capability sm_110 is not compatible with the current PyTorch installation.
(EngineCore_DP0 pid=2490429) The current PyTorch install supports CUDA capabilities sm_80 sm_90 sm_100 sm_120.
(EngineCore_DP0 pid=2490429) If you want to use the NVIDIA Thor GPU with PyTorch, please check the instructions at Get Started
(EngineCore_DP0 pid=2490429)
(EngineCore_DP0 pid=2490429) warnings.warn(
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 54, in _init_executor
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.collective_rpc(“init_device”)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 83, in collective_rpc
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/worker/worker_base.py”, line 259, in init_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py”, line 161, in init_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] current_platform.set_device(self.device)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/platforms/cuda.py”, line 83, in set_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] _ = torch.zeros(1, device=device)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708]
(EngineCore_DP0 pid=2490429) Process EngineCore_DP0:
(EngineCore_DP0 pid=2490429) Traceback (most recent call last):
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=2490429) self.run()
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=2490429) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 712, in run_engine_core
(EngineCore_DP0 pid=2490429) raise e
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=2490429) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=2490429) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=2490429) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=2490429) self._init_executor()
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 54, in _init_executor
(EngineCore_DP0 pid=2490429) self.collective_rpc(“init_device”)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 83, in collective_rpc
(EngineCore_DP0 pid=2490429) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=2490429) return func(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/worker/worker_base.py”, line 259, in init_device
(EngineCore_DP0 pid=2490429) self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py”, line 161, in init_device
(EngineCore_DP0 pid=2490429) current_platform.set_device(self.device)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/platforms/cuda.py”, line 83, in set_device
(EngineCore_DP0 pid=2490429) _ = torch.zeros(1, device=device)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
(EngineCore_DP0 pid=2490429) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=2490429) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=2490429) Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=2490429)
(APIServer pid=2490067) Traceback (most recent call last):
(APIServer pid=2490067) File “”, line 198, in _run_module_as_main
(APIServer pid=2490067) File “”, line 88, in _run_code
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1953, in
(APIServer pid=2490067) uvloop.run(run_server(args))
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/uvloop/init.py”, line 92, in run
(APIServer pid=2490067) return runner.run(wrapper())
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/asyncio/runners.py”, line 118, in run
(APIServer pid=2490067) return self._loop.run_until_complete(task)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=2490067) return await main
(APIServer pid=2490067) ^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1884, in run_server
(APIServer pid=2490067) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1902, in run_server_worker
(APIServer pid=2490067) async with build_async_engine_client(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 210, in aenter
(APIServer pid=2490067) return await anext(self.gen)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 180, in build_async_engine_client
(APIServer pid=2490067) async with build_async_engine_client_from_engine_args(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 210, in aenter
(APIServer pid=2490067) return await anext(self.gen)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 225, in build_async_engine_client_from_engine_args
(APIServer pid=2490067) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 1572, in inner
(APIServer pid=2490067) return fn(*args, **kwargs)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py”, line 207, in from_vllm_config
(APIServer pid=2490067) return cls(
(APIServer pid=2490067) ^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py”, line 134, in init
(APIServer pid=2490067) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 102, in make_async_mp_client
(APIServer pid=2490067) return AsyncMPClient(*client_args)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 769, in init
(APIServer pid=2490067) super().init(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 448, in init
(APIServer pid=2490067) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 144, in exit
(APIServer pid=2490067) next(self.gen)
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/utils.py”, line 732, in launch_core_engines
(APIServer pid=2490067) wait_for_engine_startup(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/utils.py”, line 785, in wait_for_engine_startup
(APIServer pid=2490067) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=2490067) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(llm311) nvidia@localhost:~$
我在conda通过cudatoolkit下载了cuda12.9，cuda环境下安装torch12.8，python3.11，vllm0.11.0
求救大佬，这个怎么解决。谢谢回答。

whitesscott · November 15, 2025, 7:32am

Are you on Thor? I don’t know conda but you may need to update it to python3.12 cuda13.0 torch2.9.

This post may help; it discusses a not-conda method to install vllm 11.0 on Thor.

Run VLLM in Thor from VLLM Repository

AastaLLL · November 17, 2025, 2:17am

Hi,

You will need to use CUDA 13.0+ to support Thor.
Please use the vLLM container from our NGC link below for compatibility.

We will upgrade the vLLM branch to 0.11.0 in the upcoming release.
Please wait our the new container for Qwen3 instead.

Thanks.

903592775 · November 17, 2025, 2:40am

好的，谢谢，已经解决了，能够推理了

Topic		Replies	Views
Install vllm in Thor failed Jetson Thor generative_ai	6	793	October 16, 2025
Run VLLM in Thor from VLLM Repository Jetson Thor	15	822	November 29, 2025
Run vllm fail Jetson Thor generative_ai	2	251	September 11, 2025
Announcing new VLLM container & 3.5X increase in Gen AI Performance in just 5 weeks of Jetson AGX Thor Launch Jetson Thor jetson , llama-31-8b-instruct , llama , nemotron	46	2259	December 14, 2025
vLLM container 25.10-py3 fails to start Jetson Thor generative_ai	13	321	December 8, 2025
求救，Jetson Agx Thor边缘设备安装了cuda13，怎么安装vllm0.11.0版本呢。谢谢解答 Jetson Thor generative_ai	2	53	November 11, 2025
Issue with run gpt-oss-120b in vLLM Jetson Thor generative_ai	23	1846	October 18, 2025
Thor开发板上测试vllm失败 Jetson Thor generative_ai	9	185	November 5, 2025
vLLM container out of date for new models DGX Spark / GB10	10	1059	November 14, 2025
Run VLLM in Spark DGX Spark / GB10	93	4102	December 9, 2025

求救，运行vllm报错

Related topics