求救,运行vllm报错

(llm311) nvidia@localhost:~$ python -m vllm.entrypoints.openai.api_server \

--model /data/lqy/qwen/Qwen3-VL-8B-Instruct \
--served-model-name Qwen3-VL-8B-Instruct \
--host 0.0.0.0 \
--port 9000 \
--gpu-memory-utilization 0.7 \
--skip-mm-profiling

INFO 11-14 11:34:33 [init.py:216] Automatically detected platform cuda.
(APIServer pid=2490067) INFO 11-14 11:34:34 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=2490067) INFO 11-14 11:34:34 [utils.py:233] non-default args: {‘host’: ‘0.0.0.0’, ‘port’: 9000, ‘model’: ‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, ‘served_model_name’: [‘Qwen3-VL-8B-Instruct’], ‘gpu_memory_utilization’: 0.7, ‘skip_mm_profiling’: True}
(APIServer pid=2490067) INFO 11-14 11:34:34 [model.py:547] Resolved architecture: Qwen3VLForConditionalGeneration
(APIServer pid=2490067) torch_dtype is deprecated! Use dtype instead!
(APIServer pid=2490067) INFO 11-14 11:34:34 [model.py:1510] Using max model len 262144
(APIServer pid=2490067) INFO 11-14 11:34:35 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 11-14 11:34:39 [init.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=2490429) INFO 11-14 11:34:41 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=2490429) INFO 11-14 11:34:41 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model=‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, speculative_config=None, tokenizer=‘/data/lqy/qwen/Qwen3-VL-8B-Instruct’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-VL-8B-Instruct, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“”,“backend”:“”,“custom_ops”:,“splitting_ops”:[“vllm.unified_attention”,“vllm.unified_attention_with_output”,“vllm.mamba_mixer2”,“vllm.mamba_mixer”,“vllm.short_conv”,“vllm.linear_attention”,“vllm.plamo2_mamba_mixer”,“vllm.gdn_attention”,“vllm.sparse_attn_indexer”],“use_inductor”:true,“compile_sizes”:,“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“cudagraph_mode”:[2,1],“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“use_inductor_graph_partition”:false,“pass_config”:{},“max_capture_size”:512,“local_cache_dir”:null}
(EngineCore_DP0 pid=2490429) /data/conda/envs/llm311/lib/python3.11/site-packages/torch/cuda/init.py:326: UserWarning:
(EngineCore_DP0 pid=2490429) NVIDIA Thor with CUDA capability sm_110 is not compatible with the current PyTorch installation.
(EngineCore_DP0 pid=2490429) The current PyTorch install supports CUDA capabilities sm_80 sm_90 sm_100 sm_120.
(EngineCore_DP0 pid=2490429) If you want to use the NVIDIA Thor GPU with PyTorch, please check the instructions at Get Started
(EngineCore_DP0 pid=2490429)
(EngineCore_DP0 pid=2490429) warnings.warn(
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 54, in _init_executor
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.collective_rpc(“init_device”)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 83, in collective_rpc
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/worker/worker_base.py”, line 259, in init_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py”, line 161, in init_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] current_platform.set_device(self.device)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/platforms/cuda.py”, line 83, in set_device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] _ = torch.zeros(1, device=device)
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=2490429) ERROR 11-14 11:34:43 [core.py:708]
(EngineCore_DP0 pid=2490429) Process EngineCore_DP0:
(EngineCore_DP0 pid=2490429) Traceback (most recent call last):
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/multiprocessing/process.py”, line 314, in _bootstrap
(EngineCore_DP0 pid=2490429) self.run()
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/multiprocessing/process.py”, line 108, in run
(EngineCore_DP0 pid=2490429) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 712, in run_engine_core
(EngineCore_DP0 pid=2490429) raise e
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 699, in run_engine_core
(EngineCore_DP0 pid=2490429) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 498, in init
(EngineCore_DP0 pid=2490429) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core.py”, line 83, in init
(EngineCore_DP0 pid=2490429) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/executor_base.py”, line 54, in init
(EngineCore_DP0 pid=2490429) self._init_executor()
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 54, in _init_executor
(EngineCore_DP0 pid=2490429) self.collective_rpc(“init_device”)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py”, line 83, in collective_rpc
(EngineCore_DP0 pid=2490429) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 3122, in run_method
(EngineCore_DP0 pid=2490429) return func(*args, **kwargs)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/worker/worker_base.py”, line 259, in init_device
(EngineCore_DP0 pid=2490429) self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py”, line 161, in init_device
(EngineCore_DP0 pid=2490429) current_platform.set_device(self.device)
(EngineCore_DP0 pid=2490429) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/platforms/cuda.py”, line 83, in set_device
(EngineCore_DP0 pid=2490429) _ = torch.zeros(1, device=device)
(EngineCore_DP0 pid=2490429) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2490429) torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
(EngineCore_DP0 pid=2490429) CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=2490429) For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=2490429) Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
(EngineCore_DP0 pid=2490429)
(APIServer pid=2490067) Traceback (most recent call last):
(APIServer pid=2490067) File “”, line 198, in _run_module_as_main
(APIServer pid=2490067) File “”, line 88, in _run_code
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1953, in
(APIServer pid=2490067) uvloop.run(run_server(args))
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/uvloop/init.py”, line 92, in run
(APIServer pid=2490067) return runner.run(wrapper())
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/asyncio/runners.py”, line 118, in run
(APIServer pid=2490067) return self._loop.run_until_complete(task)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/uvloop/init.py”, line 48, in wrapper
(APIServer pid=2490067) return await main
(APIServer pid=2490067) ^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1884, in run_server
(APIServer pid=2490067) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 1902, in run_server_worker
(APIServer pid=2490067) async with build_async_engine_client(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 210, in aenter
(APIServer pid=2490067) return await anext(self.gen)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 180, in build_async_engine_client
(APIServer pid=2490067) async with build_async_engine_client_from_engine_args(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 210, in aenter
(APIServer pid=2490067) return await anext(self.gen)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py”, line 225, in build_async_engine_client_from_engine_args
(APIServer pid=2490067) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/utils/init.py”, line 1572, in inner
(APIServer pid=2490067) return fn(*args, **kwargs)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py”, line 207, in from_vllm_config
(APIServer pid=2490067) return cls(
(APIServer pid=2490067) ^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py”, line 134, in init
(APIServer pid=2490067) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 102, in make_async_mp_client
(APIServer pid=2490067) return AsyncMPClient(*client_args)
(APIServer pid=2490067) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 769, in init
(APIServer pid=2490067) super().init(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/core_client.py”, line 448, in init
(APIServer pid=2490067) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/contextlib.py”, line 144, in exit
(APIServer pid=2490067) next(self.gen)
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/utils.py”, line 732, in launch_core_engines
(APIServer pid=2490067) wait_for_engine_startup(
(APIServer pid=2490067) File “/data/conda/envs/llm311/lib/python3.11/site-packages/vllm/v1/engine/utils.py”, line 785, in wait_for_engine_startup
(APIServer pid=2490067) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=2490067) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(llm311) nvidia@localhost:~$
我在conda通过cudatoolkit下载了cuda12.9,cuda环境下安装torch12.8,python3.11,vllm0.11.0
求救大佬,这个怎么解决。谢谢回答。

Are you on Thor? I don’t know conda but you may need to update it to python3.12 cuda13.0 torch2.9.

This post may help; it discusses a not-conda method to install vllm 11.0 on Thor.

Run VLLM in Thor from VLLM Repository

Hi,

You will need to use CUDA 13.0+ to support Thor.
Please use the vLLM container from our NGC link below for compatibility.

We will upgrade the vLLM branch to 0.11.0 in the upcoming release.
Please wait our the new container for Qwen3 instead.

Thanks.

1 Like

好的,谢谢,已经解决了,能够推理了