-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
- vllm-openai:gptoss official dockerfile
- Running in a k8s cluster with NVIDIA A100 GPU.
- Air gapped environment
🐛 Describe the bug
At startup I observe the following issue
(Normal startup logs)
...
Capturing CUDA graph shapes: 98%|█████████▊| 81/83 [01:40<00:02, 1.27s/it]
Capturing CUDA graph shapes: 99%|█████████▉| 82/83 [01:42<00:01, 1.33s/it]
Capturing CUDA graph shapes: 100%|██████████| 83/83 [01:45<00:00, 1.87s/it]
Capturing CUDA graph shapes: 100%|██████████| 83/83 [01:45<00:00, 1.27s/it]
(VllmWorker pid=418) INFO 08-08 08:01:25 [gpu_model_runner.py:2567] Graph capturing finished in 106 secs, took 0.72 GiB
(EngineCore_0 pid=284) INFO 08-08 08:01:25 [core.py:216] init engine (profile, create kv cache, warmup model) took 196.18 seconds
(EngineCore_0 pid=284) reasoning_end_token_ids [200006, 173781, 200005, 17196, 200008]
(EngineCore_0 pid=284) WARNING 08-08 08:01:26 [core.py:111] Using configured V1 scheduler class vllm.v1.core.sched.async_scheduler.AsyncScheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore_0 pid=284) INFO 08-08 08:01:26 [core.py:156] Batch queue is enabled with size 2
(APIServer pid=1) INFO 08-08 08:01:27 [loggers.py:142] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 154137
(APIServer pid=1) INFO 08-08 08:01:27 [api_server.py:1599] Supported_tasks: ['generate']
(APIServer pid=1) WARNING 08-08 08:01:27 [serving_responses.py:123] For gpt-oss, we ignore --enable-auto-tool-choice and always enable tool use.
(VllmWorker pid=418) INFO 08-08 08:01:57 [multiproc_executor.py:520] Parent process exited, terminating worker
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1855, in run_server_worker
(APIServer pid=1) await init_app_state(engine_client, vllm_config, app.state, args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1657, in init_app_state
(APIServer pid=1) state.openai_serving_responses = OpenAIServingResponses(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_responses.py", line 130, in __init__
(APIServer pid=1) get_stop_tokens_for_assistant_actions())
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/harmony_utils.py", line 187, in get_stop_tokens_for_assistant_actions
(APIServer pid=1) return get_encoding().stop_tokens_for_assistant_actions()
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/harmony_utils.py", line 37, in get_encoding
(APIServer pid=1) _harmony_encoding = load_harmony_encoding(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/openai_harmony/__init__.py", line 670, in load_harmony_encoding
(APIServer pid=1) inner: _PyHarmonyEncoding = _load_harmony_encoding(name)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) openai_harmony.HarmonyError: error downloading or loading vocab file: failed to download or load vocab file
Shouldn't the vLLM image and the openai-harmony package come already with the vocab file? Why does it need to download anything at runtime?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
bbartels and emaadmanzoor
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working