Skip to content

[Bug]: (gpt-oss-20b) openai_harmony.HarmonyError: error downloading or loading vocab file #22525

@andresC98

Description

@andresC98

Your current environment

  • vllm-openai:gptoss official dockerfile
  • Running in a k8s cluster with NVIDIA A100 GPU.
  • Air gapped environment

🐛 Describe the bug

At startup I observe the following issue

(Normal startup logs)
...
Capturing CUDA graph shapes: 98%|█████████▊| 81/83 [01:40<00:02, 1.27s/it]
Capturing CUDA graph shapes: 99%|█████████▉| 82/83 [01:42<00:01, 1.33s/it]
Capturing CUDA graph shapes: 100%|██████████| 83/83 [01:45<00:00, 1.87s/it]
Capturing CUDA graph shapes: 100%|██████████| 83/83 [01:45<00:00, 1.27s/it]
(VllmWorker pid=418) INFO 08-08 08:01:25 [gpu_model_runner.py:2567] Graph capturing finished in 106 secs, took 0.72 GiB
(EngineCore_0 pid=284) INFO 08-08 08:01:25 [core.py:216] init engine (profile, create kv cache, warmup model) took 196.18 seconds
(EngineCore_0 pid=284) reasoning_end_token_ids [200006, 173781, 200005, 17196, 200008]
(EngineCore_0 pid=284) WARNING 08-08 08:01:26 [core.py:111] Using configured V1 scheduler class vllm.v1.core.sched.async_scheduler.AsyncScheduler. This scheduler interface is not public and compatibility may not be maintained.
(EngineCore_0 pid=284) INFO 08-08 08:01:26 [core.py:156] Batch queue is enabled with size 2
(APIServer pid=1) INFO 08-08 08:01:27 [loggers.py:142] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 154137
(APIServer pid=1) INFO 08-08 08:01:27 [api_server.py:1599] Supported_tasks: ['generate']
(APIServer pid=1) WARNING 08-08 08:01:27 [serving_responses.py:123] For gpt-oss, we ignore --enable-auto-tool-choice and always enable tool use.
(VllmWorker pid=418) INFO 08-08 08:01:57 [multiproc_executor.py:520] Parent process exited, terminating worker
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 50, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1827, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1855, in run_server_worker
(APIServer pid=1) await init_app_state(engine_client, vllm_config, app.state, args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1657, in init_app_state
(APIServer pid=1) state.openai_serving_responses = OpenAIServingResponses(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_responses.py", line 130, in __init__
(APIServer pid=1) get_stop_tokens_for_assistant_actions())
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/harmony_utils.py", line 187, in get_stop_tokens_for_assistant_actions
(APIServer pid=1) return get_encoding().stop_tokens_for_assistant_actions()
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/harmony_utils.py", line 37, in get_encoding
(APIServer pid=1) _harmony_encoding = load_harmony_encoding(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/openai_harmony/__init__.py", line 670, in load_harmony_encoding
(APIServer pid=1) inner: _PyHarmonyEncoding = _load_harmony_encoding(name)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) openai_harmony.HarmonyError: error downloading or loading vocab file: failed to download or load vocab file

Shouldn't the vLLM image and the openai-harmony package come already with the vocab file? Why does it need to download anything at runtime?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions