In the “Install and Use vLLM for Inference” guide for Spark, the nvcr.io/nvidia/vllm:25.09-py3 image packages vllm v0.10.1.
Subsequent upstream releases of v0.10.2 and v0.11.0 brought compatibility for major new models such as Qwen3-Next and Qwen3-VL that I was hoping to use.
From reading online, I understand this is the very first release of an nvidia-specific vllm container, so my first question is - what can we expect for a release cadence to integrate upstream releases?
The second question would be - is there any workaround process to setup arbitrary vllm versions on Spark? I tried naively upgrading vllm inside the docker image, but this broke CUDA compatibility immediately and it got lost looking for libcudart.so.12.
I was similarly unsuccessful on launching with the primary vllm project’s docker image, with various CUDA library issues (tried with both Cuda13 and Cuda12).
Appreciate any help or insight on future releases - thanks!
Nice, thank you! I ended up building new VLLM from main branch inside the provided container, and it works just fine, but nice to have an option to build it on the host system. I guess I was missing some environment variables when building it on the host system.
Thanks for this, these are nice clear instructions.
I found that I needed to include, following the hints, “–prerelease=allow” in the build command on step 3 above. Otherwise it complained about issues with flashinfer-python and apache-tvm-ffi versions.
I’m curious, for TORCH_CUDA_ARCH_LIST, I’ve previously only seen “12.1” as the option for the GB10. I haven’t stumbled on any documentation listing “12.1a”. Where can I find further documentation on this one?
I’ve been through several iterations of trying to get a venv that has all the right dependencies to run vllm and keep ultimately bumping up against:
ImportError: /opt/vllm/vllm/_C.abi3.so: undefined symbol: _Z20cutlass_moe_mm_sm100RN2at6TensorERKS0_S3_S3_S3_S3_S3_S3_S3_S3_bb
As I did with when I reached the end of these steps. Any additional suggestions? Is this potentially not related and I should open a separate thread?
Apparently, you need to specify 12.0f for CUDA 13: source.
Also, to avoid undefined symbol errors, you need to apply the patch from that unmerged pull request (although if you set 12.0f, I’m not sure if you still need the patch - I’m trying to compile it without it now. EDIT: nope, still needed):