What a missed opportunity for nvidia

josephbreda · December 4, 2025, 6:00pm

I get the vision for the Spark, I really do. Sure I could build some multi-gpu system, install a distro, get drivers working and start loading frameworks. The appeal of the Spark was skipping that nonsense and having a great OOBE that runs the latest a platforms without fussing around.

The hardware largely delivers. Sure, there are nits but GB10 seems up to the task.

For the life of me I can’t understand how they failed so badly at having the software sorted for launch. Sitting here a month after opening the box software support is still a mess. The developer QuickStarts on build.nvidia.com are brilliant in concept, but they often don’t work despite covering very simple scenarios. Regardless of the stack you are using, everything seems out of sync and not compiled or packaged for the system, vLLM support? Broken. TRT-LLM support – good luck. Running nvfp4 quant models from HF – keep dreaming. In theory docker packages should help, but they don’t. How many times have you tried to run something that doesn’t recognize SM121 and fails?

This is a company with a 4.5 trillion market cap. How do they fail on a product launch with such high expectations? One could argue “they only care about data center buildouts” and are not investing any developer cycles toward getting the Spark to deliver on its promises. To which I say fine, but then why bother shipping the thing in the first place?

Please fix it.

/end of rant

eugr · December 4, 2025, 7:02pm

Given that consumer Blackwell support is still lagging a bit, almost a year since 5090 launch, I’m not that surprised.

But they could at least come up with the correct playbooks, that don’t end up in a broken state or result in a subpar performance, like their clustering ones.

It took several weeks of community effort here to get vllm working reasonably well, in both standalone and cluster configuration. Something that NVIDIA should have done on their own.

There was some help from NVIDIA folks in these forums - special thanks to @johnny_nv actively participating in the threads and submitting PRs for vllm and others.

But I wish they spent a fraction of their marketing budget to dedicate more engineers to this product. Or hire outside help.

raphael.amorim · December 4, 2025, 7:17pm

Yeah, thanks a lot to you @eugr for spending a ton of your time on the cluster setup and debugging. My second spark is only coming tomorrow. Missed all the fun :(
But on the bright side, there’s an opportunity out there to really evolving the platform, and I heard there’s going to be some movement in regards to engaging differently with the GB10 community. By the way, did they announce if the GB10 is going to be supported on DGX Cloud Lepton’s BYOC?

eugr · December 4, 2025, 7:28pm

I feel like there is a way to squeeze a bit more performance out of NCCL, but I spent waay too much time on this, so I hope someone figures this out.

johnny_nv · December 4, 2025, 7:47pm

hello,
vllm 0.12.0 was released.

thanks @shahizat to ping me

eugr · December 4, 2025, 8:06pm

But if I build from the main branch, I already have all these, right?

johnny_nv · December 4, 2025, 8:08pm

yes,
we are adding full support very soon, during following weeks.
Thanks for your feedback!

eugr · December 4, 2025, 8:23pm

That would be great! Do you think it would be possible to port over changes from SGLang re: MXFP4 performance?

Once I finally got the cluster setup working full speed, I’m getting ~75 t/s on gpt-oss-120b in cluster and ~52 t/s on a single node in SGLang vs. 55 t/s and 36 t/s on VLLM.

I haven’t spent much time dissecting it, but looks like most of the important changes are actually in the Triton itself.

They haven’t merged any of this into either SGLang or Triton though, the spark container is built from one of the forks that is significantly behind the main branch now.

_cjg · December 4, 2025, 11:45pm

You mean there’s hope that they’ll spend more than five minutes in the forum, community members will be taken seriously, and errors in the playbooks won’t remain there for weeks? Awsome!

PrinceHal · December 5, 2025, 3:56am

You know, I have one simple request and that is to have Nvidia’s Spark playbooks for Spark run on my two new Sparks. Can you remind me what I pay you people for, honestly, throw me a bone here!

There are 19 playbooks, all on github. One allegedly takes 2 hours to work through, the rest either 1/2 or 1 hour. As a community, let’s make a list of the ones that work. Shouldn’t be a long list.

PrinceHal · December 5, 2025, 4:00am

For the moment, these playbooks (from Mark Ramsey, based on some great work by @eugr are fully functional if you follow the README instructions.

vllm

sglang

tensorRT-LLM

4 out of 5 doctors recommend them.

eugr · December 5, 2025, 5:21am

I’ll just leave a link to my repo here: GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
It’s pretty barebones and won’t automate everything like Mark’s, but could be useful as a reference for those who have some experience with vllm on other platforms, and just want an example of working setup. Or for those who want to use the main branch.

PrinceHal · December 5, 2025, 5:40pm

I’ve used @eugr ‘s repo and confirm that it works.

antonibertel · December 8, 2025, 10:57pm

I think it’s very well said. It tanks because of OSS community adoption. I’m currently trying to reverse engineer how Nvidia built their containers and what version of pytorch-triton they use with sm_121a - it’s unclear whether it’s public or internally built, and this should have been completely open and documented. I know Nvidia has smart people who thought of this ahead of time and distributed lots of devices to open-source framework builders, but apparently not to the right ones, or not at right scale.

eugr · December 8, 2025, 11:03pm

Just use the latest pytorch wheels for cu130 - see the Dockerfile in my repository that I linked above for the details. When it comes to vllm, etc, they actually contribute to the mainstream, so if you use the latest, you will have all of that.

SGLang spark version is a slightly different story, because the person who’s done GPT-OSS optimizations for Spark at LMSys used their own fork and patched both SGLang and Triton, and those changes are not merged into main yet.

antonibertel · December 8, 2025, 11:11pm

Yep, thank you, that was helpful. I also pulled an official image and looked inside. I run another kind of model on it - VLA (robotics stuff) - so it just creates a few more complexities to make it work with the ecosystem.

Might be helpful to look at Nvidia build:
docker pull nvcr.io/nvidia/pytorch:25.11-py3
NVIDIA NGC Container
CUDA: 13.0, V13.0.88

torch: 2.10.0a0+b558c986e8.nv25.11
pytorch-triton: 3.5.0+gitde3506d2
triton_kernels: 1.0.0+nv25.11
transformer_engine: 2.9.0+70f53666
flash_attn: 2.7.4.post1+25.11
torchvision: 0.25.0a0+7a13ad0f
torch_tensorrt: 2.10.0a0
tensorrt: 10.14.1.48
torchao: 0.14.0+git
nvidia-modelopt: 0.37.0
nvidia-cudnn-frontend: 1.15.0
nvidia-dali-cuda130: 1.52.0
numpy: 2.1.0
safetensors: 0.6.2
tokenizers: 0.22.1

Topic		Replies	Views
Suggestion for https://build.nvidia.com/spark/ DGX Spark / GB10	1	257	October 20, 2025
New bleeding-edge vLLM Docker Image: avarok/vllm-nvfp4-gb10-sm120 DGX Spark / GB10 Projects	34	496	December 17, 2025
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	32	333	December 22, 2025
I'd like to learn how to use the latest vLLM on DGX Spark DGX Spark / GB10 cuda	9	1086	November 29, 2025
vLLM >= 0.12 on DGX Spark? DGX Spark / GB10 cuda , containers , llm , dgx	1	105	December 16, 2025
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	17	593	December 20, 2025
vLLM container out of date for new models DGX Spark / GB10	10	1189	November 14, 2025
NIM LLM Containers Fail on DGX Spark (GB10): Triton/vLLM Crash on sm_121 and NGC Permission Errors DGX Spark / GB10 jetson , nim , llama , nemotron	2	152	December 3, 2025
Run VLLM in Spark DGX Spark / GB10	120	5382	December 23, 2025
Has anyone tried an alternative Linux distro? DGX Spark / GB10	60	1636	December 8, 2025

What a missed opportunity for nvidia

Related topics