I get the vision for the Spark, I really do. Sure I could build some multi-gpu system, install a distro, get drivers working and start loading frameworks. The appeal of the Spark was skipping that nonsense and having a great OOBE that runs the latest a platforms without fussing around.
The hardware largely delivers. Sure, there are nits but GB10 seems up to the task.
For the life of me I can’t understand how they failed so badly at having the software sorted for launch. Sitting here a month after opening the box software support is still a mess. The developer QuickStarts on build.nvidia.com are brilliant in concept, but they often don’t work despite covering very simple scenarios. Regardless of the stack you are using, everything seems out of sync and not compiled or packaged for the system, vLLM support? Broken. TRT-LLM support – good luck. Running nvfp4 quant models from HF – keep dreaming. In theory docker packages should help, but they don’t. How many times have you tried to run something that doesn’t recognize SM121 and fails?
This is a company with a 4.5 trillion market cap. How do they fail on a product launch with such high expectations? One could argue “they only care about data center buildouts” and are not investing any developer cycles toward getting the Spark to deliver on its promises. To which I say fine, but then why bother shipping the thing in the first place?
Given that consumer Blackwell support is still lagging a bit, almost a year since 5090 launch, I’m not that surprised.
But they could at least come up with the correct playbooks, that don’t end up in a broken state or result in a subpar performance, like their clustering ones.
It took several weeks of community effort here to get vllm working reasonably well, in both standalone and cluster configuration. Something that NVIDIA should have done on their own.
There was some help from NVIDIA folks in these forums - special thanks to @johnny_nv actively participating in the threads and submitting PRs for vllm and others.
But I wish they spent a fraction of their marketing budget to dedicate more engineers to this product. Or hire outside help.
Yeah, thanks a lot to you @eugr for spending a ton of your time on the cluster setup and debugging. My second spark is only coming tomorrow. Missed all the fun :(
But on the bright side, there’s an opportunity out there to really evolving the platform, and I heard there’s going to be some movement in regards to engaging differently with the GB10 community. By the way, did they announce if the GB10 is going to be supported on DGX Cloud Lepton’s BYOC?
That would be great! Do you think it would be possible to port over changes from SGLang re: MXFP4 performance?
Once I finally got the cluster setup working full speed, I’m getting ~75 t/s on gpt-oss-120b in cluster and ~52 t/s on a single node in SGLang vs. 55 t/s and 36 t/s on VLLM.
I haven’t spent much time dissecting it, but looks like most of the important changes are actually in the Triton itself.
They haven’t merged any of this into either SGLang or Triton though, the spark container is built from one of the forks that is significantly behind the main branch now.
You mean there’s hope that they’ll spend more than five minutes in the forum, community members will be taken seriously, and errors in the playbooks won’t remain there for weeks? Awsome!
You know, I have one simple request and that is to have Nvidia’s Spark playbooks for Spark run on my two new Sparks. Can you remind me what I pay you people for, honestly, throw me a bone here!
There are 19 playbooks, all on github. One allegedly takes 2 hours to work through, the rest either 1/2 or 1 hour. As a community, let’s make a list of the ones that work. Shouldn’t be a long list.
I’ll just leave a link to my repo here: GitHub - eugr/spark-vllm-docker: Docker configuration for running VLLM on dual DGX Sparks
It’s pretty barebones and won’t automate everything like Mark’s, but could be useful as a reference for those who have some experience with vllm on other platforms, and just want an example of working setup. Or for those who want to use the main branch.
I think it’s very well said. It tanks because of OSS community adoption. I’m currently trying to reverse engineer how Nvidia built their containers and what version of pytorch-triton they use with sm_121a - it’s unclear whether it’s public or internally built, and this should have been completely open and documented. I know Nvidia has smart people who thought of this ahead of time and distributed lots of devices to open-source framework builders, but apparently not to the right ones, or not at right scale.
Just use the latest pytorch wheels for cu130 - see the Dockerfile in my repository that I linked above for the details. When it comes to vllm, etc, they actually contribute to the mainstream, so if you use the latest, you will have all of that.
SGLang spark version is a slightly different story, because the person who’s done GPT-OSS optimizations for Spark at LMSys used their own fork and patched both SGLang and Triton, and those changes are not merged into main yet.
Yep, thank you, that was helpful. I also pulled an official image and looked inside. I run another kind of model on it - VLA (robotics stuff) - so it just creates a few more complexities to make it work with the ecosystem.
Might be helpful to look at Nvidia build:
docker pull nvcr.io/nvidia/pytorch:25.11-py3 NVIDIA NGC Container
CUDA: 13.0, V13.0.88