Vibe Coding with NVIDIA DGX Spark

Has anyone watched the YouTube video Vibe Coding with NVIDIA DGX Spark? If so, what’s your take on it? I’m a bit confused about the video, particularly with regards to speeding it up and DGX Spark’s performance.

The video is private. Who published it? NVIDIA? Or some random youtuber?

Confused because it was slow?

Someone shared it on X. Yes it was slow.

Well, without more details on model or software used - could have been an unoptimized driver or inference solution. Bleeding edge…

Seems that will get more reliable reports after the 20th October. That is a date mentioned by some lucky guy who got his order instructions for his reservation today.

1 Like

Just a reminder for people. Don’t click on links. Copy the link title and paste it in google or youtube whichever fits the situation.
Publicly there is no video listed with that title at the moment of me writing this.

Also vibe coding with DGX Spark would be pretty much the same as with any other inference setup, there’s noting special about it. You can set up with small models and weak hw and give the LLM tools. It really doens’t change much how it functions. Though it will be much faster with Spark, though also with Mac Studio.

Also using Spark for just plain inference is like using exotic sports car for delivering Uber Eats.

1 Like

Here’s the video. https://youtu.be/MLu7TdwJDkw It’s from the official Nvidia Developer account. It’s horrible. They shut off comments bc they were all about how sped up this was, making it clear they have something to hide regarding performance. Lol @ not for inference. What is it for, gaming?

Oh, cool. At least we see the gpt-oss-120b at work. Just what I wanted actually to test with inference. It’s too early for people to judge since noone has one so I don’t really care if they turn off comments.

It’s mostly for proptotyping and dev setups, but it can do a lot else. I bet it’s gonna be used a lot for inference since alternatives would be Mac Studio which would be more expensive with similar spec.

For serious stuff I would probably pick RTX PRO 6000 Blackwell and stuff but yea, I am not Mr. Moneybag so I can’t afford it.

It’s certainly not too early to judge, since they’re allegedly selling the thing. Performance is measured with benchmarks, such as tps. If Spark outperformed a Mac Studio they would be shouting it from the rooftop, not trying to mislead with sped up videos and claims that it is not for inference. It appears to be more in the ballpark of a Strix Halo with allegedly better software support at 2x the price. Personally, I decided not to risk it and just bought a 5090.

They used Ollama? Seriously? That’s what my kids use at home… 😂

Isn’t it meant to test applications that targeted to be roled out to their big iron?

Well, let’s assume it’s just a simple demo for one of many use cases. 😅

Looking forward for some demos of more serious applications that make use of the full potential of the Blackwell architecture as in Triton TensorRT/vLLM using NVFP4 quants (and their benchmarks).

I came across a post on X that showcased a NVIDIA DGX Spark demo at the OpenAI Devday. Basically Unsloth AI team used the DGX to demonstrate reinforcement learning using their Unsloth AI framework with the GPT-OSS model.

3 Likes

Go beyond out-of-the-box models with gpt-oss, OpenAI’s newest open model series. Discover how gpt-oss lets you adapt, extend, and fine-tune to your needs while combining seamlessly with GPT-5 for flexible, high-impact builds.

Dominik presents one of the first of NVIDIA’s DGX Spark AI Computers on stage.

Video is now online. At the end he pulls the Spark out. Pre Prod Sample.

3 Likes

Another demo with comments are turned off. smh

I ordered my Asus Ascent yesterday so I will be interested to see how it goes. However I do have an Nvidia AGX Thor with fairly similar specs (120 GB RAM, Blackwell GPU but somewhat lower grade CPU). I pasted the exact prompt from the video into my VS Code running the same model (gpt-oss 120b) on the Thor and the speed is not astounding but reasonable – based on the output of the vLLM server on the Thor it looks like about 28 tokens per second. Mind you there could be 8 simultanous users all having this as their VS Code Continue back end and the speed would not be much less for each. I didn’t try it with Ollama on the server end, but I could.

By the way, it took about as long for my LLM to start responding as it did in the video; the video speedup didn’t occur until a bit later as the code was streaming out.

Because the memory bandwidth on these devices is 273 GB/s you are not going to get super fast token generation. My M1 Mac Studio Ultra has a 800 GB/s bandwidth and it is notably faster on LLM tests I have done, although vLLM doesn’t run on the Mac so I can’t test multiuser capabilities.

As others have commented, if you really want to cook get the RTX Pro 6000 or use cloud GPUs. For local use on a budget you have to accept a compromise on the speed. High bandwidth memory (HBM) is too expensive for normal home users and it’s also really hard for manufacturers to buy because of supply constraints. It also uses more power. The Thor is pretty power-efficient.

2 Likes

Nothing that requires a Spark (with128GB RAM).

These models will run perfectly in a regular consumer grade GPU. I assume that in this demo they don’t even run in parallel. Ollama will just do the switching.

1 Like

After the last announcement (going on sale on 15th). They published documentation:

There are also recipes for vLLM and Finetuning with Unsloth, Llama Factory, NeMo… that looks more promising. 🧐

1 Like

Thanks for sharing! I’m planning to mess around with vllm and sglang frameworks…