Vibe Coding with NVIDIA DGX Spark

ashreaperone · October 6, 2025, 7:26pm

Has anyone watched the YouTube video Vibe Coding with NVIDIA DGX Spark? If so, what’s your take on it? I’m a bit confused about the video, particularly with regards to speeding it up and DGX Spark’s performance.

cosinus · October 6, 2025, 8:39pm

The video is private. Who published it? NVIDIA? Or some random youtuber?

Confused because it was slow?

ashreaperone · October 6, 2025, 8:41pm

Someone shared it on X. Yes it was slow.

cosinus · October 6, 2025, 9:04pm

Well, without more details on model or software used - could have been an unoptimized driver or inference solution. Bleeding edge…

Seems that will get more reliable reports after the 20th October. That is a date mentioned by some lucky guy who got his order instructions for his reservation today.

foe · October 7, 2025, 2:45pm

Just a reminder for people. Don’t click on links. Copy the link title and paste it in google or youtube whichever fits the situation.
Publicly there is no video listed with that title at the moment of me writing this.

Also vibe coding with DGX Spark would be pretty much the same as with any other inference setup, there’s noting special about it. You can set up with small models and weak hw and give the LLM tools. It really doens’t change much how it functions. Though it will be much faster with Spark, though also with Mac Studio.

Also using Spark for just plain inference is like using exotic sports car for delivering Uber Eats.

prl83 · October 8, 2025, 2:27pm

Here’s the video. https://youtu.be/MLu7TdwJDkw It’s from the official Nvidia Developer account. It’s horrible. They shut off comments bc they were all about how sped up this was, making it clear they have something to hide regarding performance. Lol @ not for inference. What is it for, gaming?

foe · October 8, 2025, 2:38pm

Oh, cool. At least we see the gpt-oss-120b at work. Just what I wanted actually to test with inference. It’s too early for people to judge since noone has one so I don’t really care if they turn off comments.

It’s mostly for proptotyping and dev setups, but it can do a lot else. I bet it’s gonna be used a lot for inference since alternatives would be Mac Studio which would be more expensive with similar spec.

For serious stuff I would probably pick RTX PRO 6000 Blackwell and stuff but yea, I am not Mr. Moneybag so I can’t afford it.

prl83 · October 8, 2025, 2:52pm

It’s certainly not too early to judge, since they’re allegedly selling the thing. Performance is measured with benchmarks, such as tps. If Spark outperformed a Mac Studio they would be shouting it from the rooftop, not trying to mislead with sped up videos and claims that it is not for inference. It appears to be more in the ballpark of a Strix Halo with allegedly better software support at 2x the price. Personally, I decided not to risk it and just bought a 5090.

cosinus · October 8, 2025, 4:13pm

They used Ollama? Seriously? That’s what my kids use at home… 😂

Isn’t it meant to test applications that targeted to be roled out to their big iron?

Well, let’s assume it’s just a simple demo for one of many use cases. 😅

Looking forward for some demos of more serious applications that make use of the full potential of the Blackwell architecture as in Triton TensorRT/vLLM using NVFP4 quants (and their benchmarks).

ashreaperone · October 8, 2025, 6:29pm

I came across a post on X that showcased a NVIDIA DGX Spark demo at the OpenAI Devday. Basically Unsloth AI team used the DGX to demonstrate reinforcement learning using their Unsloth AI framework with the GPT-OSS model.

cosinus · October 8, 2025, 8:51pm

Go beyond out-of-the-box models with gpt-oss, OpenAI’s newest open model series. Discover how gpt-oss lets you adapt, extend, and fine-tune to your needs while combining seamlessly with GPT-5 for flexible, high-impact builds.

Dominik presents one of the first of NVIDIA’s DGX Spark AI Computers on stage.

Video is now online. At the end he pulls the Spark out. Pre Prod Sample.

ashreaperone · October 12, 2025, 8:54pm

Another demo with comments are turned off. smh

PrinceHal · October 12, 2025, 11:05pm

I ordered my Asus Ascent yesterday so I will be interested to see how it goes. However I do have an Nvidia AGX Thor with fairly similar specs (120 GB RAM, Blackwell GPU but somewhat lower grade CPU). I pasted the exact prompt from the video into my VS Code running the same model (gpt-oss 120b) on the Thor and the speed is not astounding but reasonable – based on the output of the vLLM server on the Thor it looks like about 28 tokens per second. Mind you there could be 8 simultanous users all having this as their VS Code Continue back end and the speed would not be much less for each. I didn’t try it with Ollama on the server end, but I could.

By the way, it took about as long for my LLM to start responding as it did in the video; the video speedup didn’t occur until a bit later as the code was streaming out.

Because the memory bandwidth on these devices is 273 GB/s you are not going to get super fast token generation. My M1 Mac Studio Ultra has a 800 GB/s bandwidth and it is notably faster on LLM tests I have done, although vLLM doesn’t run on the Mac so I can’t test multiuser capabilities.

As others have commented, if you really want to cook get the RTX Pro 6000 or use cloud GPUs. For local use on a budget you have to accept a compromise on the speed. High bandwidth memory (HBM) is too expensive for normal home users and it’s also really hard for manufacturers to buy because of supply constraints. It also uses more power. The Thor is pretty power-efficient.

cosinus · October 13, 2025, 9:20am

Nothing that requires a Spark (with128GB RAM).

These models will run perfectly in a regular consumer grade GPU. I assume that in this demo they don’t even run in parallel. Ollama will just do the switching.

cosinus · October 14, 2025, 5:29am

After the last announcement (going on sale on 15th). They published documentation:

There are also recipes for vLLM and Finetuning with Unsloth, Llama Factory, NeMo… that looks more promising. 🧐

ashreaperone · October 14, 2025, 5:32am

Thanks for sharing! I’m planning to mess around with vllm and sglang frameworks…

Topic		Replies	Views
Reviews are coming in DGX Spark / GB10	27	5074	November 24, 2025
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10	18	890	December 4, 2025
DGX Spark vs AMD Strix Halo DGX Spark / GB10 llama	2	1821	October 23, 2025
Suggestion for https://build.nvidia.com/spark/ DGX Spark / GB10	1	247	October 20, 2025
Benchmarking VLMs on the DGX Spark DGX Spark / GB10	6	2697	October 14, 2025
DGX Spark Power Consumption DGX Spark / GB10	2	185	November 3, 2025
DGX Spark Trolls DGX Spark / GB10	5	374	November 6, 2025
When we install an LLM model and start a chat session, the response speed becomes extremely slow DGX Spark / GB10 llama	1	90	December 6, 2025
DGX Spark Image and Video Generation performance? DGX Spark / GB10 performance , generative_ai	0	495	October 9, 2025
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	5	302	December 12, 2025

Vibe Coding with NVIDIA DGX Spark

Related topics