Inconsistent Official Guides

LuckyChap · November 29, 2025, 10:29pm

There appears to be an issue with two of the Nvidia guides. The first guide instructed me to run ollama in a container. The second instructed me to install ollama (again) in the spark OS itself (as host). I wonder if the authors of these two guides can suggest which approach is the correct approach. I certainly don’t want my models loaded up twice depending on where I am using these models.

Onboarding / Open WebUI with Ollama

Install Open WebUI + Ollama in web container and download gpt oss 20b (in a container).

Use Case / Vibe Coding in VS Code

Install Ollama + GPT 120B parameter model outside of a container (on the host).

Use continue.dev to point to the new ollama instance.

eugr · November 30, 2025, 12:40am

Well, these are separate guides showcasing different ways to set everything up.
Better not use Ollama at all and use llama.cpp instead - HUGE boost in performance.

Having said that, the guides are not great - there is often conflicting information there, some guides are outdated, and almost all of them (other than ComfyUI one) will result in a sub-par performance.

maiia · November 30, 2025, 1:23am

It’s been two months since DGX Spark’s official launch. Some of the Playbooks released in September should be updated to take into consideration community improvements and the recent NVIDIA system updates. It’s been a longer cycle just learning how to ride the bike than actually ridding the bike. These playbooks were meant to help with our on-boarding but I’ve yet to find one other than launching ComfyUI that hasn’t hit a some sort of bottleneck.

la15 · November 30, 2025, 2:23am

It is not always evident when a Playbook has been updated to ameliorate some apparent bottleneck. I had written about ways this in a previous post. Hopefully, something along these lines will be done to improve the situation.

DannyTup · November 30, 2025, 10:03am

Do you know why this is? I thought Ollama was just a wrapper over llama.cpp?

eugr · November 30, 2025, 9:14pm

Yes and no. It started as llama.cpp wrapper, but they introduced their own engine at some point. In any case, even llama.cpp parts lag behind llama.cpp main branch a lot.

At this point, there not many reasons to run Ollama. You can use llama.cpp and something like llama-swap to load models on demand.

Topic		Replies	Views
Pre-installed Ollama Configuration DGX Spark / GB10	10	518	November 24, 2025
Very poor performance with Ollama on DGX Spark – looking for help DGX Spark / GB10 Projects	5	302	December 12, 2025
GDX Spark is extremely slow on a short LLM test DGX Spark / GB10	18	890	December 4, 2025
OpenWebUI With Ollama Playbook Issues/Questions DGX Spark / GB10	1	124	November 10, 2025
Reviews are coming in DGX Spark / GB10	27	5074	November 24, 2025
Tutorial: Build llama.cpp from source and run Qwen3 235B DGX Spark / GB10 Projects llama	25	986	December 12, 2025
Errors & Improvements for initial Setup DGX Spark / GB10 system-setup	0	74	November 7, 2025
Model Orchestration and Deployment DGX Spark / GB10 nim	4	215	November 24, 2025
Can I use Ollama or vLLM on the GB10 to run multiple LLM models simultaneously DGX Spark / GB10	8	171	December 13, 2025
Introducing Ollama Support for Jetson Devices Jetson Projects cuda , natural-language-processing-nlp , artificialintelligence , interactive , docker-machine-learning , generative_ai	29	13058	August 28, 2024

Inconsistent Official Guides

Onboarding / Open WebUI with Ollama

Use Case / Vibe Coding in VS Code

Related topics