Inconsistent Official Guides

There appears to be an issue with two of the Nvidia guides. The first guide instructed me to run ollama in a container. The second instructed me to install ollama (again) in the spark OS itself (as host). I wonder if the authors of these two guides can suggest which approach is the correct approach. I certainly don’t want my models loaded up twice depending on where I am using these models.

  1. Onboarding / Open WebUI with Ollama

  • Install Open WebUI + Ollama in web container and download gpt oss 20b (in a container).

Use Case / Vibe Coding in VS Code

  • Install Ollama + GPT 120B parameter model outside of a container (on the host).
  • Use continue.dev to point to the new ollama instance.

Well, these are separate guides showcasing different ways to set everything up.
Better not use Ollama at all and use llama.cpp instead - HUGE boost in performance.

Having said that, the guides are not great - there is often conflicting information there, some guides are outdated, and almost all of them (other than ComfyUI one) will result in a sub-par performance.

1 Like

It’s been two months since DGX Spark’s official launch. Some of the Playbooks released in September should be updated to take into consideration community improvements and the recent NVIDIA system updates. It’s been a longer cycle just learning how to ride the bike than actually ridding the bike. These playbooks were meant to help with our on-boarding but I’ve yet to find one other than launching ComfyUI that hasn’t hit a some sort of bottleneck.

1 Like

It is not always evident when a Playbook has been updated to ameliorate some apparent bottleneck. I had written about ways this in a previous post. Hopefully, something along these lines will be done to improve the situation.

2 Likes

Do you know why this is? I thought Ollama was just a wrapper over llama.cpp?

Yes and no. It started as llama.cpp wrapper, but they introduced their own engine at some point. In any case, even llama.cpp parts lag behind llama.cpp main branch a lot.

At this point, there not many reasons to run Ollama. You can use llama.cpp and something like llama-swap to load models on demand.

2 Likes