I’ve just received my Spark trying to use it as an appliance, and used xfreerdp3 to connect to it.
I’ve been playing with ollama, using the docker from the playbook using webUI.
I’ve noticed that when I pull a big model like gpt-oss:120b, all other connections seem to fail somehow ( my ssh becomes unresponsive, and my xfreerdp disconnects… ), until the file has finished loading.
Once it is finished loading, everything returns to normal…
Am I the only one experimenting this (meaning it’s my setup, somehow), or is it something else?
(The spark is connected to my wired net though a rj45 at 1000bps. My other linux boxes on the same switch don’t suffer from this… I’ve killed ipv6 on the spark in case it is interfering with something, since the firewall blocks ipv6… )
Something wrong with your setup. No networking problems here, either on DGX OS, or on Fedora 43 - everything is working as expected, no slowdowns even when the network link is saturated.
@dumesnil I get exactely same symptom when I run gpt-oss:120b. I can see that the almost entire 120GB memory is being used up. And I get 5GB speed on Ethernet (NOT WiFi) network. So I just gave up running that model. And I don’t even use Ollama. It’s too slow. I use llama.cpp directly. Same issue!
@eugr , You don’t get the same symptom? Did you do any optimization, like Swap turned off, or persistent mode off.. anything?
@abull I thought NVidia Sync is just a fancy UI using SSH in the backend. If the SSH get disconnected, how’s the NVidia Sync gonna work?
@eugr hmmm ok, I can accept that something is messed up on my side… like… the cable and the wifi are connected (wifi is still on from first bootup), so maybe the routes get mixed up somehow even tho they have different priorities… Also my setup doesn’t allow ipv6 on the local network (firewalls and such beyond my control)… I’ll investigate a bit and try to sniff whatever is happening if nothing works out ok…
@Neurfer this is interesting. I’ve noticed that too, even though the model is only around 60GB big… I’ve also noticed (correct me if I’m wrong) that no swap is setup, so everything is bound to be held in memory. (EDIT: there is no swap partition, but there is a swapfile…. 16GB…. I’ve extended it to 80GB just in case - it’s slower than memory, but better than nothing)
For the moment, I’m trying to use a many official playbooks as possible before tweaking the OS too much.
[btw, THIS, is the reason I’m using xfreerdp in the first place: the whole nvidia AI stack only agrees to run on windows, mac os, and ubuntu ( I’m on debian, the installer won’t let me run anything because it doesn’t recognize my debian as a valid platform, so I turned to the spark to handle the ubuntu side of things, install the stack etc… that’s how I noticed the sluggish behaviour).
I was planning on using the spark as some sort of portable appliance that would fit in my backpack, but this idea seems to be just a dream… I was wondering if there was any way to just connect the usbc of the spark to the usbc of my linux laptop and turn the spark into a local workhorse, but from what I read elsewhere, it seems to be a misconception on my part…
It’s a bit too soon for that though… I need to test stuffs beforehand…]
How do you run gpt-oss? Which version? What llama-server command? It should take about 65GB only, even with full context - if it takes 120 on your system, something is not right.
NVIDIA Sync does install on Debian - see the instruction in the playbook
Step 1: Set Up Local Network Access | DGX Spark
Step 2: for other things you can do with NVIDIA-sync.
Are you having a performance issue with a DGX Spark playbook, NGC container or other resource?
BTW: no firewall configured on the DGX spark by default