Memory/initalization improvements?

xmountain · November 3, 2025, 5:58pm

So, I’ve gotten the hang of DGX Sparks and the storage amount is incredible! But what is the point of so much storage, but you limited to NIM dockers with Nvidia API for memory efficiency.

When downloading your own models (not ollama 🙄), it appears vllm take up most of the compute and if you add additional computes like vlm models have a hard time running together. in the background, which causes memory errors.

Are their pure DGX Sparks compatible models that are quantized enough to be run togethers or trained enough that i loads fast if ran with as idle?

Summary:

Nice to run more than one model at a time due to slow development time when I unload one model to load another.

If anyone is confused about my question please feel free to ask for futher clarity.

aniculescu · December 12, 2025, 10:28pm

Memory usage is very model specific so I can’t give you an exact answer. We do have NVFP4 quantized models and you can quantize your own model to decrease memory usage if you wish. You can look at our playbook on how to do so: NVFP4 Quantization | DGX Spark

You can also consider stacking multiple stacks for more memory

Topic		Replies	Views
Vllm docker-compose - on DGX Spark from first time user looking for suggestions and question about RAM utilization DGX Spark / GB10 docker	5	459	December 10, 2025
Can I use Ollama or vLLM on the GB10 to run multiple LLM models simultaneously DGX Spark / GB10	8	170	December 13, 2025
vLLM on dual sparks DGX Spark / GB10	3	232	December 1, 2025
Issue with connection to 2 dgx sparks. vllm DGX Spark / GB10	3	86	November 30, 2025
Model Orchestration and Deployment DGX Spark / GB10 nim	4	211	November 24, 2025
DGX Spark vs AMD Strix Halo DGX Spark / GB10 llama	2	1802	October 23, 2025
Second NIM container won't start due to less than desired GPU memory utilization DGX Spark / GB10 docker , nim , llama-31-8b-instruct , llama	10	216	December 3, 2025
I'd like to learn how to use the latest vLLM on DGX Spark DGX Spark / GB10 cuda	9	913	November 29, 2025
Help: Running NVFP4 model on 2x DGX Spark with vLLM + Ray (multi-node) DGX Spark / GB10 mistral-large	10	292	December 8, 2025
Reviews are coming in DGX Spark / GB10	27	5066	November 24, 2025

Memory/initalization improvements?

Related topics