Memory/initalization improvements?

So, I’ve gotten the hang of DGX Sparks and the storage amount is incredible! But what is the point of so much storage, but you limited to NIM dockers with Nvidia API for memory efficiency.

When downloading your own models (not ollama 🙄), it appears vllm take up most of the compute and if you add additional computes like vlm models have a hard time running together. in the background, which causes memory errors.

Are their pure DGX Sparks compatible models that are quantized enough to be run togethers or trained enough that i loads fast if ran with as idle?

Summary:

Nice to run more than one model at a time due to slow development time when I unload one model to load another.

If anyone is confused about my question please feel free to ask for futher clarity.

Memory usage is very model specific so I can’t give you an exact answer. We do have NVFP4 quantized models and you can quantize your own model to decrease memory usage if you wish. You can look at our playbook on how to do so: NVFP4 Quantization | DGX Spark

You can also consider stacking multiple stacks for more memory