Effective PyTorch and CUDA

markl02us · October 19, 2025, 9:38pm

Has anyone come up with a better or more efficient way to get the DGX Spark to do GPU training using PyTorch? I had a lot of issues with getting a version of PyTorch or NVRTC to operate when trying to use the GPU’s for training specifically. Open to suggestions if someone has a better way to make the system function as a training solution. For context I was trying to process images in a YOLO model. Thanks for any advice you can provide.

The best I was able to put together was:

Component	Version	Path / Location
Ubuntu LTS	22.04.4 LTS	`/`
NVIDIA Driver	555.xx +	`/usr/lib/modules/<kernel>/kernel/drivers/video/nvidia.ko`
CUDA Toolkit	12.4 .1	`/usr/local/cuda-12.4/`
cuDNN	9.x (ships with CUDA 12.4)	`/usr/local/cuda-12.4/lib64/libcudnn*`
Python (venv)	3.12.4	`~/vllm-env/`
PyTorch	2.5.1 + cu124	`~/vllm-env/lib/python3.12/site-packages/torch/`
TorchVision	0.20.1 + cu124	same site-packages
YOLO / Ultralytics	8.2.85	`~/vllm-env/bin/yolo`
vLLM / Transformers	0.5.3 / 4.44.2	site-packages
TensorRT (optional)	10.3.0	`/usr/lib/x86_64-linux-gnu/`

# Clean out any old drivers or CUDA bits
sudo apt purge 'nvidia-*' -y
sudo apt update && sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo reboot

# CUDA 12.4 Toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo apt update && sudo apt install -y cuda cuda-toolkit-12-4

echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
nvcc --version

# Python 3.12 virtual environment
sudo apt install -y python3.12-venv
python3.12 -m venv ~/vllm-env
source ~/vllm-env/bin/activate
pip install --upgrade pip wheel setuptools

# GPU-enabled PyTorch stack
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 \
  -f https://download.pytorch.org/whl/torch_stable.html

python - <<'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("GPU available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))
EOF

# YOLOv8 training
pip install ultralytics==8.2.85
yolo check
yolo train model=yolov8m.pt data=wildfire.yaml imgsz=640 epochs=100 device=0

# Optional: vLLM / TensorRT
pip install vllm==0.5.3 transformers==4.44.2 accelerate
pip install tensorrt==10.3.0

nvidia-smi
# Shows python/yolo using GPU 90–100%

python -c "import torch; print(torch.cuda.get_device_name(0))"
# → NVIDIA H100 / RTX 5090 etc.

Training logs display:

GPU_mem: 10.4G / 24G
Speed: 2.1ms preprocess, 3.0ms inference

Save as dgx_gpu_setup.sh, then run:

chmod +x dgx_gpu_setup.sh
sudo ./dgx_gpu_setup.sh

#!/bin/bash
# === DGX Spark Full GPU Training Provisioner ===
# Tested Ubuntu 22.04 LTS  |  NVIDIA driver 555+  |  CUDA 12.4
set -e

echo "[1/8] Updating system..."
sudo apt update && sudo apt install -y ubuntu-drivers-common curl wget python3.12-venv

echo "[2/8] Installing NVIDIA driver..."
sudo ubuntu-drivers install

echo "[3/8] Installing CUDA 12.4 toolkit..."
wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo apt update && sudo apt install -y cuda cuda-toolkit-12-4
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

echo "[4/8] Creating Python environment..."
python3.12 -m venv ~/vllm-env
source ~/vllm-env/bin/activate
pip install --upgrade pip wheel setuptools

echo "[5/8] Installing GPU-enabled PyTorch stack..."
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 \
  -f https://download.pytorch.org/whl/torch_stable.html

echo "[6/8] Installing YOLOv8 + extras..."
pip install ultralytics==8.2.85 vllm==0.5.3 transformers==4.44.2 accelerate

echo "[7/8] Testing GPU access..."
python - <<'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("GPU available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
EOF

echo "[8/8] Complete! Reboot recommended."

NVES · October 19, 2025, 11:18pm

markl02us,

consider using Pytorch containers from GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC It is the same Pytorch image that our CSP and enterprise customers use, regulary updated with security patches, support for new platforms, and tested/validated with library dependencies. Which allows you to just build.

this should not effect your native host software stack, including the GPU driver.

naeemgtng · October 20, 2025, 7:35pm

Thanks for this response. In other words, we should be able to run a pytorch container on our GPUs on our cuda 13.0 drivers even though pytorch is not yet compatible with cuda 13.0. Is that correct?

abull · October 20, 2025, 7:50pm

The assumption is that this question is for GPUs other than GB10. If so, then it’s a yes, as long as your system meets the prerequisites that are listed for the 25.09-py3 container.

For the DGX Spark you might find this playbook interesting: Fine tune with Pytorch

naeemgtng · October 20, 2025, 8:29pm

I was thinking that I would be able to develop “locally” but judging by the playbook that you sent I’ll probably need to just use 25.09-py3 as a base image, build my own, run my python app and develop in the container. Thanks

markl02us · October 20, 2025, 8:46pm

I believe I used the -igpu version as well off the drop down that is supposed to be specifically built for the spark. I don’t know if there is a difference between them, only that the -igpu worked out for me. But I will be investigating the other solutions provided as well.

cpang2 · October 20, 2025, 9:18pm

Try uv. Here is my pyproject.toml:

[project]
name = “blah”
version = “0.1.0”
description = “Add your description here”
readme = “README.md”
requires-python = “>=3.12”
dependencies = [
“torch>=2.9.0”,
“torchvision>=0.24.0”,
]

[tool.uv.sources]
torch = [
{ index = “pytorch-cu130” },
]
torchvision = [
{ index = “pytorch-cu130” },
]

[[tool.uv.index]]
name = “pytorch-cu130”
url = “https://download.pytorch.org/whl/cu130”
explicit = true

abull · October 20, 2025, 11:56pm

yes, you can develop locally. Once you have the playbook setup it runs locally.
All the playbooks on build.nvidia.com are designed to run locally; once the assets are pulled/downloaded.

The 25.09-py3 from ngc.nvidia.com (nvcr.io) with CUDA 13.0.1 support does support the DGX Spark; it’s optimized for the best performance.

Release notes for the NGC container here (PyTorch 2.9.0a/CUDA 13.0.1 and more)

naeemgtng · October 21, 2025, 9:31pm

I found my way…
in case anyone is wondering here is my Dockerfile. I just had to install packages without the venv.

#Dockerfile
FROM nvcr.io/nvidia/pytorch:25.09-py3

WORKDIR /app


RUN pip install fastapi uvicorn[standard] \
    pydantic python-multipart \
    pillow diffusers transformers \
    huggingface \
    peft \
    sentencepiece

COPY . .

ENV MODEL_DIR=/models
ENV PORT=80
EXPOSE 80

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "1"]

cimply · October 23, 2025, 9:51pm

What I am seeing right now is that maybe the PyTorch NGC container is not fully CUDA supported for the GB10 in the Spark yet. I think that is what you were saying earlier about PyTorch compatibility. Wondering if someone might be able to confirm if we’re just waiting for that container release to catch up?

naeemgtng · October 24, 2025, 3:55pm

If you see my Dockerfile above, that Docker image is what works. I also have a some work that I showcased here. In there you’ll find a link to the repo as well.

weile.wang · October 27, 2025, 6:15am

Is there a non-container solution to address this issue? For instance, guidance to install pytorch 2.9.0a0+50eac811a6.nv25.09 + CUDA 13.0 using pip?

Neurfer · October 27, 2025, 6:56am

I use miniconda when I can’t use docker.

# The 1st line was from https://build.nvidia.com/spark/cuda-x-data-science.
#  I used this for all Conda Env in DGX Spark.
conda create -n ${ENV_NAME} -c rapidsai-nightly -c conda-forge -c nvidia rapids=25.10 python=3.12 'cuda-version=13.0' jupyter hdbscan umap-learn ipykernel -y

# The key here is 'cu130'.  I forgot where I got this from, but works.
#  I used it to train my own tiny LLM in DGX Spark.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

weile.wang · October 27, 2025, 8:38am

Many thanks! It mostly works. But the model I tested (which is a ViT) seems to run slower (>7 minutes) than it runs in the container (<4 minutes). I also got the following warning message:

/miniconda3/envs/prithvi_env/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning:
Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
Minimum and Maximum cuda capability supported by this version of PyTorch is
(8.0) - (12.0)

markl02us · October 27, 2025, 11:49am

Are you monitoring how much of your GPU is being used or has been allotted to the task?

weile.wang · October 27, 2025, 5:40pm

It is ~96% using conda or container.

jondecker76 · November 1, 2025, 12:56pm

I’ve gotten a working Pytorch container from AI Workbench - though you need to do a couple of small tweaks:

1- Create a new project in AI Workbench. For the container select the Python with Cuda 12.9 container

2- Add this to requirements.txt:
```
–index-url=https://download.pytorch.org/whl/cu130
torch
torchvision
torchaudio

```

This has gotten me a working Pytorch container with GPU access

tim.cutts · November 7, 2025, 1:33pm

This is what I’m seeing as well. I can’t use the containers because I don’t have root on the DGX Spark. I’m installing the packages from the whl/cu130 source, which claim to be CUDA 13.0, but it still seems the pytorch is not built with support for anything beyond 12.0.

simon.thornington · November 18, 2025, 2:21am

I wish they would just update the AI Workbench “base image” for PyTorch to be the latest. I was just getting used to this workflow with the launcher app but I need a newer PyTorch, would rather not switch to raw docker right away.

antonibertel · December 8, 2025, 11:20pm

You can install the majority of them as project deps and it’s going to fly, I only needed triton compiler working with PyTorch together on CUDA 13.

inductor/eager/aot_eager work just fine

Compatible version from the latest image:

torch: 2.10.0a0+b558c986e8.nv25.11

pytorch-triton: 3.5.0+gitde3506d2
triton_kernels: 1.0.0+nv25.11
transformer_engine: 2.9.0+70f53666
flash_attn: 2.7.4.post1+25.11
torchvision: 0.25.0a0+7a13ad0f
torch_tensorrt: 2.10.0a0
tensorrt: 10.14.1.48
torchao: 0.14.0+git
nvidia-modelopt: 0.37.0
nvidia-cudnn-frontend: 1.15.0
nvidia-dali-cuda130: 1.52.0
numpy: 2.1.0
safetensors: 0.6.2
tokenizers: 0.22.1

Topic		Replies	Views
Anyone got nanochat training working on the DGX spark? DGX Spark / GB10	10	815	November 21, 2025
Can I downgrade DGX SPARK to cuda 12? DGX Spark / GB10	6	447	November 2, 2025
Which PyTorch base image to use in AI Workbench with DGX Spark? DGX Spark / GB10	12	380	November 18, 2025
Cloud Vendor agnostic Pytorch CUDA docker image Frameworks (archived)	0	673	February 26, 2023
Roadblock: DGX Spark. PyTorch 2.8 and NGC 25.09 missing support for GB10 GPUs with sm_121 for FLUX training: DGX Spark / GB10	2	94	November 17, 2025
Unable to Install CUDA-Enabled PyTorch for NVIDIA GB10 GPU (Only CPU Version Installed) CUDA Setup and Installation cuda , pytorch	4	109	December 7, 2025
Having trouble with my dgx spark-digits DGX Spark / GB10	3	48	December 5, 2025
Has anyone been able to get Ostris' AI Toolkit running on DGX Spark? DGX Spark / GB10	21	1000	December 5, 2025
Running PyTorch on Jetson TX2 Jetson TX2 pytorch	10	1439	February 27, 2024
Determining Compatability DGX Spark / GB10 cuda	0	100	October 20, 2025

Effective PyTorch and CUDA

Related topics