Effective PyTorch and CUDA

Has anyone come up with a better or more efficient way to get the DGX Spark to do GPU training using PyTorch? I had a lot of issues with getting a version of PyTorch or NVRTC to operate when trying to use the GPU’s for training specifically. Open to suggestions if someone has a better way to make the system function as a training solution. For context I was trying to process images in a YOLO model. Thanks for any advice you can provide.

The best I was able to put together was:

Component Version Path / Location
Ubuntu LTS 22.04.4 LTS /
NVIDIA Driver 555.xx + /usr/lib/modules/<kernel>/kernel/drivers/video/nvidia.ko
CUDA Toolkit 12.4 .1 /usr/local/cuda-12.4/
cuDNN 9.x (ships with CUDA 12.4) /usr/local/cuda-12.4/lib64/libcudnn*
Python (venv) 3.12.4 ~/vllm-env/
PyTorch 2.5.1 + cu124 ~/vllm-env/lib/python3.12/site-packages/torch/
TorchVision 0.20.1 + cu124 same site-packages
YOLO / Ultralytics 8.2.85 ~/vllm-env/bin/yolo
vLLM / Transformers 0.5.3 / 4.44.2 site-packages
TensorRT (optional) 10.3.0 /usr/lib/x86_64-linux-gnu/

# Clean out any old drivers or CUDA bits
sudo apt purge 'nvidia-*' -y
sudo apt update && sudo apt install -y ubuntu-drivers-common
sudo ubuntu-drivers install
sudo reboot

# CUDA 12.4 Toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo apt update && sudo apt install -y cuda cuda-toolkit-12-4

echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
nvcc --version

# Python 3.12 virtual environment
sudo apt install -y python3.12-venv
python3.12 -m venv ~/vllm-env
source ~/vllm-env/bin/activate
pip install --upgrade pip wheel setuptools

# GPU-enabled PyTorch stack
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 \
  -f https://download.pytorch.org/whl/torch_stable.html

python - <<'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("GPU available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))
EOF

# YOLOv8 training
pip install ultralytics==8.2.85
yolo check
yolo train model=yolov8m.pt data=wildfire.yaml imgsz=640 epochs=100 device=0

# Optional: vLLM / TensorRT
pip install vllm==0.5.3 transformers==4.44.2 accelerate
pip install tensorrt==10.3.0


nvidia-smi
# Shows python/yolo using GPU 90–100%

python -c "import torch; print(torch.cuda.get_device_name(0))"
# → NVIDIA H100 / RTX 5090 etc.

Training logs display:

GPU_mem: 10.4G / 24G
Speed: 2.1ms preprocess, 3.0ms inference


Save as dgx_gpu_setup.sh, then run:

chmod +x dgx_gpu_setup.sh
sudo ./dgx_gpu_setup.sh

#!/bin/bash
# === DGX Spark Full GPU Training Provisioner ===
# Tested Ubuntu 22.04 LTS  |  NVIDIA driver 555+  |  CUDA 12.4
set -e

echo "[1/8] Updating system..."
sudo apt update && sudo apt install -y ubuntu-drivers-common curl wget python3.12-venv

echo "[2/8] Installing NVIDIA driver..."
sudo ubuntu-drivers install

echo "[3/8] Installing CUDA 12.4 toolkit..."
wget -q https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204_12.4.1-1_amd64.deb
sudo apt update && sudo apt install -y cuda cuda-toolkit-12-4
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

echo "[4/8] Creating Python environment..."
python3.12 -m venv ~/vllm-env
source ~/vllm-env/bin/activate
pip install --upgrade pip wheel setuptools

echo "[5/8] Installing GPU-enabled PyTorch stack..."
pip install torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 \
  -f https://download.pytorch.org/whl/torch_stable.html

echo "[6/8] Installing YOLOv8 + extras..."
pip install ultralytics==8.2.85 vllm==0.5.3 transformers==4.44.2 accelerate

echo "[7/8] Testing GPU access..."
python - <<'EOF'
import torch
print("Torch:", torch.__version__)
print("CUDA:", torch.version.cuda)
print("GPU available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU:", torch.cuda.get_device_name(0))
EOF

echo "[8/8] Complete! Reboot recommended."



markl02us,

consider using Pytorch containers from GPU-optimized AI, Machine Learning, & HPC Software | NVIDIA NGC It is the same Pytorch image that our CSP and enterprise customers use, regulary updated with security patches, support for new platforms, and tested/validated with library dependencies. Which allows you to just build.

this should not effect your native host software stack, including the GPU driver.

1 Like

Thanks for this response. In other words, we should be able to run a pytorch container on our GPUs on our cuda 13.0 drivers even though pytorch is not yet compatible with cuda 13.0. Is that correct?

The assumption is that this question is for GPUs other than GB10. If so, then it’s a yes, as long as your system meets the prerequisites that are listed for the 25.09-py3 container.

For the DGX Spark you might find this playbook interesting: Fine tune with Pytorch

I was thinking that I would be able to develop “locally” but judging by the playbook that you sent I’ll probably need to just use 25.09-py3 as a base image, build my own, run my python app and develop in the container. Thanks

I believe I used the -igpu version as well off the drop down that is supposed to be specifically built for the spark. I don’t know if there is a difference between them, only that the -igpu worked out for me. But I will be investigating the other solutions provided as well.

Try uv. Here is my pyproject.toml:

[project]
name = “blah”
version = “0.1.0”
description = “Add your description here”
readme = “README.md”
requires-python = “>=3.12”
dependencies = [
“torch>=2.9.0”,
“torchvision>=0.24.0”,
]

[tool.uv.sources]
torch = [
{ index = “pytorch-cu130” },
]
torchvision = [
{ index = “pytorch-cu130” },
]

[[tool.uv.index]]
name = “pytorch-cu130”
url = “https://download.pytorch.org/whl/cu130”
explicit = true

yes, you can develop locally. Once you have the playbook setup it runs locally.
All the playbooks on build.nvidia.com are designed to run locally; once the assets are pulled/downloaded.

The 25.09-py3 from ngc.nvidia.com (nvcr.io) with CUDA 13.0.1 support does support the DGX Spark; it’s optimized for the best performance.

Release notes for the NGC container here (PyTorch 2.9.0a/CUDA 13.0.1 and more)

3 Likes

I found my way…
in case anyone is wondering here is my Dockerfile. I just had to install packages without the venv.

#Dockerfile
FROM nvcr.io/nvidia/pytorch:25.09-py3

WORKDIR /app


RUN pip install fastapi uvicorn[standard] \
    pydantic python-multipart \
    pillow diffusers transformers \
    huggingface \
    peft \
    sentencepiece

COPY . .

ENV MODEL_DIR=/models
ENV PORT=80
EXPOSE 80

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80", "--workers", "1"]
1 Like

What I am seeing right now is that maybe the PyTorch NGC container is not fully CUDA supported for the GB10 in the Spark yet. I think that is what you were saying earlier about PyTorch compatibility. Wondering if someone might be able to confirm if we’re just waiting for that container release to catch up?

If you see my Dockerfile above, that Docker image is what works. I also have a some work that I showcased here. In there you’ll find a link to the repo as well.

1 Like

Is there a non-container solution to address this issue? For instance, guidance to install pytorch 2.9.0a0+50eac811a6.nv25.09 + CUDA 13.0 using pip?

I use miniconda when I can’t use docker.

# The 1st line was from https://build.nvidia.com/spark/cuda-x-data-science.
#  I used this for all Conda Env in DGX Spark.
conda create -n ${ENV_NAME} -c rapidsai-nightly -c conda-forge -c nvidia rapids=25.10 python=3.12 'cuda-version=13.0' jupyter hdbscan umap-learn ipykernel -y

# The key here is 'cu130'.  I forgot where I got this from, but works.
#  I used it to train my own tiny LLM in DGX Spark.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Many thanks! It mostly works. But the model I tested (which is a ViT) seems to run slower (>7 minutes) than it runs in the container (<4 minutes). I also got the following warning message:

/miniconda3/envs/prithvi_env/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning:
Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.
Minimum and Maximum cuda capability supported by this version of PyTorch is
(8.0) - (12.0)
1 Like

Are you monitoring how much of your GPU is being used or has been allotted to the task?

It is ~96% using conda or container.

I’ve gotten a working Pytorch container from AI Workbench - though you need to do a couple of small tweaks:

1- Create a new project in AI Workbench. For the container select the Python with Cuda 12.9 container

2- Add this to requirements.txt:
```
–index-url=https://download.pytorch.org/whl/cu130
torch
torchvision
torchaudio

```

This has gotten me a working Pytorch container with GPU access

1 Like

This is what I’m seeing as well. I can’t use the containers because I don’t have root on the DGX Spark. I’m installing the packages from the whl/cu130 source, which claim to be CUDA 13.0, but it still seems the pytorch is not built with support for anything beyond 12.0.

1 Like

I wish they would just update the AI Workbench “base image” for PyTorch to be the latest. I was just getting used to this workflow with the launcher app but I need a newer PyTorch, would rather not switch to raw docker right away.

1 Like

You can install the majority of them as project deps and it’s going to fly, I only needed triton compiler working with PyTorch together on CUDA 13.

inductor/eager/aot_eager work just fine

Compatible version from the latest image:

torch: 2.10.0a0+b558c986e8.nv25.11

  1. pytorch-triton: 3.5.0+gitde3506d2

  2. triton_kernels: 1.0.0+nv25.11

  3. transformer_engine: 2.9.0+70f53666

  4. flash_attn: 2.7.4.post1+25.11

  5. torchvision: 0.25.0a0+7a13ad0f

  6. torch_tensorrt: 2.10.0a0

  7. tensorrt: 10.14.1.48

  8. torchao: 0.14.0+git

  9. nvidia-modelopt: 0.37.0

  10. nvidia-cudnn-frontend: 1.15.0

  11. nvidia-dali-cuda130: 1.52.0

  12. numpy: 2.1.0

  13. safetensors: 0.6.2

  14. tokenizers: 0.22.1