Boltz-2 Container DGX Deployment Issue

#############################################
The Boltz-2 container is unable to run because the host DGX system, utilizing the NVIDIA GB10 (Blackwell) architecture, is not exposing the necessary NVML power management data (nvmlDeviceGetPowerManagementLimit) to the container runtime, even with --privileged access.

##############################################

  1. Host System and User Information
    Field Value
    GPU Architecture NVIDIA GB10 (Blackwell)
    NVIDIA Driver Version 580.95.05
    CUDA Version 13.0
  2. Container Execution Command

The container failed with and without the --privileged flag.
Bash

Command attempted:

docker run --rm --name boltz2 --runtime=nvidia
–shm-size=16G
-e NGC_API_KEY
-v $LOCAL_NIM_CACHE:/opt/nim/.cache
-p 8000:8000
–privileged

nvcr.io/nim/mit/boltz2:1.4.0

  1. Error Traceback (Summary)

The container consistently fails during initialization when attempting to gather GPU telemetry, specifically related to power management.
Error Component Detail
Final Error pynvml.NVMLError_NotSupported: Not Supported
Failing Function pynvml.nvmlDeviceGetPowerManagementLimit(handle)
Trace Location Within the container’s dependency chain: cuequivariance_ops/triton/cache_manager.py

  1. Host-Level Diagnostics (nvidia-smi Output)

This output confirms the lack of power management data on the host, which is the root cause of the container error.
Bash

… (Headers Omitted for Brevity)

GPU 0000000F:01:00.0
GPU Power Readings
Average Power Draw : 4.44 W
Instantaneous Power Draw : 6.49 W
Current Power Limit : N/A ← CRITICAL
Requested Power Limit : N/A ← CRITICAL
Default Power Limit : N/A ← CRITICAL
Min Power Limit : N/A
Max Power Limit : N/A

The Boltz-2 NIM is not yet officially supported on DGX Spark. For this specific issue, due to the unified architecture, the NVIDIA driver cannot report the power limits of the GPU which is why you are seeing this particular error.