Skip to content

[Bug]: Shims fail to detect GPU resources when running inside WSL2 #3211

@frankcholula

Description

@frankcholula

Steps to reproduce

Host setup:

  1. Machine A running both as dstack server and a worker.
  2. Machine B runs as another worker.
  3. Both workers are registered via SSH and appear in the fleet list.
  4. The dstack server is started inside WSL2 (Ubuntu 24.04) on Machine A.
  5. Observe that both workers show identical specs (CPU = 16 cores, mem ≈ 31 GB) and no GPU info in the fleet dashboard.

Actual behaviour

The shim running on SSH fleet hosts fails to detect NVIDIA GPUs because nvidia-smi cannot be invoked under root inside WSL2. nvidia-smi only works at the user level in WSL2 (see NVIDIA WSL2 forum discussion), but dstack always runs the shim under root on ssh fleets.

Expected behaviour

dstack should correctly detect and report GPU resources for each SSH worker even when the server and workers are running under WSL2. Ideally, the GPU detection logic should handle WSL2 environments where nvidia-smi is available only to the non-root user.

dstack version

0.19.33

Server logs

DEBUG    dstack._internal.server.utils.provisioning:204 
         Retry after error: cat: /root/.dstack/host_info.json: No such file or directory

DEBUG    dstack._internal.server.background.tasks.process_instances:461 
         The dstack-shim environment variables have been installed

DEBUG    dstack._internal.server.app:259 
         Processed request POST http://127.0.0.1:3000/api/project/main/fleets/get 
         in 0.004402s. Status: 200

DEBUG    dstack._internal.server.utils.provisioning:204 
         Retry after error: cat: /root/.dstack/host_info.json: No such file or directory

[00:00:24] DEBUG    dstack._internal.server.app:259 
           Processed request POST http://127.0.0.1:3000/api/project/main/fleets/get 
           in 0.005425s. Status: 200
[00:00:25] DEBUG    dstack._internal.server.background.tasks.process_instances:477 
           Received a host_info:
           {
               'gpu_vendor': 'none',
               'gpu_name': '',
               'gpu_memory': 0,
               'gpu_count': 0,
               'addresses': [
                   '10.255.255.254/32',
                   '172.30.13.156/20',
                   'fe80::215:5dff:fe4c:693b/64',
                   '100.123.202.13/32',
                   'fd7a:115c:a1e0::3501:ca23/128',
                   'fe80::150b:6255:8bdb:4870/64'
               ],
               'disk_size': 0,
               'cpus': 16,
               'memory': 33437167616
           }

INFO     dstack._internal.server.background.tasks.process_instances:314 
         The instance homelab-fleet-1 (100.123.202.13) was successfully added

DEBUG    dstack._internal.server.background.tasks.process_instances:477 
         Received a host_info:
         {
             'gpu_vendor': 'none',
             'gpu_name': '',
             'gpu_memory': 0,
             'gpu_count': 0,
             'addresses': [
                 '10.255.255.254/32',
                 '172.27.117.30/20',
                 'fe80::215:5dff:fec0:258f/64',
                 '100.96.234.126/32',
                 'fd7a:115c:a1e0::2a01:ea86/128',
                 'fe80::8595:4d45:c27c:26a6/64'
             ],
             'disk_size': 0,
             'cpus': 16,
             'memory': 33240039424
         }

Additional information

configuration file

type: fleet
# The name is optional, if not specified, generated randomly
name: homelab-fleet

# SSH credentials for the on-prem servers
ssh_config:
  user: frankcholula
  identity_file: ~/.ssh/id_ed25519_dstack
  hosts:
    - hostname: 127.0.0.1
      blocks: auto
    - hostname: 100.x.x.x
      blocks: auto

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions