-
Notifications
You must be signed in to change notification settings - Fork 207
Closed as not planned
Closed as not planned
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Steps to reproduce
Host setup:
- Machine A running both as dstack server and a worker.
- Machine B runs as another worker.
- Both workers are registered via SSH and appear in the fleet list.
- The dstack server is started inside WSL2 (Ubuntu 24.04) on Machine A.
- Observe that both workers show identical specs (CPU = 16 cores, mem ≈ 31 GB) and no GPU info in the fleet dashboard.
Actual behaviour
The shim running on SSH fleet hosts fails to detect NVIDIA GPUs because nvidia-smi cannot be invoked under root inside WSL2. nvidia-smi only works at the user level in WSL2 (see NVIDIA WSL2 forum discussion), but dstack always runs the shim under root on ssh fleets.
| User=root |
Expected behaviour
dstack should correctly detect and report GPU resources for each SSH worker even when the server and workers are running under WSL2. Ideally, the GPU detection logic should handle WSL2 environments where nvidia-smi is available only to the non-root user.
dstack version
0.19.33
Server logs
DEBUG dstack._internal.server.utils.provisioning:204
Retry after error: cat: /root/.dstack/host_info.json: No such file or directory
DEBUG dstack._internal.server.background.tasks.process_instances:461
The dstack-shim environment variables have been installed
DEBUG dstack._internal.server.app:259
Processed request POST http://127.0.0.1:3000/api/project/main/fleets/get
in 0.004402s. Status: 200
DEBUG dstack._internal.server.utils.provisioning:204
Retry after error: cat: /root/.dstack/host_info.json: No such file or directory
[00:00:24] DEBUG dstack._internal.server.app:259
Processed request POST http://127.0.0.1:3000/api/project/main/fleets/get
in 0.005425s. Status: 200
[00:00:25] DEBUG dstack._internal.server.background.tasks.process_instances:477
Received a host_info:
{
'gpu_vendor': 'none',
'gpu_name': '',
'gpu_memory': 0,
'gpu_count': 0,
'addresses': [
'10.255.255.254/32',
'172.30.13.156/20',
'fe80::215:5dff:fe4c:693b/64',
'100.123.202.13/32',
'fd7a:115c:a1e0::3501:ca23/128',
'fe80::150b:6255:8bdb:4870/64'
],
'disk_size': 0,
'cpus': 16,
'memory': 33437167616
}
INFO dstack._internal.server.background.tasks.process_instances:314
The instance homelab-fleet-1 (100.123.202.13) was successfully added
DEBUG dstack._internal.server.background.tasks.process_instances:477
Received a host_info:
{
'gpu_vendor': 'none',
'gpu_name': '',
'gpu_memory': 0,
'gpu_count': 0,
'addresses': [
'10.255.255.254/32',
'172.27.117.30/20',
'fe80::215:5dff:fec0:258f/64',
'100.96.234.126/32',
'fd7a:115c:a1e0::2a01:ea86/128',
'fe80::8595:4d45:c27c:26a6/64'
],
'disk_size': 0,
'cpus': 16,
'memory': 33240039424
}Additional information
configuration file
type: fleet
# The name is optional, if not specified, generated randomly
name: homelab-fleet
# SSH credentials for the on-prem servers
ssh_config:
user: frankcholula
identity_file: ~/.ssh/id_ed25519_dstack
hosts:
- hostname: 127.0.0.1
blocks: auto
- hostname: 100.x.x.x
blocks: auto
r4victor
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working