The code doesn't seem to know when user specifies "gpus: [0, 1, 2, 3]" in the pytorch_config.yaml (PyTorch engine)

### Is there an existing issue for this?

- [X] I have searched the existing issues

### Bug description

Hi DLC team, I’m using the latest DLC version on the PyTorch engine. There seems to be an error with training this DLC engine on multiple GPUs.
Let me show you further:

If my pytorch_config.yaml is like this:
```
…
runner:
  type: PoseTrainingRunner
  gpus:
  - 0
  - 1
  - 2
  - 3
  key_metric: test.mAP
  key_metric_asc: true
  eval_interval: 300
…

```
I can verify that when the code executes to here: (at line 73 <DLC conda env folder>/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/apis/train.py), the run_config object has read the [0, 1, 2, 3] array of GPU IDs as seen below:

run_config
```
…
'model':{'backbone': {'type': 'ResNet', 'model_name': 'resnet50_gn', 'output_stride': 16, 'freeze_bn_stats': True, 'freeze_bn_weights': False}, 'backbone_output_channels': 2048, 'heads': {'bodypart': {...}}}
'net_type':'resnet_50'
'runner':{'type': 'PoseTrainingRunner', 'gpus': [0, 1, 2, 3], 'key_metric': 'test.mAP', 'key_metric_asc': True, 'eval_interval': 300, 'optimizer': {'type': 'AdamW', 'params': {...}}, 'scheduler': {'type': 'LRListScheduler', 'params': {...}}, 'snapshots': {'max_snapshots': 5, 'save_epochs': 25, 'save_optimizer_state': False}}
'train_settings':{'batch_size': 8, 'dataloader_workers': 0, 'dataloader_pin_memory': False, 'display_iters': 100, 'epochs': 200, 'seed': 42}
len():8
```



The code eventually executes to here, line 554 at this file: <DLC conda env folder>/lib/python3.10/site-packages/deeplabcut/pose_estimation_pytorch/runners/train.py, where the code builds a runner. It’s supposed to know that “gpus” is an array of GPU IDs. However, “gpus” remains None and thus the runner would not initiate the DataParallel().
```
def build_training_runner(
    runner_config: dict,
    model_folder: Path,
    task: Task,
    model: nn.Module,
    device: str,
    gpus: list[int] | None = None,
    snapshot_path: str | None = None,
    logger: BaseLogger | None = None,
) -> TrainingRunner:
    """
    Build a runner object according to a pytorch configuration file

    Args:
        runner_config: the configuration for the runner
        model_folder: the folder where models should be saved
        task: the task the runner will perform
        model: the model to run
        device: the device to use (e.g. {'cpu', 'cuda:0', 'mps'})
        gpus: the list of GPU indices to use for multi-GPU training
        snapshot_path: the snapshot from which to load the weights
        logger: the logger to use, if any

    Returns:
        the runner that was built
    """
```

As a fix, I manually added this line right at the beginning, in order to force the “gpus” argument to take the values from runner config, if GPU IDs are specified:
`gpus = runner_config["gpus"] if runner_config["gpus"] else gpus
`
I hope someone can take a look. Thanks!


### Operating System

NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"

### DeepLabCut version

Loading DLC 3.0.0rc4...
DLC loaded in light mode; you cannot use any GUI (labeling, relabeling and standalone GUI)

### DeepLabCut mode

single animal

### Device type

gpu

### Steps To Reproduce

Please see above

### Relevant log output

_No response_

### Anything else?

_No response_

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://github.com/DeepLabCut/DeepLabCut/blob/master/CODE_OF_CONDUCT.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The code doesn't seem to know when user specifies "gpus: [0, 1, 2, 3]" in the pytorch_config.yaml (PyTorch engine) #2744

Is there an existing issue for this?

Bug description

Operating System

DeepLabCut version

DeepLabCut mode

Device type

Steps To Reproduce

Relevant log output

Anything else?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

The code doesn't seem to know when user specifies "gpus: [0, 1, 2, 3]" in the pytorch_config.yaml (PyTorch engine) #2744

Description

Is there an existing issue for this?

Bug description

Operating System

DeepLabCut version

DeepLabCut mode

Device type

Steps To Reproduce

Relevant log output

Anything else?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions