Skip to content

Running on Apple ARM #85

@ways

Description

@ways

So far I only have a test setup for training. Here's what my Mac colleagues get when running it:

Error executing job with overrides: []
Traceback (most recent call last):
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py)", line 438, in main
    AnemoiTrainer(config).train()
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py)", line 396, in train
    trainer = pl.Trainer(
              ^^^^^^^^^^^
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/utilities/argparse.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/utilities/argparse.py)", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py)", line 396, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py)", line 131, in __init__
    self._check_config_and_set_final_flags(
  File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py)", line 219, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You set `strategy=<anemoi.training.distributed.strategy.DDPGroupStrategy object at 0x31fb3ce30>` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerator='cpu'` or change the strategy.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
train: exit 1 (84.38 seconds) /Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/tmp](http://github.com/metno/bris-inference/.tox/train/tmp)> anemoi-training train --config-name train_cpu_test pid=48744
  train: FAIL code 1 (241.85=setup[141.77]+cmd[0.01,14.43,1.25,0.01,84.38] seconds)
  evaluation failed :( (241.95 seconds)

Seems similar to https://lightning.ai/forums/t/how-to-use-ddp-in-lightningmodule-in-apple-m1/5182/3

Config used: https://github.com/metno/bris-inference/blob/main/config/train_cpu_test.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationstalled

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions