-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationstalled
Description
So far I only have a test setup for training. Here's what my Mac colleagues get when running it:
Error executing job with overrides: []
Traceback (most recent call last):
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py)", line 438, in main
AnemoiTrainer(config).train()
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/anemoi/training/train/train.py)", line 396, in train
trainer = pl.Trainer(
^^^^^^^^^^^
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/utilities/argparse.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/utilities/argparse.py)", line 70, in insert_env_defaults
return fn(self, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py)", line 396, in __init__
self._accelerator_connector = _AcceleratorConnector(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py)", line 131, in __init__
self._check_config_and_set_final_flags(
File "/Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py](http://github.com/metno/bris-inference/.tox/train/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py)", line 219, in _check_config_and_set_final_flags
raise ValueError(
ValueError: You set `strategy=<anemoi.training.distributed.strategy.DDPGroupStrategy object at 0x31fb3ce30>` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerator='cpu'` or change the strategy.
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
train: exit 1 (84.38 seconds) /Users/vegardb/src/[github.com/metno/bris-inference/.tox/train/tmp](http://github.com/metno/bris-inference/.tox/train/tmp)> anemoi-training train --config-name train_cpu_test pid=48744
train: FAIL code 1 (241.85=setup[141.77]+cmd[0.01,14.43,1.25,0.01,84.38] seconds)
evaluation failed :( (241.95 seconds)
Seems similar to https://lightning.ai/forums/t/how-to-use-ddp-in-lightningmodule-in-apple-m1/5182/3
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationstalled