-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Since iris 3.6, it is possible to use Dask distributed with iris. This is a great new feature that will allow for better memory handling and distributed computing. See #1714 for an example implementation. However, it does require some extra configuration.
My proposal would be to allow users to specify the arguments to distributed.Client and to the associated cluster, e.g. distributed.LocalCluster or dask_jobqueue.SLURMCluster in this configuration. This could either be added under a new key in config-user.yml or in a new configuration file in the ~/.esmvaltool directory:
Add to the user configuration file
We could add these new options to config-user.yml under a new key dask, e.g.
Example config-user.yml settings for running locally using a LocalCluster:
dask:
cluster:
type: distributed.LocalClusterExample settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
dask:
client:
address: tcp://127.0.0.1:45695Example settings for running on Levante:
dask:
client: {}
cluster:
type: dask_jobqueue.SLURMCluster
queue: interactive
account: bk1088
cores: 8
memory: 16GiB
local_directory: "/work/bd0854/b381141/dask-tmp"
n_workers: 2New configuration file
Or, we could add the new configuration in a separate file, e.g. called ~/.esmvaltool/dask.yml or ~/.esmvaltool/dask-distributed.yml.
Example config-user.yml settings for running locally using a LocalCluster:
cluster:
type: distributed.LocalClusterExample settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
client:
address: tcp://127.0.0.1:45695Example settings for running on Levante:
client: {}
cluster:
type: dask_jobqueue.SLURMCluster
queue: interactive
account: bk1088
cores: 8
memory: 16GiB
local_directory: "/work/bd0854/b381141/dask-tmp"
n_workers: 2@ESMValGroup/esmvaltool-coreteam Does anyone have an opinion on what the best approach is here? A new file or add to config-user.yml?