Add support for configuring Dask distributed

Since [iris 3.6](https://scitools-iris.readthedocs.io/en/stable/whatsnew/3.6.html), it is possible to use [Dask distributed](https://distributed.dask.org/en/stable/) with iris. This is a great new feature that will allow for better memory handling and distributed computing. See #1714 for an example implementation. However, it does require some extra configuration. 

My proposal would be to allow users to specify the arguments to [distributed.Client](https://distributed.dask.org/en/stable/api.html#client) and to the associated cluster, e.g. [distributed.LocalCluster](https://distributed.dask.org/en/stable/api.html#distributed.LocalCluster) or [dask_jobqueue.SLURMCluster](https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html#dask_jobqueue.SLURMCluster) in this configuration. This could either be added under a new key in config-user.yml or in a new configuration file in the `~/.esmvaltool` directory:

Add to the user configuration file
----------------------------------

We could add these new options to `config-user.yml` under a new key `dask`, e.g.

Example config-user.yml settings for running locally using a LocalCluster:
```yaml
dask:
  cluster:
    type: distributed.LocalCluster
```
Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
```yaml
dask:
  client:
    address: tcp://127.0.0.1:45695
```
Example settings for running on Levante:
```yaml
dask:
  client: {}
  cluster:
    type: dask_jobqueue.SLURMCluster
    queue: interactive
    account: bk1088
    cores: 8
    memory: 16GiB
    local_directory: "/work/bd0854/b381141/dask-tmp"
    n_workers: 2
```

New configuration file
----------------------

Or, we could add the new configuration in a separate file, e.g. called `~/.esmvaltool/dask.yml` or `~/.esmvaltool/dask-distributed.yml`.

Example config-user.yml settings for running locally using a LocalCluster:
```yaml
cluster:
  type: distributed.LocalCluster
```
Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)
```yaml
client:
  address: tcp://127.0.0.1:45695
```
Example settings for running on Levante:
```yaml
client: {}
cluster:
  type: dask_jobqueue.SLURMCluster
  queue: interactive
  account: bk1088
  cores: 8
  memory: 16GiB
  local_directory: "/work/bd0854/b381141/dask-tmp"
  n_workers: 2
```

@ESMValGroup/esmvaltool-coreteam Does anyone have an opinion on what the best approach is here? A new file or add to config-user.yml?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for configuring Dask distributed #2040

Add to the user configuration file

New configuration file

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for configuring Dask distributed #2040

Description

Add to the user configuration file

New configuration file

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions