Skip to content

Add support for configuring Dask distributed #2040

@bouweandela

Description

@bouweandela

Since iris 3.6, it is possible to use Dask distributed with iris. This is a great new feature that will allow for better memory handling and distributed computing. See #1714 for an example implementation. However, it does require some extra configuration.

My proposal would be to allow users to specify the arguments to distributed.Client and to the associated cluster, e.g. distributed.LocalCluster or dask_jobqueue.SLURMCluster in this configuration. This could either be added under a new key in config-user.yml or in a new configuration file in the ~/.esmvaltool directory:

Add to the user configuration file

We could add these new options to config-user.yml under a new key dask, e.g.

Example config-user.yml settings for running locally using a LocalCluster:

dask:
  cluster:
    type: distributed.LocalCluster

Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)

dask:
  client:
    address: tcp://127.0.0.1:45695

Example settings for running on Levante:

dask:
  client: {}
  cluster:
    type: dask_jobqueue.SLURMCluster
    queue: interactive
    account: bk1088
    cores: 8
    memory: 16GiB
    local_directory: "/work/bd0854/b381141/dask-tmp"
    n_workers: 2

New configuration file

Or, we could add the new configuration in a separate file, e.g. called ~/.esmvaltool/dask.yml or ~/.esmvaltool/dask-distributed.yml.

Example config-user.yml settings for running locally using a LocalCluster:

cluster:
  type: distributed.LocalCluster

Example settings for using an externally managed cluster (e.g. set it up from a Jupyter notebook)

client:
  address: tcp://127.0.0.1:45695

Example settings for running on Levante:

client: {}
cluster:
  type: dask_jobqueue.SLURMCluster
  queue: interactive
  account: bk1088
  cores: 8
  memory: 16GiB
  local_directory: "/work/bd0854/b381141/dask-tmp"
  n_workers: 2

@ESMValGroup/esmvaltool-coreteam Does anyone have an opinion on what the best approach is here? A new file or add to config-user.yml?

Metadata

Metadata

Assignees

No one assigned

    Labels

    daskrelated to improvements using DaskenhancementNew feature or request

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions