Skip to content

Conversation

@schlunma
Copy link
Contributor

@schlunma schlunma commented Dec 12, 2024

Description

This PR makes Dask configurable in our configuration. These are the default settings:

dask:
  client: {}  # keyword arguments for distributed.Client, can include "address" to use external cluster
  clusters:
    default:
      type: default
    debug:
      type: default
      scheduler: synchronous
    local:
      type: distributed.LocalCluster
      n_workers: 2
      threads_per_worker: 2
      memory_limit: 4GiB
  config: {}  # keyword arguments for dask.config.set
  run: default  # Start the `default` cluster defined above

The entries given under clusters can be selected via the run option; this also works in the command line, e.g.,

esmvaltool run --dask='{"run": "local"}' recipe_example.yml

I chose not to add a predefined SLURMCluster setting since this needs to be customized by the user anyway (account and scratch directory).

This PR also deprecates the usage of the file ~/.esmvaltool/dask.yml (see below).

Closes #2040
Closes #2369

Link to documentation: https://esmvaltool--2616.org.readthedocs.build/projects/ESMValCore/en/2616/quickstart/configure.html#dask-configuration

Deprecation (since v2.12.0, will be removed in v2.14.0) #

Usage of Dask configuration file ~/.esmvaltool/dask.yml: please use the new configuration option dask instead (see details above). If the deprecated file is present, it will be used but a warning will be printed. In this case, all dask configuration options are ignored. To use the new configuration option, delete or move ~/.esmvaltool/dask.yml. To force using the new Dask configuration option and ignore ~/.esmvaltool/dask.yml already now, you can set the environment variable ESMVALTOOL_USE_NEW_DASK_CONFIG to any truthy value.


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added deprecated feature dask related to improvements using Dask labels Dec 12, 2024
@schlunma schlunma added this to the v2.12.0 milestone Dec 12, 2024
@schlunma schlunma self-assigned this Dec 12, 2024
@codecov
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.95%. Comparing base (92746cc) to head (282df78).
Report is 106 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2616      +/-   ##
==========================================
+ Coverage   94.93%   94.95%   +0.01%     
==========================================
  Files         252      252              
  Lines       14620    14672      +52     
==========================================
+ Hits        13880    13932      +52     
  Misses        740      740              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@schlunma schlunma marked this pull request as ready for review December 12, 2024 17:35
@schlunma schlunma changed the title Make Dask configurable in configuration Make Dask configurable in our configuration Dec 12, 2024
Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, Manu! I would strongly recommend we removed all valid HPC account/user-related information - this is a public space, and that info is confidential in that it can ease hacking. Put some bogus ones instead 👍

slurm_cluster:
type: dask_jobqueue.SLURMCluster
queue: shared
account: bk1088
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this as public info, Manu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a40fb69

memory: 7680MiB
processes: 2
interface: ib0
local_directory: "/scratch/b/b381141/dask-tmp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid user name, pls remove this as public information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a40fb69

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to see this finally happening @schlunma!

I just wondered about the format of the configuration file. Would it make sense to be able to specify the configuration parameters per 'cluster'?

How about using the following as a format:

dask:
  use: threaded  # Use the `threaded` profile defined below
  profiles:
    threaded:
      scheduler: threaded
      num_workers: 4
    debug:
      scheduler: synchronous
    local:
      scheduler: distributed.LocalCluster
        n_workers: 2
        threads_per_worker: 2
        memory_limit: 4GiB
    compute:
      scheduler: dask_jobqueue.SLURMCluster
        queue: compute
        account: bd0854
        cores: 128
        memory: 240GiB
        processes: 32
        interface: ib0
        local_directory: /scratch/b/b381141/dask-tmp
        n_workers: 32
        walltime: '2:00:00'
    external:
      scheduler: distributed.Client
         address: "tcp://127.0.0.1:43605"

So all the parameters from the profile selected with use will just be applied using dask.config.set, but we do have a special treatment of the scheduler key if it is not one of the values that are supported by Dask by default, we import the module and instantiate the class.

run: default # This can be omitted
clusters:
default:
type: default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to local_threaded in a40fb69

client.dashboard_link,
)
else:
logger.warning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(
logger.info(

Maybe this is fine in many cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d52a5ac

)
else:
logger.warning(
"Using Dask default scheduler, checkout "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rephrase this as:

"Using the Dask threaded scheduler. The distributed scheduler is recommended, please read ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d52a5ac

.. code:: yaml
dask:
run: <NAME_OF_CLUSTER>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run: <NAME_OF_CLUSTER>
use: <NAME_OF_CLUSTER>

Would it be nicer to call this use instead of run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f9f4315

@schlunma
Copy link
Contributor Author

schlunma commented Dec 19, 2024

Very nice to see this finally happening @schlunma!

I just wondered about the format of the configuration file. Would it make sense to be able to specify the configuration parameters per 'cluster'?

How about using the following as a format:

dask:
  use: threaded  # Use the `threaded` profile defined below
  profiles:
    threaded:
      scheduler: threaded
      num_workers: 4
    debug:
      scheduler: synchronous
    local:
      scheduler: distributed.LocalCluster
        n_workers: 2
        threads_per_worker: 2
        memory_limit: 4GiB
    compute:
      scheduler: dask_jobqueue.SLURMCluster
        queue: compute
        account: bd0854
        cores: 128
        memory: 240GiB
        processes: 32
        interface: ib0
        local_directory: /scratch/b/b381141/dask-tmp
        n_workers: 32
        walltime: '2:00:00'
    external:
      scheduler: distributed.Client
         address: "tcp://127.0.0.1:43605"

So all the parameters from the profile selected with use will just be applied using dask.config.set, but we do have a special treatment of the scheduler key if it is not one of the values that are supported by Dask by default, we import the module and instantiate the class.

I'm not quite sure I understand. This doesn't look like valid YAML to me 😬 My idea was that you should be able to use arbitrary kwargs for the scheduler, the client, and dask.config.set to be as flexible as possible.

@bouweandela
Copy link
Member

bouweandela commented Dec 19, 2024

I'm not quite sure I understand. This doesn't look like valid YAML to me 😬

🤦 Sorry for the confusion, I tried to write that a bit too fast.

Here is a working YAML file:

dask:
  use: local-threaded  # Use the `local-threaded` profile defined below
  profiles:
    local-threaded:
      scheduler: threaded
      num_workers: 4
    local-distributed:
      cluster:
        type: distributed.LocalCluster
        n_workers: 2
        threads_per_worker: 2
        memory_limit: 4GiB
    debug:
      scheduler: synchronous
    external:
      scheduler-address: "tcp://127.0.0.1:43605"
    dkrz-compute:
      cluster:
        type: dask_jobqueue.SLURMCluster
        queue: compute
        account: bd0854
        cores: 128
        memory: 240GiB
        processes: 32
        interface: ib0
        local_directory: /scratch/b/b381141/dask-tmp
        n_workers: 32
        walltime: '2:00:00'
  • I like the ability to set Dask configuration parameters, but think it would be better if they could be set per profile instead of as global configuration, therefore I propose to extend the definitions in clusters so they can include Dask configuration parameters. To capture this added option, I think profiles might be better name than clusters, but I'm open to other ideas. So in the example above, you would take the profile indicated by use, pop the cluster key and start a cluster with the provided information, and then pass all the remaining settings to dask.config.set.
  • use instead of run indicates better which profile will be used, but open to suggestions if you have a better idea
  • Having 4 (run, client, clusters, and config) options to choose from at the top level seems much. I had a look at the things that can be provided to distributed.Client and most of these options do not look very useful and some can be specified via a dask configuration parameter. It looks like the scheduler-address dask configuration parameter can be used to connect to an existing cluster, and that was the only one we've been using so far. Therefore propose we leave this one out for the moment until someone complains that they need to be able to configure this.
  • I used different names for the profiles vs the scheduler setting value, to avoid confusion

@schlunma
Copy link
Contributor Author

Thanks for your comments @bouweandela! I implemented all your desired changes and tested this thoroughly, everything seems to be working fine! Let me know what you think.

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the udpates @schlunma! I've now done a full review of the documentation and suggested a few improvements to the code.

@schlunma
Copy link
Contributor Author

Thanks for all the udpates @schlunma! I've now done a full review of the documentation and suggested a few improvements to the code.

Thanks for the review! I think I address everything.

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍻

@bouweandela bouweandela merged commit 23400b1 into main Dec 20, 2024
7 checks passed
@bouweandela bouweandela deleted the configurable_dask branch December 20, 2024 14:11
@schlunma schlunma mentioned this pull request Feb 27, 2025
10 tasks
@schlunma schlunma added the config label Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config dask related to improvements using Dask deprecated feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make dask.yml path configurable Add support for configuring Dask distributed

4 participants