Skip to content

[DF] Make sure the Dask scheduler has information about the workers#9431

Merged
vepadulano merged 1 commit intoroot-project:masterfrom
vepadulano:distrdf-check-workers-dict
Dec 13, 2021
Merged

[DF] Make sure the Dask scheduler has information about the workers#9431
vepadulano merged 1 commit intoroot-project:masterfrom
vepadulano:distrdf-check-workers-dict

Conversation

@vepadulano
Copy link
Copy Markdown
Member

The current implementation of optimize_npartitions of the Dask backend
queries information about the workers from the Dask client object. The
information is stored in the client.scheduler_info() return value
which is a dictionary that can have the key workers.

Supposedly, when this key exists it means the Dask client has the needed
information. This is not always true. In certain scenarios, for example
when waiting for a batch system to return the available workers to the
dask client, the workers key will be present but its value will be an
empty dictionary. This is because the scheduler doesn't already know
which nodes of the cluster will become workers (this can be mitigated by
calling the client.wait_for_workers function beforehand).

This commit makes the check a bit stronger, getting the value of the
dictionary key workers and then checking if that value actually
contains something.

fixes #9429

The current implementation of `optimize_npartitions` of the Dask backend
queries information about the workers from the Dask client object. The
information is stored in the `client.scheduler_info()` return value
which is a dictionary that can have the key `workers`.

Supposedly, when this key exists it means the Dask client has the needed
information. This is not always true. In certain scenarios, for example
when waiting for a batch system to return the available workers to the
dask client, the `workers` key will be present but its value will be an
empty dictionary. This is because the scheduler doesn't already know
which nodes of the cluster will become workers (this can be mitigated by
calling the `client.wait_for_workers` function beforehand).

This commit makes the check a bit stronger, getting the value of the
dictionary key `workers` and then checking if that value actually
contains something.
@vepadulano
Copy link
Copy Markdown
Member Author

@phsft-bot build

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, ROOT-ubuntu2004/soversion, mac1015/python3, mac11/cxx17, windows10/cxx14
How to customize builds

@vepadulano
Copy link
Copy Markdown
Member Author

Unsure why the build system wasn't triggered previously

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1015/python3.
Running on macitois22.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@vepadulano vepadulano merged commit cd5210d into root-project:master Dec 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DF] Crash with distributed RDataFrame on dask with dask_jobqueue

3 participants