Skip to content

New graph on which the scheduler performs poorly #994

@mrocklin

Description

@mrocklin

In pydata/xarray#729 we found a real-world problem that contained a graph that dask should have been able to compute easily (I think), but didn't.

The resulting graph is here: dask.pdf

The graph as a cloudpickled file is here: dask.pkl.txt

I have a zip file with a dataset and script to reproduce this graph.

Some steps:

  1. Take a hard look at this graph to verify that yes, it indeed should be executing in small space.
  2. Visualize the result of dask.ordering.order on such a graph to ensure that we intend to walk through this graph cleanly. Perhaps do a tiny bit of simulation to improve on the accuracy beyond just the ordering produced by order
  3. Perhaps benchmark ghost computations, which can be easily reproduced and look somewhat similar, to ensure that they walk through a graph somewhat cleanly.
  4. ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions