-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Blockwise uses Task class
#11568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockwise uses Task class
#11568
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 4h 3m 59s ⏱️ - 9m 54s Results for commit d1501d7. ± Comparison against base commit 307038b. ♻️ This comment has been updated with latest results. |
dask/_task_spec.py
Outdated
| def substitute(self, subs): | ||
| if self.key in subs or self.target.key in subs: | ||
| sub_key = subs.get(self.key, self.key) | ||
| val = subs.get(self.target.key, self.target.key) | ||
| if sub_key == self.key and val == self.target.key: | ||
| return self | ||
| if isinstance(val, GraphNode): | ||
| return val.substitute({}, key=sub_key) | ||
| return Alias(sub_key, val) | ||
| return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The substitution logic could be factored out into a separate PR on request. It's not used anywhere else so I chose to keep in in here but if it helps with review, I'll break it out
| io_deps=None, | ||
| keys=None, | ||
| ): | ||
| """Tensor operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- consider putting doc string back. This is now an internal function so I want to only put it back if it helps developers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the dosctring around would be helpful to future me.
| return [ | ||
| "Comparing two dask graph nodes:", | ||
| f" left: {left.key} right: {right.key}", | ||
| " Diff:", | ||
| ] + diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't look great but it's something
2d1f186 to
9cecb08
Compare
| io_deps=None, | ||
| keys=None, | ||
| ): | ||
| """Tensor operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the dosctring around would be helpful to future me.
|
Comparing performance to main I ran the test_climatology::test_highlevel_api on small scale and collected profiles. Primarily I looked at the initial ramp up time between starting graph construction client side and when the computation kicks off cluster side. Below are more detailed notes about the differences. Broadly speaking, picking is much faster because we're now deduplicating the subgraphs/tasks properly again (that's a regression in Note that the numbers below are collected with this regression coiled/benchmarks#1622 which blows up the graph size. We're dealing here with 500k tasks (after optimziation it's 300k) Blockwise TaskSpecClient side
Scheduler side
Main TaskSpecClient side
Scheduler side
|
dask/blockwise.py
Outdated
| Applies a task, ``func``, across blocks from many different input | ||
| collections. We arrange the pattern with which those blocks interact with | ||
| sets of matching indices. E.g.:: | ||
| make_blockwise_graph(task, 'z', 'i', 'x', 'i', 'y', 'i') | ||
| yield an embarrassingly parallel communication pattern and is read as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having those doc strings for make_blockwise_graph was a little off. We now only have this function as public interface so I moved the docs over here and migrated the doctests to the current version. Some of them are a little tedious so I skipped them but overall this should be working
|
Can we please ask for a migration guide, and how to make our code backward compatible? |
An update to the documentation is overdue and something I plan for in the next couple of days. Can you point to a package or issue that is affected (I assume dask-awkward)? blockwise isn't used that widely as a user facing API so I expected the splash zone to be small to moderate in size. I'm also happy to help with a migration (most of the time this is straight forward) Edit: Just saw dask-contrib/dask-awkward#556 I'll hop over there and will provide help (tomorrow) |
This migrates
Blockwiseto the Task class (xref #9969)