-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Add fuse to delayed object optimization
#6222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Many of our IO routes start/end with Delayed objects. With fuse enabled, we can more effectively optimize these graphs. Since using `Delayed` to build large graphs is already taking a performance hit (the delayed api is more ergonomic, but less efficient than raw graphs), the slower optimization time should be negligible for users compared to other overheads.
|
Aside from the performance hit of calling |
|
I can't see why we wouldn't? Dask in general is given full liberty about when and where to run tasks, fusing tasks IMO is part of this liberty. If a user wants fusion not to happen, they can set |
|
One, perhaps silly reason to avoid fusing is for education. We often use
dask delayed to show basics around the task scheduler to users in
situations like talks and the tutorial. Fusing there would obfuscate
what's going on a bit.
…On Mon, May 18, 2020 at 9:20 AM Jim Crist-Harif ***@***.***> wrote:
I can't see why we wouldn't? Dask in general is given full liberty about
when and where to run tasks, fusing tasks IMO is part of this liberty.
If a user wants fusion not to happen, they can set
optimization.fuse.enabled=False in the config to disable this
temporarily/globally.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6222 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTCTVERE3KPPO6MOKUDRSFN3BANCNFSM4NEGX6TQ>
.
|
Calling |
|
I'm aware. At least when I give tutorials or demos I usually have the
dashboard up. It's useful to see the inc/add/dec calls everywhere rather
than having to explain what an inc-dec call is.
…On Mon, May 18, 2020 at 9:28 AM Jim Crist-Harif ***@***.***> wrote:
One, perhaps silly reason to avoid fusing is for education. We often use
dask delayed to show basics around the task scheduler to users in
situations like talks and the tutorial. Fusing there would obfuscate
what's going on a bit.
Calling .visualize() doesn't optimize by default, so users won't see this
when visualizing. They will see it for computation.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6222 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTFAYN3C55EHLSD4EDLRSFOZHANCNFSM4NEGX6TQ>
.
|
|
But again, this may be silly
…On Mon, May 18, 2020 at 9:33 AM Matthew Rocklin ***@***.***> wrote:
I'm aware. At least when I give tutorials or demos I usually have the
dashboard up. It's useful to see the inc/add/dec calls everywhere rather
than having to explain what an inc-dec call is.
On Mon, May 18, 2020 at 9:28 AM Jim Crist-Harif ***@***.***>
wrote:
> One, perhaps silly reason to avoid fusing is for education. We often use
> dask delayed to show basics around the task scheduler to users in
> situations like talks and the tutorial. Fusing there would obfuscate
> what's going on a bit.
>
> Calling .visualize() doesn't optimize by default, so users won't see
> this when visualizing. They will see it for computation.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#6222 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACKZTFAYN3C55EHLSD4EDLRSFOZHANCNFSM4NEGX6TQ>
> .
>
|
|
I can see the issue, but I'm not sure if it outweighs the benefits. You also might develop graphs that don't hit this (anything with incoming branches), or develop a new narrative ("dask was smart enough to fuse these into one task"). No strong thoughts. |
|
When I've taught people about Dask graphs, have used visualize for that instead of a computation though, but that might be a style/preference thing :) I guess in the education case one could disable it in the notebook or hide that away in a config. Perhaps it is useful to teach about fusing though (off/on comparison)? |
|
The change here doesn't affect |
|
For educational purposes, it is easy enough to include a dask config that turns the fuse off if that is deemed preferable. So I don't think that the educational reason is enough to justify not adding this feature if it is otherwise an improvement. |
|
Are we still deciding whether this is a good idea or does this PR just need to pass CI? |
|
I want to poke at the underlying issue (better fusion of tasks that convert to/from delayed, with |
|
With the addition of high level graphs and other work since then, this PR no longer makes sense as is. Closing. |
Many of our IO routes start/end with Delayed objects. With fuse enabled,
we can more effectively optimize these graphs. Since using
Delayedtobuild large graphs is already taking a performance hit (the delayed api
is more ergonomic, but less efficient than raw graphs), the slower
optimization time should be negligible for users compared to other
overheads.
See #6219.
black dask/flake8 dask