-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix array copy not being a no-op #9555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| y_c = y.copy() | ||
| assert y is not y_c | ||
| assert y.compute() is not y_c.compute() | ||
| assert y.name == y_c.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian-r-rose I just realized, maybe I should do an assert y is not y_c here too to enforce/confirm/define that the expected behavior of .copy is to make a new instance of the Array object but have the same task graph.
|
Thanks @djhoese for the CC! Searching back through my memory (and the old issues referenced here) I think the problem was (and still remains, actually) that Iris expects that, after a copy of a However, that is an Iris thing and not a dask thing... I guess what would help Iris, then, is if there is a mechanism (perhaps this is just |
The dask |
|
@DPeterK Any other comments on this? @ian-r-rose Any chance this can get merged? |
ian-r-rose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this, thanks @djhoese. I, too, am curious about @DPeterK's use-case such that the underlying array data need to be different pieces of memory, and whether we can address that in a way that is more dask-y. But I also suspect that the previous implementation where copy's only happened for single-partition arrays wasn't really solving that use-case either.
If it is like my case, the single-partition/chunk case was only used in unit tests and probably not very likely in real world cases. |
|
Thanks @djhoese! |
This reverts commit 7363412.
* Replicate legacy dask behaviour pre dask/dask#9555. * Revert "Replicate legacy dask behaviour pre dask/dask#9555." This reverts commit 7363412. * Accept shared NumPy arrays when copying certain Dask arrays - dask/dask#9555. * Updated lock-files. * What's New entry. * Re-arrange What's New entry.
For info, this is basically the conclusion we came to in Iris. Now we wait to see if any user tells us otherwise! |
|
Oh sorry @djhoese - glad you went ahead and merged rather than waiting on me... As @rcomer says, Iris had a test that expected this behaviour; indeed this test was why I introduced the former behaviour into dask in the first place. The Iris devs think there was a reason that we expected this behaviour in Iris - something like it was necessary for some math operations - but we can't remember either precisely what the use-case was nor whether that use-case is now legacy as well. Otherwise I think this change is a positive step for dask, and I reckon Iris should just follow what dask does for array management. If we find there are user problems we can deal with them should they arrive... |
pre-commit run --all-filesSee #9533 for details. This reverts the logic implemented in #3754 which was a fix for #3751.
CC @DPeterK who made the original #3751 issue.
The summary is that users shouldn't assume anything about a dask Array's memory allocation or usage. Doing an
Array.copymay return the same block of allocated memory once computed as itsfrom_arrayinput or it may return a newly allocated block of memory. Dask has the possibility of using memory however it deems is the most efficient or at least that's the idea. The thing that should be consistent is that doing a no-op operation (like.copyis currently documented as) should not add additional tasks to the dask graph.