Fix array copy not being a no-op #9555

djhoese · 2022-10-06T20:12:08Z

Closes Array.copy is not a no-op #9533
Tests added / passed
Passes pre-commit run --all-files

See #9533 for details. This reverts the logic implemented in #3754 which was a fix for #3751.

CC @DPeterK who made the original #3751 issue.

The summary is that users shouldn't assume anything about a dask Array's memory allocation or usage. Doing an Array.copy may return the same block of allocated memory once computed as its from_array input or it may return a newly allocated block of memory. Dask has the possibility of using memory however it deems is the most efficient or at least that's the idea. The thing that should be consistent is that doing a no-op operation (like .copy is currently documented as) should not add additional tasks to the dask graph.

    x = np.arange(10)
    y = da.from_array(x, chunks=chunks)
    y_c = y.copy()
    assert y.name == y_c.name

djhoese · 2022-10-07T14:32:17Z

dask/array/tests/test_array_core.py

    y_c = y.copy()
-    assert y is not y_c
-    assert y.compute() is not y_c.compute()
+    assert y.name == y_c.name


@ian-r-rose I just realized, maybe I should do an assert y is not y_c here too to enforce/confirm/define that the expected behavior of .copy is to make a new instance of the Array object but have the same task graph.

DPeterK · 2022-10-07T16:32:32Z

Thanks @djhoese for the CC! Searching back through my memory (and the old issues referenced here) I think the problem was (and still remains, actually) that Iris expects that, after a copy of a Cube, the two arrays are different entities; that is, a is not b. Copying a Cube will eventually copy its data array and Iris expects that these are not the same item in memory, which is important for most of the operations where Iris makes a copy of a Cube.

However, that is an Iris thing and not a dask thing... I guess what would help Iris, then, is if there is a mechanism (perhaps this is just deepcopy?) that means that Iris can make a new dask array in memory when copying a Cube.

djhoese · 2022-10-07T16:39:03Z

Copying a Cube will eventually copy its data array and Iris expects that these are not the same item in memory, which is important for most of the operations where Iris makes a copy of a Cube.

The dask Array objects are different, the task graphs are the same, and if it is a single chunk numpy array being given to da.from_array then dask's current threaded scheduler implementation will give you the original numpy array back when you call dask_arr.compute(). Could you shed more light on why Iris needs the memory to not be the same? Does it modify the numpy array under the hood in a map_blocks function or other similar task? Or does it modify the numpy array that it originally gave to dask?

djhoese · 2022-10-13T13:59:19Z

@DPeterK Any other comments on this?

@ian-r-rose Any chance this can get merged?

ian-r-rose

I'm happy with this, thanks @djhoese. I, too, am curious about @DPeterK's use-case such that the underlying array data need to be different pieces of memory, and whether we can address that in a way that is more dask-y. But I also suspect that the previous implementation where copy's only happened for single-partition arrays wasn't really solving that use-case either.

djhoese · 2022-10-14T12:02:12Z

But I also suspect that the previous implementation where copy's only happened for single-partition arrays wasn't really solving that use-case either.

If it is like my case, the single-partition/chunk case was only used in unit tests and probably not very likely in real world cases.

ian-r-rose · 2022-10-14T15:41:45Z

Thanks @djhoese!

This reverts commit 7363412.

…sk#9555.

* Replicate legacy dask behaviour pre dask/dask#9555. * Revert "Replicate legacy dask behaviour pre dask/dask#9555." This reverts commit 7363412. * Accept shared NumPy arrays when copying certain Dask arrays - dask/dask#9555. * Updated lock-files. * What's New entry. * Re-arrange What's New entry.

rcomer · 2022-11-05T08:54:08Z

If it is like my case, the single-partition/chunk case was only used in unit tests and probably not very likely in real world cases.

For info, this is basically the conclusion we came to in Iris. Now we wait to see if any user tells us otherwise!
SciTools/iris#5041

DPeterK · 2022-11-07T09:50:31Z

Oh sorry @djhoese - glad you went ahead and merged rather than waiting on me...

As @rcomer says, Iris had a test that expected this behaviour; indeed this test was why I introduced the former behaviour into dask in the first place. The Iris devs think there was a reason that we expected this behaviour in Iris - something like it was necessary for some math operations - but we can't remember either precisely what the use-case was nor whether that use-case is now legacy as well.

Otherwise I think this change is a positive step for dask, and I reckon Iris should just follow what dask does for array management. If we find there are user problems we can deal with them should they arrive...

Fix array copy not being a no-op

de766e2

github-actions bot added the array label Oct 6, 2022

djhoese mentioned this pull request Oct 6, 2022

Array.copy is not a no-op #9533

Closed

djhoese commented Oct 7, 2022

View reviewed changes

ian-r-rose approved these changes Oct 13, 2022

View reviewed changes

ian-r-rose mentioned this pull request Oct 13, 2022

Release 2022.10.0 dask/community#281

Closed

7 tasks

ian-r-rose merged commit d6938ac into dask:main Oct 14, 2022

rcomer mentioned this pull request Oct 15, 2022

[iris.ci] environment lockfiles auto-update SciTools/iris#5027

Closed

trexfeathers added a commit to trexfeathers/iris that referenced this pull request Oct 31, 2022

Replicate legacy dask behaviour pre dask/dask#9555.

7363412

trexfeathers mentioned this pull request Oct 31, 2022

Accept new copy behaviour from dask/dask#9555. SciTools/iris#5041

Merged

trexfeathers added a commit to trexfeathers/iris that referenced this pull request Nov 2, 2022

Revert "Replicate legacy dask behaviour pre dask/dask#9555."

66fe23b

This reverts commit 7363412.

trexfeathers added a commit to trexfeathers/iris that referenced this pull request Nov 2, 2022

Accept shared NumPy arrays when copying certain Dask arrays - dask/da…

cdf288a

…sk#9555.

djhoese deleted the revert-dask-copy-new-task branch April 5, 2023 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix array copy not being a no-op #9555

Fix array copy not being a no-op #9555

Uh oh!

djhoese commented Oct 6, 2022

Uh oh!

djhoese Oct 7, 2022

Uh oh!

DPeterK commented Oct 7, 2022

Uh oh!

djhoese commented Oct 7, 2022 •

edited

Loading

Uh oh!

djhoese commented Oct 13, 2022

Uh oh!

ian-r-rose left a comment

Uh oh!

djhoese commented Oct 14, 2022

Uh oh!

ian-r-rose commented Oct 14, 2022

Uh oh!

rcomer commented Nov 5, 2022

Uh oh!

DPeterK commented Nov 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Fix array copy not being a no-op #9555

Fix array copy not being a no-op #9555

Uh oh!

Conversation

djhoese commented Oct 6, 2022

Uh oh!

djhoese Oct 7, 2022

Choose a reason for hiding this comment

Uh oh!

DPeterK commented Oct 7, 2022

Uh oh!

djhoese commented Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djhoese commented Oct 13, 2022

Uh oh!

ian-r-rose left a comment

Choose a reason for hiding this comment

Uh oh!

djhoese commented Oct 14, 2022

Uh oh!

ian-r-rose commented Oct 14, 2022

Uh oh!

rcomer commented Nov 5, 2022

Uh oh!

DPeterK commented Nov 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

djhoese commented Oct 7, 2022 •

edited

Loading