Skip to content

WIP: Lazy compress#2548

Closed
jakirkham wants to merge 1 commit intodask:masterfrom
jakirkham:lazy_compress
Closed

WIP: Lazy compress#2548
jakirkham wants to merge 1 commit intodask:masterfrom
jakirkham:lazy_compress

Conversation

@jakirkham
Copy link
Copy Markdown
Member

(Tries to) fix #2540

Attempts to implement Dask Array's compress in a lazy manner. Does this by making used of delayed to wrap up applications of slices to the Dask Array being compressed. Also makes use of np.nan to describe the length of dimension that was compressed. This seems to sort of work.

That said, there are definitely some rough edges. First as delayed returns one big chunk and we are not breaking up array chunks before slicing them. Ideally each chunk would be passed through these delayed calls and reconstructed on the other end. That way chunking could largely be preserved through compress. It's unclear to me that what the cleanest and simplest way to do this is ATM. My best thinking would be slicing the array into pieces after inspecting its chunk sizes and trying to reassemble it afterwards, but that is hardly clean and not really simple AFAICT. Second the tests appear to be failing locally. However this failure doesn't have anything to do with the accuracy of the resulting Dask Array, but instead is due to some property of the graph isn't being maintained. Looking at the error it is not clear to me where this implementation caused it. So the problem may be somewhere deeper in Dask.

ATM I'm not sure that I will be able to solve these problems due to lack of time and/or lack of understanding.

@mrocklin
Copy link
Copy Markdown
Member

Yes, we definitely don't want to send the entire array into a delayed function call. This will instantiate the full array, then call the function on it, which is likely to fail if the array is larger than memory.

@jakirkham
Copy link
Copy Markdown
Member Author

Of course, can you please elaborate on how that would be fixed? Other than the quite messy and undesirable approach I've already noted above, I'm not seeing a good way to do this.

@jakirkham
Copy link
Copy Markdown
Member Author

Could PR ( #2545 ) please be merged in the interim?

Attempts to make Dask Array's compress lazy by wrapping up each slicing
with a delayed closure and noting the sliced dimension has an unknown
shape.
@jakirkham
Copy link
Copy Markdown
Member Author

Closing out in favor of PR ( #2555 ). Thanks @jcrist.

@jakirkham jakirkham closed this Jul 25, 2017
@jakirkham jakirkham deleted the lazy_compress branch July 25, 2017 18:03
@jakirkham
Copy link
Copy Markdown
Member Author

This Travis CI build can be cancelled. Looks like the AppVeyor build already was.

@sinhrks sinhrks added this to the 0.15.2 milestone Aug 30, 2017
@jakirkham
Copy link
Copy Markdown
Member Author

Should this be on a milestone given it was closed out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy implementation of compress

3 participants