Closed
Conversation
Member
|
Yes, we definitely don't want to send the entire array into a delayed function call. This will instantiate the full array, then call the function on it, which is likely to fail if the array is larger than memory. |
Member
Author
|
Of course, can you please elaborate on how that would be fixed? Other than the quite messy and undesirable approach I've already noted above, I'm not seeing a good way to do this. |
Member
Author
|
Could PR ( #2545 ) please be merged in the interim? |
Attempts to make Dask Array's compress lazy by wrapping up each slicing with a delayed closure and noting the sliced dimension has an unknown shape.
Member
Author
Member
Author
|
This Travis CI build can be cancelled. Looks like the AppVeyor build already was. |
Member
Author
|
Should this be on a milestone given it was closed out? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(Tries to) fix #2540
Attempts to implement Dask Array's
compressin a lazy manner. Does this by making used ofdelayedto wrap up applications of slices to the Dask Array beingcompressed. Also makes use ofnp.nanto describe the length of dimension that wascompressed. This seems to sort of work.That said, there are definitely some rough edges. First as
delayedreturns one big chunk and we are not breaking up array chunks before slicing them. Ideally each chunk would be passed through thesedelayedcalls and reconstructed on the other end. That way chunking could largely be preserved throughcompress. It's unclear to me that what the cleanest and simplest way to do this is ATM. My best thinking would be slicing the array into pieces after inspecting its chunk sizes and trying to reassemble it afterwards, but that is hardly clean and not really simple AFAICT. Second the tests appear to be failing locally. However this failure doesn't have anything to do with the accuracy of the resulting Dask Array, but instead is due to some property of the graph isn't being maintained. Looking at the error it is not clear to me where this implementation caused it. So the problem may be somewhere deeper in Dask.ATM I'm not sure that I will be able to solve these problems due to lack of time and/or lack of understanding.