ARROW-10046: [Rust] [DataFusion] Made RecordBatchReader implement Iterator#8225
Closed
jorgecarleitao wants to merge 3 commits intoapache:masterfrom
jorgecarleitao:reader_iterator
Closed
ARROW-10046: [Rust] [DataFusion] Made RecordBatchReader implement Iterator#8225jorgecarleitao wants to merge 3 commits intoapache:masterfrom jorgecarleitao:reader_iterator
RecordBatchReader implement Iterator#8225jorgecarleitao wants to merge 3 commits intoapache:masterfrom
jorgecarleitao:reader_iterator
Conversation
*Iterator implement IteratorRecordBatchReader implement Iterator
Member
Author
|
FYI, I am trying to summarize how DataFusion iterates over data, and I came up with this summary. |
This is a small conceptual change, but of fundamental importance: We can now build the iterator API on top of next_batch, as well as a futures::Stream trait.
alamb
approved these changes
Oct 1, 2020
Contributor
alamb
left a comment
There was a problem hiding this comment.
While this will be a breaking change for any code that currently uses RecordBatchReader I think it is a significant improvement and I vote (not that I am sure what my vote counts for :) ) that it is merged in.
If the API breakage is a concern (by removing RecordBatchReader::next_batch I have a suggestion, inline, that could potentially mitigate that)
andygrove
approved these changes
Oct 2, 2020
Member
andygrove
left a comment
There was a problem hiding this comment.
Very nice. Thanks @jorgecarleitao
andygrove
pushed a commit
that referenced
this pull request
Oct 3, 2020
This PR is built on top of #8225 and Replaces `Arc<Mutex<dyn ...>>` by `Box<dyn ...>`. In the TopK example, I had to move some functions away from the `impl`. This is because `self` cannot be borrowed as mutable and immutable at the same time, and, during iteration, it was being borrowed as mutable (to update the BTree) and as immutable (to access the `input`). There is probably a better way of achieving this e.g. via interior mutability. Closes #8307 from jorgecarleitao/box_iterator Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Andy Grove <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a proposal to change how we programmatically iterate over record batches in arrow and datafusion.
Instead of
use
I.e. via the
Iteratortrait.This allow us to write more expressive code, as well as offer a well documented and popular API to our users (Iterator).
Finally, this change also opens the possibility to implement
future::Stream, the async version ofIterator.