Implement GenericByteViewArray::gc for compacting StringViewArray and ByteViewArray#5707
Implement GenericByteViewArray::gc for compacting StringViewArray and ByteViewArray#5707ClSlaid wants to merge 4 commits intoapache:mainfrom
GenericByteViewArray::gc for compacting StringViewArray and ByteViewArray#5707Conversation
part 1: implement checker to check if current buffer is compact Signed-off-by: 蔡略 <[email protected]>
part2: implement actual gc algorithm Signed-off-by: 蔡略 <[email protected]>
Signed-off-by: 蔡略 <[email protected]>
04efa78 to
c45acb8
Compare
|
The naming of making
renaming |
|
Thank you -- looking forward to this one. |
There was a problem hiding this comment.
I wonder if there is a better algorithm for this.
It's the most straightforward and simplest way I could come up with, though.
There was a problem hiding this comment.
I recommend we pull the "compact checking" algorithm into a new PR to discuss it -- I am not sure about the assumption that StringViewArrays will mostly often be compacted (I would actually expect the opposite)
7e47a1d to
73ab8a3
Compare
Signed-off-by: 蔡略 <[email protected]>
73ab8a3 to
6d2a453
Compare
| } | ||
|
|
||
| let mut new_views = Vec::with_capacity(self.views.len()); | ||
| let mut new_bufs: Vec<Vec<u8>> = vec![vec![]; self.buffers.len()]; |
There was a problem hiding this comment.
The number of buffers may shrink after gc. Every buffer should be filled up to block_size.
See GenericByteViewBuilder::append_value.
GenericByteViewArray::gc for compacting StringViewArray and ByteViewArray
alamb
left a comment
There was a problem hiding this comment.
Thank you for this contributon @ClSlaid and for the review @RinChanNOWWW
My biggest suggestion is to split the code in this PR into two PRs -- one for the "compact_check" and one for the actual compaction.
I had some suggestions on some additional coverage needed for the gc, but I think it is looking quite close.
Thanks again!
There was a problem hiding this comment.
I recommend we pull the "compact checking" algorithm into a new PR to discuss it -- I am not sure about the assumption that StringViewArrays will mostly often be compacted (I would actually expect the opposite)
| let mut view = ByteView::from(array.views[1]); | ||
| view.length = 15; | ||
| let new_views = ScalarBuffer::from(vec![array.views[0], view.into()]); |
There was a problem hiding this comment.
I dn't undertand what this is doing
I think you can more easily create a stringview with multiple buffers like this:
arrow-rs/arrow-cast/src/pretty.rs
Lines 334 to 341 in 905c46b
There was a problem hiding this comment.
Not familiar with the rest of the code, so I made it out brutally.
There was a problem hiding this comment.
that StringViewArrays will mostly often be compacted (I would actually expect the opposite)
We assume the same idea, it's likely to be not compacted.
|
Thanks for your review @alamb, and I'd left this PR to a draft and reopen two PRs for this. |
|
Closed by #5885 |
Which issue does this PR close?
Closes #5513.This PR is being splited into:
Rationale for this change
What changes are included in this PR?
Adding a new
gcmethod forStringViewArrayandByteViewArray.Are there any user-facing changes?
None.