Add is_valid and truncate methods to NullBufferBuilder#7013
Add is_valid and truncate methods to NullBufferBuilder#7013alamb merged 14 commits intoapache:mainfrom
is_valid and truncate methods to NullBufferBuilder#7013Conversation
Jefffrey
left a comment
There was a problem hiding this comment.
Thanks for picking this up, left some comments 👍
arrow-buffer/src/builder/null.rs
Outdated
| if let Some(buf) = self.bitmap_builder.as_ref() { | ||
| buf.capacity() | ||
| } else { | ||
| 0 |
There was a problem hiding this comment.
I think need to reference self.capacity here instead of 0?
There was a problem hiding this comment.
Actually, I'm not sure when bitmap_builder hasn't been initialized, we should return the actual capacity of it (that is 0 because no builder existed) or self.capacity.
There was a problem hiding this comment.
In my opinion it would feel weird if I initialize a NullBufferBuilder with a capacity:
let nbb = NullBufferBuilder::new(10)But then checking the capacity right after would result in it saying 0.
There was a problem hiding this comment.
I agree it is a bit strange, but I think the PR as written makes the most sense.
Specifically, downstream in DataFusion (and other places) we use the capacity as a way to calculate how much memory has been allocated -- if there is no bitmap_builder there is no memory allocated.
I think we can make this less confusing with some comments. I left some suggestions
Another thing that might help might be to rename it to fn allocated_capacity() to make the difference more explicit 🤔
There was a problem hiding this comment.
- I also made Minor: Clarify NullBufferBuilder::new capacity parameter #7016 to try and clarify the behavior of
capacity
There was a problem hiding this comment.
fn allocated_capacity() would make sense, but it seems we already have that:
arrow-rs/arrow-buffer/src/builder/null.rs
Lines 188 to 194 in 0c07ec7
There was a problem hiding this comment.
Or rename capacity field to initial_capacity or initial_bits to clearly indicate this is for initialization only?
pub fn new(capacity: usize) -> Self {
Self {
bitmap_builder: None,
len: 0,
initial_capacity,
}
}
There was a problem hiding this comment.
I think we don't need to add a capacity() method given allocated_size() seems to already do what is required
arrow-buffer/src/builder/null.rs
Outdated
| if len > self.len { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Once we materialize, is self.len accurate anymore?
There was a problem hiding this comment.
I am not sure what you mean -- in this case the user requested to truncate to a value larger than the current length which has no effect.
This behavior seems to be consistent with Vec::truncate so it makes sense to me: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.truncate
There was a problem hiding this comment.
Consider this test case:
#[test]
fn test123() {
let mut builder = NullBufferBuilder::new(0);
assert_eq!(builder.len(), 0);
builder.append_n_nulls(2);
assert_eq!(builder.len(), 2);
builder.truncate(1);
assert_eq!(builder.len(), 1); // fails here
}It would fail at the last assertion because it was materialized after appending two nulls, but then truncating down to 1 is a noop since the internal self.len stays 0 (not updated after materialization)
There was a problem hiding this comment.
You are (of course) correct. This is a great test and find
I took the liberty of pushing a commit to this branch that includes this test case (and will fail CI until it is fixed so block this PR from merging)
There was a problem hiding this comment.
Fixed in d0d3c62 and added some more documentation on the relationship between len and builder
alamb
left a comment
There was a problem hiding this comment.
Thanks @Chen-Yuan-Lai and @Jefffrey -- I think this PR looks really nice.
The comments / suggestions from @Jefffrey are worth looking at I think but in my opinion we could also merge this PR as is.
I will wait for @Jefffrey 's comments before doing so
arrow-buffer/src/builder/null.rs
Outdated
| if let Some(buf) = self.bitmap_builder.as_ref() { | ||
| buf.capacity() | ||
| } else { | ||
| 0 |
There was a problem hiding this comment.
I agree it is a bit strange, but I think the PR as written makes the most sense.
Specifically, downstream in DataFusion (and other places) we use the capacity as a way to calculate how much memory has been allocated -- if there is no bitmap_builder there is no memory allocated.
I think we can make this less confusing with some comments. I left some suggestions
Another thing that might help might be to rename it to fn allocated_capacity() to make the difference more explicit 🤔
arrow-buffer/src/builder/null.rs
Outdated
| if len > self.len { | ||
| return; | ||
| } |
There was a problem hiding this comment.
I am not sure what you mean -- in this case the user requested to truncate to a value larger than the current length which has no effect.
This behavior seems to be consistent with Vec::truncate so it makes sense to me: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.truncate
|
As I am working on the next release, I am going to try and push this PR along |
…-rs into add_required_methods
NullBufferBuilderis_valid and truncate methods to NullBufferBuilder
| builder.append_n_nulls(2); | ||
| builder.append_n_non_nulls(2); | ||
| assert_eq!(6, builder.len()); | ||
| assert_eq!(512, builder.allocated_size()); |
There was a problem hiding this comment.
it appears that @XiangpengHao had already added allocated_size
There was a problem hiding this comment.
Since we removed capacity(), I think assertions for builder.allocated_size() can also be removed in the test_null_buffer_builder_truncate test.
There was a problem hiding this comment.
Since there was no other testing of allocated_size that I could find, I thought it would be good to leave these tests in as the increase coverage for the existing behavior
|
@Chen-Yuan-Lai and @Jefffrey -- I made some changes to this PR that I think address the (excellent) comments. Let me know what you think |
|
Thanks again @Chen-Yuan-Lai and @Jefffrey |
Which issue does this PR close?
Closes #7002 .
Rationale for this change
As #7002 says, some required methods are implemented due to the need to replace
BooleanBufferBuilderwithNullBufferBuilderin DataFsusion.What changes are included in this PR?
capacity()is_valid()(as same asget_bit()inBooleanBufferBuilder)truncate()Are there any user-facing changes?