Minor: Clarify documentation on NullBufferBuilder::allocated_size#7089
Minor: Clarify documentation on NullBufferBuilder::allocated_size#7089Jefffrey merged 3 commits intoapache:mainfrom
NullBufferBuilder::allocated_size#7089Conversation
| } | ||
|
|
||
| /// Returns the capacity of the buffer | ||
| /// Returns the capacity of the buffer, in bits (not bytes) |
There was a problem hiding this comment.
The key detail is that this function returns the value in bits (not bytes)
| } | ||
|
|
||
| /// Return the allocated size of this builder, in bytes, useful for memory accounting. | ||
| /// Return the allocated size of this builder, in bits, useful for memory accounting. |
There was a problem hiding this comment.
This documentation is wrong -- as it uses capacity() directly which returns a size in bits
I actually think it would be better if we had a function that returned allocated size in bytes (maybe we can deprecate this one and make a new one like allocated_bytes() or something for clarity.
There was a problem hiding this comment.
IMO this is a bug and we could fix this without it being a breaking change
Jefffrey
left a comment
There was a problem hiding this comment.
This is something I also noticed in #7082
One thing to note is that NullBufferBuilder::allocated_size() is used here:
arrow-rs/arrow-array/src/builder/generic_bytes_view_builder.rs
Lines 397 to 408 in 92cfd99
And I think it's used with assumption it provides bytes not bits, so may need adjustment.
| /// // empty requires 0 bytes | ||
| /// let b = BooleanBufferBuilder::new(0); | ||
| /// assert_eq!(0, b.capacity()); | ||
| /// // Creating space for 1 bit results in 64 bytes (space for 512 bits) |
There was a problem hiding this comment.
Maybe add a small note here explaining why it over allocates?
NullBufferBuilder::allocated_sizeNullBufferBuilder::allocated_size
|
|
👍 |
…pache#7089) * Minor: Clarify documentaiton on NullBufferBuilder::allocated_size * add note about why allocations are 64 bytes
…e32 (#7074) * Support converting large dates (i.e. +10999-12-31) from string to Date32 * Fix lint * Update arrow-cast/src/parse.rs Co-authored-by: Andrew Lamb <[email protected]> * fix: issue introduced in #6833 - less than equal check for scale in decimal conversion (#7070) * fix <= check for scale in decimal conversion * Update arrow-cast/src/cast/mod.rs name change Co-authored-by: Arttu <[email protected]> * remove incorrect comment --------- Co-authored-by: Arttu <[email protected]> * minor: re-export `OffsetBufferBuilder` in `arrow` crate (#7077) * Add another decimal cast edge test case (#7078) * Add another decimal cast edge test case Before 1019f5b this test would fail, as the cast produced 1. 0 is an edge case worth explicitly testing for. * typo/fmt Co-authored-by: Felipe Oliveira Carvalho <[email protected]> --------- Co-authored-by: Felipe Oliveira Carvalho <[email protected]> * Support both 0x01 and 0x02 as type for list of booleans in thrift metadata (#7052) * Support both 0x01 and 0x02 as type for list of booleans * Also support 0 for false inside boolean collections * Use hex notation in tests * Fix LocalFileSystem with range request that ends beyond end of file (#6751) * Fix LocalFileSystem with range request that ends beyond end of file * fix windows * add comment * Seek error * fix seek check * remove windows flag * Get file length from file metadata * Introduce `UnsafeFlag` to manage disabling `ArrayData` validation (#7027) * Introduce UnsafeFlag to manage disabling validation * fix docs * Refactor arrow-ipc: Rename `ArrayReader` to `RecodeBatchDecoder` (#7028) * Rename `ArrayReader` to `RecordBatchDecoder` * Remove alias for `self` * Minor: Update release schedule (#7086) * Minor: Update release schedule * realism * Refactor some decimal-related code and tests (#7062) * Refactor some decimal-related code and tests in preparation for adding Decimal32 and Decimal64 support * Fixed symbol * Apply PR feedback * Fixed format problem * Fixed logical merge conflicts * PR feedback * Refactor arrow-ipc: Move `create_*_array` methods into `RecordBatchDecoder` (#7029) * Move `create_primitive_array` into RecordBatchReader * Move `create_list-array` into RecordBatchReader * Move `create_dictionay_array` into RecordBatchReader * Print Parquet BasicTypeInfo id when present (#7094) * Print Parquet BasicTypeInfo id when present * Improve print_schema documentation * tiny cleanup * Add a custom implementation `LocalFileSystem::list_with_offset` (#7019) * Initial change from Daniel. * Upgrade unit test to be more generic. * Add comments on why we have filter * Cleanup unit tests. * Update object_store/src/local.rs Co-authored-by: Adam Reeve <[email protected]> * Add changes suggested by Adam. * Cleanup match error. * Apply formatting changes suggested by cargo +stable fmt --all. * Apply cosmetic changes suggested by clippy. * Upgrade test_path_with_offset to create temporary directory + files for testing rather than pointing to existing dir. --------- Co-authored-by: Adam Reeve <[email protected]> * fix: first none/empty list in `ListArray` panics in `cast_with_options` (#7065) * fix: first none in `ListArray` panics in `cast_with_options` * simplify * fix * Update arrow-cast/src/cast/list.rs Co-authored-by: Jeffrey Vo <[email protected]> --------- Co-authored-by: Jeffrey Vo <[email protected]> * Benchmarks for Arrow IPC writer (#7090) * Add benchmarks for Arrow IPC writer * Add benchmarks for Arrow IPC writer * reuse target buffer * rename, etc * Add compression type * update --------- Co-authored-by: Andy Grove <[email protected]> * Minor: Clarify documentation on `NullBufferBuilder::allocated_size` (#7089) * Minor: Clarify documentaiton on NullBufferBuilder::allocated_size * add note about why allocations are 64 bytes * Add more tests for edge cases * Add negative test case for incorrectly formatted large dates --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Himadri Pal <[email protected]> Co-authored-by: Arttu <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]> Co-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: Jörn Horstmann <[email protected]> Co-authored-by: Kyle Barron <[email protected]> Co-authored-by: Curt Hagenlocher <[email protected]> Co-authored-by: Devin Smith <[email protected]> Co-authored-by: Corwin Joy <[email protected]> Co-authored-by: Adam Reeve <[email protected]> Co-authored-by: irenjj <[email protected]> Co-authored-by: Jeffrey Vo <[email protected]> Co-authored-by: Andy Grove <[email protected]>
Which issue does this PR close?
Related to
Rationale for this change
@Chen-Yuan-Lai found in apache/datafusion#14504 that
NullBufferBuilder::allocated_sizeactually reports the size in bits, not bytes -- nice eyes!What changes are included in this PR?
Update the documentation to reflect what the code does
BooleanBufferBuilder::capacityto verify it is in terms of bitsAre there any user-facing changes?
Better docs