ARROW-15244: [Format] Clarify that offsets are monotonic for binary like arrays#12019
ARROW-15244: [Format] Clarify that offsets are monotonic for binary like arrays#12019alamb wants to merge 4 commits intoapache:masterfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
|
I also started a mailing list thread on this topic: https://lists.apache.org/thread/fx8k250nn1d9b86sfo9t2gcl1v11mn4f |
Co-authored-by: Jorge Leitao <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
|
|
|
Benchmark runs are scheduled for baseline = 31a07be and contender = e7dc8f5. e7dc8f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale
The question of "what are the values of the offsets for non-valid entries in arrays" came up in arrow-rs: apache/arrow-rs#1071 and the existing docs seem to be somewhat vague on this issue.
I looked at three implementations of arrow, and they all seem to assume / validate the offsets are monotonic:
https://github.com/jorgecarleitao/arrow2/blob/37a9c758826a92d98dc91e992b2a49ce9724095d/src/array/specification.rs#L102-L119
Changes
Thus I propose updating the format docs to make the monotonic offsets explicit.
Background
I think @jorgecarleitao's description on apache/arrow-rs#1071 (comment), explains the reason why having monotonic offsets is a good idea