fix: Taking slicing into account when writing BooleanBuffers as fast-encoding format#1522
Merged
kazuyukitanimura merged 1 commit intoapache:mainfrom Mar 25, 2025
Merged
Conversation
1816f03 to
05a4b04
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1522 +/- ##
============================================
+ Coverage 56.12% 58.97% +2.84%
- Complexity 976 1028 +52
============================================
Files 119 122 +3
Lines 11743 12268 +525
Branches 2251 2309 +58
============================================
+ Hits 6591 7235 +644
+ Misses 4012 3875 -137
- Partials 1140 1158 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
kazuyukitanimura
approved these changes
Mar 24, 2025
Contributor
kazuyukitanimura
left a comment
There was a problem hiding this comment.
Thank you @Kontinuation
Contributor
|
Thanks @Kontinuation merged |
coderfender
pushed a commit
to coderfender/datafusion-comet
that referenced
this pull request
Dec 13, 2025
…ache#1522) ## Which issue does this PR close? Closes apache#1520. ## Rationale for this change This is a problem I found when working on apache#1511, the null bits were not correctly written and caused test failures. This patch is an attempt to fix it. This patch is only aiming for fixing correctness problems. As apache#1190 (comment) pointed out, the fast BatchWriter may write full data buffer for sliced `Utf8` arrays, so there's still some performance implications when working with sliced arrays. ## What changes are included in this PR? Correctly take slicing indices and length into account when writing BooleanBuffers. This applies to null bits of all arrays, and the values of boolean arrays. ## How are these changes tested? Added a new round-trip test for sliced record batches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1520.
Rationale for this change
This is a problem I found when working on #1511, the null bits were not correctly written and caused test failures. This patch is an attempt to fix it.
This patch is only aiming for fixing correctness problems. As #1190 (comment) pointed out, the fast BatchWriter may write full data buffer for sliced
Utf8arrays, so there's still some performance implications when working with sliced arrays.What changes are included in this PR?
Correctly take slicing indices and length into account when writing BooleanBuffers. This applies to null bits of all arrays, and the values of boolean arrays.
How are these changes tested?
Added a new round-trip test for sliced record batches.