Skip to content

Conversation

@rluvaton
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

The Arrow IPC format already contain the null count for each array, reusing it instead of calculating each time to improve performance

What changes are included in this PR?

Update the create_array in RecordBatchDecoder to use the null count when creating ArrayDataBuilder

Are these changes tested?

Existing tests

Are there any user-facing changes?

No

this uses the `null_count()`` on `FieldNode` instead of calculating it each time
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rluvaton -- the code looks good to me

I wonder if you think we should add any new tests to cover this new path, or if you have any benchmarks we should run that might show an improvement?

@rluvaton
Copy link
Member Author

rluvaton commented Aug 13, 2025

Not sure about adding a test, I can add a test for creating an invalid null length and see that it fails, but this will have the same result before and now, so that test should already exist regardless.

and about benchmarks, no I did not run any nor do I have any, any IPC reading benchmark for arrays that have nulls in them will suffice.

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, null counting probably is probably not the most expensive part, but good to avoid it regardless :)

@alamb alamb merged commit de7f866 into apache:main Aug 14, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 14, 2025

Thanks @rluvaton and @Dandandan

@rluvaton rluvaton deleted the skip-null-buffer-calculation-in-arrow-ipc branch August 14, 2025 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants