Avoid unnecessary branching in row read/write if schema is null-free by yjshen · Pull Request #1891 · apache/datafusion

yjshen · 2022-02-27T05:03:34Z

Which issue does this PR close?

Part of #1861

Rationale for this change

We can avoid null bit sets in the row representation and eliminate unnecessary branching during reading/writing, for both space and performance, when the row is null-free according to its schema.

Are there any user-facing changes?

No.

alamb

Looks good to me @yjshen

datafusion/src/row/mod.rs

alamb · 2022-03-01T14:32:54Z

datafusion/src/row/reader.rs

+            &[]
+        } else {
+            let start = self.base_offset;
+            &self.data[start..start + self.null_width]


if null_width is always zero, I wonder if the check for self.null_free is needed?

This is for not null_free code path. Actually this method shouldn't be touched when tuples are null-free

alamb · 2022-03-01T14:34:41Z

datafusion/src/row/writer.rs

 use arrow::datatypes::{DataType, Schema};
 use arrow::record_batch::RecordBatch;
 use arrow::util::bit_util::{ceil, round_upto_power_of_2, set_bit_raw, unset_bit_raw};
+#[cfg(feature = "jit")]


I think over time it would be good to start trying to encapsulate the JIT'd code more (as in reduce the number of #[cfg(feature = "jit")] calls -- perhaps by defining a common interface for creating jit and non jit versions. As I am interested in getting more involved in this project, I would be happy to try and do so (or do it as part of a larger body of work)

Ah, that would be great! Thanks for the offering.

I'll see what I can do over the next day or two

yjshen added 2 commits February 27, 2022 12:32

Avoid unnecessary branching in row read/write if schema is null-free

e5a246b

test null free code path for binary as well

41581db

github-actions bot added the datafusion label Feb 27, 2022

alamb approved these changes Mar 1, 2022

View reviewed changes

name nf to null_free

c7680b9

alamb merged commit cc22e17 into apache:master Mar 1, 2022

yjshen mentioned this pull request Mar 4, 2022

[Epic]: Complete ROW Format (Missing features) #1861

Closed

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Avoid unnecessary branching in row read/write if schema is null-free#1891

Avoid unnecessary branching in row read/write if schema is null-free#1891
alamb merged 3 commits intoapache:masterfrom
yjshen:jit

yjshen commented Feb 27, 2022 •

edited

Loading

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb Mar 1, 2022

Uh oh!

yjshen Mar 1, 2022

Uh oh!

alamb Mar 1, 2022

Uh oh!

yjshen Mar 1, 2022

Uh oh!

alamb Mar 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

yjshen commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

yjshen Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

yjshen Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yjshen commented Feb 27, 2022 •

edited

Loading