Skip to content

Conversation

@jayhan94
Copy link
Contributor

What changes were proposed in this pull request?

When all values in column col_0 are NULLs within a row group, and we attempt to apply the predicate pushdown col_0 <=> 'xxx', the evaluatePredicateProto function returns TruthValue.NULL. In this case, we can directly determine the result based on the literal value: if the literal is NULL, return TruthValue.YES, otherwise, return TruthValue.NO.

Why are the changes needed?

See SPARK-52032.
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are NULL. The evaluatePredicateProto returns TruthValue.NULL, whose isNeeded returns false so that the whole row group is skipped by SargApplier.pickRowGroups, which actually is incorrect.

How was this patch tested?

There already exists unit test -- TestOrcTimezonePPD.testTimestampAllNulls

Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

@dongjoon-hyun
Copy link
Member

Thank you for making a PR, @jayhan94 .

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Thanks!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Thank you, @jayhan94 and @wgtmac .

Sorry for being later. I was a little busy until Today.

dongjoon-hyun pushed a commit that referenced this pull request May 21, 2025
…get evaluated correctly

### What changes were proposed in this pull request?

When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.

### Why are the changes needed?

See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect.

### How was this patch tested?

There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.

Authored-by: Jay Han <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 467a293)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request May 21, 2025
…get evaluated correctly

### What changes were proposed in this pull request?

When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.

### Why are the changes needed?

See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect.

### How was this patch tested?

There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.

Authored-by: Jay Han <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 467a293)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request May 21, 2025
…get evaluated correctly

### What changes were proposed in this pull request?

When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.

### Why are the changes needed?

See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect.

### How was this patch tested?

There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.

Authored-by: Jay Han <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 467a293)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request May 21, 2025
…get evaluated correctly

### What changes were proposed in this pull request?

When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`.

### Why are the changes needed?

See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032).
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect.

### How was this patch tested?

There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls`

### Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

Closes #2223 from jayhan94/fix_null_safe_equals_pred_push.

Authored-by: Jay Han <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 467a293)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you for your first contribution @jayhan94 . I added you to the Apache ORC contributor group and assigned ORC-1898 to you. Welcome to the Apache ORC community.

@dongjoon-hyun
Copy link
Member

This is merged to all live release branches for Apache ORC 2.2/2.1/2.0/1.9/1.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants