ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly #2223

jayhan94 · 2025-05-10T02:01:36Z

What changes were proposed in this pull request?

When all values in column col_0 are NULLs within a row group, and we attempt to apply the predicate pushdown col_0 <=> 'xxx', the evaluatePredicateProto function returns TruthValue.NULL. In this case, we can directly determine the result based on the literal value: if the literal is NULL, return TruthValue.YES, otherwise, return TruthValue.NO.

Why are the changes needed?

See SPARK-52032.
When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are NULL. The evaluatePredicateProto returns TruthValue.NULL, whose isNeeded returns false so that the whole row group is skipped by SargApplier.pickRowGroups, which actually is incorrect.

How was this patch tested?

There already exists unit test -- TestOrcTimezonePPD.testTimestampAllNulls

Was this patch authored or co-authored using generative AI tooling?

Co-authored using generative AI tooling.

.gitignore

dongjoon-hyun · 2025-05-10T02:26:23Z

Thank you for making a PR, @jayhan94 .

wgtmac

Looks reasonable to me. Thanks!

dongjoon-hyun

+1, LGTM.

Thank you, @jayhan94 and @wgtmac .

Sorry for being later. I was a little busy until Today.

…get evaluated correctly ### What changes were proposed in this pull request? When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`. ### Why are the changes needed? See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032). When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect. ### How was this patch tested? There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls` ### Was this patch authored or co-authored using generative AI tooling? Co-authored using generative AI tooling. Closes #2223 from jayhan94/fix_null_safe_equals_pred_push. Authored-by: Jay Han <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 467a293) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-05-21T00:24:32Z

Thank you for your first contribution @jayhan94 . I added you to the Apache ORC contributor group and assigned ORC-1898 to you. Welcome to the Apache ORC community.

dongjoon-hyun · 2025-05-21T00:25:16Z

This is merged to all live release branches for Apache ORC 2.2/2.1/2.0/1.9/1.8.

fix null_safe_equals pushdown

2420148

github-actions bot added INFRA JAVA labels May 10, 2025

dongjoon-hyun reviewed May 10, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

revert changes to gitignore

8cc9109

wgtmac approved these changes May 12, 2025

View reviewed changes

dongjoon-hyun approved these changes May 21, 2025

View reviewed changes

dongjoon-hyun closed this in 467a293 May 21, 2025

dongjoon-hyun added this to the 1.8.10 milestone May 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly #2223

ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly #2223

Uh oh!

jayhan94 commented May 10, 2025

Uh oh!

Uh oh!

dongjoon-hyun commented May 10, 2025

Uh oh!

wgtmac left a comment

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented May 21, 2025

Uh oh!

dongjoon-hyun commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly #2223

ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly #2223

Uh oh!

Conversation

jayhan94 commented May 10, 2025

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

dongjoon-hyun commented May 10, 2025

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented May 21, 2025

Uh oh!

dongjoon-hyun commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants