Skip to content

Commit 467a293

Browse files
jayhan94dongjoon-hyun
authored andcommitted
ORC-1898: When column is all null, NULL_SAFE_EQUALS pushdown doesn't get evaluated correctly
### What changes were proposed in this pull request? When all values in column `col_0` are `NULL`s within a row group, and we attempt to apply the predicate pushdown `col_0 <=> 'xxx'`, the `evaluatePredicateProto` function returns `TruthValue.NULL`. In this case, we can directly determine the result based on the literal value: if the literal is `NULL`, return `TruthValue.YES`, otherwise, return `TruthValue.NO`. ### Why are the changes needed? See [SPARK-52032](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52032). When we pushdown the NULL_SAFE_EQUALS predicate, all values of the column are `NULL`. The `evaluatePredicateProto` returns `TruthValue.NULL`, whose `isNeeded` returns false so that the whole row group is skipped by `SargApplier.pickRowGroups`, which actually is incorrect. ### How was this patch tested? There already exists unit test -- `TestOrcTimezonePPD.testTimestampAllNulls` ### Was this patch authored or co-authored using generative AI tooling? Co-authored using generative AI tooling. Closes #2223 from jayhan94/fix_null_safe_equals_pred_push. Authored-by: Jay Han <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent b716d81 commit 467a293

File tree

2 files changed

+8
-1
lines changed

2 files changed

+8
-1
lines changed

java/core/src/java/org/apache/orc/impl/RecordReaderImpl.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,13 @@ static TruthValue evaluatePredicateRange(PredicateLeaf predicate,
763763
if (!range.hasValues()) {
764764
if (predicate.getOperator() == PredicateLeaf.Operator.IS_NULL) {
765765
return TruthValue.YES;
766+
} else if (predicate.getOperator() == PredicateLeaf.Operator.NULL_SAFE_EQUALS) {
767+
Object literal = predicate.getLiteral();
768+
if (literal == null) {
769+
return TruthValue.YES;
770+
} else {
771+
return TruthValue.NO;
772+
}
766773
} else {
767774
return TruthValue.NULL;
768775
}

java/core/src/test/org/apache/orc/TestOrcTimezonePPD.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -387,7 +387,7 @@ public void testTimestampAllNulls(String writerTimeZone, String readerTimeZone)
387387
PredicateLeaf pred = createPredicateLeaf(
388388
PredicateLeaf.Operator.NULL_SAFE_EQUALS, PredicateLeaf.Type.TIMESTAMP, "x",
389389
Timestamp.valueOf("2007-08-01 00:00:00.0"), null);
390-
assertEquals(SearchArgument.TruthValue.NULL, RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
390+
assertEquals(SearchArgument.TruthValue.NO, RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));
391391

392392
pred = createPredicateLeaf(PredicateLeaf.Operator.IS_NULL, PredicateLeaf.Type.TIMESTAMP, "x", null, null);
393393
assertEquals(SearchArgument.TruthValue.YES, RecordReaderImpl.evaluatePredicate(colStats[1], pred, bf));

0 commit comments

Comments
 (0)