-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] proper support for column-level null handling #10423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks.. this is good to see. One note: this is setting nullability at table-level. Most of our use-cases require nullability to be set at column-level. People want do things like IS NULL on some columns while at the same time they want to use sub-queries with semi-joins ( I think long-term we may want column level nullability, in which case we may need to think how backwards compatibility will be handled if we go with table level nullability right now. |
1e7b49a to
1c364a5
Compare
i think this make sense. created a separate ticket for this. |
pinot-query-runtime/src/test/resources/queries/Comparisons.json
Outdated
Show resolved
Hide resolved
|
Just to confirm: we are putting this on hold right? |
the table-level nullability handle will be checked in first and then we will enable column-level nullability. |
Chatted offline. we will hold off b/c without column level nullability this will cause large planner deficiency --> will address #10428 first |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10423 +/- ##
==========================================
- Coverage 0.11% 0.11% -0.01%
==========================================
Files 2190 2190
Lines 118056 118081 +25
Branches 17873 17878 +5
==========================================
Hits 137 137
- Misses 117899 117924 +25
Partials 20 20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
684a58f to
522228d
Compare
|
this PR is ready for review CC @ankitsultana @Jackie-Jiang |
522228d to
e306753
Compare
e306753 to
97ece6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be "and"? isNullSupportEnabled && e.getValue().isNullableField()
Since fieldSpec is already passed to toRelDataType, we can also pass only isNullSupportEnabled here and in toRelDataType we can do:
return createTypeWithNullability(fieldSpec.isSingleValueField() ? createSqlType(SqlTypeName.INTEGER)
: createArrayType(createSqlType(SqlTypeName.INTEGER), -1), isNullSupportEnabled && fieldSpec.isNullableField());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no and to answer the previous comment as well.
we can either enable globally (using table config), or enable via column config
- if global enable. then all fields are nullable.
- if not global enable. we read column level nullability.
- if neither is set, then the column is not nullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a fallback mechanism to determine nullability?
- A column level nullability can be true, false, or NULL(default).
- A table level nullability can be true or false (default)
- If the column level nullability is present, it determines the nullability.
- If the column level nullability is not present, we fall back to the table level nullability setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the table level nullability can also be NULL (default). If both the column level nullability and table level nullability are NULL, we apply the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to propose the same @shenyu0127 proposed in June
pinot-query-planner/src/main/java/org/apache/pinot/query/catalog/PinotTable.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be: nullHandlingMap.getOrDefault(entry.getKey(), true)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. by default nullability is disabled. this is the default behavior of all tables for pinot nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was wrt // ALWAYS enable null handling in V2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i think the comment was not suppose to be there. what i meant was "nullhandling query option" was enabled always, not "null column is always registered for table" :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get TableConfig for at least one type in a bunch of places in the code. Can we add a util method somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea but let's follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some plan-based tests in pinot-query-planner?
Particularly for anti semi-join queries such as: SELECT .. FROM T1 WHERE uuid NOT IN (SELECT uuid FROM ...)
If the uuid is column is nullable, this gets converted to a broadcast join (iirc).
Would be great if we can add test-cases for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added planner test
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java
Outdated
Show resolved
Hide resolved
|
@walterddr : This query is one example where the join changes to a broadcast join after this change (you'll need to set nullHandlingEnabled: true in TableConfig for both tables). Edit: Also this variant of the query also does a broadcast join: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ Let's add a table or two which have nullHandlingEnabled: true in table-config. Anti semi join queries (NOT IN) kind of queries would turn into broadcast joins in that case.
9890bf5 to
f6708ab
Compare
f6708ab to
c7878db
Compare
|
any additional comment on this approach? @ankitsultana @61yao @Jackie-Jiang |
|
My main concern is with this part: I think if users have configured "enableNullHandling" at table level, we should let column level nullability decide whether a given column is nullable or not. If enableNullHandling is not set at the table level, then we can assume that all columns are non-null (similar to V1 engine). That would essentially mean that column level nullability is not enabled so we assume everything to be non-null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this address your concern @ankitsultana
Hmm. I understood your concern. so to link: https://github.com/apache/pinot/pull/10423/files#r1183232264 --> this means in order to enable nullability we should
yes? |
Yeah. The idea being that we only have two states:
|
2ddcc8a to
29324d7
Compare
62832cc to
3b6ef89
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making a separate PR for the FieldSpec and v1 related changes to limit the scope so that it is easier to review. We want to handle backward compatibility for table level enableNullHandling
|
|
||
| @ScalarFunction | ||
| public static Object cast(Object value, String targetTypeLiteral) { | ||
| if (value == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to handle null here if the scalar function doesn't have nullableParameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a fallback mechanism to determine nullability?
- A column level nullability can be true, false, or NULL(default).
- A table level nullability can be true or false (default)
- If the column level nullability is present, it determines the nullability.
- If the column level nullability is not present, we fall back to the table level nullability setting.
| return this; | ||
| } | ||
|
|
||
| private static boolean isNullEnabled(TableCache tableCache, String rawTableName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function gets a table property. Other similar functions are member functions. Do we want change this function to a member function for consistency?
|
closing this one and creating a new one (design doc shared in #10381) |
This is a follow-up on #10381 and a better version of #10148
This PR:
FieldSpec