Skip to content

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented Mar 15, 2023

This is a follow-up on #10381 and a better version of #10148

This PR:

@ankitsultana
Copy link
Contributor

Thanks.. this is good to see. One note: this is setting nullability at table-level.

Most of our use-cases require nullability to be set at column-level. People want do things like IS NULL on some columns while at the same time they want to use sub-queries with semi-joins (uuid IN (...)), so they want uuid to be non-null.

I think long-term we may want column level nullability, in which case we may need to think how backwards compatibility will be handled if we go with table level nullability right now.

@walterddr walterddr force-pushed the pr_fix_null_handling branch 2 times, most recently from 1e7b49a to 1c364a5 Compare March 16, 2023 00:46
@walterddr
Copy link
Contributor Author

Thanks.. this is good to see. One note: this is setting nullability at table-level.

Most of our use-cases require nullability to be set at column-level. People want do things like IS NULL on some columns while at the same time they want to use sub-queries with semi-joins (uuid IN (...)), so they want uuid to be non-null.

I think long-term we may want column level nullability, in which case we may need to think how backwards compatibility will be handled if we go with table level nullability right now.

i think this make sense. created a separate ticket for this.

@ankitsultana
Copy link
Contributor

Just to confirm: we are putting this on hold right?

@walterddr
Copy link
Contributor Author

walterddr commented Mar 16, 2023

Just to confirm: we are putting this on hold right?

the table-level nullability handle will be checked in first and then we will enable column-level nullability.
as long as table didn't enable null handling we shouldn't have any issue.

@walterddr
Copy link
Contributor Author

the table-level nullability handle will be checked in first and then we will enable column-level nullability. as long as table didn't enable null handling we shouldn't have any issue.

Chatted offline. we will hold off b/c without column level nullability this will cause large planner deficiency --> will address #10428 first

@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2023

Codecov Report

❌ Patch coverage is 0% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.11%. Comparing base (2d30e28) to head (b51c016).
⚠️ Report is 3959 commits behind head on master.

Files with missing lines Patch % Lines
.../java/org/apache/pinot/query/type/TypeFactory.java 0.00% 20 Missing ⚠️
...a/org/apache/pinot/query/catalog/PinotCatalog.java 0.00% 11 Missing ⚠️
...main/java/org/apache/pinot/spi/data/FieldSpec.java 0.00% 11 Missing ⚠️
...ava/org/apache/pinot/query/catalog/PinotTable.java 0.00% 3 Missing ⚠️
...n/function/scalar/DataTypeConversionFunctions.java 0.00% 2 Missing ⚠️
.../pinot/common/function/scalar/ObjectFunctions.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10423      +/-   ##
==========================================
- Coverage    0.11%    0.11%   -0.01%     
==========================================
  Files        2190     2190              
  Lines      118056   118081      +25     
  Branches    17873    17878       +5     
==========================================
  Hits          137      137              
- Misses     117899   117924      +25     
  Partials       20       20              
Flag Coverage Δ
integration1temurin11 0.00% <0.00%> (ø)
integration1temurin17 0.00% <0.00%> (ø)
integration1temurin20 0.00% <0.00%> (ø)
integration2temurin11 0.00% <0.00%> (ø)
integration2temurin17 0.00% <0.00%> (ø)
integration2temurin20 0.00% <0.00%> (ø)
unittests1temurin11 0.00% <0.00%> (ø)
unittests1temurin17 0.00% <0.00%> (ø)
unittests1temurin20 0.00% <0.00%> (ø)
unittests2temurin11 0.11% <0.00%> (-0.01%) ⬇️
unittests2temurin17 0.11% <0.00%> (-0.01%) ⬇️
unittests2temurin20 0.11% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@walterddr walterddr force-pushed the pr_fix_null_handling branch from 684a58f to 522228d Compare March 22, 2023 14:52
@walterddr
Copy link
Contributor Author

this PR is ready for review CC @ankitsultana @Jackie-Jiang

@walterddr walterddr force-pushed the pr_fix_null_handling branch from 522228d to e306753 Compare March 23, 2023 14:40
@walterddr walterddr changed the title [multistage] proper support for null handling [multistage] proper support for column-level null handling Mar 29, 2023
@walterddr walterddr force-pushed the pr_fix_null_handling branch from e306753 to 97ece6d Compare March 29, 2023 17:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "and"? isNullSupportEnabled && e.getValue().isNullableField()

Since fieldSpec is already passed to toRelDataType, we can also pass only isNullSupportEnabled here and in toRelDataType we can do:

        return createTypeWithNullability(fieldSpec.isSingleValueField() ? createSqlType(SqlTypeName.INTEGER)
                : createArrayType(createSqlType(SqlTypeName.INTEGER), -1), isNullSupportEnabled && fieldSpec.isNullableField());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no and to answer the previous comment as well.

we can either enable globally (using table config), or enable via column config

  • if global enable. then all fields are nullable.
  • if not global enable. we read column level nullability.
  • if neither is set, then the column is not nullable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a fallback mechanism to determine nullability?

  • A column level nullability can be true, false, or NULL(default).
  • A table level nullability can be true or false (default)
  • If the column level nullability is present, it determines the nullability.
  • If the column level nullability is not present, we fall back to the table level nullability setting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the table level nullability can also be NULL (default). If both the column level nullability and table level nullability are NULL, we apply the current behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to propose the same @shenyu0127 proposed in June

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be: nullHandlingMap.getOrDefault(entry.getKey(), true)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. by default nullability is disabled. this is the default behavior of all tables for pinot nowadays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was wrt // ALWAYS enable null handling in V2.

Copy link
Contributor Author

@walterddr walterddr Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i think the comment was not suppose to be there. what i meant was "nullhandling query option" was enabled always, not "null column is always registered for table" :-P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get TableConfig for at least one type in a bunch of places in the code. Can we add a util method somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea but let's follow up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add some plan-based tests in pinot-query-planner?

Particularly for anti semi-join queries such as: SELECT .. FROM T1 WHERE uuid NOT IN (SELECT uuid FROM ...)

If the uuid is column is nullable, this gets converted to a broadcast join (iirc).

Would be great if we can add test-cases for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added planner test

@ankitsultana
Copy link
Contributor

ankitsultana commented Apr 4, 2023

@walterddr : This query is one example where the join changes to a broadcast join after this change (you'll need to set nullHandlingEnabled: true in TableConfig for both tables).

explain plan for select count(*), deviceOS from userAttributes_OFFLINE 
  WHERE userUUID NOT IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-1')
group by deviceOS

Edit: Also this variant of the query also does a broadcast join:

explain plan for select count(*), deviceOS from userAttributes_OFFLINE 
  WHERE NOT (userUUID IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-1'))
group by deviceOS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ Let's add a table or two which have nullHandlingEnabled: true in table-config. Anti semi join queries (NOT IN) kind of queries would turn into broadcast joins in that case.

@walterddr
Copy link
Contributor Author

any additional comment on this approach? @ankitsultana @61yao @Jackie-Jiang

@ankitsultana
Copy link
Contributor

My main concern is with this part:

builder.add(e.getKey(), toRelDataType(e.getValue(),
          isNullSupportEnabled || e.getValue().isNullableField()));

I think if users have configured "enableNullHandling" at table level, we should let column level nullability decide whether a given column is nullable or not.

If enableNullHandling is not set at the table level, then we can assume that all columns are non-null (similar to V1 engine). That would essentially mean that column level nullability is not enabled so we assume everything to be non-null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this address your concern @ankitsultana

@walterddr
Copy link
Contributor Author

My main concern is with this part:

builder.add(e.getKey(), toRelDataType(e.getValue(),
          isNullSupportEnabled || e.getValue().isNullableField()));

I think if users have configured "enableNullHandling" at table level, we should let column level nullability decide whether a given column is nullable or not.

If enableNullHandling is not set at the table level, then we can assume that all columns are non-null (similar to V1 engine). That would essentially mean that column level nullability is not enabled so we assume everything to be non-null.

Hmm. I understood your concern. so to link: https://github.com/apache/pinot/pull/10423/files#r1183232264 --> this means in order to enable nullability we should

  1. only do so for a table if table config enables null, otherwise disregard any column level config (this is existing behavior)
  2. add column level nullability config to user setting it to false, otherwise default to true (this is going to change)
  3. create nullable column only if (1) AND (2) are true

yes?

@ankitsultana
Copy link
Contributor

My main concern is with this part:

builder.add(e.getKey(), toRelDataType(e.getValue(),
          isNullSupportEnabled || e.getValue().isNullableField()));

I think if users have configured "enableNullHandling" at table level, we should let column level nullability decide whether a given column is nullable or not.
If enableNullHandling is not set at the table level, then we can assume that all columns are non-null (similar to V1 engine). That would essentially mean that column level nullability is not enabled so we assume everything to be non-null.

Hmm. I understood your concern. so to link: https://github.com/apache/pinot/pull/10423/files#r1183232264 --> this means in order to enable nullability we should

  1. only do so for a table if table config enables null, otherwise disregard any column level config (this is existing behavior)
  2. add column level nullability config to user setting it to false, otherwise default to true (this is going to change)
  3. create nullable column only if (1) AND (2) are true

yes?

Yeah. The idea being that we only have two states:

  1. Either column level nullability feature is disabled (when enableNullHandling is not set at table level)
  2. Or it is enabled, in which case its values will always be honored.

@walterddr walterddr force-pushed the pr_fix_null_handling branch from 2ddcc8a to 29324d7 Compare June 27, 2023 03:08
- index on planner changed
- test example in runtime changed
@walterddr walterddr force-pushed the pr_fix_null_handling branch from 62832cc to 3b6ef89 Compare June 27, 2023 18:58
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest making a separate PR for the FieldSpec and v1 related changes to limit the scope so that it is easier to review. We want to handle backward compatibility for table level enableNullHandling


@ScalarFunction
public static Object cast(Object value, String targetTypeLiteral) {
if (value == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to handle null here if the scalar function doesn't have nullableParameters

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a fallback mechanism to determine nullability?

  • A column level nullability can be true, false, or NULL(default).
  • A table level nullability can be true or false (default)
  • If the column level nullability is present, it determines the nullability.
  • If the column level nullability is not present, we fall back to the table level nullability setting.

return this;
}

private static boolean isNullEnabled(TableCache tableCache, String rawTableName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function gets a table property. Other similar functions are member functions. Do we want change this function to a member function for consistency?

@walterddr
Copy link
Contributor Author

closing this one and creating a new one (design doc shared in #10381)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants