Skip to content

Add setting aggregate_functions_null_for_empty#16123

Merged
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
ucasfl:agg-func-setting-null-for-empty
Nov 6, 2020
Merged

Add setting aggregate_functions_null_for_empty#16123
alexey-milovidov merged 5 commits intoClickHouse:masterfrom
ucasfl:agg-func-setting-null-for-empty

Conversation

@ucasfl
Copy link
Copy Markdown
Collaborator

@ucasfl ucasfl commented Oct 18, 2020

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add setting aggregate_functions_null_for_empty, this option will rewrite all aggregate functions in a query, adding -OrNull suffix to them. fix 10273

Detailed description / Documentation draft:

...

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@alexey-milovidov
Copy link
Copy Markdown
Member

It leads to 6x performance drop even if the argument is not nullable:

milovidov-desktop :) SELECT sum(number) FROM system.numbers

SELECT sum(number)
FROM system.numbers

Query id: 2f21a1b4-c469-4a0b-a8ed-e86efe2edd43

Cancelling query.
Ok.
Query was cancelled.

0 rows in set. Elapsed: 2.906 sec. Processed 8.84 billion rows, 70.74 GB (3.04 billion rows/s., 24.34 GB/s.) 

milovidov-desktop :) SELECT sumOrNull(number) FROM system.numbers

SELECT sumOrNull(number)
FROM system.numbers

Query id: 06dda4aa-9168-41fe-8d2d-63f7857e44fb

Cancelling query.
Ok.
Query was cancelled.

0 rows in set. Elapsed: 4.409 sec. Processed 2.20 billion rows, 17.59 GB (498.66 million rows/s., 3.99 GB/s.)

@alexey-milovidov
Copy link
Copy Markdown
Member

Please cherry-pick this commit: c7618ea
I reordered the settings and removed some of the obsolete settings.

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Nov 4, 2020

It's tricky that the function count and count(DISTINCT ...) and some other functions have to continue to return zero value when aggregate_functions_null_for_empty is enabled (if we aim for SQL compatibility), in contrast to countOrNull, uniqExactOrNull, etc...

@alexey-milovidov
Copy link
Copy Markdown
Member

It is controlled by AggregateFunctionProperties.

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Nov 4, 2020

It's tricky that the function count and count(DISTINCT ...) and some other functions have to continue to return zero value when aggregate_functions_null_for_empty is enabled (if we aim for SQL compatibility), in contrast to countOrNull, uniqExactOrNull, etc...

So, what is the best way to resolve this?

@alexey-milovidov
Copy link
Copy Markdown
Member

alexey-milovidov commented Nov 4, 2020

Look at the function properties (from AggregateFunctionFactory) and don't rewrite functions that should not return NULL.

@alexey-milovidov alexey-milovidov merged commit 4a872a5 into ClickHouse:master Nov 6, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member

@ucasfl We forgot about one little thing...

For -State and -Merge functions we need to add -OrNull before, not after...
Need to check how it will work.

ucasfl added a commit to ucasfl/ClickHouse that referenced this pull request Nov 7, 2020
@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Nov 8, 2020

@ucasfl We forgot about one little thing...

For -State and -Merge functions we need to add -OrNull before, not after...
Need to check how it will work.

Maybe not only -State and -Merge, but also -If, -MergeState?

@alexey-milovidov
Copy link
Copy Markdown
Member

Yes, -MergeState definitely.

About -If - let's check if the behaviour of -IfOrNull different with -OrNullIf.

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Nov 9, 2020

SELECT sumIfOrNull(1, 0)

Query id: cfeae055-758d-4482-9cda-0ebac2bcca5f

┌─sumIfOrNull(1, 0)─┐
│                 0 │
└───────────────────┘
SELECT sumOrNullIf(1, 0)

Query id: eb9b740c-155e-46e4-9d44-5d67e6d2fb25

┌─sumOrNullIf(1, 0)─┐
│              ᴺᵁᴸᴸ │
└───────────────────┘

It's different.

@ucasfl
Copy link
Copy Markdown
Collaborator Author

ucasfl commented Nov 9, 2020

And there may be multiple combinators in an aggregate function, such as avgIfState, sumStateForEach. I think for aggregate functions with combinators, we shoule move -OrNull to the first when rewrite?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add setting "aggregate_functions_null_for_empty"

3 participants