Skip to content

Short circuit optimization for functions executed over Nullable arguments#60129

Merged
Avogar merged 61 commits intoClickHouse:masterfrom
bigo-sg:short_circut_func
Nov 19, 2024
Merged

Short circuit optimization for functions executed over Nullable arguments#60129
Avogar merged 61 commits intoClickHouse:masterfrom
bigo-sg:short_circut_func

Conversation

@taiyang-li
Copy link
Copy Markdown
Contributor

@taiyang-li taiyang-li commented Feb 19, 2024

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add 2 new settings short_circuit_function_evaluation_for_nulls and short_circuit_function_evaluation_for_nulls_threshold that allow to execute functions over Nullable columns in short-circuit manner when the ratio of NULL values in the block of data exceeds the specified threshold. It means that the function will be executed only on rows with non-null values. It applies only to functions that return NULL value for rows where at least one argument is NULL.

It also closes #34854

Documentation entry:

Short circuit optimization for defaultImplementationForNulls. It makes sense only when useDefaultImplementationForNulls() = true. For functions with useDefaultImplementationForNulls() = true, and result_type is Nullable(T):

  • If each rows contains argument with null value, skip function evaluation and return column with all rows null directly
  • If short_circuit_function_evaluation_for_nulls and the ratio of rows containing nulls to total rows exceeds short_circuit_function_evaluation_for_nulls_threshold, skip evaluation for those rows containing nulls
  • Otherwise process rows as it was processed before.
SQL:
with null::Nullable(String) as x, 'hello' as y, ' clickhouse' as z  select concat(materialize(x), materialize(y), materialize(z)) from numbers(10000000) format Null;   

Before: 
0 rows in set. Elapsed: 0.253 sec. Processed 10.00 million rows, 80.00 MB (39.57 million rows/s., 316.54 MB/s.)
Peak memory usage: 4.00 MiB.

After:
0 rows in set. Elapsed: 0.518 sec. Processed 10.00 million rows, 80.00 MB (19.30 million rows/s., 154.40 MB/s.)
Peak memory usage: 11.33 MiB.

@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-performance Pull request with some performance improvements label Feb 19, 2024
@robot-clickhouse-ci-2
Copy link
Copy Markdown
Contributor

robot-clickhouse-ci-2 commented Feb 19, 2024

This is an automated comment for commit 3386cbb with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Feb 19, 2024
@Avogar Avogar self-assigned this Feb 19, 2024
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Feb 19, 2024

We should add a setting to be able to turn it on/off

@taiyang-li
Copy link
Copy Markdown
Contributor Author

We should add a setting to be able to turn it on/off

I'm afraid current changes will slow down performance. Let's keep it as draft.

@taiyang-li taiyang-li marked this pull request as draft February 24, 2024 13:11
@Avogar
Copy link
Copy Markdown
Member

Avogar commented Feb 26, 2024

But we still can have it under a setting that will be disabled by default, it will be useful in some cases. For example, afaiu it can solve this problem: #34854 (comment)

@taiyang-li
Copy link
Copy Markdown
Contributor Author

But we still can have it under a setting that will be disabled by default, it will be useful in some cases. For example, afaiu it an solve this problem: #34854 (comment)

I'll try.

@taiyang-li taiyang-li marked this pull request as ready for review February 28, 2024 07:13
@taiyang-li
Copy link
Copy Markdown
Contributor Author

But we still can have it under a setting that will be disabled by default, it will be useful in some cases. For example, afaiu it can solve this problem: #34854 (comment)

Done. Hope for you reviews, thanks!

@taiyang-li
Copy link
Copy Markdown
Contributor Author

taiyang-li commented Mar 6, 2024

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Mar 6, 2024

Do you known why ?

No I don't. Fast test is failed and in this case we don't run any build checks. Let's ask @Felixoid why it shows pending.
BTW, server creashes in Fast test and it's related to the changes, please, take a look and fix

@Algunenano
Copy link
Copy Markdown
Member

Fast tests is a hard check before other tasks are run, like other builds or full stateless tests.

@Felixoid
Copy link
Copy Markdown
Member

Felixoid commented Mar 6, 2024

Both reports are pending because of the crashed server, @Algunenano is right.

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Mar 6, 2024

Both reports are pending because of the crashed server, Algunenano is right.

Probably the question was why it is pending when we will not run any builds (as fast test fail), pending status may be misleading that it will be actually run

@taiyang-li
Copy link
Copy Markdown
Contributor Author

Both reports are pending because of the crashed server, Algunenano is right.

Probably the question was why it is pending when we will not run any builds (as fast test fail), pending status may be misleading that it will be actually run

Yes, this is the most important thing I care about. Where could I find any valuable clues to fix it ?

@taiyang-li
Copy link
Copy Markdown
Contributor Author

Both reports are pending because of the crashed server, @Algunenano is right.

Crash is fixed now. Let wait for ci.

@taiyang-li
Copy link
Copy Markdown
Contributor Author

It is ready for review now. cc @Avogar

@taiyang-li
Copy link
Copy Markdown
Contributor Author

@Avogar I can't reproduce the performance regression in my machine. Let's trigger the CI again and wait for the result.

@taiyang-li
Copy link
Copy Markdown
Contributor Author

@Avogar do you know why there is no performance tests in the newest CI ?

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Nov 13, 2024

@Avogar I can't reproduce the performance regression in my machine. Let's trigger the CI again and wait for the result.

Reproduces for me:
Master:

:) SELECT count() FROM (SELECT toNullable(materialize(1)) AS x1, toNullable(materialize(1)) AS x2 FROM zeros(100000000)) WHERE NOT ignore(xor(x1,x2))

   ┌───count()─┐
1. │ 100000000 │ -- 100.00 million
   └───────────┘

1 row in set. Elapsed: 0.029 sec.

:) SELECT sumCountIf(key, key != -1) FROM ( SELECT materialize(toNullable(number)) AS key FROM numbers(100000000) ) FORMAT Null

Ok.

0 rows in set. Elapsed: 0.072 sec.

This PR:

:) SELECT count() FROM (SELECT toNullable(materialize(1)) AS x1, toNullable(materialize(1)) AS x2 FROM zeros(100000000)) WHERE NOT ignore(xor(x1,x2))


   ┌───count()─┐
1. │ 100000000 │ -- 100.00 million
   └───────────┘

1 row in set. Elapsed: 0.152 sec.

:) SELECT sumCountIf(key, key != -1) FROM ( SELECT materialize(toNullable(number)) AS key FROM numbers(100000000) ) FORMAT Null

Ok.

0 rows in set. Elapsed: 0.144 sec.

Most likely the difference is because of executing extractInvertedMask on each Nullable argument null-mask. So probably we can't enable new behaviour by default

@taiyang-li
Copy link
Copy Markdown
Contributor Author

taiyang-li commented Nov 14, 2024

@Avogar I had improved the performance in 932caea. Let's wait for the newest performance report.

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Nov 14, 2024

The fasttest is broken

@Avogar
Copy link
Copy Markdown
Member

Avogar commented Nov 18, 2024

Performance tests are ok now. Failied test 02809_storage_set_analysis_bug is related (it's flaky and fails only sometimes, you can run this test with clickhouse-test --test-runs 100 -j 10 to reproduce)

@taiyang-li
Copy link
Copy Markdown
Contributor Author

@Avogar 02809_storage_set_analysis_bug is failed because for in function, the first argument contains null in each row. The execution is short-circuited, instead of throwing exception, it returns 0.

@Avogar Avogar changed the title Short circuit optimization for defaultImplementationForNulls Short circuit optimization for functions executed over Nullable arguments Nov 19, 2024
@Avogar Avogar added this pull request to the merge queue Nov 19, 2024
@Algunenano
Copy link
Copy Markdown
Member

It would be nice to have a performance test that verifies the impact of this feature in the real world. Right now all I see is 300 lines of code changed and some dubious performance reports with several degradations.

Merged via the queue into ClickHouse:master with commit e3e4e45 Nov 19, 2024
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Nov 19, 2024
@taiyang-li
Copy link
Copy Markdown
Contributor Author

@Algunenano we will test this feature in apache gluten. cc @baibaichen

@Algunenano
Copy link
Copy Markdown
Member

@Algunenano we will test this feature in apache gluten. cc @baibaichen

That's not ok. Changes to CH upstream should have tests in CH upstream.

@Algunenano
Copy link
Copy Markdown
Member

As suspected and shown by the existing perf tests, it's degrading the performance:

SELECT count() FROM (SELECT toNullable(materialize(1)) AS x1, toNullable(materialize(1)) AS x2 FROM zeros(100000000000)) WHERE NOT ignore(xor(x1,x2)
  • Before the change: 7.70 GB/s.
  • After the change: 6.02 GB/s.

As the perf test show no advantage from having this change, I'm reverting this. Please resubmit with proper proof of the impact and without ignoring degradations in the existing benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Unexpected inf or nan to integer conversion" error when using toInt32 and if/isNaN check

7 participants