Skip to content

Fix predicates optimization for distributed queries with HAVING#10621

Merged
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
azat:enable_optimize_predicate_expression-order
May 31, 2020
Merged

Fix predicates optimization for distributed queries with HAVING#10621
alexey-milovidov merged 6 commits intoClickHouse:masterfrom
azat:enable_optimize_predicate_expression-order

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented May 1, 2020

Changelog category (leave one):

  • Bug Fix

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix predicates optimization for distributed queries (enable_optimize_predicate_expression=1) for queries with HAVING section (i.e. when filtering on the server initiator is required), by preserving the order of expressions (and this is enough to fix), and also force aggregator use column names over indexes. Fixes: #10613, #11413

Details:
The problem is that server initiator need to do final aggregate, but Aggregator works based on column indexes, and since expression optimizer switch it every time, server initiator will have different order of columns with the results from the shards that it needs to aggregate, I'm looking into make Aggregator respect column names (if that makes sense?), but will take some time

Cc: @zhang2014

@azat azat force-pushed the enable_optimize_predicate_expression-order branch 2 times, most recently from feaab52 to df9fecd Compare May 2, 2020 09:05
@azat
Copy link
Copy Markdown
Member Author

azat commented May 2, 2020

I'm looking into make Aggregator respect column names (if that makes sense?)

e8104d14aa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Copy Markdown
Contributor

@zhang2014 zhang2014 May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @alexey-milovidov
I think this is an unnecessary change, and the change predicates order should be reasonable.
We keep it so that we can continue to find same bugs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong arguments, but executing different queries on the initiator and remote shards (even if only the order differs) sounds tricky

But catching some possible issues maybe a good argument to ignore trickery

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexey-milovidov friendly ping

@azat azat changed the title [RFC] Fix predicates optimization for distributed queries [RFC] Fix predicates optimization for distributed queries with HAVING May 3, 2020
@alexey-milovidov alexey-milovidov added the pr-bugfix Pull request with bugfix, not backported by default label May 11, 2020
@alexey-milovidov alexey-milovidov self-assigned this May 11, 2020
@azat
Copy link
Copy Markdown
Member Author

azat commented May 12, 2020

Check failures looks like due to problems on CI

@azat
Copy link
Copy Markdown
Member Author

azat commented May 15, 2020

Friendly ping

@azat azat force-pushed the enable_optimize_predicate_expression-order branch from e8104d1 to a4957e7 Compare May 21, 2020 19:09
@azat azat changed the title [RFC] Fix predicates optimization for distributed queries with HAVING Fix predicates optimization for distributed queries with HAVING May 21, 2020
@azat
Copy link
Copy Markdown
Member Author

azat commented May 22, 2020

Test report summary:

  • 01281_group_by_limit_memory_tracking - Fix 01281_group_by_limit_memory_tracking flackiness #11119
  • test_version_update_after_mutation - looks flacky
  • test_replicated_merge_tree_s3
                        "Instance `{}' failed to start. Container status: {}, logs: {}".format(self.name, status,
    >                                                                                          handle.logs()))
    E               Exception: Instance `node1' failed to start. Container status: exited, logs: Processing configuration file '/etc/clickhouse-server/config.xml'.
    
  • and unstable performance

azat added 6 commits May 23, 2020 00:36
…optimize_predicate_expression)

Before this patch enable_optimize_predicate_expression changes the
order *every* time, so:
  foo AND bar -> bar AND foo

And this causes troubles for distributed queries, since query on shard
will have different order of expressions with the initiating server.
Relax this to allow accepting aggregate keys in different order (typical
use case is distributed queries, where the initiator server will do
final merge).
@azat azat force-pushed the enable_optimize_predicate_expression-order branch from a4957e7 to 6a9ec13 Compare May 22, 2020 21:36
@azat
Copy link
Copy Markdown
Member Author

azat commented May 23, 2020

Looks like some tests stuck?

Functional stateless tests (thread) Pending — Started
Stress test (address) Pending — Started
Stress test (memory) Pending — Started
Stress test (thread) Pending — Started

Everything else is green (including performance)

@azat
Copy link
Copy Markdown
Member Author

azat commented May 25, 2020

@alexey-milovidov can you take a look?

@alexey-milovidov
Copy link
Copy Markdown
Member

Ok.

@alexey-milovidov alexey-milovidov merged commit a8f5b10 into ClickHouse:master May 31, 2020
@azat azat deleted the enable_optimize_predicate_expression-order branch May 31, 2020 18:12
@KochetovNicolai
Copy link
Copy Markdown
Member

This pr also fixes #11413

KochetovNicolai pushed a commit that referenced this pull request Jun 3, 2020
…ion-order

Fix predicates optimization for distributed queries with HAVING

(cherry picked from commit a8f5b10)
KochetovNicolai pushed a commit that referenced this pull request Jun 3, 2020
…ion-order

Fix predicates optimization for distributed queries with HAVING

(cherry picked from commit a8f5b10)
KochetovNicolai pushed a commit that referenced this pull request Jun 3, 2020
…ion-order

Fix predicates optimization for distributed queries with HAVING

(cherry picked from commit a8f5b10)
@azat azat mentioned this pull request Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

20.3 enable_optimize_predicate_expression wrong result or exception for distributed queries

5 participants