Skip to content

Use query hash instead of string representation to handle the sample block cache#40065

Closed
Algunenano wants to merge 13 commits intoClickHouse:masterfrom
Algunenano:interpretation_sample_block_cache
Closed

Use query hash instead of string representation to handle the sample block cache#40065
Algunenano wants to merge 13 commits intoClickHouse:masterfrom
Algunenano:interpretation_sample_block_cache

Conversation

@Algunenano
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Use query hash instead of string representation to handle the sample block cache

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Start pushing to speed up query interpretation, which has been degrading over time. Related to #39996

In this case, change how the sample block cache works to stop using the full query string and instead use the query hash. This is done because the call to getTreeHash is much faster than formatImpl and should be equivalent. In my perf analysis of large queries the query serialization was taking almost 10% of the time:

+    8.88%     0.00%            13  TCPHandler      libclickhouse_parsers.so                             [.] DB::queryToString                                                                                                                              ▒
+    8.87%     0.00%            21  TCPHandler      libclickhouse_parsers.so                             [.] DB::serializeAST

And once implemented I see an almost 10% improvement.

MASTER:

- localhost:9000, queries 200, QPS: 8.331, RPS: 0.000, MiB/s: 0.000, result RPS: 8.331, result MiB/s: 0.003.
- localhost:9000, queries 200, QPS: 8.263, RPS: 0.000, MiB/s: 0.000, result RPS: 8.263, result MiB/s: 0.003.

CHANGES:

- localhost:9000, queries 200, QPS: 8.888, RPS: 0.000, MiB/s: 0.000, result RPS: 8.888, result MiB/s: 0.003.
- localhost:9000, queries 200, QPS: 9.019, RPS: 0.000, MiB/s: 0.000, result RPS: 9.019, result MiB/s: 0.003.

I also added a couple of profile events because that's better than adding logs.

Marking it as draft for now because although I expect this to be equivalent I'm not 100% (almost but not 100%) confident all operations done during query interpretation will be properly reflected in the query hash. Let's see what the test think.

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-performance Pull request with some performance improvements label Aug 10, 2022
@Algunenano
Copy link
Copy Markdown
Member Author

I had to add the alias to the hash because it was detecting queries as equal that aren't.

A sample of something that breaks (in current code) because of the hash not including aliases is the scalar subquery cache:

SELECT
    (
        SELECT
            1 AS number,
            number
        FROM numbers(1)
    ) AS s,
    (
        SELECT
            1,
            number
        FROM numbers(1)
    ) AS s2
  • 22.7.2.15
┌─s─────┬─s2────┐
│ (1,1) │ (1,1) │
└───────┴───────┘
  • With these changes:
┌─s─────┬─s2────┐
│ (1,1) │ (1,0) │
└───────┴───────┘

The new one is the correct one, the old one is incorrectly reusing the first scalar result. It's a really odd corner case, but I guess it could happen in real queries.

@Algunenano
Copy link
Copy Markdown
Member Author

Now ASTSubquery hash includes the alias which we probably don't want, specially for scalar caches and so on. I'll look into it and maybe remove it for this case only. Needs more thought in any case.

@alexey-milovidov
Copy link
Copy Markdown
Member

Do we still need the sample block cache?

@Algunenano
Copy link
Copy Markdown
Member Author

Do we still need the sample block cache?

The query from the benchmark:

  • With the cache and the changes: ~9 QPS ('SampleBlockCacheHit':256,'SampleBlockCacheMiss':15)
  • Without the cache: 0.078 QPS, that is 115x worse

So I'd say yes, at least for now.

@Algunenano Algunenano force-pushed the interpretation_sample_block_cache branch from e3ebdf3 to 8afd571 Compare August 11, 2022 13:44
@Algunenano
Copy link
Copy Markdown
Member Author

It seems there are several other places that relied on the alias not being part of the hash, so this needs more work (probably after I come back from holidays)

@Algunenano
Copy link
Copy Markdown
Member Author

While looking at the failing tests I've detected some other queries that were working incorrectly because the alias was ignored as part of the node hash. I've added some tests for those queries.

The problem with the failed tests arose from, you guessed it, introducing the alias in the hash, as some parts of the code were implicitly relying on that behaviour. To workaround this I've added some helper methods to calculate other hashes were needed: get the hash of the contents of a subquery (ignoring its alias, id or cte name) to prepare and cache sets, and ignore the alias of functions when processing and optimizing queries based on constraints. I'm thinking that maybe the alias shouldn't be part of the node and be a node on itself, but it's likely better to delay massive changes around that until #23194 is clearer.

In any case, the performance seems still good: master (8.454 QPS) vs PR (9.552 QPS)

@Algunenano
Copy link
Copy Markdown
Member Author

Performance improvement seems like the expected 10%:
image

There are other performance changes but they seem unrelated

@Algunenano Algunenano marked this pull request as ready for review August 24, 2022 09:59
@Algunenano
Copy link
Copy Markdown
Member Author

Although the query analyzer is being worked on, I still think this PR is worth including: it improves the performance of the sample block cache (and that is not going away I think) and fixes several bugs in the query interpretation.

@alexey-milovidov
Copy link
Copy Markdown
Member

@kitaisreal said that Sample Block Cache is entirely unneeded if Analyzer is used.

Ok, let's continue on this PR, but I did not review it yet...

@kitaisreal
Copy link
Copy Markdown
Contributor

@Algunenano no need to continue this pr.
Sample Block cache is only necessary because we have mess and create Interpreters recursively during analyze multiple times. In Analyzer we analyze query only once, and we do not use AST at all.
Methods getContentHash, getTreeHashWithoutAlias, getTreeHash are hard to understand. I expect that there could be complex bugs with distributed processing and aliases that are not covered by our CI/CD.
Profile events SampleBlockCacheHit should not exists because client should not know anything about SampleBlockCache. Client should not know anything about SampleBlock either.

@Algunenano
Copy link
Copy Markdown
Member Author

Sample Block cache is only necessary because we have mess and create Interpreters recursively during analyze multiple times. In Analyzer we analyze query only once, and we do not use AST at all.

Great!

Profile events SampleBlockCacheHit should not exists because client should not know anything about SampleBlockCache. Client should not know anything about SampleBlock either.

I don't consider profile events are not only for clients, but also for debugging. Clients doesn't need to know anything about ZK either, but we do when investigating the behaviour of the query.

Anyway, I'm closing the PR and I'll keep (already were) an eye on the analyzer PR to see that the issues found here are fixed by the analyzer too.

@Algunenano Algunenano closed this Oct 3, 2022
@alexey-milovidov
Copy link
Copy Markdown
Member

@Algunenano thank you!
We need more eyes on the analyzer PR. It is almost ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants