Skip to content

Refactor cache key calculation for joins#76613

Merged
nickitat merged 31 commits intomasterfrom
refactor_cache_key_calculation
Mar 10, 2025
Merged

Refactor cache key calculation for joins#76613
nickitat merged 31 commits intomasterfrom
refactor_cache_key_calculation

Conversation

@nickitat
Copy link
Copy Markdown
Member

@nickitat nickitat commented Feb 21, 2025

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

As of now, we calculate keys for HashTablesStatistics (cache of hash table sizes for aggregation and joins) mainly from the AST. It has drawbacks. Namely, it doesn't account for all the changes that happen in the planning stage (filters push down, reordering, and so on). Also, we want to use statistics from this cache for optimization itself (e.g. #76185). The bottom line is we better to reimplement this logic in a proper way.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 21, 2025

Workflow [PR], commit [00306f9]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Feb 21, 2025
@nickitat nickitat marked this pull request as ready for review February 25, 2025 21:59
@nickitat
Copy link
Copy Markdown
Member Author

Stress test (tsan) - confirmed that it doesn't reproduce on #76722


const JoinSettings & getSettings() const { return join_settings; }

void setHashTableCacheKeys(UInt64 left_key_hash, UInt64 right_key_hash)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to support multiple tables in the JoinLocial step in the future.
It's ok to keep it here for now.
Another option: do it for the JoinStep (physical one), cause JoinStepLogical is converted to (potentially many) JoinStep (physical).

catch (const Exception & e)
{
// Some steps currently missing serialization support. Let's just skip them.
if (e.code() != ErrorCodes::NOT_IMPLEMENTED)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's not nice to rethrow/catch exceptions in normal query execution. Can we avoid it, please?

writeStringBinary(transform.getSerializationName(), wbuf);
try
{
transform.serialize(ctx);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely prefer to add bool IQueryPlanStep::updateHash() virtual method and implement it for some steps (probably, even via serializaion).

@nickitat nickitat added this pull request to the merge queue Mar 10, 2025
Merged via the queue into master with commit b94312a Mar 10, 2025
123 of 126 checks passed
@nickitat nickitat deleted the refactor_cache_key_calculation branch March 10, 2025 19:10
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants