Fix join squashing with sparse columns by nickitat · Pull Request #81886 · ClickHouse/ClickHouse

nickitat · 2025-06-15T20:20:11Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add new setting min_joined_block_size_rows (analogous to min_joined_block_size_bytes; defaults to 65409) to control the minimum block size (in rows) for JOIN input and output blocks (if the join algorithm supports it). Small blocks will be squashed.

The motivation when we introduced squashing around join transforms for parallel hash was the following. Previously, we physically split blocks to distribute them among parallel hash join "shards". All input blocks automatically became max_threads times smaller after splitting. Since not all rows usually have a match during joining, output blocks were even smaller. On top of that, multiple joins might be chained on each other. Because of that, we saw significant slowdowns with parallel hash on some TPC-H queries. At that time, I decided to use only the byte threshold to configure squashing. The problem with Sparse columns is that they can have a compression factor close to the number of rows. For such cases, it makes sense also to have the number-of-rows threshold configured, because since our goal is to avoid passing too small blocks along the pipeline, we definitely shouldn't worry about blocks that are bigger than DEFAULT_BLOCK_SIZE.

clickhouse-gh · 2025-06-15T20:20:43Z

Workflow [PR], commit [557071f]

clickhouse-gh · 2025-06-16T20:11:28Z

Workflow [PR], commit [0ade907]

Summary: ❌

job_name	test_name	status
Stress test (amd_tsan)		failure
	Sanitizer assert (in stderr.log)	FAIL
	Fatal message in clickhouse-server.log (see fatal_messages.txt)	FAIL
	Found signal in gdb.log	FAIL
Performance Comparison (amd_release, master_head, 3/3)		failure
	Check Results	failure

novikd

LGTM, but I see grace_hash_join became slower in the performance tests. Is it related?

nickitat · 2025-07-01T16:45:10Z

LGTM, but I see grace_hash_join became slower in the performance tests. Is it related?

It looks so. I haven't found any specific problem I could address, so just decided to ignore it.

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250705) * Fix benchmark build * Fix Benchmark build due to ClickHouse/ClickHouse#79417 * Revert "Fix Build due to ClickHouse/ClickHouse#80931" This reverts commit 02d12f6. * Fix Build due to ClickHouse/ClickHouse#81886 * Fix Link issue due to ClickHouse/ClickHouse#83121 * Fix Build due to ClickHouse/ClickHouse#82604 * Fix Build due to ClickHouse/ClickHouse#82945 * Fix Build due to ClickHouse/ClickHouse#83214 --------- Co-authored-by: kyligence-git <[email protected]> Co-authored-by: Chang chen <[email protected]>

nickitat added 2 commits June 15, 2025 18:42

impl

4a719c2

add perf test

349c0ca

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Jun 15, 2025

nickitat added 4 commits June 15, 2025 22:56

fix test

557071f

Merge branch 'master' into fix_join_squashing

e6b1b36

better

73751c8

Merge branch 'master' into fix_join_squashing

9922f93

Merge branch 'master' into fix_join_squashing

918ce0f

novikd self-assigned this Jun 17, 2025

nickitat marked this pull request as ready for review June 17, 2025 21:08

novikd approved these changes Jun 26, 2025

View reviewed changes

Merge branch 'master' into fix_join_squashing

3aea071

Merge branch 'master' into fix_join_squashing

0ade907

nickitat enabled auto-merge July 2, 2025 17:52

nickitat added this pull request to the merge queue Jul 2, 2025

Merged via the queue into master with commit a7842aa Jul 2, 2025
230 of 241 checks passed

nickitat deleted the fix_join_squashing branch July 2, 2025 19:54

robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 2, 2025

baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 3, 2025

Fix Build due to ClickHouse/ClickHouse#81886

d3ba794

baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 4, 2025

Fix Build due to ClickHouse/ClickHouse#81886

95b791c

baibaichen pushed a commit to Kyligence/gluten that referenced this pull request Jul 5, 2025

Fix Build due to ClickHouse/ClickHouse#81886

c90cc4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix join squashing with sparse columns#81886

Fix join squashing with sparse columns#81886
nickitat merged 9 commits intomasterfrom
fix_join_squashing

nickitat commented Jun 15, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Jun 15, 2025 •

edited

Loading

Uh oh!

clickhouse-gh bot commented Jun 16, 2025 •

edited

Loading

Uh oh!

novikd left a comment

Uh oh!

nickitat commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nickitat commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Uh oh!

clickhouse-gh bot commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clickhouse-gh bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

novikd left a comment

Choose a reason for hiding this comment

Uh oh!

nickitat commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nickitat commented Jun 15, 2025 •

edited

Loading

clickhouse-gh bot commented Jun 15, 2025 •

edited

Loading

clickhouse-gh bot commented Jun 16, 2025 •

edited

Loading