Fix flaky test_jbod_balancer test by amosbird · Pull Request #88104 · ClickHouse/ClickHouse

amosbird · 2025-10-05T14:24:01Z

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

The test_jbod_balancer test occasionally fails with "There are still merges on-going after 20 assignments". It's likely due to recent changes in merge strategy which may result in more merge iterations needed to fully merge parts. Since the test performs 200 inserts, increase the merge attempt limit from 20 to 200 should be enough. The test still exits early when optimize_throw_if_noop confirms no more merges are possible, so this doesn't affect normal test time.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-10-05T14:24:27Z

Workflow [PR], commit [0b01387]

Summary: ✅

azat · 2025-10-05T19:52:42Z

tests/integration/test_jbod_balancer/test.py


 def wait_until_fully_merged(node, table):
-    for i in range(20):
+    for i in range(200):


Interesting that 20 is not enough, since there is assert_eq_with_retry(node, merges_count_query, "0\n", retry_count=20) below, which will wait 10 seconds each, so it means that before the maximum waiting period was 200, and now it is 2000.

Although it looks safe, since likely there will be zero merges usually.

Though maybe a better fix will be to tune merge selector algorithm, @amosbird WDYT?

IIRC, assert_eq_with_retry here is not waiting for OPTIMIZE. It's synchronous, so merges are finished when it returns.
This check was only ensuring there are no background merges running before the test. It can probably be removed. It was likely added to reduce flakiness earlier.

Though maybe a better fix will be to tune merge selector algorithm

That might be too complicated. For this test we just need to wait until background merges are done and no new ones are triggered.

Try to fix flaky test

0b01387

clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 5, 2025

azat self-assigned this Oct 5, 2025

azat added the 🍃 green ci 🌿 Fixing flaky tests in CI label Oct 5, 2025

azat approved these changes Oct 5, 2025

View reviewed changes

azat reviewed Oct 5, 2025

View reviewed changes

azat added this pull request to the merge queue Oct 5, 2025

Merged via the queue into ClickHouse:master with commit 24074bc Oct 5, 2025
123 checks passed

robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 5, 2025

amosbird mentioned this pull request Oct 6, 2025

Optimize index analysis by applying query condition cache earlier and caching filtered ranges #82380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test_jbod_balancer test#88104

Fix flaky test_jbod_balancer test#88104
azat merged 1 commit intoClickHouse:masterfrom
amosbird:try-fix-flaky-test-of-jbod-balancer

amosbird commented Oct 5, 2025

Uh oh!

clickhouse-gh bot commented Oct 5, 2025 •

edited

Loading

Uh oh!

azat Oct 5, 2025

Uh oh!

azat Oct 5, 2025

Uh oh!

amosbird Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amosbird commented Oct 5, 2025

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

azat Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

azat Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

amosbird Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clickhouse-gh bot commented Oct 5, 2025 •

edited

Loading