Fix flaky test_jbod_balancer test#88104
Conversation
|
|
||
| def wait_until_fully_merged(node, table): | ||
| for i in range(20): | ||
| for i in range(200): |
There was a problem hiding this comment.
Interesting that 20 is not enough, since there is assert_eq_with_retry(node, merges_count_query, "0\n", retry_count=20) below, which will wait 10 seconds each, so it means that before the maximum waiting period was 200, and now it is 2000.
There was a problem hiding this comment.
Although it looks safe, since likely there will be zero merges usually.
Though maybe a better fix will be to tune merge selector algorithm, @amosbird WDYT?
There was a problem hiding this comment.
IIRC, assert_eq_with_retry here is not waiting for OPTIMIZE. It's synchronous, so merges are finished when it returns.
This check was only ensuring there are no background merges running before the test. It can probably be removed. It was likely added to reduce flakiness earlier.
Though maybe a better fix will be to tune merge selector algorithm
That might be too complicated. For this test we just need to wait until background merges are done and no new ones are triggered.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
The
test_jbod_balancertest occasionally fails with "There are still merges on-going after 20 assignments". It's likely due to recent changes in merge strategy which may result in more merge iterations needed to fully merge parts. Since the test performs 200 inserts, increase the merge attempt limit from 20 to 200 should be enough. The test still exits early whenoptimize_throw_if_noopconfirms no more merges are possible, so this doesn't affect normal test time.Documentation entry for user-facing changes