Fix flaky test test_backup_restore_on_cluster by vitlibar · Pull Request #47577 · ClickHouse/ClickHouse

vitlibar · 2023-03-14T14:01:22Z

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Fix error Table default.tbl doesn't exist in this test which could occur sometimes.

tavplubix · 2023-03-14T15:25:40Z

tests/integration/test_backup_restore_on_cluster/test_concurrency.py

+    assert_eq_with_retry(
+        node0,
+        "SELECT count() from clusterAllReplicas('cluster', system.tables) where table='tbl' and database='default'",
+        TSV([num_nodes]),
+    )
    for i in range(num_nodes):
        nodes[i].query(f"INSERT INTO tbl VALUES ({i})")


CREATE ... ON CLUSTER is synchronous by default: it waits for the query to be executed by all replicas or throws an exception. So the table must already exist at that point, we don't need extra checks.

There's something interesting in the text log of node4:

2023.03.13 11:13:09.843597 [ 266 ] {} <Trace> DDLWorker: Too early to clean queue, will do it later. 2023.03.13 11:13:09.843607 [ 275 ] {} <Debug> DDLWorker: Scheduling tasks 2023.03.13 11:13:09.844023 [ 275 ] {} <Trace> DDLWorker: scheduleTasks: initialized=true, size_before_filtering=2, queue_size=2, entries=query-0000000000..query-0000000001, first_failed_task_name=none, current_tasks_size=0, last_current_task=none, last_skipped_entry_name=query-0000000000 2023.03.13 11:13:09.844073 [ 275 ] {} <Debug> DDLWorker: Will schedule 1 tasks starting from query-0000000001 2023.03.13 11:13:09.844108 [ 275 ] {} <Trace> DDLWorker: Checking task query-0000000001 2023.03.13 11:13:09.847102 [ 275 ] {} <Debug> DDLWorker: Will not execute task query-0000000001: There is no a local address in host list 2023.03.13 11:13:09.847134 [ 275 ] {} <Debug> DDLWorker: Waiting for queue updates 2023.03.13 11:13:10.000231 [ 261 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 760.47 MiB, peak 762.47 MiB, free memory in arenas 0.00 B, will set to 780.64 MiB (RSS), difference: 20.16 MiB 2023.03.13 11:13:10.163435 [ 266 ] {} <Trace> DDLWorker: Too early to clean queue, will do it later. 2023.03.13 11:13:10.163435 [ 275 ] {} <Debug> DDLWorker: Scheduling tasks 2023.03.13 11:13:10.165800 [ 275 ] {} <Trace> DDLWorker: scheduleTasks: initialized=true, size_before_filtering=3, queue_size=3, entries=query-0000000000..query-0000000002, first_failed_task_name=none, current_tasks_size=0, last_current_task=none, last_skipped_entry_name=query-0000000001 2023.03.13 11:13:10.165837 [ 275 ] {} <Debug> DDLWorker: Will schedule 1 tasks starting from query-0000000002 2023.03.13 11:13:10.165869 [ 275 ] {} <Trace> DDLWorker: Checking task query-0000000002 2023.03.13 11:13:10.166897 [ 275 ] {} <Debug> DDLWorker: Will not execute task query-0000000002: There is no a local address in host list 2023.03.13 11:13:10.166945 [ 275 ] {} <Debug> DDLWorker: Waiting for queue updates 2023.03.13 11:13:10.470069 [ 266 ] {} <Trace> DDLWorker: Too early to clean queue, will do it later. 2023.03.13 11:13:10.470074 [ 275 ] {} <Debug> DDLWorker: Scheduling tasks 2023.03.13 11:13:10.472361 [ 275 ] {} <Trace> DDLWorker: scheduleTasks: initialized=true, size_before_filtering=4, queue_size=4, entries=query-0000000000..query-0000000003, first_failed_task_name=none, current_tasks_size=0, last_current_task=none, last_skipped_entry_name=query-0000000002 2023.03.13 11:13:10.472405 [ 275 ] {} <Debug> DDLWorker: Will schedule 1 tasks starting from query-0000000003 2023.03.13 11:13:10.472431 [ 275 ] {} <Trace> DDLWorker: Checking task query-0000000003 2023.03.13 11:13:10.473239 [ 275 ] {} <Debug> DDLWorker: Will not execute task query-0000000003: There is no a local address in host list 2023.03.13 11:13:10.473264 [ 275 ] {} <Debug> DDLWorker: Waiting for queue updates

The most interesting lines are Will not execute task ...: There is no a local address in host list. So something is wrong with the cluster config and/or hostnames

For some reason test_backup_restore_on_cluster/_instances_concurrency_0/node4/configs/config.d/cluster_for_concurrency_test.xml looks like this:

<clickhouse> <remote_servers> <cluster> <shard> <replica> <host>node0</host> <port>9000</port> </replica> <replica> <host>node1</host> <port>9000</port> </replica> <replica> <host>node2</host> <port>9000</port> </replica> <replica> <host>node3</host> <port>9000</port> </replica> </shard> </cluster> </remote_servers> </clickhouse>

And that's because num_nodes in test_disallow_concurrency.py was changed to 4 in #47216. So the flaky check failure was actually related to the changes.

It seems it's because of that change
https://github.com/ClickHouse/ClickHouse/pull/47216/files#diff-e0f79cbbfa03684382a2be3d7da44cc585c6c548bc634067b411af295cfc9854R12

Both tests test_disallow_concurrency.py and test_concurrency.py generate cluster config in the same temporary file _gen/cluster_for_concurrency_test.xml.

vitlibar · 2023-03-14T16:09:22Z

And that's because num_nodes in test_disallow_concurrency.py was changed to 4 in #47216. So the flaky check failure was actually related to the changes.

Ok, you're right

I closed this PR because the fix should be around #47216 and not here

Fix flaky test test_backup_restore_on_cluster

1c96a7b

vitlibar mentioned this pull request Mar 14, 2023

Backup_Restore_concurrency_check_node #47216

Merged

robot-ch-test-poll4 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 14, 2023

tavplubix self-assigned this Mar 14, 2023

tavplubix reviewed Mar 14, 2023

View reviewed changes

tavplubix marked this pull request as draft March 14, 2023 15:51

vitlibar closed this Mar 14, 2023

vitlibar deleted the vitlibar-fix-flaky-test_backup_restore_on_cluster branch March 14, 2023 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test test_backup_restore_on_cluster#47577

Fix flaky test test_backup_restore_on_cluster#47577
vitlibar wants to merge 1 commit intomasterfrom
vitlibar-fix-flaky-test_backup_restore_on_cluster

vitlibar commented Mar 14, 2023

Uh oh!

tavplubix Mar 14, 2023 •

edited

Loading

Uh oh!

tavplubix Mar 14, 2023

Uh oh!

tavplubix Mar 14, 2023

Uh oh!

tavplubix Mar 14, 2023

Uh oh!

vitlibar Mar 14, 2023 •

edited

Loading

Uh oh!

vitlibar commented Mar 14, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vitlibar commented Mar 14, 2023

Changelog category (leave one):

Uh oh!

tavplubix Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tavplubix Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

tavplubix Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

tavplubix Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

vitlibar Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vitlibar commented Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tavplubix Mar 14, 2023 •

edited

Loading

vitlibar Mar 14, 2023 •

edited

Loading

vitlibar commented Mar 14, 2023 •

edited

Loading