Skip to content

tests: disable 01646_system_restart_replicas_smoke under stress tests#38212

Merged
qoega merged 5 commits intoClickHouse:masterfrom
azat:no-stress
Jun 20, 2022
Merged

tests: disable 01646_system_restart_replicas_smoke under stress tests#38212
qoega merged 5 commits intoClickHouse:masterfrom
azat:no-stress

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jun 20, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

tests: disable 01646_system_restart_replicas_smoke under stress tests (should fix one reason of Backward compatibility check: Error message in clickhouse-server.log (see bc_check_error_messages.txt))

This test executes SYSTEM RESTART REPLICAS, that may leave some tables
in some tests, and the problem test is
01414_mutations_and_errors_zookeeper, that has invalid values in the
table that produces the following error:

2022.06.19 19:02:07.165320 [ 1242562 ] {} <Error> MutateFromLogEntryTask: virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 6. DB::Exception: Cannot parse string 'Hello' as UInt64: syntax error at begin of string. Note: there are toUInt64OrZero and toUInt64OrNull functions, which returns zero/NULL instead of throwing exception.: while executing 'FUNCTION _CAST(value :: 2, 'UInt64' :: 3) -> _CAST(value, 'UInt64') UInt64 : 4': (while reading from part /var/lib/clickhouse/store/700/70043200-eae1-44da-8554-0d43b7e936d7/20191002_1_1_0/): While executing MergeTreeInOrder. (CANNOT_PARSE_TEXT), Stack trace (when copying this message, always include the lines below):

Here is part of the server log that relevant for the test:

Details
...
2022.06.19 18:33:22.495867 [ 391343 ] {e0332447-5473-4653-ba8b-b976acb304a1} <Trace> InterpreterSystemQuery: Restarting replica on test_9.replicated_mutation_table
2022.06.19 18:33:22.503462 [ 390869 ] {} <Information> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Stopped being leader
...
2022.06.19 18:33:23.396760 [ 395825 ] {09ee374d-a8d9-47db-bdca-611d605b40c6} <Error> executeQuery: Code: 341. DB::Exception: Mutation is not finished because table shutdown was called. It will be done after table restart. (UNFINISHED) (version 22.6.1.1985 (official build)) (from [::1]:40558) (comment: '01414_mutations_and_errors_zookeeper.sh') (in query: ALTER TABLE replicated_mutation_table MODIFY COLUMN value UInt64 SETTINGS replication_alter_partitions_sync = 2), Stack trace (when copying this message, always include the lines below):
...
2022.06.19 18:33:23.467115 [ 390869 ] {} <Debug> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Loading data parts
2022.06.19 18:33:23.471062 [ 390869 ] {} <Debug> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Loaded data parts (3 items)
...
2022.06.19 18:33:23.515997 [ 390869 ] {} <Trace> test_9.replicated_mutation_table (ReplicatedMergeTreeRestartingThread): Restarting thread finished
...
2022.06.19 18:33:23.522475 [ 390869 ] {} <Trace> test_9.replicated_mutation_table (PartMovesBetweenShardsOrchestrator): PartMovesBetweenShardsOrchestrator thread finished
...
2022.06.19 18:33:24.960630 [ 391343 ] {e0332447-5473-4653-ba8b-b976acb304a1} <Error> executeQuery: Code: 57. DB::Exception: Cannot attach table with UUID 9b62c1d4-cf4a-4e41-bd11-bafb1446495c, because it was detached but still used by some query. Retry later. (TABLE_ALREADY_EXISTS) (version 22.6.1.1985 (official build)) (from [::1]:47448) (comment: 01646_system_restart_replicas_smoke.sql) (in query: SYSTEM RESTART REPLICAS;), Stack trace (when copying this message, always include the lines below):
...
2022.06.19 18:33:24.490940 [ 400623 ] {00c29852-e786-4e53-a44a-5f1c5f23c698} <Debug> executeQuery: (from [::1]:48804) (comment: '01414_mutations_and_errors_zookeeper.sh') SELECT distinct(value) FROM replicated_mutation_table ORDER BY value (stage: Complete)
2022.06.19 18:33:24.502168 [ 400623 ] {00c29852-e786-4e53-a44a-5f1c5f23c698} <Error> executeQuery: Code: 60. DB::Exception: Table test_9.replicated_mutation_table doesn't exist. (UNKNOWN_TABLE) (version 22.6.1.1985 (official build)) (from [::1]:48804) (comment: '01414_mutations_and_errors_zookeeper.sh') (in query: SELECT distinct(value) FROM replicated_mutation_table ORDER BY value), Stack trace (when copying this message, always include the lines below):
...
2022.06.19 18:33:25.048152 [ 395940 ] {bb31a17f-aca1-411a-ab30-c6b7598c59e5} <Debug> executeQuery: (from [::1]:49236) (comment: '01414_mutations_and_errors_zookeeper.sh') DROP TABLE IF EXISTS replicated_mutation_table (stage: Complete)

And if this table will be left, then checking error messages in
/var/log/clickhouse-server/clickhouse-server.backward.clean.log will
fail, like in 1.

azat added 5 commits June 20, 2022 07:43
Signed-off-by: Azat Khuzhin <[email protected]>
This will make troubleshooting easier.

Signed-off-by: Azat Khuzhin <[email protected]>
This test executes SYSTEM RESTART REPLICAS, that may leave some tables
in some tests, and the problem test is
01414_mutations_and_errors_zookeeper, that has invalid values in the
table that produces the following error:

    2022.06.19 19:02:07.165320 [ 1242562 ] {} <Error> MutateFromLogEntryTask: virtual bool DB::ReplicatedMergeMutateTaskBase::executeStep(): Code: 6. DB::Exception: Cannot parse string 'Hello' as UInt64: syntax error at begin of string. Note: there are toUInt64OrZero and toUInt64OrNull functions, which returns zero/NULL instead of throwing exception.: while executing 'FUNCTION _CAST(value :: 2, 'UInt64' :: 3) -> _CAST(value, 'UInt64') UInt64 : 4': (while reading from part /var/lib/clickhouse/store/700/70043200-eae1-44da-8554-0d43b7e936d7/20191002_1_1_0/): While executing MergeTreeInOrder. (CANNOT_PARSE_TEXT), Stack trace (when copying this message, always include the lines below):

Here is part of the server log that relevant for the test:

    ...
    2022.06.19 18:33:22.495867 [ 391343 ] {e0332447-5473-4653-ba8b-b976acb304a1} <Trace> InterpreterSystemQuery: Restarting replica on test_9.replicated_mutation_table
    2022.06.19 18:33:22.503462 [ 390869 ] {} <Information> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Stopped being leader
    ...
    2022.06.19 18:33:23.396760 [ 395825 ] {09ee374d-a8d9-47db-bdca-611d605b40c6} <Error> executeQuery: Code: 341. DB::Exception: Mutation is not finished because table shutdown was called. It will be done after table restart. (UNFINISHED) (version 22.6.1.1985 (official build)) (from [::1]:40558) (comment: '01414_mutations_and_errors_zookeeper.sh') (in query: ALTER TABLE replicated_mutation_table MODIFY COLUMN value UInt64 SETTINGS replication_alter_partitions_sync = 2), Stack trace (when copying this message, always include the lines below):
    ...
    2022.06.19 18:33:23.467115 [ 390869 ] {} <Debug> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Loading data parts
    2022.06.19 18:33:23.471062 [ 390869 ] {} <Debug> test_9.replicated_mutation_table (70043200-eae1-44da-8554-0d43b7e936d7): Loaded data parts (3 items)
    ...
    2022.06.19 18:33:23.515997 [ 390869 ] {} <Trace> test_9.replicated_mutation_table (ReplicatedMergeTreeRestartingThread): Restarting thread finished
    ...
    2022.06.19 18:33:23.522475 [ 390869 ] {} <Trace> test_9.replicated_mutation_table (PartMovesBetweenShardsOrchestrator): PartMovesBetweenShardsOrchestrator thread finished
    ...
    2022.06.19 18:33:24.960630 [ 391343 ] {e0332447-5473-4653-ba8b-b976acb304a1} <Error> executeQuery: Code: 57. DB::Exception: Cannot attach table with UUID 9b62c1d4-cf4a-4e41-bd11-bafb1446495c, because it was detached but still used by some query. Retry later. (TABLE_ALREADY_EXISTS) (version 22.6.1.1985 (official build)) (from [::1]:47448) (comment: 01646_system_restart_replicas_smoke.sql) (in query: SYSTEM RESTART REPLICAS;), Stack trace (when copying this message, always include the lines below):
    ...
    2022.06.19 18:33:24.490940 [ 400623 ] {00c29852-e786-4e53-a44a-5f1c5f23c698} <Debug> executeQuery: (from [::1]:48804) (comment: '01414_mutations_and_errors_zookeeper.sh') SELECT distinct(value) FROM replicated_mutation_table ORDER BY value (stage: Complete)
    2022.06.19 18:33:24.502168 [ 400623 ] {00c29852-e786-4e53-a44a-5f1c5f23c698} <Error> executeQuery: Code: 60. DB::Exception: Table test_9.replicated_mutation_table doesn't exist. (UNKNOWN_TABLE) (version 22.6.1.1985 (official build)) (from [::1]:48804) (comment: '01414_mutations_and_errors_zookeeper.sh') (in query: SELECT distinct(value) FROM replicated_mutation_table ORDER BY value), Stack trace (when copying this message, always include the lines below):
    ...
    2022.06.19 18:33:25.048152 [ 395940 ] {bb31a17f-aca1-411a-ab30-c6b7598c59e5} <Debug> executeQuery: (from [::1]:49236) (comment: '01414_mutations_and_errors_zookeeper.sh') DROP TABLE IF EXISTS replicated_mutation_table (stage: Complete)

And if this table will be left, then checking error messages in
/var/log/clickhouse-server/clickhouse-server.backward.clean.log will
fail, like in [1].

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/38205/90b57e7445d5167ea2170bfe03af29faffc195cf/stress_test__undefined__actions_.html

Signed-off-by: Azat Khuzhin <[email protected]>
@qoega qoega self-assigned this Jun 20, 2022
@qoega
Copy link
Copy Markdown
Member

qoega commented Jun 20, 2022

PullRequestCI / StressTestDebug (pull_request)

Error: 8:03:43 UTC 06/20/20222022-06-20T08:03:43.4902654Z ##[error]No space left on device : '/home/ubuntu/actions-runner/_diag/pages/eb54bfdf-2eac-40dc-a3e1-f3e8c16d523f_0a572bc3-c0d2-5a96-b992-c9a2bcf5ff56_1.log'

Stateless tests #38220

Integration tests (asan, actions) [3/3] no space left on device

@qoega qoega merged commit 24d5be1 into ClickHouse:master Jun 20, 2022
@tavplubix
Copy link
Copy Markdown
Member

@azat, @qoega, I'm not sure, but seems like the correct way to fix it is to add error to the list of errors to ignore or to add no-backward-compatibility-check instead of unconditionally disabling the test.

@azat azat deleted the no-stress branch June 20, 2022 14:33
@tavplubix tavplubix mentioned this pull request Jun 21, 2022
@azat
Copy link
Copy Markdown
Member Author

azat commented Jun 21, 2022

I'm not sure, but seems like the correct way to fix it is to add error to the list of errors to ignore or to add no-backward-compatibility-check instead of unconditionally disabling the test.

@tavplubix I was thinking about this, but I think that having more excludes for stress tests error log check does not scales better.

As for no-backward-compatiblity-check, there is no problems with this test itself it is completely compatible with previous versions, and maybe someday another check will be introduced that will run all tests for the previous release, so the purpose was to distinguish such kind of things, but it is up to you to decided, and I'm completely fine with replacing it with just no-backward-compatiblity-check tag.

What do you think?

@tavplubix
Copy link
Copy Markdown
Member

The problem is that BC check (unlike server-failed-to-start-check) is too strict and it fails on any error in server log. But some errors are expected and that's why we have the suppression list. I don't care if we will have 100 grep patterns in this list. But I don't like no-stress tag, because usually there's no reason to skip some tests in Stress Tests that were supposed to find crashes and that do not even check test output. We should fix errors detection in BC check instead.

As for no-backward-compatiblity-check, we're going to remove this tag at all and use git history instead.

azat added a commit to azat/ClickHouse that referenced this pull request Feb 15, 2023
This reverts commit 24d5be1, reversing
changes made to a756b4b.

After ClickHouse#38717 there is no need in --no-stress, since there will be no
such checks for previous releases that fails.

Reverts: ClickHouse#38212
Requires: ClickHouse#38717
alexey-milovidov added a commit that referenced this pull request Feb 16, 2023
Revert "Merge pull request #38212 from azat/no-stress"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants