Skip to content

Integration test test_backup_restore_on_cluster/test_disallow_concurrency.py is flaky (a bit) #68012

@alexey-milovidov

Description

@alexey-milovidov

Test name: test_backup_restore_on_cluster/test_disallow_concurrency.py
Failure reason: helpers.client.QueryRuntimeException: Client failed! Return code: 159, stderr: Received exception from server
CI report: Integration tests (amd_tsan, 5/6)

CIDB statistics: cidb

Test output:

File: test_backup_restore_on_cluster/test_disallow_concurrency.py:191 - in test_concurrent_backups_on_same_node
    nodes[0].query(f"RESTORE TABLE tbl ON CLUSTER 'cluster' FROM {backup_name}")
File: helpers/cluster.py:4479 - in query
    return self.client.query(
File: helpers/client.py:40 - in wrap
    return func(self, *args, **kwargs)
File: helpers/client.py:80 - in query
    ).get_answer()
File: helpers/client.py:256 - in get_answer
    stdout, stderr = self.wait_and_read_output()
File: helpers/client.py:241 - in wait_and_read_output
    self.process.wait(timeout=DEFAULT_QUERY_TIMEOUT)
File: /usr/lib/python3.10/subprocess.py:1209 - in wait
    return self._wait(timeout=timeout)
File: /usr/lib/python3.10/subprocess.py:1951 - in _wait
    raise TimeoutExpired(self.args, timeout)
E   subprocess.TimeoutExpired: Command '['/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/ci/tmp/clickhouse', 'client', '--host', '172.16.2.5', '--port', '9000', '--stacktrace']' timed out after 600 seconds
[teardown]
File: test_backup_restore_on_cluster/test_disallow_concurrency.py:45 - in drop_after_test
    node0.query(
File: helpers/cluster.py:4479 - in query
    return self.client.query(
File: helpers/client.py:40 - in wrap
    return func(self, *args, **kwargs)
File: helpers/client.py:80 - in query
    ).get_answer()
File: helpers/client.py:269 - in get_answer
    raise QueryRuntimeException(
E   helpers.client.QueryRuntimeException: Client failed! Return code: 159, stderr: Received exception from server (version 26.1.1):
E   Code: 159. DB::Exception: Received from 172.16.2.5:9000. DB::Exception: Distributed DDL task /clickhouse/task_queue/ddl/query-0000000004 is not finished on 2 of 2 hosts (0 of them are currently executing the task, 0 are inactive). They are going to execute the query in background. Was waiting for 360.934403823 seconds, which is longer than distributed_ddl_task_timeout. Stack trace:
E   
E   0. ./contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x000000002f2008a0
E   1. ./ci/tmp/build/./src/Common/Exception.cpp:136: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000164ef978
E   2. ./src/Common/Exception.h:172: DB::Exception::Exception(String&&, int, String, bool) @ 0x0000000009cb086e
E   3. ./src/Common/Exception.h:58: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009cb001a
E   4. ./src/Common/Exception.h:190: DB::Exception::Exception<String&, unsigned long&, unsigned long, unsigned long&, unsigned long, double, char const*>(int, FormatStringHelperImpl<std::type_identity<String&>::type, std::type_identity<unsigned long&>::type, std::type_identity<unsigned long>::type, std::type_identity<unsigned long&>::type, std::type_identity<unsigned long>::type, std::type_identity<double>::type, std::type_identity<char const*>::type>, String&, unsigned long&, unsigned long&&, unsigned long&, unsigned long&&, double&&, char const*&&) @ 0x00000000206e53e5
E   5. ./ci/tmp/build/./src/Interpreters/DDLOnClusterQueryStatusSource.cpp:85: DB::DDLOnClusterQueryStatusSource::handleTimeoutExceeded() @ 0x0000000020b14b20
E   6. ./ci/tmp/build/./src/Interpreters/DistributedQueryStatusSource.cpp:206: DB::DistributedQueryStatusSource::generate() @ 0x000000002038779d
E   7. ./ci/tmp/build/./src/Processors/ISource.cpp:144: DB::ISource::tryGenerate() @ 0x0000000026b625b8
E   8. ./ci/tmp/build/./src/Processors/ISource.cpp:110: DB::ISource::work() @ 0x0000000026b620ae
E   9. ./ci/tmp/build/./src/Processors/Executors/ExecutionThreadContext.cpp:53: DB::ExecutionThreadContext::executeTask() @ 0x0000000026b8a3b9
E   10. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:351: DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x0000000026b76759
E   11. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:279: DB::PipelineExecutor::executeImpl(unsigned long, bool) @ 0x0000000026b7591e
E   12. ./ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:136: DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000026b7536a
E   13. ./ci/tmp/build/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:76: void std::__function::__policy_func<void ()>::__call_func[abi:ne210105]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'()>(std::__function::__policy_storage const*) @ 0x0000000026b9658b
E   14. ./contrib/llvm-project/libcxx/include/__functional/function.h:508: ? @ 0x00000000166bf6f1
E   15. ./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:0: void* std::__thread_proxy[abi:ne210105]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x00000000166c97dc
E   16. __tsan_thread_start_func @ 0x0000000009c1b2a8
E   17. ? @ 0x0000000000094ac3
E   18. ? @ 0x00000000001268c0

Metadata

Metadata

Assignees

Labels

flaky testflaky test found by CItestingSpecial issue with list of bugs found by CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions