Skip to content

Increase operation_timeout_ms for database_replicated tests#29462

Closed
azat wants to merge 1 commit intoClickHouse:masterfrom
azat:tests-database-replicated-increase-keeper-timeout
Closed

Increase operation_timeout_ms for database_replicated tests#29462
azat wants to merge 1 commit intoClickHouse:masterfrom
azat:tests-database-replicated-increase-keeper-timeout

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Sep 28, 2021

CI reports 1:

Details 2021.09.28 09:49:37.205294 [ 640 ] {} KeeperStateMachine: Commit request for session 5 with type Remove, log id 770498, path /test/01154_move_partition_long_test_kue21q/s1/src/replicas/r2_1264626517/queue/queue-0000000321 2021.09.28 09:49:37.267731 [ 586 ] {} KeeperTCPHandler: Received heartbeat for session #2 2021.09.28 09:49:39.610989 [ 667 ] {} SystemLog (system.asynchronous_metric_log): Flushing system log, 5796 entries to flush up to offset 3494292 2021.09.28 09:49:43.081774 [ 591 ] {} KeeperTCPHandler: Received heartbeat for session #4 2021.09.28 09:49:43.580865 [ 658 ] {} SystemLog (system.query_thread_log): Flushing system log, 1 entries to flush up to offset 22082 2021.09.28 09:49:43.681288 [ 654 ] {} SystemLog (system.query_log): Flushing system log, 1 entries to flush up to offset 43707 2021.09.28 09:49:44.034109 [ 919 ] {} system.asynchronous_metric_log (bb13afbb-543d-4730-bb13-afbb543d9730) (MergerMutator): Selected 5 parts from 202109_596_596_0 to 202109_600_600_0 2021.09.28 09:49:48.835504 [ 586 ] {} KeeperTCPHandler: Got exception processing session #2: Code: 210. DB::NetException: I/O error: Broken pipe, while writing to socket ([::1]:55544). (NETWORK_ERROR), Stack trace (...)
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x94fe394 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
1. DB::WriteBufferFromPocoSocket::nextImpl() @ 0x10e0a77c in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
2. Coordination::ZooKeeperResponse::write(DB::WriteBuffer&) const @ 0x12316383 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
3. DB::KeeperTCPHandler::runImpl() @ 0x11ef877d in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
4. Poco::Net::TCPServerConnection::start() @ 0x14b5aa2f in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
5. Poco::Net::TCPServerDispatcher::run() @ 0x14b5ce21 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
6. Poco::PooledThread::run() @ 0x14c715c9 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
7. Poco::ThreadImpl::runnableEntry(void*) @ 0x14c6ed00 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

And according to logs seems that it requires a little bit higher timeout
in this case (I tried to look at trace_log, but keeper threads are not
traceable).

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Cc: @tavplubix

CI reports [1]:

    2021.09.28 09:49:37.205294 [ 640 ] {} <Test> KeeperStateMachine: Commit request for session 5 with type Remove, log id 770498, path /test/01154_move_partition_long_test_kue21q/s1/src/replicas/r2_1264626517/queue/queue-0000000321
    2021.09.28 09:49:37.267731 [ 586 ] {} <Trace> KeeperTCPHandler: Received heartbeat for session ClickHouse#2
    2021.09.28 09:49:39.610989 [ 667 ] {} <Trace> SystemLog (system.asynchronous_metric_log): Flushing system log, 5796 entries to flush up to offset 3494292
    2021.09.28 09:49:43.081774 [ 591 ] {} <Trace> KeeperTCPHandler: Received heartbeat for session ClickHouse#4
    2021.09.28 09:49:43.580865 [ 658 ] {} <Trace> SystemLog (system.query_thread_log): Flushing system log, 1 entries to flush up to offset 22082
    2021.09.28 09:49:43.681288 [ 654 ] {} <Trace> SystemLog (system.query_log): Flushing system log, 1 entries to flush up to offset 43707
    2021.09.28 09:49:44.034109 [ 919 ] {} <Debug> system.asynchronous_metric_log (bb13afbb-543d-4730-bb13-afbb543d9730) (MergerMutator): Selected 5 parts from 202109_596_596_0 to 202109_600_600_0
    2021.09.28 09:49:48.835504 [ 586 ] {} <Information> KeeperTCPHandler: Got exception processing session ClickHouse#2: Code: 210. DB::NetException: I/O error: Broken pipe, while writing to socket ([::1]:55544). (NETWORK_ERROR), Stack trace (...)

    0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x94fe394 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    1. DB::WriteBufferFromPocoSocket::nextImpl() @ 0x10e0a77c in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    2. Coordination::ZooKeeperResponse::write(DB::WriteBuffer&) const @ 0x12316383 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    3. DB::KeeperTCPHandler::runImpl() @ 0x11ef877d in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    4. Poco::Net::TCPServerConnection::start() @ 0x14b5aa2f in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    5. Poco::Net::TCPServerDispatcher::run() @ 0x14b5ce21 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    6. Poco::PooledThread::run() @ 0x14c715c9 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    7. Poco::ThreadImpl::runnableEntry(void*) @ 0x14c6ed00 in /usr/lib/debug/.build-id/69/a5fe716478bb310e86532ade0d971b828da8f9.debug
    8. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
    9. __clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so

  [1]: https://clickhouse-test-reports.s3.yandex.net/28760/20bdda60084621a2589cc45303d0fb2de0f33f5a/functional_stateless_tests_(release,_databasereplicated).html#fail1

And according to logs seems that it requires a little bit higher timeout
in this case (I tried to look at trace_log, but keeper threads are not
traceable).
@alesapin
Copy link
Copy Markdown
Member

We have very strange quirks in our CI https://markify.dev/p/sr1hamK. Look at times in logs.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Sep 28, 2021

CLA assistant check
All committers have signed the CLA.

@tavplubix
Copy link
Copy Markdown
Member

I don't think that increasing the timeout will really help, 10 seconds should be enough in most cases.

@azat
Copy link
Copy Markdown
Member Author

azat commented Sep 28, 2021

I don't think that increasing the timeout will really help, 10 seconds should be enough in most cases.

Agree, I though about this as a temporary solution.

But maybe initial issue of this PR and log with mixed dates from @alesapin related, and all of this is due to slow filesystem.

@azat azat closed this Sep 28, 2021
@azat azat deleted the tests-database-replicated-increase-keeper-timeout branch July 9, 2022 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants