Skip to content

Drop RESTART REPLICAS from stateless tests to avoid locking lots of mutexes#18897

Merged
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:stress-tests-no-restart-replicas
Jan 11, 2021
Merged

Drop RESTART REPLICAS from stateless tests to avoid locking lots of mutexes#18897
alexey-milovidov merged 3 commits intoClickHouse:masterfrom
azat:stress-tests-no-restart-replicas

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jan 10, 2021

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Drop RESTART REPLICAS from stateless tests to avoid locking lots of mutexes

Under TSan you can lock only not more then 64 mutexes from one thread at
once 1 2, while RESTART REPLICAS can acquire more (it depends on the
number of replicated tables).

And since stress tests run tests in parallel, you can have more then 64
ReplicatedMergeTree tables at once (even though it is unlikely).

Fix this by using RESTART REPLICA table over RESTART REPLICAS.

And also add a smoke test for SYSTEM RESTART REPLICAS

azat added 3 commits January 10, 2021 12:39
…utexes

Under TSan you can lock only not more then 64 mutexes from one thread at
once [1] [2], while RESTART REPLICAS can acquire more (it depends on the
number of replicated tables).

  [1]: google/sanitizers#950 (comment)
  [2]: https://github.com/llvm/llvm-project/blob/b02eab9058e58782fca32dd8b1e53c27ed93f866/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h#L67

And since stress tests run tests in parallel, you can have more then 64
ReplicatedMergeTree tables at once (even though it is unlikely).

Fix this by using RESTART REPLICA table over RESTART REPLICAS.
@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jan 10, 2021
@alexey-milovidov alexey-milovidov self-assigned this Jan 10, 2021
@azat azat changed the title Stress tests no restart replicas Drop RESTART REPLICAS from stateless tests to avoid locking lots of mutexes Jan 10, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member

test_testkeeper_back_to_back

@alesapin What's wrong?

@alexey-milovidov alexey-milovidov added the st-need-info We need extra data to continue (waiting for response). Either some details or a repro of the issue. label Jan 11, 2021
@alesapin
Copy link
Copy Markdown
Member

test_testkeeper_back_to_back

@alesapin What's wrong?

Suspicious.

@alesapin
Copy link
Copy Markdown
Member

alesapin commented Jan 11, 2021

2021.01.10 14:10:29.190107 [ 8 ] {} <Information> Application: Listening for connections to fake zookeeper (tcp): 0.0.0.0:9181
2021.01.10 14:10:29.300407 [ 10 ] {} <Trace> TestKeeperTCPHandlerFactory: Test keeper request. Address: 172.18.0.1:59556
2021.01.10 14:10:29.301208 [ 10 ] {} <Debug> TestKeeperTCPHandler: Received close request for session #0
2021.01.10 14:10:29.301304 [ 10 ] {} <Debug> TestKeeperTCPHandler: Sent close for session #0
2021.01.10 14:10:29.302137 [ 10 ] {} <Trace> TestKeeperTCPHandlerFactory: Test keeper request. Address: 172.18.0.1:59558
2021.01.10 14:10:29.362069 [ 11 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: 172.18.0.1:57028
<...>
2021.01.10 14:10:29.362237 [ 11 ] {} <Information> TCPHandler: Client has not sent any data.
2021.01.10 14:10:29.362250 [ 11 ] {} <Debug> TCPHandler: Done processing connection.
2021.01.10 14:10:59.303377 [ 10 ] {} <Debug> TestKeeperTCPHandler: Session #1 expired

???

@alesapin
Copy link
Copy Markdown
Member

These changes useful and don't related to the failure. Will investigate separately.

@alexey-milovidov alexey-milovidov merged commit f847322 into ClickHouse:master Jan 11, 2021
@azat azat deleted the stress-tests-no-restart-replicas branch January 11, 2021 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-docs-needed pr-not-for-changelog This PR should not be mentioned in the changelog st-need-info We need extra data to continue (waiting for response). Either some details or a repro of the issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants