Skip to content

Add stress test for distributed queries#21944

Merged
tavplubix merged 1 commit intoClickHouse:masterfrom
azat:dist-stress-test
Mar 31, 2021
Merged

Add stress test for distributed queries#21944
tavplubix merged 1 commit intoClickHouse:masterfrom
azat:dist-stress-test

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Mar 20, 2021

  • It may found issues like in stress test
  • Right now it find some hungs (due to endless Progress package)

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Mar 20, 2021
@azat azat marked this pull request as draft March 20, 2021 19:42
@azat azat force-pushed the dist-stress-test branch 4 times, most recently from 7d56542 to 3fba17c Compare March 21, 2021 07:25
@tavplubix
Copy link
Copy Markdown
Member

Btw, we are going to add a check that runs functional tests on a cluster of three nodes (two shards, two replicas in the first shard), so it will be possible to write simple .sql and .sh functional tests for distributed queries. #21690

@azat
Copy link
Copy Markdown
Member Author

azat commented Mar 22, 2021

Btw, we are going to add a check that runs functional tests on a cluster of three nodes (two shards, two replicas in the first shard), so it will be possible to write simple .sql and .sh functional tests for distributed queries. #21690

@tavplubix this looks very useful for stateless tests, however it still differs:

  • it does not allow to change global settings
  • the interaction is done via localhost, which is too fast for reproducing some issues

It may founds issue like in [1]:

    2021.03.18 19:05:38.783328 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Debug> executeQuery: (from 127.0.0.1:40918, using production parser) select * from dist where key = 0;
    2021.03.18 19:05:38.783760 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Debug> StorageDistributed (dist): Skipping irrelevant shards - the query will be sent to the following shards of the cluster (shard numbers): [1]
    2021.03.18 19:05:38.784012 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Trace> ContextAccess (default): Access granted: SELECT(key) ON default.dist
    2021.03.18 19:05:38.784410 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Trace> ContextAccess (default): Access granted: SELECT(key) ON default.dist
    2021.03.18 19:05:38.784488 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Trace> StorageDistributed (dist): Disabling force_optimize_skip_unused_shards for nested queries (force_optimize_skip_unused_shards_nesting exceeded)
    2021.03.18 19:05:38.784572 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Trace> InterpreterSelectQuery: Complete -> Complete
    2021.03.18 19:05:38.819063 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Information> executeQuery: Read 20 rows, 80.00 B in 0.035687783 sec., 560 rows/sec., 2.19 KiB/sec.
    2021.03.18 19:05:38.827842 [ 245 ] {4b1f5ec0-bf2d-478c-a2e1-d312531db206} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

    2021.03.18 19:05:38.867752 [ 547 ] {} <Fatal> BaseDaemon: ########################################
    2021.03.18 19:05:38.867959 [ 547 ] {} <Fatal> BaseDaemon: (version 21.4.1.1, build id: A0ADEC175BD65E58EA012C47C265E661C32D23B5) (from thread 245) (query_id: 4b1f5ec0-bf2d-478c-a2e1-d312531db206) Received signal Aborted (6)
    2021.03.18 19:05:38.868733 [ 547 ] {} <Fatal> BaseDaemon:
    2021.03.18 19:05:38.868958 [ 547 ] {} <Fatal> BaseDaemon: Stack trace: 0x7fd1394be18b 0x7fd13949d859 0x10c4c99b 0xd434ee1 0xd434f1a
    2021.03.18 19:05:38.870135 [ 547 ] {} <Fatal> BaseDaemon: 3. gsignal @ 0x4618b in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    2021.03.18 19:05:38.870383 [ 547 ] {} <Fatal> BaseDaemon: 4. abort @ 0x25859 in /usr/lib/x86_64-linux-gnu/libc-2.31.so
    2021.03.18 19:05:38.886783 [ 547 ] {} <Fatal> BaseDaemon: 5. /work3/azat/ch/clickhouse/.cmake/../contrib/libunwind/src/UnwindLevel1.c:396: _Unwind_Resume @ 0x10c4c99b in /usr/bin/clickhouse
    2021.03.18 19:05:47.200208 [ 547 ] {} <Fatal> BaseDaemon: 6. ? @ 0xd434ee1 in /usr/bin/clickhouse
    2021.03.18 19:05:47.348738 [ 547 ] {} <Fatal> BaseDaemon: 7.1. inlined from /work3/azat/ch/clickhouse/.cmake/../contrib/boost/boost/context/fiber_fcontext.hpp:253: boost::context::fiber::~fiber()
    2021.03.18 19:05:47.349118 [ 547 ] {} <Fatal> BaseDaemon: 7.2. inlined from ../contrib/boost/boost/context/fiber_fcontext.hpp:252: boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, DB::RemoteQueryExecutorRoutine>::run(void*)
    2021.03.18 19:05:47.349163 [ 547 ] {} <Fatal> BaseDaemon: 7. ../contrib/boost/boost/context/fiber_fcontext.hpp:80: void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, DB::RemoteQueryExecutorRoutine> >(boost::context::detail::transfer_t) @ 0xd434f1a in /usr/bin/clickhouse
    2021.03.18 19:05:47.618174 [ 547 ] {} <Fatal> BaseDaemon: Calculated checksum of the binary: FF3BA83D0CD648741EEEC242CB1966D9. There is no information about the reference checksum.

  [1]: https://clickhouse-test-reports.s3.yandex.net/0/1b2ed51ff5e4a3dc45567d4967108f43f680c884/stress_test_(debug).html#fail1
@azat azat force-pushed the dist-stress-test branch from 3fba17c to 9db74c4 Compare March 25, 2021 22:05
@azat azat marked this pull request as ready for review March 25, 2021 22:05
@azat
Copy link
Copy Markdown
Member Author

azat commented Mar 26, 2021

AST fuzzer (TSan) — Lost connection to server. See the logs.

2021.03.26 03:28:23.310623 [ 38 ] {} <Fatal> Application: Child process was terminated by signal 9 (KILL). If it is not done by 'forcestop' command or manually, the possible cause is OOM Killer (see 'dmesg' and look at the '/var/log/kern.log' for the details).

@tavplubix tavplubix self-assigned this Mar 31, 2021
@tavplubix tavplubix merged commit 5fa2244 into ClickHouse:master Mar 31, 2021
@azat azat deleted the dist-stress-test branch April 1, 2021 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants