Skip to content

Fix 02050_client_profile_events flakiness#32493

Merged
alesapin merged 6 commits intoClickHouse:masterfrom
azat:fix-02050_client_profile_events
Dec 20, 2021
Merged

Fix 02050_client_profile_events flakiness#32493
alesapin merged 6 commits intoClickHouse:masterfrom
azat:fix-02050_client_profile_events

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Dec 10, 2021

Changelog:

  • Do not loose ProfileEvents in case of thread destroyed before
  • Merge ProfileEvents in case they were not printed

CI: https://s3.amazonaws.com/clickhouse-test-reports/32303/24751e7d45d94541be854c86ce46d65c2e0f66da/stateless_tests__thread__actions_.html

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Cc: @novikd

@robot-clickhouse robot-clickhouse added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 10, 2021
@azat azat marked this pull request as draft December 10, 2021 11:33
@azat azat force-pushed the fix-02050_client_profile_events branch 2 times, most recently from 2dd7872 to e853b88 Compare December 13, 2021 18:06
@azat azat marked this pull request as ready for review December 13, 2021 18:08
@novikd novikd self-assigned this Dec 13, 2021
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 14, 2021

BuilderDebUBsan

Some internal github actions issue:

  Error: fatal: Unable to create '/home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/.git/modules/contrib/aws/index.lock': File exists.
...
  Unable to checkout '00b03604543367d7e310cb0993973fdcb723ea79' in submodule path 'contrib/aws'

Will do a rebase.

@azat azat force-pushed the fix-02050_client_profile_events branch from e853b88 to f6971ff Compare December 14, 2021 07:17
@azat azat force-pushed the fix-02050_client_profile_events branch from f6971ff to 3da2e1d Compare December 14, 2021 08:26
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 16, 2021

Integration tests (asan, actions) [2/3] — fail: 1, passed: 543, flaky: 0

Integration tests (release, actions) [1/2] — fail: 1, passed: 746, flaky: 0

Integration tests (thread, actions) [1/4] — fail: 101, passed: 1, flaky: 0

compose.cli.verbose_proxy.proxy_callable: docker pull <- ('clickhouse/integration-test', tag='latest', stream=True, platform=None)
compose.cli.errors.log_api_error: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Integration tests (thread, actions) [2/4] — fail: 1, passed: 459, flaky: 2

  • test_replicated_fetches_timeouts/test.py::test_no_stall

Stress test (debug, actions) — Killed by signal (in clickhouse-server.log)

Stress test (thread, actions) — Killed by signal (in clickhouse-server.log)

==516==ERROR: ThreadSanitizer failed to allocate 0x12000 (73728) bytes of ThreadSignalContext (error code: 12)
ERROR: Failed to mmap

Stress test (undefined, actions) — check_status.tsv doesn't exists

Some internal issue?

But let's run CI one more time (just in case).

@azat azat force-pushed the fix-02050_client_profile_events branch from 3da2e1d to cf14e4a Compare December 16, 2021 05:07
@azat
Copy link
Copy Markdown
Member Author

azat commented Dec 16, 2021

Now better:

AST fuzzer (debug, actions) — Task failed: $?=210

Hm, some issue in getaddrinfo

Details
2021.12.16 08:45:41.274358 [ 230 ] {} <Trace> BaseDaemon: Received signal 5
2021.12.16 08:45:41.274778 [ 234 ] {} <Fatal> BaseDaemon: ########################################
2021.12.16 08:45:41.274872 [ 234 ] {} <Fatal> BaseDaemon: (version 21.13.1.538, build id: 2D910F2D2502C654) (from thread 99) (no query) Received signal Trace/breakpoint trap (5)
2021.12.16 08:45:41.274933 [ 234 ] {} <Fatal> BaseDaemon: 
2021.12.16 08:45:41.275024 [ 234 ] {} <Fatal> BaseDaemon: Stack trace: 0x7ff1a47f108f 0x7ff1a46f58b8 0x7ff1a47f05fa 0x7ff1a46f4861 0x7ff1a46f58b8 0x7ff1a46f5983 0x7ff1a46f4995 0x7ff1a46d862c 0x7ff1a46d8ed9 0x7ff1a46994f5 0x7ff1a469b0d9 0x27d23d34 0x27d24f16 0x27cf0203 0x27cf0158 0x157e1982 0x27d79645 0x157d8956 0x27d9a2d3 0x157d67e7 0x156f4e65 0x7ff1a45b90b3 0x156f4a6e
2021.12.16 08:45:41.275106 [ 234 ] {} <Fatal> BaseDaemon: 4. ? @ 0x7ff1a47f108f in ?
2021.12.16 08:45:41.275153 [ 234 ] {} <Fatal> BaseDaemon: 5. _dl_catch_exception @ 0x7ff1a46f58b8 in ?
2021.12.16 08:45:41.275189 [ 234 ] {} <Fatal> BaseDaemon: 6. ? @ 0x7ff1a47f05fa in ?
2021.12.16 08:45:41.275226 [ 234 ] {} <Fatal> BaseDaemon: 7. ? @ 0x7ff1a46f4861 in ?
2021.12.16 08:45:41.275264 [ 234 ] {} <Fatal> BaseDaemon: 8. _dl_catch_exception @ 0x7ff1a46f58b8 in ?
2021.12.16 08:45:41.275302 [ 234 ] {} <Fatal> BaseDaemon: 9. _dl_catch_error @ 0x7ff1a46f5983 in ?
2021.12.16 08:45:41.275340 [ 234 ] {} <Fatal> BaseDaemon: 10. __libc_dlopen_mode @ 0x7ff1a46f4995 in ?
2021.12.16 08:45:41.275395 [ 234 ] {} <Fatal> BaseDaemon: 11. ? @ 0x7ff1a46d862c in ?
2021.12.16 08:45:41.275433 [ 234 ] {} <Fatal> BaseDaemon: 12. __nss_lookup_function @ 0x7ff1a46d8ed9 in ?
2021.12.16 08:45:41.275470 [ 234 ] {} <Fatal> BaseDaemon: 13. ? @ 0x7ff1a46994f5 in ?
2021.12.16 08:45:41.275507 [ 234 ] {} <Fatal> BaseDaemon: 14. getaddrinfo @ 0x7ff1a469b0d9 in ?
2021.12.16 08:45:41.362055 [ 234 ] {} <Fatal> BaseDaemon: 15. ./obj-x86_64-linux-gnu/../contrib/poco/Net/src/DNS.cpp:79: Poco::Net::DNS::hostByName(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned int) @ 0x27d23d34 in /workspace/clickhouse
2021.12.16 08:45:41.445293 [ 234 ] {} <Fatal> BaseDaemon: 16. ./obj-x86_64-linux-gnu/../contrib/poco/Net/src/DNS.cpp:194: Poco::Net::DNS::thisHost() @ 0x27d24f16 in /workspace/clickhouse
2021.12.16 08:45:41.517369 [ 234 ] {} <Fatal> BaseDaemon: 17. ./obj-x86_64-linux-gnu/../base/base/getFQDNOrHostName.cpp:11: (anonymous namespace)::getFQDNOrHostNameImpl() @ 0x27cf0203 in /workspace/clickhouse
2021.12.16 08:45:41.578008 [ 234 ] {} <Fatal> BaseDaemon: 18. ./obj-x86_64-linux-gnu/../base/base/getFQDNOrHostName.cpp:23: getFQDNOrHostName() @ 0x27cf0158 in /workspace/clickhouse
2021.12.16 08:45:41.670984 [ 234 ] {} <Fatal> BaseDaemon: 19. ./obj-x86_64-linux-gnu/../programs/server/Server.cpp:769: DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0x157e1982 in /workspace/clickhouse
2021.12.16 08:45:41.763793 [ 234 ] {} <Fatal> BaseDaemon: 20. ./obj-x86_64-linux-gnu/../contrib/poco/Util/src/Application.cpp:334: Poco::Util::Application::run() @ 0x27d79645 in /workspace/clickhouse
2021.12.16 08:45:41.900908 [ 234 ] {} <Fatal> BaseDaemon: 21. ./obj-x86_64-linux-gnu/../programs/server/Server.cpp:401: DB::Server::run() @ 0x157d8956 in /workspace/clickhouse
2021.12.16 08:45:41.978903 [ 234 ] {} <Fatal> BaseDaemon: 22. ./obj-x86_64-linux-gnu/../contrib/poco/Util/src/ServerApplication.cpp:611: Poco::Util::ServerApplication::run(int, char**) @ 0x27d9a2d3 in /workspace/clickhouse
2021.12.16 08:45:42.115231 [ 234 ] {} <Fatal> BaseDaemon: 23. ./obj-x86_64-linux-gnu/../programs/server/Server.cpp:179: mainEntryClickHouseServer(int, char**) @ 0x157d67e7 in /workspace/clickhouse
2021.12.16 08:45:42.132032 [ 234 ] {} <Fatal> BaseDaemon: 24. ./obj-x86_64-linux-gnu/../programs/main.cpp:378: main @ 0x156f4e65 in /workspace/clickhouse
2021.12.16 08:45:42.132085 [ 234 ] {} <Fatal> BaseDaemon: 25. __libc_start_main @ 0x7ff1a45b90b3 in ?
2021.12.16 08:45:46.044067 [ 234 ] {} <Fatal> BaseDaemon: 26. _start @ 0x156f4a6e in /workspace/clickhouse
2021.12.16 08:45:47.156760 [ 234 ] {} <Fatal> BaseDaemon: Calculated checksum of the binary: 27B40AB4F18BD9CC69E2165982CA0BF4. There is no information about the reference checksum.
2021.12.16 08:45:47.156830 [ 234 ] {} <Information> SentryWriter: Not sending crash report

Integration tests (asan, actions) [2/3] — fail: 1, passed: 546, flaky: 0

Integration tests (asan, actions) [3/3] — fail: 1, passed: 626, flaky: 1

  • test_s3_zero_copy_replication/test.py::test_s3_zero_copy_replication[s3]

Integration tests (release, actions) [2/2] — fail: 1, passed: 929, flaky: 1

  • test_merge_tree_blob_storage/test.py::test_freeze_unfreeze
E           helpers.client.QueryRuntimeException: Client failed! Return code: 233, stderr: Received exception from server (version 21.13.1):
E           Code: 1001. DB::Exception: Received from 172.16.12.3:9000. DB::Exception: Azure::Storage::StorageException: The specified blob does not exist.

@azat azat requested a review from novikd December 17, 2021 19:23
azat added 6 commits December 19, 2021 23:43
…f failure

Sometimes [1] the test fails like this:

    2021-12-10 03:41:19 --- /usr/share/clickhouse-test/queries/0_stateless/02050_client_profile_events.reference	2021-12-10 03:23:43.000000000 -0500
    2021-12-10 03:41:19 +++ /tmp/clickhouse-test/0_stateless/02050_client_profile_events.617.stdout	2021-12-10 03:41:19.509611205 -0500
    2021-12-10 03:41:19 @@ -1,4 +1,3 @@
    2021-12-10 03:41:19  0
    2021-12-10 03:41:19 -SelectedRows: 131010 (increment)
    2021-12-10 03:41:19  OK
    2021-12-10 03:41:19  OK
    2021-12-10 03:41:19
    2021-12-10 03:41:19
    2021-12-10 03:41:19 Database: test_73d5o0

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/32493/703213a6444f8014e3324df4b6e44d03fa351294/stateless_tests_flaky_check__address__actions_.html

And I did not find anything strange in server logs (and there was 0
exceptions).
v2: drop std::move and add copy ctor for ProfileEvents::Counter::Snapshot
v2: remove std::move
That way with --profile-events-delay-ms=-1 you will always get totals.

Plus, this will fix periodic failures, that can be reproduced by
limitting CPU (5% is enough in my setup), i.e.:

    $ systemd-run --collect --unit ch -p CPUQuota=5% --user clickhouse-server
    $ while clickhouse-client --print-profile-events --profile-events-delay-ms=-1 -q 'select * from numbers (1e5) format Null' |& tee /dev/stderr | fgrep 'SelectedRows: 131010 (increment)'; do :; done

And as a bonus it will make 02050_client_profile_events deterministic.
@azat azat force-pushed the fix-02050_client_profile_events branch from cf14e4a to 1d25ec3 Compare December 19, 2021 20:43
@alesapin alesapin merged commit 5958cac into ClickHouse:master Dec 20, 2021
@azat azat deleted the fix-02050_client_profile_events branch December 20, 2021 18:17
@alexey-milovidov
Copy link
Copy Markdown
Member

@azat This pull request slows down short queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants