Skip to content

tests: enable back 02232_dist_insert_send_logs_level_hung#39124

Merged
tavplubix merged 1 commit intoClickHouse:masterfrom
azat:fix-02232_dist_insert_send_logs_level_hung
Aug 1, 2022
Merged

tests: enable back 02232_dist_insert_send_logs_level_hung#39124
tavplubix merged 1 commit intoClickHouse:masterfrom
azat:fix-02232_dist_insert_send_logs_level_hung

Conversation

@azat
Copy link
Copy Markdown
Member

@azat azat commented Jul 12, 2022

The original issue was that log_comment was wrong, and this test relies
on correct log_comment, but this had been fixed already in #37932.

Fixes: #38219 (@tavplubix)

P.S. @tavplubix thanks for the detailed reports

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@robot-ch-test-poll robot-ch-test-poll added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jul 12, 2022
@tavplubix tavplubix self-assigned this Jul 12, 2022
@tavplubix
Copy link
Copy Markdown
Member

Seems like it was not only about log_comment, you can find hung query in the report (elapsed 873s):

{
        "is_initial_query": 1,
        "user": "default",
        "query_id": "1276909a-129d-4b58-a04b-8a1aa5a64706",
        "address": "::1",
        "port": 38790,
        "initial_user": "default",
        "initial_query_id": "1276909a-129d-4b58-a04b-8a1aa5a64706",
        "initial_address": "::1",
        "initial_port": 38790,
        "interface": 1,
        "os_user": "",
        "client_hostname": "2ad5976f619f",
        "client_name": "ClickHouse",
        "client_revision": "54455",
        "client_version_major": "22",
        "client_version_minor": "7",
        "client_version_patch": "1",
        "http_method": 0,
        "http_user_agent": "",
        "http_referer": "",
        "forwarded_for": "",
        "quota_key": "",
        "distributed_depth": "0",
        "elapsed": 873.936196012,
        "is_cancelled": 0,
        "is_all_data_sent": 0,
        "read_rows": "167690",
        "read_bytes": "1341520",
        "total_rows_approx": "1000000",
        "written_rows": "167689",
        "written_bytes": "670756",
        "memory_usage": "6487077",
        "peak_memory_usage": "6786269",
        "query": "insert into function remote('127.2', currentDatabase(), in_02232) select * from numbers(1e6)",
        "thread_ids": [
            "2137",
            "42120",
            "13006"
        ],

So probably something's wrong with server.

@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 12, 2022

Not really, log_comment is indeed wrong here too:

           "log_comment": "'02232_dist_insert_send_logs_level_hung.sh'",

Note extra single quote.

@tavplubix
Copy link
Copy Markdown
Member

Hm, I got it, so it's because KILL QUERY used to fail due to invalid quoting and query might hang for a long time, because timeouts are set to huge values (86400 seconds). But seems like it will not work in Stress Tests, because any query may fail at any point (for example, because of memory_tracker_fault_probability). So tests that set timeouts to huge values and rely on KILL QUERY will make Hung Check in Stress Tests flaky.

@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 12, 2022

Hm, I got it, so it's because KILL QUERY used to fail due to invalid quoting and query might hang for a long time, because timeouts are set to huge values (86400 seconds).

Yes

But seems like it will not work in Stress Tests, because any query may fail at any point (for example, because of memory_tracker_fault_probability). So tests that set timeouts to huge values and rely on KILL QUERY will make Hung Check in Stress Tests flaky.

Indeed. I thought about disable it for stress tests, but paused this because of #38750

@azat azat marked this pull request as draft July 13, 2022 06:14
The original issue was that log_comment was wrong, and this test relies
on correct log_comment, but this had been fixed already in ClickHouse#37932.

Also adjust the timeout to avoid possible query hung check failures.

Signed-off-by: Azat Khuzhin <[email protected]>
@azat azat force-pushed the fix-02232_dist_insert_send_logs_level_hung branch from 5885877 to aff8c12 Compare July 31, 2022 11:27
@azat azat marked this pull request as ready for review July 31, 2022 11:27
@azat
Copy link
Copy Markdown
Member Author

azat commented Jul 31, 2022

@tavplubix I've adjust timeout for INSERT, this should be enough, let's see if this will help

@tavplubix
Copy link
Copy Markdown
Member

Integration tests (asan) [2/3] - ERROR: for zoo3 Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (why didn't it pull image from our cache?), cc: @Felixoid
Stateless tests (ubsan) - 02317_distinct_in_order_optimization - #39688 (comment)

@tavplubix tavplubix merged commit 3bb0603 into ClickHouse:master Aug 1, 2022
@Felixoid
Copy link
Copy Markdown
Member

Felixoid commented Aug 1, 2022

@tavplubix
Copy link
Copy Markdown
Member

Something went wrong:

2022-08-01 16:21:06 02232_dist_insert_send_logs_level_hung:                                 [ FAIL ] 44.73 sec. - having stderror: 
2022-08-01 16:21:06 Received exception from server (version 22.8.1):
2022-08-01 16:21:06 Code: 163. DB::Exception: Received from localhost:9000. DB::Exception: Received from 127.2:9000. DB::Exception: Query analyze overflow. Try to increase `max_pipeline_depth` or simplify the query: while pushing to view test_epp1sq.mv_02232 (25f6c5b7-3c06-4cb9-b957-fbeba1b5cc50). (TOO_DEEP_PIPELINE)
2022-08-01 16:21:06 (query: insert into function remote('127.2', currentDatabase(), in_02232) select * from numbers(1e6))
2022-08-01 16:21:06 
2022-08-01 16:21:06 stdout:
2022-08-01 16:21:06 0
2022-08-01 16:21:06 
2022-08-01 16:21:06 
2022-08-01 16:21:06 Settings used in the test: --max_insert_threads=16 --group_by_two_level_threshold=100000 --group_by_two_level_threshold_bytes=1152921504606846976 --distributed_aggregation_memory_efficient=1 --fsync_metadata=1 --output_format_parallel_formatting=0 --input_format_parallel_parsing=0 --min_chunk_bytes_for_parallel_parsing=20967610 --max_read_buffer_size=705524 --prefer_localhost_replica=0 --max_block_size=29388 --max_threads=28 --optimize_or_like_chain=0 --optimize_read_in_order=1 --read_in_order_two_level_merge_threshold=22 --optimize_aggregation_in_order=1 --aggregation_in_order_max_block_bytes=23690269 --use_uncompressed_cache=1 --min_bytes_to_use_direct_io=0 --min_bytes_to_use_mmap_io=0 --local_filesystem_read_method=pread --remote_filesystem_read_method=threadpool --local_filesystem_read_prefetch=0 --remote_filesystem_read_prefetch=0 --compile_expressions=1 --compile_aggregate_expressions=0 --compile_sort_description=1 --merge_tree_coarse_index_granularity=23 --optimize_distinct_in_order=1
2022-08-01 16:21:06 

@tavplubix
Copy link
Copy Markdown
Member

It was a kind of race condition with #38185
But I do not understand why does this test generates too deep pipeline: queries are quite simple and should not exceed default max_pipeline_depth limit (10000). Probaly analyze count is calculated incorrectly. @vdimir, please take a look

@azat azat deleted the fix-02232_dist_insert_send_logs_level_hung branch August 1, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hung check failed on insertion into Distributed

4 participants