Skip to content

00409_shard_limit_by flakiness (likely something wrong with distributed queries processing) #66244

@azat

Description

@azat

The initial query never finished, but there is nothing specific for this query:

SELECT dummy FROM remote('127.0.0.{2,3}', system.one) LIMIT 1 BY dummy

So I think it is some kind of possible issue for distributed queries.

$ zstd -cdq server.log.zst | grep  -e 00409_shard_limit_by -e 61c71f3d-94a5-4010-b665-5950ec5129a3 -e c9f30e37-30cd-4938-8198-e816f82d82e8 -e 77bd16d7-d37d-4693-8ec1-533d3ec75b4d -a
2024.07.08 12:24:31.738908 [ 181084 ] {} <Trace> DynamicQueryHandler: Request URI: /?query=SELECT+%27Running+test+stateless%2F00409_shard_limit_by.sql+from+pid%3D163343%27&database=system&connect_timeout=30&receive_timeout=30&send_timeout=30&http_connection_timeout=30&http_receive_timeout=30&http_send_timeout=30&output_format_parallel_formatting=0
2024.07.08 12:24:31.739258 [ 181084 ] {35f44159-188f-4ad9-93e0-96451c3a5613} <Debug> executeQuery: (from [::ffff:127.0.0.1]:47516) SELECT 'Running test stateless/00409_shard_limit_by.sql from pid=163343'  (stage: Complete)
2024.07.08 12:24:31.741038 [ 181253 ] {} <Trace> DynamicQueryHandler: Request URI: /?query=CREATE+DATABASE+IF+NOT+EXISTS+test_lib8t25p&database=system&connect_timeout=30&receive_timeout=30&send_timeout=30&http_connection_timeout=30&http_receive_timeout=30&http_send_timeout=30&output_format_parallel_formatting=0&log_comment=00409_shard_limit_by.sql
2024.07.08 12:24:31.741597 [ 181253 ] {fd253261-d46e-4410-8ea2-37b8d264a13e} <Debug> executeQuery: (from [::ffff:127.0.0.1]:47534) (comment: 00409_shard_limit_by.sql) CREATE DATABASE IF NOT EXISTS test_lib8t25p  (stage: Complete)
2024.07.08 12:24:31.767243 [ 181259 ] {49a5d304-80ee-46a7-a20c-9bea980d7f0f} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) DROP TABLE IF EXISTS limit_by; (stage: Complete)
2024.07.08 12:24:31.768016 [ 181259 ] {5da3775d-9c12-441d-8b10-da8f5efca4ea} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) CREATE TABLE limit_by (Num UInt32, Name String) ENGINE = Memory; (stage: Complete)
2024.07.08 12:24:31.769035 [ 181259 ] {51f7cc0f-c078-4026-ae93-c8103726d761} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.784401 [ 181259 ] {70c4c145-8bc8-493a-9cff-faefaef2ac1c} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.785943 [ 181259 ] {90201a54-e9f6-487d-8750-13b1fe00a363} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.787363 [ 181259 ] {611e7e1d-c922-44e7-bda3-fc0f0378af84} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.788732 [ 181259 ] {a2e57ac2-b931-4563-9067-a09d56e6cf2f} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.789982 [ 181259 ] {cc711016-2248-45dd-9c36-82195bede390} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.791141 [ 181259 ] {936ae507-856b-4c87-aa31-9046997f3b10} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.792282 [ 181259 ] {6bb79bf0-b433-4070-8b70-0bd93b1d628c} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.793650 [ 181259 ] {e15a224b-e8de-47cf-9c6c-d9e6bd9cd012} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.795200 [ 181259 ] {5320f407-3388-4936-ad39-4932846e5e1c} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.796597 [ 181259 ] {f18b4c45-d099-47da-afd5-31a8008dfcb7} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.798063 [ 181259 ] {e6ee239b-f82f-4365-93db-61a1088ad599} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) INSERT INTO limit_by (Num, Name) VALUES  (stage: Complete)
2024.07.08 12:24:31.799424 [ 181259 ] {2e6f179b-8a44-4eff-acc4-82d117683529} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) -- Two elemens in each group
2024.07.08 12:24:31.802982 [ 181259 ] {5d765fb4-8215-4d80-b6aa-b86a9df699ac} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) -- LIMIT BY doesn't affect result of GROUP BY
2024.07.08 12:24:31.806980 [ 181259 ] {a5f9242b-c969-42ee-85cb-0d3c8fe2d4fc} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) -- LIMIT BY can be combined with LIMIT
2024.07.08 12:24:31.810412 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Debug> executeQuery: (from [::ffff:127.0.0.1]:55606) (comment: 00409_shard_limit_by.sql) -- Distributed LIMIT BY
2024.07.08 12:24:31.810730 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Connection (127.0.0.2:9000): Connecting. Database: (not specified). User: default
2024.07.08 12:24:31.811716 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Connection (127.0.0.2:9000): Connected to ClickHouse server version 24.7.1.
2024.07.08 12:24:31.812291 [ 181401 ] {c37f69ea-10db-4569-9b3e-452e5aba91fe} <Debug> executeQuery: (from [::ffff:127.0.0.1]:57954, initial_query_id: 61c71f3d-94a5-4010-b665-5950ec5129a3) (comment: 00409_shard_limit_by.sql) DESC TABLE system.one (stage: Complete)
2024.07.08 12:24:31.813337 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage Complete
2024.07.08 12:24:31.813395 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage FetchColumns to stage Complete
2024.07.08 12:24:31.813539 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage Complete
2024.07.08 12:24:31.813632 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage WithMergeableState only analyze
2024.07.08 12:24:31.813671 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState only analyze
2024.07.08 12:24:31.813760 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage WithMergeableState only analyze
2024.07.08 12:24:31.813796 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState only analyze
2024.07.08 12:24:31.813881 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage WithMergeableState only analyze
2024.07.08 12:24:31.813916 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState only analyze
2024.07.08 12:24:31.813975 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query to stage WithMergeableState only analyze
2024.07.08 12:24:31.814006 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState only analyze
2024.07.08 12:24:31.814043 [ 181259 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Planner: Query from stage WithMergeableState to stage Complete
2024.07.08 12:24:31.814620 [ 172696 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Connection (127.0.0.3:9000): Connecting. Database: (not specified). User: default
2024.07.08 12:24:31.815021 [ 172378 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Debug> Connection (127.0.0.2:9000): Sent data for 2 scalars, total 2 rows in 4.296e-05 sec., 46040 rows/sec., 68.00 B (1.49 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (3.24 MiB/sec.)
2024.07.08 12:24:31.815207 [ 172696 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Trace> Connection (127.0.0.3:9000): Connected to ClickHouse server version 24.7.1.
2024.07.08 12:24:31.815542 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Debug> executeQuery: (from [::ffff:127.0.0.1]:57954, initial_query_id: 61c71f3d-94a5-4010-b665-5950ec5129a3) (comment: 00409_shard_limit_by.sql) SELECT `__table1`.`dummy` AS `dummy` FROM `system`.`one` AS `__table1` LIMIT _CAST(1, 'UInt64') BY `__table1`.`dummy` (stage: WithMergeableState)
2024.07.08 12:24:31.815549 [ 172696 ] {61c71f3d-94a5-4010-b665-5950ec5129a3} <Debug> Connection (127.0.0.3:9000): Sent data for 2 scalars, total 2 rows in 5.044e-05 sec., 39238 rows/sec., 68.00 B (1.27 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (2.76 MiB/sec.)
2024.07.08 12:24:31.815861 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Debug> executeQuery: (from [::ffff:127.0.0.1]:49382, initial_query_id: 61c71f3d-94a5-4010-b665-5950ec5129a3) (comment: 00409_shard_limit_by.sql) SELECT `__table1`.`dummy` AS `dummy` FROM `system`.`one` AS `__table1` LIMIT _CAST(1, 'UInt64') BY `__table1`.`dummy` (stage: WithMergeableState)
2024.07.08 12:24:31.815868 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Trace> Planner: Query to stage WithMergeableState
2024.07.08 12:24:31.815993 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState
2024.07.08 12:24:31.816035 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Trace> Planner: Query to stage WithMergeableState
2024.07.08 12:24:31.816086 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Trace> Planner: Query from stage FetchColumns to stage WithMergeableState
2024.07.08 12:24:31.816631 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Debug> executeQuery: Read 1 rows, 1.00 B in 0.001189 sec., 841.0428931875526 rows/sec., 841.04 B/sec.
2024.07.08 12:24:31.816680 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Debug> executeQuery: Read 1 rows, 1.00 B in 0.000881 sec., 1135.0737797956867 rows/sec., 1.11 KiB/sec.
2024.07.08 12:24:31.816684 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Debug> MemoryTracker: Peak memory usage (for query): 1.06 MiB.
2024.07.08 12:24:31.816722 [ 181401 ] {77bd16d7-d37d-4693-8ec1-533d3ec75b4d} <Debug> TCPHandler: Processed in 0.00169519 sec.
2024.07.08 12:24:31.816728 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Debug> MemoryTracker: Peak memory usage (for query): 1.06 MiB.
2024.07.08 12:24:31.816745 [ 181045 ] {c9f30e37-30cd-4938-8198-e816f82d82e8} <Debug> TCPHandler: Processed in 0.00118999 sec.
2024.07.08 12:34:31.854203 [ 181431 ] {} <Trace> DynamicQueryHandler: Request URI: /?query=DROP+DATABASE+IF+EXISTS+test_lib8t25p&database=system&connect_timeout=20&receive_timeout=20&send_timeout=20&http_connection_timeout=20&http_receive_timeout=20&http_send_timeout=20&output_format_parallel_formatting=0&log_comment=00409_shard_limit_by.sql
2024.07.08 12:34:31.854447 [ 181431 ] {c5fe534b-3fa4-4421-b684-0ca33aef7cde} <Debug> executeQuery: (from [::ffff:127.0.0.1]:48176) (comment: 00409_shard_limit_by.sql) DROP DATABASE IF EXISTS test_lib8t25p  (stage: Complete)
2024-07-08 12:40:35 Found hung queries in processlist:
2024-07-08 12:41:45 Row 1:
2024-07-08 12:41:45 ──────
2024-07-08 12:41:45 is_initial_query:     1
2024-07-08 12:41:45 user:                 default
2024-07-08 12:41:45 query_id:             61c71f3d-94a5-4010-b665-5950ec5129a3
2024-07-08 12:41:45 address:              ::ffff:127.0.0.1
2024-07-08 12:41:45 port:                 55606
2024-07-08 12:41:45 initial_user:         default
2024-07-08 12:41:45 initial_query_id:     61c71f3d-94a5-4010-b665-5950ec5129a3
2024-07-08 12:41:45 initial_address:      ::ffff:127.0.0.1
2024-07-08 12:41:45 initial_port:         55606
2024-07-08 12:41:45 interface:            1
2024-07-08 12:41:45 os_user:              
2024-07-08 12:41:45 client_hostname:      ip-172-31-81-182.ec2.internal
2024-07-08 12:41:45 client_name:          ClickHouse client
2024-07-08 12:41:45 client_revision:      54468
2024-07-08 12:41:45 client_version_major: 24
2024-07-08 12:41:45 client_version_minor: 7
2024-07-08 12:41:45 client_version_patch: 1
2024-07-08 12:41:45 http_method:          0
2024-07-08 12:41:45 http_user_agent:      
2024-07-08 12:41:45 http_referer:         
2024-07-08 12:41:45 forwarded_for:        
2024-07-08 12:41:45 quota_key:            
2024-07-08 12:41:45 distributed_depth:    0
2024-07-08 12:41:45 elapsed:              963.087286
2024-07-08 12:41:45 is_cancelled:         0
2024-07-08 12:41:45 is_all_data_sent:     0
2024-07-08 12:41:45 read_rows:            1 # so it read only 1 row, hm
2024-07-08 12:41:45 read_bytes:           1
2024-07-08 12:41:45 total_rows_approx:    0
2024-07-08 12:41:45 written_rows:         0
2024-07-08 12:41:45 written_bytes:        0
2024-07-08 12:41:45 memory_usage:         7766998
2024-07-08 12:41:45 peak_memory_usage:    7767062
2024-07-08 12:41:45 query:                -- Distributed LIMIT BY
2024-07-08 12:41:45 SELECT dummy FROM remote('127.0.0.{2,3}', system.one) LIMIT 1 BY dummy;
2024-07-08 12:41:45 query_kind:           Select
2024-07-08 12:41:45 thread_ids:           [172378,172696,172656,181259]
2024-07-08 12:41:45 ProfileEvents:        {'Query':1,'SelectQuery':1,'InitialQuery':1,'QueriesWithSubqueries':6,'SelectQueriesWithSubqueries':6,'ReadCompressedBytes':561,'CompressedReadBufferBlocks':8,'CompressedReadBufferBytes':463,'IOBufferAllocs':10,'IOBufferAllocBytes':7341085,'TableFunctionExecute':1,'NetworkReceiveElapsedMicroseconds':16615,'NetworkSendElapsedMicroseconds':274781,'NetworkReceiveBytes':4657,'NetworkSendBytes':3298637,'DistributedConnectionTries':3,'DistributedConnectionUsable':3,'SuspendSendingQueryToShard':3,'SelectedRows':1,'SelectedBytes':1,'ContextLock':28,'RWLockAcquiredReadLocks':2,'RealTimeMicroseconds':600037936,'UserTimeMicroseconds':5437,'SystemTimeMicroseconds':17078,'SoftPageFaults':21,'OSCPUWaitMicroseconds':10,'OSCPUVirtualTimeMicroseconds':22514,'OSReadChars':448,'OSWriteChars':57877,'QueryProfilerRuns':3490,'LogTrace':16,'LogDebug':3,'GWPAsanAllocateSuccess':75,'GWPAsanAllocateFailed':1,'GWPAsanFree':75}
2024-07-08 12:41:45 Settings:             {'connect_timeout_with_failover_ms':'2000','connect_timeout_with_failover_secure_ms':'3000','idle_connection_timeout':'36000','s3_check_objects_after_upload':'1','stream_like_engine_allow_direct_select':'1','replication_wait_for_inactive_replica_timeout':'30','allow_nonconst_timezone_arguments':'1','log_queries':'1','insert_quorum_timeout':'60000','fsync_metadata':'0','http_send_timeout':'60','http_receive_timeout':'60','opentelemetry_start_trace_probability':'0.1','max_untracked_memory':'1048576','memory_profiler_step':'1048576','log_comment':'00409_shard_limit_by.sql','send_logs_level':'warning','allow_introspection_functions':'1','database_atomic_wait_for_drop_and_detach_synchronously':'1','distributed_ddl_entry_format_version':'6','async_insert_busy_timeout_max_ms':'5000','load_marks_asynchronously':'1','allow_prefetched_read_pool_for_remote_filesystem':'0','allow_prefetched_read_pool_for_local_filesystem':'0','filesystem_prefetch_max_memory_usage':'1073741824','insert_keeper_max_retries':'100','insert_keeper_retry_initial_backoff_ms':'1','insert_keeper_retry_max_backoff_ms':'1','insert_keeper_fault_injection_probability':'0.01'}
2024-07-08 12:41:45 current_database:     test_lib8t25p

But, there is no stacktraces for threads, due to musl build

CI: https://s3.amazonaws.com/clickhouse-test-reports/65920/8fbd1fb314a9e9ab94d07c262d698fa40b88dc8d/fast_test.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    comp-distributedDistributed table engine & query routing across shards (sharding/load balancing).flaky testflaky test found by CItestingSpecial issue with list of bugs found by CI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions