Skip to content

Stateless test 00653_verification_monotonic_data_load is flaky #68626

@shiyer7474

Description

@shiyer7474

In my PR #68323

2024-08-19 18:32:27 00653_verification_monotonic_data_load: [ FAIL ] 600.05 sec.

The test times out in ClickHouse Stateless tests (tsan, s3 storage)

The test is a shell script where the clickhouse client commands output is piped to a grep. Below is the server debug log for a command which was stuck for 4 minutes and then the test was killed due to hitting 600 seconds.

2024.08.19 18:28:11.330041 [ 10795 ] {} <Trace> TCPHandlerFactory: TCP Request. Address: [::1]:47212
2024.08.19 18:28:11.333131 [ 10795 ] {} <Debug> TCPHandler: Connected ClickHouse client version 24.9.0, revision: 54470, database: test_ltxmr5rm, user: default.
2024.08.19 18:28:11.333490 [ 10795 ] {} <Debug> TCP-Session: c36d5f04-e548-48ff-998d-f742b41219f6 Authenticating user 'default' from [::1]:47212
2024.08.19 18:28:11.333813 [ 10795 ] {} <Debug> TCP-Session: c36d5f04-e548-48ff-998d-f742b41219f6 Authenticated with global context as user 94309d50-4f52-5250-31bd-74fecac179db
2024.08.19 18:28:11.333966 [ 10795 ] {} <Debug> TCP-Session: c36d5f04-e548-48ff-998d-f742b41219f6 Creating session context with user_id: 94309d50-4f52-5250-31bd-74fecac179db
2024.08.19 18:28:11.405175 [ 10795 ] {} <Trace> TCP-Session: c36d5f04-e548-48ff-998d-f742b41219f6 Creating query context from session context, user_id: 94309d50-4f52-5250-31bd-74fecac179db, parent context user: default
2024.08.19 18:28:11.412546 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> executeQuery: (from [::1]:47212) (comment: 00653_verification_monotonic_data_load.sh) SELECT count() FROM signed_integer_test_table WHERE toUInt32(val) == 0; (stage: Complete)
2024.08.19 18:28:11.418827 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> Planner: Query to stage Complete
2024.08.19 18:28:11.420152 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> Planner: Query from stage FetchColumns to stage Complete
2024.08.19 18:28:11.423451 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> QueryPlanOptimizePrewhere: The min valid primary key position for moving to the tail of PREWHERE is 0
2024.08.19 18:28:11.424281 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Key condition: (toUInt32(column 0) in [0, 0])
2024.08.19 18:28:11.424598 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Filtering marks by primary and secondary keys
2024.08.19 18:28:11.425041 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Used generic exclusion search over index for part all_1_1_0 with 8 steps
2024.08.19 18:28:11.425290 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 2/7 marks by primary key, 2 marks to read from 1 ranges
2024.08.19 18:28:11.425603 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Trace> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Spreading mark ranges among streams (default reading)
2024.08.19 18:28:11.426085 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Reading approx. 2 rows with 1 streams
2024.08.19 18:28:11.448629 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> executeQuery: Read 2 rows, 8.00 B in 0.039218 sec., 50.99699117752052 rows/sec., 203.99 B/sec.
2024.08.19 18:28:11.450063 [ 10795 ] {9c87b100-7c2a-404c-8cba-878a6e53e849} <Debug> TCPHandler: Processed in 0.047461623 sec.
2024.08.19 18:32:27.679498 [ 10795 ] {} <Debug> TCPHandler: Done processing connection.

At 18:32:27, the test was killed and hence the "Done processing connection" message is seen. Notice the 4 minute gap. The previous command in the test passed successfully and it's debug output looks like :-

2024.08.19 18:28:10.386297 [ 10816 ] {5a70522e-5958-4ef3-9eb7-fcc7061c3193} <Debug> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 2/7 marks by primary key, 2 marks to read from 1 ranges
2024.08.19 18:28:10.386602 [ 10816 ] {5a70522e-5958-4ef3-9eb7-fcc7061c3193} <Trace> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Spreading mark ranges among streams (default reading)
2024.08.19 18:28:10.386977 [ 10816 ] {5a70522e-5958-4ef3-9eb7-fcc7061c3193} <Debug> test_ltxmr5rm.signed_integer_test_table (66f2add6-d9c5-4eef-adf0-773f6c90e542) (SelectExecutor): Reading approx. 2 rows with 1 streams
2024.08.19 18:28:10.403503 [ 10816 ] {5a70522e-5958-4ef3-9eb7-fcc7061c3193} <Debug> executeQuery: Read 2 rows, 8.00 B in 0.02959 sec., 67.59040216289287 rows/sec., 270.36 B/sec.
2024.08.19 18:28:10.405171 [ 10816 ] {5a70522e-5958-4ef3-9eb7-fcc7061c3193} <Debug> TCPHandler: Processed in 0.036433273 sec.
2024.08.19 18:28:10.607434 [ 10816 ] {} <Debug> TCPHandler: Done processing connection.
2024.08.19 18:28:10.609280 [ 10816 ] {} <Debug> TCP-Session: ba83d9ed-0a90-4583-94c4-5e30def3029b Logout, user_id: 94309d50-4f52-5250-31bd-74fecac179db

The test's command looks like this :

${CLICKHOUSE_CLIENT} --query="SELECT count() FROM signed_integer_test_table WHERE toUInt32(val) == 0;" 2>&1 |grep -q "2 marks to read from" && echo "monotonic int case: Int32 -> UInt32"

Maybe the last bytes of output in the TCP socket get stuck somewhere or something to do with socket close etc.

Metadata

Metadata

Assignees

Labels

flaky testflaky test found by CItestingSpecial issue with list of bugs found by CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions