Skip to content

Fix flaky tests: curl + SYSTEM FLUSH LOGS race condition#96957

Merged
alexey-milovidov merged 4 commits intomasterfrom
fix-02423-flaky-insert-stats
Feb 15, 2026
Merged

Fix flaky tests: curl + SYSTEM FLUSH LOGS race condition#96957
alexey-milovidov merged 4 commits intomasterfrom
fix-02423-flaky-insert-stats

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 15, 2026

Fix flaky tests caused by a race between HTTP response being sent and the query_log entry being written. The code in executeQuery.cpp:2324 explicitly documents that query_finish_callback() (which sends the HTTP response) runs before onFinish() (which writes the QueryFinish log entry). Under TSan this race window is wide enough for the last entry to be missing.

Fix 17 affected tests by retrying SYSTEM FLUSH LOGS in a loop until all expected entries appear. Add a style check to various_checks.sh that prevents this anti-pattern from reappearing.

Tests fixed:
02423_insert_stats_behaviour, 01194_http_query_id, 01526_initial_query_id, 01526_max_untracked_memory, 01661_referer, 02156_async_insert_query_log, 02246_is_secure_query_log, 02456_async_inserts_logs, 02494_query_cache_http_introspection, 02570_fallback_from_async_insert, 02714_async_inserts_empty_data, 02725_async_insert_table_setting, 02841_parallel_replicas_summary, 02908_many_requests_to_system_replicas, 03237_insert_sparse_columns_mem, 03338_http_compression_profile_events, 03723_max_insert_block_size_bytes_http

Closes #84364

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

The test sends inserts via HTTP (curl) and then immediately queries
`system.query_log`. There is a race between the HTTP response being
sent back to curl and the `query_log` entry being written — under
TSan this race window is wide enough for the last RowBinary insert's
log entry to be missing when `SYSTEM FLUSH LOGS` runs.

Fix by retrying `SYSTEM FLUSH LOGS` in a loop until all 6 expected
insert entries appear in `query_log`.

#84364

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 15, 2026

Workflow [PR], commit [c18c609]

Summary:

alexey-milovidov and others added 3 commits February 15, 2026 05:15
Apply the same fix as for `02423_insert_stats_behaviour`: when tests
send requests via HTTP (curl) and then query `system.query_log` or
other system log tables, there is a race between the HTTP response
being sent and the log entry being written. Under TSan this race
window is wide enough to cause flaky failures.

Fix all affected tests by retrying `SYSTEM FLUSH LOGS` in a loop
until the expected entries appear.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Same fix for `01526_initial_query_id`, `02246_is_secure_query_log`,
and `02908_many_requests_to_system_replicas`.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add a check to `various_checks.sh` that detects shell tests using
curl/HTTP to send requests and then calling `SYSTEM FLUSH LOGS`
without a retry loop. This pattern is racy because the `query_log`
entry is written after the HTTP response is sent (see the comment
in `executeQuery.cpp:2324`), so the entry may not exist when
`FLUSH LOGS` runs.

The check finds .sh test files that use curl + system log tables +
FLUSH LOGS but don't have a retry loop (for/while + sleep/break),
and reports them as style violations.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexey-milovidov alexey-milovidov changed the title Fix flaky test 02423_insert_stats_behaviour Fix flaky tests: curl + SYSTEM FLUSH LOGS race condition Feb 15, 2026
@clickhouse-gh clickhouse-gh bot added the pr-ci label Feb 15, 2026
alexey-milovidov added a commit that referenced this pull request Feb 15, 2026
In `AsynchronousInsertQueue::processData`, the `finish_entries` lambda
was calling `entry->finish()` (which sets the promise and wakes up the
client waiting with `wait_for_async_insert=1`) before `logQueryFinish`
(which writes the `QueryFinish` entry to `query_log`).

This created a race window: the client could wake up, execute
`SYSTEM FLUSH LOGS`, and query `system.query_log` before the
`QueryFinish` entry was even added to the log buffer. Under TSan or
CPU contention, this window was wide enough to trigger reliably.

Fix by reordering: write log entries first, then notify the waiting
clients. This is consistent with the error path, where
`logQueryException` already runs before `finishWithException`.

Note: this is complementary to #96957, which addresses a similar race
in the HTTP/curl path at the test level. That PR does not cover this
case because test `02790` uses the native client, not curl, and the
race window is in `AsynchronousInsertQueue`, not in the HTTP response
path.

Closes #80813

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexey-milovidov alexey-milovidov self-assigned this Feb 15, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member Author

This is a great observation!

@alexey-milovidov alexey-milovidov merged commit b0fb8ad into master Feb 15, 2026
262 of 264 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-02423-flaky-insert-stats branch February 15, 2026 19:39
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

02423_insert_stats_behaviour is flaky

2 participants