Skip to content

Enable server-side AST fuzzer in stress test and BuzzHouse#98138

Merged
alexey-milovidov merged 47 commits intomasterfrom
enable-ast-fuzzer-stress-buzzhouse
Mar 22, 2026
Merged

Enable server-side AST fuzzer in stress test and BuzzHouse#98138
alexey-milovidov merged 47 commits intomasterfrom
enable-ast-fuzzer-stress-buzzhouse

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 26, 2026

Summary

  • Enable ast_fuzzer_runs (set to 5) in the stress test, with ast_fuzzer_any_query so all queries including DDL/INSERT are fuzzed
  • Enable ast_fuzzer_runs (set to 5) in BuzzHouse, without ast_fuzzer_any_query so only read-only queries are fuzzed

Changelog category (leave one):

  • Build/Testing/Packaging Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Enable server-side AST fuzzer in stress test and BuzzHouse CI environments.

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk: changes touch query execution context handling and MergeTree read-path state (min/max_part_offset), which could affect stability or performance beyond CI if triggered.

Overview
Enables the server-side AST fuzzer by default in CI environments: sets ast_fuzzer_runs=5 in BuzzHouse user tweaks and in stress-test user profiles (with ast_fuzzer_any_query=true for stress tests), while explicitly disabling it for the stress-test smoke check to avoid expected stderr noise.

Hardens fuzzed-query execution in executeQuery.cpp by resetting input callbacks, disabling parallel replica reading, applying per-fuzzed-query resource limits, and clearing transactions before context teardown to avoid termination during unwinding; also adds a system.warnings entry when ast_fuzzer_runs>0.

Fixes optimizer utilities to safely handle missing filter outputs (return UNKNOWN instead of dereferencing null), and fixes MergeTree patch-part reads by always computing min_part_offset/max_part_offset from granule offsets even when _part_offset isn’t requested; adds a regression test for the patch-parts bug.

Written by Cursor Bugbot for commit b1aa247. This will update automatically on new commits. Configure here.

Enable the `ast_fuzzer_runs` setting (set to 5) in two CI environments:

- Stress test: with `ast_fuzzer_any_query` enabled, so all queries
  (including DDL/INSERT) get fuzzed.
- BuzzHouse: without `ast_fuzzer_any_query`, so only read-only queries
  are fuzzed.

Both environments add `<readonly/>` constraints for `ast_fuzzer_runs`
and `ast_fuzzer_any_query` to prevent tests from accidentally changing
these settings.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 26, 2026

Workflow [PR], commit [d4373be]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-build Pull request with build/testing/packaging improvement label Feb 26, 2026
alexey-milovidov and others added 3 commits February 26, 2026 23:29
…query`

The `<readonly/>` constraints conflict with `compatibility` setting
in stress tests. When `compatibility='22.9'` is set, it tries to
reset `ast_fuzzer_runs` to its default (0), which the readonly
constraint rejects with SETTING_CONSTRAINT_VIOLATION.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When the original query context has parallel replicas enabled,
fuzzed queries inherit these settings and can trigger
"Initiator received more initial requests than there are replicas"
exception in `ParallelReplicasReadingCoordinator`. This is not a
real bug but a consequence of running fuzzed queries in the same
server context. Disable parallel replicas for fuzzed queries by
setting `max_parallel_replicas` to 1.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…nce in filterResultForNotMatchedRows

The BuzzHouse test failed with a `chassert` in `Context::setCurrentTransaction`
because the fuzz context inherits `implicit_transaction = true` but is not
registered as a session or query context.

The AST fuzzer (arm_asan) crashed with a segfault in
`ActionsDAG::evaluatePartialResult` because `tryFindInOutputs` returned nullptr
for fuzzed queries where the filter column name was not found in the DAG outputs.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@tiandiwonder tiandiwonder self-assigned this Feb 26, 2026
alexey-milovidov and others added 3 commits February 27, 2026 00:05
…ing features

Instead of individually disabling `max_parallel_replicas` and
`implicit_transaction` for fuzzed queries, make the fuzz context a proper
independent context by calling `makeQueryContext` and `makeSessionContext`.

This ensures that any state stored via `getQueryContext` or
`getSessionContext` (parallel replicas coordinator callbacks, transactions,
etc.) is isolated to the fuzz context, rather than leaking into the original
query's context.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When `ast_fuzzer_any_query` is enabled, the fuzzer can produce INSERT
queries that create a pushing pipeline (expecting input data). The code
only handled the pulling case, falling through to
`CompletedPipelineExecutor` which throws a fatal exception
"Pipeline for CompletedPipelineExecutor must be completed" and
terminates the server.

Handle all three pipeline states: skip pushing (no data to provide),
complete pulling with `NullOutputFormat`, and execute completed directly.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98138&sha=ce2d2fd4afda542d07976faa413573f209ba27e8&name_0=PR&name_1=Stress%20test%20%28amd_release%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The smoke check runs `00001_select_1` and
`00234_disjunctive_equality_chains_optimization` before the actual
stress tests to validate that randomized settings don't break basic
functionality. The server-side AST fuzzer produces expected errors
in stderr from fuzzed queries, which the test runner interprets as
test failures.

Pass `ast_fuzzer_runs=0` as a client option for the smoke check
only, so the actual stress test still runs with the fuzzer enabled.

#98138

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

It's likely we should also limit resources for each fuzzed query (max_execution_time, max_memory_usage, etc).

alexey-milovidov and others added 13 commits February 27, 2026 23:40
The AST fuzzer could generate INSERT queries (pushing pipelines) and
attempt to execute them without providing input data. This caused
`SetOrJoinSink` to be destroyed with its write buffers neither finalized
nor cancelled, triggering a "WriteBuffer is neither finalized nor
canceled in destructor" exception.

Two fixes:
- In `executeASTFuzzerQueries`, cancel pushing pipelines instead of
  trying to execute them without data.
- In `SetOrJoinSink`, lazily initialize backup write buffers on first
  `consume` call, so no buffers exist when the sink is never used.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98138&sha=f6eac0762f9799ea48b8e800ef4d22b08c29b699&name_0=PR&name_1=Stress%20test%20%28arm_asan%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The AST fuzzer created a context that was both its own session and query
context. When the fuzzed query triggered an implicit transaction,
`InterpreterTransactionControlQuery::executeBegin` set the transaction
on the session context via `initCurrentTransaction`, then tried to set
it again on the query context via `setCurrentTransaction` — but they
were the same object, hitting the assertion
`!merge_tree_transaction || !txn`.

Fix by creating a separate session context object so that query and
session contexts are distinct, matching the normal execution flow.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=7a972da94a98935d46b6f2d6c98ef79117e082c4&name_0=MasterCI&name_1=Stress%20test%20%28amd_debug%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
`makeQueryContext`/`makeSessionContext` only change the
`query_context`/`session_context` pointers — they do not clear
direct Context members copied by `Context::createCopy`:
`merge_tree_all_ranges_callback`, `merge_tree_read_task_callback`,
and `client_info.collaborate_with_initiator`.

When the initiator sends a query to replicas with
`ast_fuzzer_runs > 0`, each replica runs the fuzzer after the
original query completes. The fuzz context inherits the TCPHandler
callbacks and the follower flag, so the fuzzed query acts as a
parallel replicas follower and sends a second
`InitialAllRangesAnnouncement` through the inherited callback to
the already-finished coordinator — causing "Duplicate announcement
received" or "Initiator received more initial requests than there
are replicas".

Disable `allow_experimental_parallel_reading_from_replicas` in the
fuzz context so that `canUseTaskBasedParallelReplicas` returns
false, preventing both initiator and follower parallel replicas
behavior regardless of inherited callbacks.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…ring stack unwinding

Move fuzz context creation outside the try block so that contexts are not
destroyed during stack unwinding. When a fuzzed query throws an exception,
`MergeTreeTransactionHolder` destructor calls `rollbackTransaction` (noexcept),
which calls `getCurrentExceptionCode` using bare `throw;`. This only works
inside a catch handler, not during stack unwinding, causing `std::terminate`.

By moving contexts outside try, they are destroyed after the catch block,
and we explicitly reset transactions in the catch block to prevent the
destructor from attempting a rollback.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Skip `logExceptionBeforeStart` for internal queries in `executeQueryImpl`
to avoid polluting server error logs. Internal query errors are always
propagated back to the caller which handles them appropriately.

In the AST fuzzer catch blocks, log only the exception code instead of
the full exception message. This prevents fuzzed query failures (like
"Logical error: Input initializer is not set" from `input()` usage)
from triggering the stress test log scanner that checks for patterns
like "Logical error" in server logs.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
… exceptions

Add a warning to `system.warnings` when `ast_fuzzer_runs` is enabled,
similar to the existing `THREAD_FUZZER_IS_ENABLED` warning.

Change log level of AST fuzzer exception messages from LOG_ERROR/LOG_TRACE
to LOG_DEBUG.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The `Exception` destructor automatically logs all `LOGICAL_ERROR`
exceptions at ERROR level via `ForcedCriticalErrorsLogger` unless
`markAsLogged` has been called. This caused "Logical error: Input
initializer is not set" to appear in server logs despite all other
logging suppression, triggering stress test failure detection.

Use the same `try { throw; } catch (Exception & e) { e.markAsLogged(); }`
pattern as `tryLogCurrentExceptionImpl` in all three fuzzer catch blocks.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The AST fuzzer creates a copy of the original query context, which
inherits the `input_blocks_reader` callback from the TCP connection.
When the fuzzer generates queries using `input()`, `StorageInput::read`
sees the callback and assumes it's a legitimate TCP INSERT with input
data, then tries to call `initializeInput` which triggers a
LOGICAL_ERROR ("Input initializer is not set") or tries to read from the
same input pipe twice ("Trying to read from input() twice").

Reset both input callbacks on the fuzzed context so that `input()` in
fuzzed queries doesn't attempt to use callbacks from the original query.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Revert `if (!internal)` guards on `logExceptionBeforeStart` - exceptions
  should still be logged even for internal queries
- Remove `markAsLogged` calls that were suppressing `LOGICAL_ERROR` exceptions
  in AST fuzzer catch blocks - these exceptions should surface normally
- Restore original `tryLogCurrentException` / `LOG_TRACE` logging for fuzzed
  query failures
- Add null check for `tryFindInOutputs` in `filterResultForMatchedRows`
  (same fix as was done for `filterResultForNotMatchedRows`)

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…tress-buzzhouse

# Conflicts:
#	src/Storages/MergeTree/MergeTreeRangeReader.cpp
alexey-milovidov and others added 9 commits March 14, 2026 09:20
The style checker flags `catch (...)` blocks that don't log at
`ERROR`/`WARNING`/`FATAL` level. These two blocks intentionally skip
invalid fuzzed ASTs (too deep or unformattable), so add `// Ok`
suppression comments instead of bumping log severity.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…House/ClickHouse into enable-ast-fuzzer-stress-buzzhouse
The AST fuzzer creates a copy of the original query context for each
fuzzed query via `Context::createCopy`. This copies the
`table_function_results` cache, which contains `StorageInput` objects
from the original query with `was_pipe_used = true`. When a fuzzed
query happens to use `input()` with the same AST structure, it finds
the stale cached `StorageInput` and hits the "Trying to read from
input() twice" logical error in `ReadFromInput::initializePipeline`.

Fix by clearing `table_function_results` on the fuzz context after
creation, so stale table function storage objects are not reused.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98138&sha=48fbefe2e46da9d88ad55ab43fd40d6380d7c1c5&name_0=PR

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The hung check queries `system.processes` and runs `00001_select_1` to detect
deadlocks. With `ast_fuzzer_runs=5` and `ast_fuzzer_any_query=true` active in
the server profile, these diagnostic queries were also being fuzzed, causing
`CANNOT_SCHEDULE_TASK` errors and timeouts that consumed the entire 600-second
budget.

Pass `ast_fuzzer_runs=0` as a client option to the hung check command, matching
what the smoke check and stress test runs already do.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…tress-buzzhouse

# Conflicts:
#	src/Interpreters/Context.cpp
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Needs #99881


try
/// Reset the transaction (if any), it is stored in session and local context (see InterpreterTransactionControlQuery::executeBegin())
context->getQueryContext()->getSessionContext()->setCurrentTransaction(NO_TRANSACTION_PTR);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting currentTransaction on the original query/session context here mutates user-visible state outside the fuzzer sandbox. If ast_fuzzer_runs > 0 is enabled in a session that has an active transaction, this can clear the session transaction even though the main query finished successfully, which changes transaction semantics.

Please avoid touching the original context and reset transaction only on fuzz_session_context / fuzz_context after they are created.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Depends on #100281

fuzzed_ast->checkDepth(500);
}
catch (...) // Ok: skip fuzzed ASTs that are too deeply nested
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path swallows formatting/depth exceptions silently and immediately continues. For fuzzing this can be expected, but full silence removes observability when the generator starts producing mostly invalid ASTs.

Can we log a lightweight LOG_TRACE (e.g. getCurrentExceptionMessage(false)) before skipping, so CI can distinguish "no interesting mutations" from systematic formatter/depth failures?

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 22, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 24.70% 24.40% -0.30%
Branches 76.40% 76.40% +0.00%

PR changed lines: PR changed-lines coverage: 89.81% (97/108, 0 noise lines excluded)
Diff coverage report
Uncovered code

@alexey-milovidov alexey-milovidov merged commit 21e34d9 into master Mar 22, 2026
298 of 300 checks passed
@alexey-milovidov alexey-milovidov deleted the enable-ast-fuzzer-stress-buzzhouse branch March 22, 2026 17:07
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-build Pull request with build/testing/packaging improvement pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants