Skip to content

Fix exception in KILL QUERY when columns are const#99881

Merged
alexey-milovidov merged 5 commits intomasterfrom
fix-kill-query-const-column
Mar 20, 2026
Merged

Fix exception in KILL QUERY when columns are const#99881
alexey-milovidov merged 5 commits intomasterfrom
fix-kill-query-const-column

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

The getSelectResult method in InterpreterKillQueryQuery executes a SELECT against system tables and returns a block. Callers use typeid_cast to cast columns to concrete types like ColumnString. However, the query pipeline can produce ColumnConst wrappers, causing the cast to fail with "Bad cast from type X to Y".

Fix by calling materializeBlockInplace on the returned block.

Found in stress test with AST fuzzer: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98138&sha=a349e29ce98d34354439478bec564adb7d5b66c9&name_0=PR&name_1=Stress%20test%20%28amd_msan%29

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix exception "Bad cast from type X to Y" in KILL QUERY when the internal query against system tables returns columns wrapped in ColumnConst.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

alexey-milovidov and others added 2 commits March 18, 2026 11:59
The `getSelectResult` method in `InterpreterKillQueryQuery` executes
a SELECT query against system tables and returns a block. The callers
use `typeid_cast` to cast columns to concrete types like `ColumnString`.
However, the query optimizer can produce `ColumnConst` wrappers, causing
the cast to fail with "Bad cast from type A to B".

Fix by materializing const columns in the returned block.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98138&sha=a349e29ce98d34354439478bec564adb7d5b66c9&name_0=PR&name_1=Stress%20test%20%28amd_msan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Test exercises the KILL QUERY, KILL MUTATION, and KILL TRANSACTION
code paths with TEST mode to verify that `typeid_cast` to concrete
column types does not fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 18, 2026

Workflow [PR], commit [b5bb249]

Summary:


AI Review

Summary

This PR fixes a real exception path in InterpreterKillQueryQuery::getSelectResult by materializing ColumnConst columns before downstream typeid_cast calls to concrete column types. The change is surgical, addresses the reported failure mode, and is covered by a new regression test for KILL QUERY and KILL MUTATION test paths.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
Safe rollout
Compilation time

Final Verdict

  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 18, 2026
The `system.transactions` table exists in CI builds, so the query
succeeds rather than failing with UNKNOWN_TABLE.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

A question - why were columns const?

@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Mar 18, 2026

The mechanism that produces `ColumnConst` in the query pipeline is:

  1. When the Planner encounters a constant expression in the query tree, it wraps the value in `ColumnConst` via `ConstantValue::wrapToColumnConst` (in `src/Analyzer/ConstantValue.h`).
  2. This `ColumnConst` is stored in an `ActionsDAG` COLUMN node via `ActionsDAG::addColumn`.
  3. When `ExpressionActions::execute` processes the COLUMN node, it calls `cloneResized(num_rows)` on the column, which for `ColumnConst` returns another `ColumnConst` — preserving the const wrapping.
  4. The output block thus contains `ColumnConst` columns.

The trigger in this case is the AST fuzzer (enabled in PR #98138). The fuzzer modifies the WHERE expression of `KILL QUERY`, and the modified expression gets embedded into the internal `SELECT query_id, user, query FROM system.processes WHERE <fuzzed_expr>`. The fuzzed expression can cause the query planner to produce a plan where one of the output columns becomes a constant (e.g., if the optimizer determines that a column has a known constant value after filtering, or if the fuzzed expression itself introduces constants in the projection path).

Unfortunately I don't have the exact fuzzed query that triggered this (logs were not downloadable from the CI report), but the fix is the standard ClickHouse approach — `materializeBlockInplace` converts all `ColumnConst` to their underlying concrete column types.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Mar 18, 2026

I investigated this extensively and I have to admit: I cannot determine the exact fuzzed expression that produced the const column.

What I verified does NOT produce const columns:

  • `WHERE 1` (constant true) — `dumpColumnStructure` shows regular `ColumnString`
  • `WHERE query_id = 'test'` (equality predicate) — still regular `ColumnString` in the plan (all ALIAS/INPUT nodes, no COLUMN nodes)
  • `WHERE query_id = 'test' SETTINGS optimize_substitute_columns = 1` — same, because the optimization requires table constraints
  • `WHERE user = currentUser()`, `WHERE user IN (SELECT 'default')` — all regular columns
  • Any standard WHERE expression on system.processes — the columns remain INPUT/ALIAS in the projection ActionsDAG

What I found about the fuzzer: For `ASTKillQueryQuery`, the fuzzer falls through to the generic handler at line 3699 which just calls `fuzz(ast->children)`, fuzzing the WHERE expression recursively. It can modify literals, swap function names, wrap expressions in `toLowCardinality()`/`toNullable()`/`materialize()`, wrap in scalar subqueries, etc. — but I could not construct a scenario where any of these transformations cause the output columns of `SELECT query_id, user, query FROM system.processes WHERE ` to become `ColumnConst`.

The mechanism for ColumnConst in pipelines is well understood: ActionsDAG COLUMN nodes produce `ColumnConst` through `ExpressionActions::execute` → `cloneResized()`. But I couldn't find a planner pass that would replace an INPUT column (`query_id`, `user`, `query`) with a COLUMN (const) node in the projection for this kind of query.

It's possible the actual bad cast was not to/from `ColumnConst` but some other column wrapper (e.g., `ColumnLowCardinality` or `ColumnNullable` if the fuzzed expression triggered a type change). Unfortunately the CI report abbreviates the type names as "A" and "B".

The `materializeBlockInplace` fix handles all these cases (`ColumnConst`, `ColumnSparse`, and other special representations), so it's correct regardless of the exact trigger.

The table `system.transactions` only exists when experimental
transactions are enabled, causing UNKNOWN_TABLE on most CI configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 19, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.80% 83.80% +0.00%
Functions 24.00% 24.00% +0.00%
Branches 76.40% 76.30% -0.10%

PR changed lines: PR changed-lines coverage: 100.00% (4/4, 1 noise lines excluded)
Diff coverage report
Uncovered code

@alexey-milovidov alexey-milovidov self-assigned this Mar 20, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Mar 20, 2026
Merged via the queue into master with commit bb309d1 Mar 20, 2026
163 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-kill-query-const-column branch March 20, 2026 08:07
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants