Fix exception in Parquet PREWHERE when column is not in file#98360
Merged
alexey-milovidov merged 2 commits intomasterfrom Mar 1, 2026
Merged
Fix exception in Parquet PREWHERE when column is not in file#98360alexey-milovidov merged 2 commits intomasterfrom
alexey-milovidov merged 2 commits intomasterfrom
Conversation
The `add_prewhere_outputs` function was adding all prewhere DAG output columns matching `sample_block` to the SchemaConverter's `external_columns`. This included INPUT pass-through columns (columns that are just forwarded through the prewhere expression to downstream consumers). When a pass-through column was not present in the actual Parquet file, marking it as "external" prevented the SchemaConverter from creating a missing column entry. This left `sample_block_to_output_columns_idx` as nullopt for that column, causing `preparePrewhere` to throw: "PREWHERE appears to use its own output as input". Fix: only add non-INPUT nodes to `external_columns`. INPUT nodes are pass-through columns that should be read from the file (or created as missing columns), not treated as prewhere-computed outputs. Also improved the error message to include the column name for easier debugging. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98288&sha=ec1e7cf53779336b87596a8dbd32ba592c80a529&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29 Co-Authored-By: Claude Opus 4.6 <[email protected]>
Contributor
alexey-milovidov
commented
Mar 1, 2026
| @@ -0,0 +1,20 @@ | |||
| -- Tags: no-fasttest | |||
|
|
|||
| -- Regression test: PREWHERE on Parquet file with a column declared in the schema | |||
Member
Author
There was a problem hiding this comment.
Does not reproduce.
The original test only selected `number` (not `extra`), so the missing column was removed from `format_header` by `applyPrewhereActions` and never appeared as a pass-through in the prewhere DAG outputs. Now the test selects both `number` and `extra`, keeping `extra` in the format header where `add_prewhere_outputs` would incorrectly add it to `external_columns`. Co-Authored-By: Claude Opus 4.6 <[email protected]>
alexey-milovidov
commented
Mar 1, 2026
| @@ -0,0 +1,25 @@ | |||
| -- Tags: no-fasttest | |||
Member
Author
There was a problem hiding this comment.
Now it reproduces.
mkmkme
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Mar 26, 2026
…ere-external-columns Fix exception in Parquet PREWHERE when column is not in file
27 tasks
Contributor
|
Hi @alexey-milovidov — the changelog category for this PR might need a look. Current: Why: This fixes a LOGICAL_ERROR crash when querying Parquet files with PREWHERE on missing columns using the V3 reader. It is a genuine bug fix, not a CI improvement. The crash affected user-facing queries with input_format_parquet_use_native_reader_v3=1 and input_format_parquet_allow_missing_columns=1. Could you verify this is correct? Ignore if the current category is intentional. |
mkmkme
pushed a commit
to Altinity/ClickHouse
that referenced
this pull request
Mar 31, 2026
…ere-external-columns Fix exception in Parquet PREWHERE when column is not in file
zvonand
added a commit
to Altinity/ClickHouse
that referenced
this pull request
Mar 31, 2026
Antalya 26.1 Backport of ClickHouse#95476, ClickHouse#98360, ClickHouse#100361 - enable prewhere for iceberg (and fixes)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LOGICAL_ERRORexceptions in the Parquet V3 reader when PREWHERE references a column declared in the schema but absent from the actual Parquet fileadd_prewhere_outputslambda incorrectly added pass-throughINPUTnodes toexternal_columns, which causedSchemaConverterto mark those columns as "found" without creating an output column entry. This leftsample_block_to_output_columns_idxasnullopt, triggering either "KeyCondition uses PREWHERE output" or "PREWHERE appears to use its own output as input" depending on the code pathINPUTnodes inadd_prewhere_outputs— only computed (non-INPUT) columns should be marked as externalCI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98288&sha=ec1e7cf53779336b87596a8dbd32ba592c80a529&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix
LOGICAL_ERRORexception in the Parquet V3 reader when PREWHERE references a column not present in the Parquet file.🤖 Generated with Claude Code