Skip to content

Fix exception in Parquet PREWHERE when column is not in file#98360

Merged
alexey-milovidov merged 2 commits intomasterfrom
fix-parquet-prewhere-external-columns
Mar 1, 2026
Merged

Fix exception in Parquet PREWHERE when column is not in file#98360
alexey-milovidov merged 2 commits intomasterfrom
fix-parquet-prewhere-external-columns

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 28, 2026

Summary

  • Fix LOGICAL_ERROR exceptions in the Parquet V3 reader when PREWHERE references a column declared in the schema but absent from the actual Parquet file
  • The add_prewhere_outputs lambda incorrectly added pass-through INPUT nodes to external_columns, which caused SchemaConverter to mark those columns as "found" without creating an output column entry. This left sample_block_to_output_columns_idx as nullopt, triggering either "KeyCondition uses PREWHERE output" or "PREWHERE appears to use its own output as input" depending on the code path
  • Fix: skip INPUT nodes in add_prewhere_outputs — only computed (non-INPUT) columns should be marked as external

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98288&sha=ec1e7cf53779336b87596a8dbd32ba592c80a529&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

Changelog category (leave one):

  • CI Fix or Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix LOGICAL_ERROR exception in the Parquet V3 reader when PREWHERE references a column not present in the Parquet file.

🤖 Generated with Claude Code

The `add_prewhere_outputs` function was adding all prewhere DAG output
columns matching `sample_block` to the SchemaConverter's `external_columns`.
This included INPUT pass-through columns (columns that are just forwarded
through the prewhere expression to downstream consumers).

When a pass-through column was not present in the actual Parquet file,
marking it as "external" prevented the SchemaConverter from creating a
missing column entry. This left `sample_block_to_output_columns_idx`
as nullopt for that column, causing `preparePrewhere` to throw:
"PREWHERE appears to use its own output as input".

Fix: only add non-INPUT nodes to `external_columns`. INPUT nodes are
pass-through columns that should be read from the file (or created as
missing columns), not treated as prewhere-computed outputs.

Also improved the error message to include the column name for easier
debugging.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98288&sha=ec1e7cf53779336b87596a8dbd32ba592c80a529&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 28, 2026

Workflow [PR], commit [d97fe9b]

Summary:

@@ -0,0 +1,20 @@
-- Tags: no-fasttest

-- Regression test: PREWHERE on Parquet file with a column declared in the schema
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not reproduce.

The original test only selected `number` (not `extra`), so the missing
column was removed from `format_header` by `applyPrewhereActions` and
never appeared as a pass-through in the prewhere DAG outputs. Now the
test selects both `number` and `extra`, keeping `extra` in the format
header where `add_prewhere_outputs` would incorrectly add it to
`external_columns`.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@@ -0,0 +1,25 @@
-- Tags: no-fasttest
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it reproduces.

Copy link
Copy Markdown
Member Author

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexey-milovidov alexey-milovidov self-assigned this Mar 1, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Mar 1, 2026
Merged via the queue into master with commit 9f49514 Mar 1, 2026
148 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-parquet-prewhere-external-columns branch March 1, 2026 18:34
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 1, 2026
mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 26, 2026
…ere-external-columns

Fix exception in Parquet PREWHERE when column is not in file
@clickgapai
Copy link
Copy Markdown
Contributor

Hi @alexey-milovidov — the changelog category for this PR might need a look.

Current: CI Fix or Improvement
Suggested: Bug Fix

Why: This fixes a LOGICAL_ERROR crash when querying Parquet files with PREWHERE on missing columns using the V3 reader. It is a genuine bug fix, not a CI improvement. The crash affected user-facing queries with input_format_parquet_use_native_reader_v3=1 and input_format_parquet_allow_missing_columns=1.

Could you verify this is correct? Ignore if the current category is intentional.

mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 31, 2026
…ere-external-columns

Fix exception in Parquet PREWHERE when column is not in file
zvonand added a commit to Altinity/ClickHouse that referenced this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants