Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled by mkmkme · Pull Request #1597 · Altinity/ClickHouse

mkmkme · 2026-03-30T10:46:50Z

The Iceberg read optimization (allow_experimental_iceberg_read_optimization) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it sets need_only_count = true, which tells the Parquet reader to skip all initialization — including preparePrewhere — and just return the raw row count from file metadata.

This completely bypasses row_level_filter (row policies) and prewhere_info, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters when supportsPrewhere is true and does not add a fallback FilterStep to the query plan, so the filter is silently lost.

The fix prevents need_only_count from being set when an active row_level_filter or prewhere_info exists in the format filter info.

Fixes #1595

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix row policies silently ignored on Iceberg tables with PREWHERE enabled

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

mkmkme · 2026-03-30T10:54:13Z

AI audit note: This review comment was generated by AI (claude-4.6-opus-high-thinking).

Audit update for PR #1597 (Fix row policies silently ignored on Iceberg tables with PREWHERE enabled):

Confirmed defects

No confirmed defects in reviewed scope.

The fix is correct and minimal. The analysis follows.

Bug being fixed (pre-existing): When allow_experimental_iceberg_read_optimization is enabled and Iceberg metadata identifies all requested columns as constant, the code unconditionally set need_only_count = true. In the ParquetV3 reader, this causes read() (at ParquetV3BlockInputFormat.cpp:105) to skip initializeIfNeeded() entirely — reading only file_metadata.num_rows from Parquet metadata and returning raw row counts. This completely bypasses preparePrewhere() (called via ReadManager), which is where row_level_filter (row policies) and prewhere_info are applied. Since InterpreterSelectQuery does not add a fallback FilterStep for these when supportsPrewhere() is true, the filters are silently lost.

Fix correctness: The added guard (!format_filter_info || (!format_filter_info->row_level_filter && !format_filter_info->prewhere_info)) prevents the need_only_count override when either filter exists. When need_only_count stays false:

The Parquet reader initializes normally, reads filter columns via format_header, and applies prewhere/row-level filters during reading
ExtractColumnsTransform then strips filter-only columns (producing correct row counts)
generate() restores constant columns from constant_columns_with_values with the filtered row count

Null safety: The fix correctly short-circuits on !format_filter_info before dereferencing members.

Consistency with caller: The caller in StorageObjectStorage.cpp:448-452 already guards need_only_count with !read_from_format_info.prewhere_info && !read_from_format_info.row_level_filter. Both are populated from query_info, matching the format_filter_info fields checked by the fix.

filter_actions_dag not checked (correct by design): The fix intentionally does not guard against filter_actions_dag. This is correct because WHERE-clause pushdown via filter_actions_dag has a plan-level FilterStep fallback. Even if the format reader skips it, the plan re-evaluates the WHERE on the constant columns restored by generate(), producing correct results.

Coverage summary

Scope reviewed: Single-file change in StorageObjectStorageSource::createReader (static overload, lines 573–964); callers in StorageObjectStorage::read and ReadFromObjectStorageStep::initializePipeline; downstream consumers in ParquetV3BlockInputFormat::read, ParquetBlockInputFormat, ReadManager::preparePrewhere; FormatFilterInfo struct definition.
Categories passed: Null-pointer safety, guard condition logic, consistency with caller-level guard, constant-column restoration correctness, cache-path (num_rows_from_cache) correctness, thread-safety of shared_ptr parameter, C++ lifetime/RAII (no new objects or ownership changes).
Categories failed: None.
Assumptions/limits: No regression test is included in the PR; correctness of the fix relies on the plan-level FilterStep fallback for filter_actions_dag (verified by code inspection, not runtime).

mkmkme · 2026-03-30T11:14:08Z

I've also checked the regression test is passing with this change.

ianton-ru

LGTM

CarlosFelipeOR · 2026-03-30T23:32:11Z

QA Verification: Partial fix — issue found

PR #1597 successfully fixes the row policy issue (#1595): the 76 row_policy regression test failures and 122 integration test failures from PR #1581 are now all passing.

However, the branch (mkmkme/antalya-26.1/iceberg-fix-prewhere) which also includes the backport of upstream 100361, introduces a new issue: server crash (SIGABRT) on all amd_debug stateless test jobs. The crash occurs in Reader::applyPrewhere() → addDummyColumnWithRowCount() in the Parquet V3 Reader (Signal 6, STID 2938-44c8). This is deterministic and reproduces on every debug build run. Release builds pass all tests.

Additionally, the regression test prewhere clause (REST + Glue catalogs, both archs) fails because the test expects exitcode=182 (ILLEGAL_PREWHERE) for versions < 26.2, but PREWHERE now works on Iceberg. This is a false positive — the test needs updating.

Details:

Row policy fix confirmed: Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581) #1595
Debug crash issue: Server crash (SIGABRT) in Parquet V3 Reader applyPrewhere - Only Debug Builds #1603
CI report: ci_run_report.html

Adding verified-with-issue label.

mkmkme · 2026-03-31T12:33:57Z

The crash is fixed by @CarlosFelipeOR commit in #1581 (we probably need to move it here).

In the personal discussion with Carlos, we found out there was another issue even with that fix: one of the regression tests didn't pass with old analyzer. I have fixed this test with the latest commit.

mkmkme · 2026-03-31T12:45:06Z

the audit-report for the last commit shows:

No confirmed defects in reviewed scope.

…r_iceberg enable prewhere for iceberg

…ere-external-columns Fix exception in Parquet PREWHERE when column is not in file

…o-assertion Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set

…bled The Iceberg read optimization (`allow_experimental_iceberg_read_optimization`) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it sets `need_only_count = true`, which tells the Parquet reader to skip all initialization — including `preparePrewhere` — and just return the raw row count from file metadata. This completely bypasses `row_level_filter` (row policies) and `prewhere_info`, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters when `supportsPrewhere` is true and does not add a fallback FilterStep to the query plan, so the filter is silently lost. The fix prevents `need_only_count` from being set when an active `row_level_filter` or `prewhere_info` exists in the format filter info. Fixes #1595

…ss instead of rows_total

…t NULLs The Altinity-specific constant column optimization (`allow_experimental_iceberg_read_optimization`) scans `requested_columns` for nullable columns absent from the Iceberg file metadata and replaces them with constant NULLs. However, `requested_columns` can also contain columns produced by `prewhere_info` or `row_level_filter` expressions (e.g. `equals(boolean_col, false)`). These computed columns are not in the file metadata, and their result type is often `Nullable(UInt8)`, so the optimization incorrectly treats them as missing file columns and replaces them with NULLs. This corrupts the prewhere pipeline: the Parquet reader evaluates the filter expression correctly, but the constant column optimization then overwrites the result with NULLs. With `need_filter = false` (old planner, PREWHERE + WHERE), all rows appear to fail the filter, producing empty output. With `need_filter = true`, the filter column is NULL so all rows are filtered out. The fix skips columns that match the `prewhere_info` or `row_level_filter` column names, since these are computed at read time and never stored in the file.

The base branch was changed.

mkmkme · 2026-03-31T21:20:04Z

Functionally the PR hasn't changed. I moved one commit from #1581 because it belongs here.

mkmkme added bugfix port-antalya PRs to be ported to all new Antalya releases antalya-26.1 antalya-26.1.6.20001 labels Mar 30, 2026

ianton-ru previously approved these changes Mar 30, 2026

View reviewed changes

CarlosFelipeOR mentioned this pull request Mar 30, 2026

Server crash (SIGABRT) in Parquet V3 Reader applyPrewhere - Only Debug Builds #1603

Closed

CarlosFelipeOR added the verified-with-issues Verified by QA and issues found. label Mar 30, 2026

mkmkme changed the title ~~Fix row policies silently ignored on Iceberg tables with PREWHERE enabled~~ Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled Mar 31, 2026

CarlosFelipeOR mentioned this pull request Mar 31, 2026

Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581) #1595

Closed

alexey-milovidov and others added 7 commits March 31, 2026 22:58

Merge pull request ClickHouse#95476 from scanhex12/enable_prewhere_fo…

9e018bf

…r_iceberg enable prewhere for iceberg

DataLakeConfiguration: delegate supportsPrewhere() as well

33cd9a0

Merge pull request ClickHouse#98360 from ClickHouse/fix-parquet-prewh…

0bb4fa3

…ere-external-columns Fix exception in Parquet PREWHERE when column is not in file

Merge pull request ClickHouse#100361 from ClickHouse/fix-prewhere-inf…

33a1a86

…o-assertion Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set

Fix assertion failure in Parquet V3 Reader applyPrewhere: use rows_pa…

35be132

…ss instead of rows_total

CarlosFelipeOR added verified Approved for release and removed verified-with-issues Verified by QA and issues found. labels Mar 31, 2026

mkmkme force-pushed the backports/antalya-26.1/95476 branch from 9fbbcaf to 33a1a86 Compare March 31, 2026 21:09

mkmkme force-pushed the mkmkme/antalya-26.1/iceberg-fix-prewhere branch from 341e1ff to b7696a3 Compare March 31, 2026 21:10

mkmkme changed the base branch from backports/antalya-26.1/95476 to antalya-26.1 March 31, 2026 21:12

svb-alt added the antalya label Mar 31, 2026

mkmkme mentioned this pull request Mar 31, 2026

Antalya 26.1 Backport of #95476, #98360, #100361 - enable prewhere for iceberg (and fixes) #1581

Merged

27 tasks

zvonand merged commit fd7dc16 into antalya-26.1 Mar 31, 2026
224 of 233 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled#1597

Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled#1597
zvonand merged 7 commits intoantalya-26.1from
mkmkme/antalya-26.1/iceberg-fix-prewhere

mkmkme commented Mar 30, 2026

Uh oh!

mkmkme commented Mar 30, 2026

Uh oh!

mkmkme commented Mar 30, 2026

Uh oh!

ianton-ru left a comment

Uh oh!

CarlosFelipeOR commented Mar 30, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mkmkme commented Mar 30, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

mkmkme commented Mar 30, 2026

Confirmed defects

Coverage summary

Uh oh!

mkmkme commented Mar 30, 2026

Uh oh!

ianton-ru left a comment

Choose a reason for hiding this comment

Uh oh!

CarlosFelipeOR commented Mar 30, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

mkmkme commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants