Skip to content

Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled#1597

Merged
zvonand merged 7 commits intoantalya-26.1from
mkmkme/antalya-26.1/iceberg-fix-prewhere
Mar 31, 2026
Merged

Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled#1597
zvonand merged 7 commits intoantalya-26.1from
mkmkme/antalya-26.1/iceberg-fix-prewhere

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Mar 30, 2026

The Iceberg read optimization (allow_experimental_iceberg_read_optimization) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it sets need_only_count = true, which tells the Parquet reader to skip all initialization — including preparePrewhere — and just return the raw row count from file metadata.

This completely bypasses row_level_filter (row policies) and prewhere_info, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters when supportsPrewhere is true and does not add a fallback FilterStep to the query plan, so the filter is silently lost.

The fix prevents need_only_count from being set when an active row_level_filter or prewhere_info exists in the format filter info.

Fixes #1595

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix row policies silently ignored on Iceberg tables with PREWHERE enabled

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@mkmkme mkmkme added bugfix port-antalya PRs to be ported to all new Antalya releases antalya-26.1 antalya-26.1.6.20001 labels Mar 30, 2026
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 30, 2026

AI audit note: This review comment was generated by AI (claude-4.6-opus-high-thinking).

Audit update for PR #1597 (Fix row policies silently ignored on Iceberg tables with PREWHERE enabled):

Confirmed defects

No confirmed defects in reviewed scope.

The fix is correct and minimal. The analysis follows.

Bug being fixed (pre-existing): When allow_experimental_iceberg_read_optimization is enabled and Iceberg metadata identifies all requested columns as constant, the code unconditionally set need_only_count = true. In the ParquetV3 reader, this causes read() (at ParquetV3BlockInputFormat.cpp:105) to skip initializeIfNeeded() entirely — reading only file_metadata.num_rows from Parquet metadata and returning raw row counts. This completely bypasses preparePrewhere() (called via ReadManager), which is where row_level_filter (row policies) and prewhere_info are applied. Since InterpreterSelectQuery does not add a fallback FilterStep for these when supportsPrewhere() is true, the filters are silently lost.

Fix correctness: The added guard (!format_filter_info || (!format_filter_info->row_level_filter && !format_filter_info->prewhere_info)) prevents the need_only_count override when either filter exists. When need_only_count stays false:

  • The Parquet reader initializes normally, reads filter columns via format_header, and applies prewhere/row-level filters during reading
  • ExtractColumnsTransform then strips filter-only columns (producing correct row counts)
  • generate() restores constant columns from constant_columns_with_values with the filtered row count

Null safety: The fix correctly short-circuits on !format_filter_info before dereferencing members.

Consistency with caller: The caller in StorageObjectStorage.cpp:448-452 already guards need_only_count with !read_from_format_info.prewhere_info && !read_from_format_info.row_level_filter. Both are populated from query_info, matching the format_filter_info fields checked by the fix.

filter_actions_dag not checked (correct by design): The fix intentionally does not guard against filter_actions_dag. This is correct because WHERE-clause pushdown via filter_actions_dag has a plan-level FilterStep fallback. Even if the format reader skips it, the plan re-evaluates the WHERE on the constant columns restored by generate(), producing correct results.

Coverage summary

  • Scope reviewed: Single-file change in StorageObjectStorageSource::createReader (static overload, lines 573–964); callers in StorageObjectStorage::read and ReadFromObjectStorageStep::initializePipeline; downstream consumers in ParquetV3BlockInputFormat::read, ParquetBlockInputFormat, ReadManager::preparePrewhere; FormatFilterInfo struct definition.
  • Categories passed: Null-pointer safety, guard condition logic, consistency with caller-level guard, constant-column restoration correctness, cache-path (num_rows_from_cache) correctness, thread-safety of shared_ptr parameter, C++ lifetime/RAII (no new objects or ownership changes).
  • Categories failed: None.
  • Assumptions/limits: No regression test is included in the PR; correctness of the fix relies on the plan-level FilterStep fallback for filter_actions_dag (verified by code inspection, not runtime).

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 30, 2026

I've also checked the regression test is passing with this change.

ianton-ru
ianton-ru previously approved these changes Mar 30, 2026
Copy link
Copy Markdown

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarlosFelipeOR
Copy link
Copy Markdown
Collaborator

QA Verification: Partial fix — issue found

PR #1597 successfully fixes the row policy issue (#1595): the 76 row_policy regression test failures and 122 integration test failures from PR #1581 are now all passing.

However, the branch (mkmkme/antalya-26.1/iceberg-fix-prewhere) which also includes the backport of upstream 100361, introduces a new issue: server crash (SIGABRT) on all amd_debug stateless test jobs. The crash occurs in Reader::applyPrewhere()addDummyColumnWithRowCount() in the Parquet V3 Reader (Signal 6, STID 2938-44c8). This is deterministic and reproduces on every debug build run. Release builds pass all tests.

Additionally, the regression test prewhere clause (REST + Glue catalogs, both archs) fails because the test expects exitcode=182 (ILLEGAL_PREWHERE) for versions < 26.2, but PREWHERE now works on Iceberg. This is a false positive — the test needs updating.

Details:

Adding verified-with-issue label.

@CarlosFelipeOR CarlosFelipeOR added the verified-with-issues Verified by QA and issues found. label Mar 30, 2026
@mkmkme mkmkme changed the title Fix row policies silently ignored on Iceberg tables with PREWHERE enabled Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled Mar 31, 2026
@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 31, 2026

The crash is fixed by @CarlosFelipeOR commit in #1581 (we probably need to move it here).

In the personal discussion with Carlos, we found out there was another issue even with that fix: one of the regression tests didn't pass with old analyzer. I have fixed this test with the latest commit.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 31, 2026

the audit-report for the last commit shows:

No confirmed defects in reviewed scope.

alexey-milovidov and others added 7 commits March 31, 2026 22:58
…ere-external-columns

Fix exception in Parquet PREWHERE when column is not in file
…o-assertion

Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set
…bled

The Iceberg read optimization (`allow_experimental_iceberg_read_optimization`)
identifies constant columns from Iceberg metadata and removes them from the
read request. When all requested columns become constant, it sets
`need_only_count = true`, which tells the Parquet reader to skip all
initialization — including `preparePrewhere` — and just return the raw row
count from file metadata.

This completely bypasses `row_level_filter` (row policies) and `prewhere_info`,
returning unfiltered row counts. The InterpreterSelectQuery relies on the
storage to apply these filters when `supportsPrewhere` is true and does not
add a fallback FilterStep to the query plan, so the filter is silently lost.

The fix prevents `need_only_count` from being set when an active
`row_level_filter` or `prewhere_info` exists in the format filter info.

Fixes #1595
…t NULLs

The Altinity-specific constant column optimization
(`allow_experimental_iceberg_read_optimization`) scans `requested_columns`
for nullable columns absent from the Iceberg file metadata and replaces
them with constant NULLs. However, `requested_columns` can also contain
columns produced by `prewhere_info` or `row_level_filter` expressions
(e.g. `equals(boolean_col, false)`). These computed columns are not in
the file metadata, and their result type is often `Nullable(UInt8)`, so
the optimization incorrectly treats them as missing file columns and
replaces them with NULLs.

This corrupts the prewhere pipeline: the Parquet reader evaluates the
filter expression correctly, but the constant column optimization then
overwrites the result with NULLs. With `need_filter = false` (old planner,
PREWHERE + WHERE), all rows appear to fail the filter, producing empty
output. With `need_filter = true`, the filter column is NULL so all rows
are filtered out.

The fix skips columns that match the `prewhere_info` or `row_level_filter`
column names, since these are computed at read time and never stored in
the file.
@CarlosFelipeOR CarlosFelipeOR added verified Approved for release and removed verified-with-issues Verified by QA and issues found. labels Mar 31, 2026
@mkmkme mkmkme force-pushed the backports/antalya-26.1/95476 branch from 9fbbcaf to 33a1a86 Compare March 31, 2026 21:09
@mkmkme mkmkme force-pushed the mkmkme/antalya-26.1/iceberg-fix-prewhere branch from 341e1ff to b7696a3 Compare March 31, 2026 21:10
@mkmkme mkmkme changed the base branch from backports/antalya-26.1/95476 to antalya-26.1 March 31, 2026 21:12
@mkmkme mkmkme dismissed ianton-ru’s stale review March 31, 2026 21:12

The base branch was changed.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 31, 2026

Functionally the PR hasn't changed. I moved one commit from #1581 because it belongs here.

@zvonand zvonand merged commit fd7dc16 into antalya-26.1 Mar 31, 2026
224 of 233 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 antalya-26.1.6.20001 bugfix port-antalya PRs to be ported to all new Antalya releases verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants