Antalya-26.1: Fix row policies silently ignored on Iceberg tables with PREWHERE enabled#1597
Conversation
|
AI audit note: This review comment was generated by AI (claude-4.6-opus-high-thinking). Audit update for PR #1597 (Fix row policies silently ignored on Iceberg tables with PREWHERE enabled): Confirmed defectsNo confirmed defects in reviewed scope. The fix is correct and minimal. The analysis follows. Bug being fixed (pre-existing): When Fix correctness: The added guard
Null safety: The fix correctly short-circuits on Consistency with caller: The caller in
Coverage summary
|
|
I've also checked the regression test is passing with this change. |
|
QA Verification: Partial fix — issue found PR #1597 successfully fixes the row policy issue (#1595): the 76 row_policy regression test failures and 122 integration test failures from PR #1581 are now all passing. However, the branch ( Additionally, the regression test Details:
Adding |
|
The crash is fixed by @CarlosFelipeOR commit in #1581 (we probably need to move it here). In the personal discussion with Carlos, we found out there was another issue even with that fix: one of the regression tests didn't pass with old analyzer. I have fixed this test with the latest commit. |
|
the audit-report for the last commit shows: |
…r_iceberg enable prewhere for iceberg
…ere-external-columns Fix exception in Parquet PREWHERE when column is not in file
…o-assertion Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set
…bled The Iceberg read optimization (`allow_experimental_iceberg_read_optimization`) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it sets `need_only_count = true`, which tells the Parquet reader to skip all initialization — including `preparePrewhere` — and just return the raw row count from file metadata. This completely bypasses `row_level_filter` (row policies) and `prewhere_info`, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters when `supportsPrewhere` is true and does not add a fallback FilterStep to the query plan, so the filter is silently lost. The fix prevents `need_only_count` from being set when an active `row_level_filter` or `prewhere_info` exists in the format filter info. Fixes #1595
…ss instead of rows_total
…t NULLs The Altinity-specific constant column optimization (`allow_experimental_iceberg_read_optimization`) scans `requested_columns` for nullable columns absent from the Iceberg file metadata and replaces them with constant NULLs. However, `requested_columns` can also contain columns produced by `prewhere_info` or `row_level_filter` expressions (e.g. `equals(boolean_col, false)`). These computed columns are not in the file metadata, and their result type is often `Nullable(UInt8)`, so the optimization incorrectly treats them as missing file columns and replaces them with NULLs. This corrupts the prewhere pipeline: the Parquet reader evaluates the filter expression correctly, but the constant column optimization then overwrites the result with NULLs. With `need_filter = false` (old planner, PREWHERE + WHERE), all rows appear to fail the filter, producing empty output. With `need_filter = true`, the filter column is NULL so all rows are filtered out. The fix skips columns that match the `prewhere_info` or `row_level_filter` column names, since these are computed at read time and never stored in the file.
9fbbcaf to
33a1a86
Compare
341e1ff to
b7696a3
Compare
|
Functionally the PR hasn't changed. I moved one commit from #1581 because it belongs here. |
The Iceberg read optimization (
allow_experimental_iceberg_read_optimization) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it setsneed_only_count = true, which tells the Parquet reader to skip all initialization — includingpreparePrewhere— and just return the raw row count from file metadata.This completely bypasses
row_level_filter(row policies) andprewhere_info, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters whensupportsPrewhereis true and does not add a fallback FilterStep to the query plan, so the filter is silently lost.The fix prevents
need_only_countfrom being set when an activerow_level_filterorprewhere_infoexists in the format filter info.Fixes #1595
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix row policies silently ignored on Iceberg tables with PREWHERE enabled
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: