-
Notifications
You must be signed in to change notification settings - Fork 17
Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581) #1595
Description
✅ I checked the Altinity Stable Builds lifecycle table, and the Altinity Stable Build version I'm using is still supported.
Type of problem
Bug report - something's broken
Describe the situation
Row policies are silently ignored on Iceberg tables when PREWHERE optimization is enabled for Iceberg. A SELECT on an Iceberg table with a RESTRICTIVE row policy returns all rows instead of only the rows matching the policy filter. The same row policy applied to a MergeTree table with identical data correctly filters rows.
This issue:
- Returns incorrect query results (silent data over-exposure) — no error or crash, the row policy is simply not enforced
- Is a regression introduced by PR Antalya 26.1 Backport of #95476, #98360, #100361 - enable prewhere for iceberg (and fixes) #1581 (backport of upstream enable prewhere for iceberg ClickHouse/ClickHouse#95476 + Fix exception in Parquet PREWHERE when column is not in file ClickHouse/ClickHouse#98360 — enable PREWHERE for Iceberg)
- Affects all architectures (x86_64 and aarch64) and all Iceberg catalog types (REST, Glue, table engine)
- Has security implications — row-level access control is bypassed on Iceberg tables
How to reproduce the behavior
Environment
- Version: 26.1.4.20001.altinityantalya (PR Antalya 26.1 Backport of #95476, #98360, #100361 - enable prewhere for iceberg (and fixes) #1581 build)
- Build type: Release (amd_binary)
Option A: Using the regression test suite
python3 -u iceberg/regression.py \
--clickhouse https://altinity-build-artifacts.s3.amazonaws.com/PRs/1581/aca8923197484b57b374c22cd9b6c309796b9613/build_amd_binary/clickhouse \
--clickhouse-version 26.1.4.20001.altinityantalya \
--log log.log \
--only '/iceberg/iceberg engine/rest catalog/feature/row policies/combination #30/*'Option B: Manual reproduction
Prerequisites: ClickHouse with the PR #1581 build, MinIO, and an Iceberg REST catalog (e.g. ice-rest-catalog). Use the docker-compose.yml from iceberg/iceberg_env/ in the regression repo.
-
Create an Iceberg table via pyiceberg with 5 columns (
boolean_col,long_col,double_col,string_col,date_col) and insert 100 rows of random data. -
Create the Iceberg database and a MergeTree table with the same data in ClickHouse:
SET allow_experimental_database_iceberg=true;
CREATE DATABASE row_policy
ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000', 'admin', 'password')
SETTINGS catalog_type = 'rest',
storage_endpoint = 'http://minio:9000/warehouse',
warehouse = 's3://bucket1/',
auth_header = 'Authorization: Bearer foo';
CREATE TABLE merge_tree_table (
boolean_col Nullable(Bool),
long_col Nullable(Int64),
double_col Nullable(Float64),
string_col Nullable(String),
date_col Nullable(Date32)
) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO merge_tree_table
SELECT * FROM row_policy.`<namespace>.<table_name>`;- Create a user and grant SELECT on both tables:
CREATE USER test_user;
GRANT SELECT ON merge_tree_table TO test_user;
GRANT SELECT ON row_policy.`<namespace>.<table_name>` TO test_user;- Create a row policy with a condition that actually filters rows:
CREATE ROW POLICY test_policy
ON merge_tree_table, row_policy.`<namespace>.<table_name>`
USING long_col in range(1, 100) OR double_col in range(1, 100)
AS RESTRICTIVE
TO test_user;- Query both tables as
test_user:
clickhouse-client --user test_user -q "SELECT count() FROM merge_tree_table"
# Returns: 3 (filtered by row policy)
clickhouse-client --user test_user -q "SELECT count() FROM row_policy.\`<namespace>.<table_name>\`"
# Returns: 100 (row policy NOT applied — BUG)Expected behavior
Both queries should return the same number of rows (3), since the same row policy is applied to both tables with identical data.
Actual behavior
The MergeTree table correctly returns 3 rows (filtered). The Iceberg table returns all 100 rows — the row policy is silently ignored.
root@clickhouse1:/# clickhouse-client --user user1 -q "SELECT count() FROM merge_tree_table"
3
root@clickhouse1:/# clickhouse-client --user user1 -q "SELECT count() FROM row_policy.\`row_policy.table_a493118f_2baf_11f1_8e49_d4a2cd784155\`"
100
Root cause analysis
Note: This root cause analysis was generated with AI assistance. The upstream verification test was confirmed manually, but the specific code-level root cause has not been fully verified and should be investigated by the developer.
The bug is NOT in the upstream PRs themselves
We verified this by running the same failing test (combination #30) against upstream v26.3.2.3-lts, which contains both upstream PRs ClickHouse#95476 and ClickHouse#98360 (confirmed via git ancestry: both merge commits are ancestors of the v26.3.2.3-lts tag with behind_by: 0). The test passes on v26.3.2.3 — the row policy is correctly applied to the Iceberg table, returning 3 rows for both MergeTree and Iceberg.
python3 -u iceberg/regression.py \
--clickhouse docker://clickhouse/clickhouse-server:26.3.2.3-alpine \
--clickhouse-version 26.3.2.3-alpine \
--log log1.log \
--only '/iceberg/iceberg engine/rest catalog/feature/row policies/combination #30/*'
# Result: OK (both tables return 3 rows with the row policy)This indicates the regression is caused by the interaction of the backported changes with the antalya-26.1 codebase, rather than by the upstream changes themselves.
Possible root cause
We identified a potential issue in updateFormatPrewhereInfo() in src/Storages/prepareReadingFromFormat.cpp. This function does not store the row_level_filter in the new ReadFromFormatInfo:
ReadFromFormatInfo new_info;
new_info.prewhere_info = prewhere_info;
// row_level_filter is NOT stored in new_infoThis same code exists in upstream v26.3.2.3, but the test passes there — possibly because the new query analyzer (enabled by default in v26.3) handles row policies through a different code path. In antalya-26.1, the old query planner is used by default.
An upstream fix for a closely related issue was merged in PR #100361 ("Fix exception in updateFormatPrewhereInfo when only row-level filter is set"), on March 29, 2026. That PR adds the missing new_info.row_level_filter = row_level_filter line. It is possible that including this fix in the backport would resolve the issue.
Additional context
CI failure
- PR: #1581 — Antalya 26.1 Backport of enable prewhere for iceberg ClickHouse/ClickHouse#95476, Fix exception in Parquet PREWHERE when column is not in file ClickHouse/ClickHouse#98360 - enable prewhere for iceberg (and fixes)
- Commit:
aca8923197484b57b374c22cd9b6c309796b9613 - CI report: ci_run_report.html
- CI run: GitHub Actions #23627983316
Upstream PRs in PR #1581
PR #1581 is a pure backport of two upstream PRs (with a single adaptation line for antalya's StorageIcebergConfiguration class hierarchy):
| PR | Description | Merged to master | First upstream release |
|---|---|---|---|
| #95476 | Enable PREWHERE for Iceberg | Jan 30, 2026 | v26.2.1.1139-stable |
| #98360 | Fix Parquet PREWHERE missing column | Mar 1, 2026 | v26.3.1.896-lts |
Neither ClickHouse#95476 nor ClickHouse#98360 were backported to upstream 26.1.x.
Upstream verification
The same test was run against upstream v26.3.2.3-lts (which contains both ClickHouse#95476 and ClickHouse#98360) and passed. This confirms the issue is specific to the antalya-26.1 branch.
Possibly related upstream fix
- #100361 — "Fix exception in
updateFormatPrewhereInfowhen only row-level filter is set" (merged Mar 29, 2026, not yet in any stable release). It is possible that including this fix in the backport would resolve the issue.
Regression test results database
- 88 new Fail results in
row_policytests, exclusively on PR Antalya 26.1 Backport of #95476, #98360, #100361 - enable prewhere for iceberg (and fixes) #1581 build (aca892319), on both x86_64 (44) and aarch64 (44) - Zero historical failures for
row_policytests across all other builds in the last 30 days (170+ builds, all OK) - The base branch
antalya-26.1at commita76c804bhas 1587 OK results per architecture for these tests
Scope of regression
Out of 100 row policy combinations tested, only 4 fail in the REST catalog suite (#30, #53, #58, #87). The other 96 pass because their row policy conditions are trivially true (match all rows), reference non-existent columns, or target users/roles not executing the query — so the bug is masked. The same 4 combinations also fail in the Glue catalog suite and the Iceberg table engine suite (80 failures total across all suites and architectures).
Integration tests also affected
Beyond the regression test suite (row_policy), the CI report shows 122 out of 126 Checks New Fails are also Iceberg-related integration tests failing with the same root pattern — query results differ depending on whether an optimization is enabled or disabled:
| Test | Failures | Assertion |
|---|---|---|
test_read_constant_columns_optimization |
56 | result_expected == result_optimized fails when toggling allow_experimental_iceberg_read_optimization |
test_partition_pruning_with_subquery_set |
42 | data1 == data2 fails when toggling use_iceberg_partition_pruning |
test_writes_statistics_by_minmax_pruning |
24 | data1 == data2 fails when toggling use_iceberg_partition_pruning |
These tests were not modified by PR #1581 — they already existed on the antalya-26.1 base branch. The failures are deterministic and appear across all job configurations (amd_asan, amd_binary, amd_tsan, arm_binary).