Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581)

✅ *I checked [the Altinity Stable Builds lifecycle table](https://docs.altinity.com/altinitystablebuilds/#altinity-stable-builds-life-cycle-table), and the Altinity Stable Build version I'm using is still supported.*

## Type of problem
**Bug report** - something's broken

## Describe the situation
Row policies are silently ignored on Iceberg tables when PREWHERE optimization is enabled for Iceberg. A `SELECT` on an Iceberg table with a RESTRICTIVE row policy returns **all rows** instead of only the rows matching the policy filter. The same row policy applied to a MergeTree table with identical data correctly filters rows.

This issue:
- Returns **incorrect query results** (silent data over-exposure) — no error or crash, the row policy is simply not enforced
- Is a **regression introduced by PR #1581** (backport of upstream #95476 + #98360 — enable PREWHERE for Iceberg)
- Affects **all architectures** (x86_64 and aarch64) and all Iceberg catalog types (REST, Glue, table engine)
- Has **security implications** — row-level access control is bypassed on Iceberg tables

---

## How to reproduce the behavior

### Environment
- **Version:** 26.1.4.20001.altinityantalya (PR #1581 build)
- **Build type:** Release (amd_binary)

### Option A: Using the regression test suite

```bash
python3 -u iceberg/regression.py \
    --clickhouse https://altinity-build-artifacts.s3.amazonaws.com/PRs/1581/aca8923197484b57b374c22cd9b6c309796b9613/build_amd_binary/clickhouse \
    --clickhouse-version 26.1.4.20001.altinityantalya \
    --log log.log \
    --only '/iceberg/iceberg engine/rest catalog/feature/row policies/combination #30/*'
```

### Option B: Manual reproduction

Prerequisites: ClickHouse with the PR #1581 build, MinIO, and an Iceberg REST catalog (e.g. `ice-rest-catalog`). Use the `docker-compose.yml` from `iceberg/iceberg_env/` in the regression repo.

1. Create an Iceberg table via pyiceberg with 5 columns (`boolean_col`, `long_col`, `double_col`, `string_col`, `date_col`) and insert 100 rows of random data.

2. Create the Iceberg database and a MergeTree table with the same data in ClickHouse:

```sql
SET allow_experimental_database_iceberg=true;

CREATE DATABASE row_policy 
ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000', 'admin', 'password') 
SETTINGS catalog_type = 'rest',
    storage_endpoint = 'http://minio:9000/warehouse',
    warehouse = 's3://bucket1/',
    auth_header = 'Authorization: Bearer foo';

CREATE TABLE merge_tree_table (
    boolean_col Nullable(Bool), 
    long_col Nullable(Int64), 
    double_col Nullable(Float64), 
    string_col Nullable(String),
    date_col Nullable(Date32)
) ENGINE = MergeTree ORDER BY tuple();

INSERT INTO merge_tree_table 
SELECT * FROM row_policy.`<namespace>.<table_name>`;
```

3. Create a user and grant SELECT on both tables:

```sql
CREATE USER test_user;
GRANT SELECT ON merge_tree_table TO test_user;
GRANT SELECT ON row_policy.`<namespace>.<table_name>` TO test_user;
```

4. Create a row policy with a condition that actually filters rows:

```sql
CREATE ROW POLICY test_policy 
    ON merge_tree_table, row_policy.`<namespace>.<table_name>` 
    USING long_col in range(1, 100) OR double_col in range(1, 100) 
    AS RESTRICTIVE 
    TO test_user;
```

5. Query both tables as `test_user`:

```bash
clickhouse-client --user test_user -q "SELECT count() FROM merge_tree_table"
# Returns: 3 (filtered by row policy)

clickhouse-client --user test_user -q "SELECT count() FROM row_policy.\`<namespace>.<table_name>\`"
# Returns: 100 (row policy NOT applied — BUG)
```

---

## Expected behavior
Both queries should return the same number of rows (3), since the same row policy is applied to both tables with identical data.

---

## Actual behavior
The MergeTree table correctly returns 3 rows (filtered). The Iceberg table returns all 100 rows — the row policy is silently ignored.

```
root@clickhouse1:/# clickhouse-client --user user1 -q "SELECT count() FROM merge_tree_table"
3
root@clickhouse1:/# clickhouse-client --user user1 -q "SELECT count() FROM row_policy.\`row_policy.table_a493118f_2baf_11f1_8e49_d4a2cd784155\`"
100
```

---

## Root cause analysis

> **Note:** This root cause analysis was generated with AI assistance. The upstream verification test was confirmed manually, but the specific code-level root cause has not been fully verified and should be investigated by the developer.

### The bug is NOT in the upstream PRs themselves

We verified this by running the same failing test (`combination #30`) against upstream **v26.3.2.3-lts**, which contains both upstream PRs #95476 and #98360 (confirmed via git ancestry: both merge commits are ancestors of the v26.3.2.3-lts tag with `behind_by: 0`). The test **passes** on v26.3.2.3 — the row policy is correctly applied to the Iceberg table, returning 3 rows for both MergeTree and Iceberg.

```bash
python3 -u iceberg/regression.py \
    --clickhouse docker://clickhouse/clickhouse-server:26.3.2.3-alpine \
    --clickhouse-version 26.3.2.3-alpine \
    --log log1.log \
    --only '/iceberg/iceberg engine/rest catalog/feature/row policies/combination #30/*'
# Result: OK (both tables return 3 rows with the row policy)
```

This indicates the regression is caused by the interaction of the backported changes with the `antalya-26.1` codebase, rather than by the upstream changes themselves.

### Possible root cause

We identified a potential issue in `updateFormatPrewhereInfo()` in `src/Storages/prepareReadingFromFormat.cpp`. This function does **not** store the `row_level_filter` in the new `ReadFromFormatInfo`:

```cpp
ReadFromFormatInfo new_info;
new_info.prewhere_info = prewhere_info;
// row_level_filter is NOT stored in new_info
```

This same code exists in upstream v26.3.2.3, but the test passes there — possibly because the new query analyzer (enabled by default in v26.3) handles row policies through a different code path. In `antalya-26.1`, the old query planner is used by default.

An upstream fix for a closely related issue was merged in **PR [#100361](https://github.com/ClickHouse/ClickHouse/pull/100361)** ("Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set"), on March 29, 2026. That PR adds the missing `new_info.row_level_filter = row_level_filter` line. It is possible that including this fix in the backport would resolve the issue.

---

## Additional context

### CI failure
- **PR:** [#1581](https://github.com/Altinity/ClickHouse/pull/1581) — Antalya 26.1 Backport of #95476, #98360 - enable prewhere for iceberg (and fixes)
- **Commit:** `aca8923197484b57b374c22cd9b6c309796b9613`
- **CI report:** [ci_run_report.html](https://s3.amazonaws.com/altinity-build-artifacts/PRs/1581/aca8923197484b57b374c22cd9b6c309796b9613/23627983316/ci_run_report.html)
- **CI run:** [GitHub Actions #23627983316](https://github.com/Altinity/ClickHouse/actions/runs/23627983316)

### Upstream PRs in PR #1581
PR #1581 is a **pure backport** of two upstream PRs (with a single adaptation line for antalya's `StorageIcebergConfiguration` class hierarchy):
| PR | Description | Merged to master | First upstream release |
|---|---|---|---|
| [#95476](https://github.com/ClickHouse/ClickHouse/pull/95476) | Enable PREWHERE for Iceberg | Jan 30, 2026 | v26.2.1.1139-stable |
| [#98360](https://github.com/ClickHouse/ClickHouse/pull/98360) | Fix Parquet PREWHERE missing column | Mar 1, 2026 | v26.3.1.896-lts |

Neither #95476 nor #98360 were backported to upstream 26.1.x.

### Upstream verification
The same test was run against upstream **v26.3.2.3-lts** (which contains both #95476 and #98360) and **passed**. This confirms the issue is specific to the `antalya-26.1` branch.

### Possibly related upstream fix
- [#100361](https://github.com/ClickHouse/ClickHouse/pull/100361) — "Fix exception in `updateFormatPrewhereInfo` when only row-level filter is set" (merged Mar 29, 2026, not yet in any stable release). It is possible that including this fix in the backport would resolve the issue.

### Regression test results database
- **88 new Fail results** in `row_policy` tests, exclusively on PR #1581 build (`aca892319`), on both x86_64 (44) and aarch64 (44)
- **Zero historical failures** for `row_policy` tests across all other builds in the last 30 days (170+ builds, all OK)
- The base branch `antalya-26.1` at commit `a76c804b` has 1587 OK results per architecture for these tests

### Scope of regression
Out of 100 row policy combinations tested, only 4 fail in the REST catalog suite (#30, #53, #58, #87). The other 96 pass because their row policy conditions are trivially true (match all rows), reference non-existent columns, or target users/roles not executing the query — so the bug is masked. The same 4 combinations also fail in the Glue catalog suite and the Iceberg table engine suite (80 failures total across all suites and architectures).

### Integration tests also affected

Beyond the regression test suite (row_policy), the [CI report](https://s3.amazonaws.com/altinity-build-artifacts/PRs/1581/aca8923197484b57b374c22cd9b6c309796b9613/23627983316/ci_run_report.html#checks-fails) shows **122 out of 126 Checks New Fails** are also Iceberg-related integration tests failing with the same root pattern — query results differ depending on whether an optimization is enabled or disabled:

| Test | Failures | Assertion |
|---|---|---|
| `test_read_constant_columns_optimization` | 56 | `result_expected == result_optimized` fails when toggling `allow_experimental_iceberg_read_optimization` |
| `test_partition_pruning_with_subquery_set` | 42 | `data1 == data2` fails when toggling `use_iceberg_partition_pruning` |
| `test_writes_statistics_by_minmax_pruning` | 24 | `data1 == data2` fails when toggling `use_iceberg_partition_pruning` |

These tests were **not modified by PR #1581** — they already existed on the `antalya-26.1` base branch. The failures are deterministic and appear across all job configurations (amd_asan, amd_binary, amd_tsan, arm_binary).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581) #1595

Type of problem

Describe the situation

How to reproduce the behavior

Environment

Option A: Using the regression test suite

Option B: Manual reproduction

Expected behavior

Actual behavior

Root cause analysis

The bug is NOT in the upstream PRs themselves

Possible root cause

Additional context

CI failure

Upstream PRs in PR #1581

Upstream verification

Possibly related upstream fix

Regression test results database

Scope of regression

Integration tests also affected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PR	Description	Merged to master	First upstream release
#95476	Enable PREWHERE for Iceberg	Jan 30, 2026	v26.2.1.1139-stable
#98360	Fix Parquet PREWHERE missing column	Mar 1, 2026	v26.3.1.896-lts

Test	Failures	Assertion
`test_read_constant_columns_optimization`	56	`result_expected == result_optimized` fails when toggling `allow_experimental_iceberg_read_optimization`
`test_partition_pruning_with_subquery_set`	42	`data1 == data2` fails when toggling `use_iceberg_partition_pruning`
`test_writes_statistics_by_minmax_pruning`	24	`data1 == data2` fails when toggling `use_iceberg_partition_pruning`

Row policies silently ignored on Iceberg tables after enabling PREWHERE (PR #1581) #1595

Description

Type of problem

Describe the situation

How to reproduce the behavior

Environment

Option A: Using the regression test suite

Option B: Manual reproduction

Expected behavior

Actual behavior

Root cause analysis

The bug is NOT in the upstream PRs themselves

Possible root cause

Additional context

CI failure

Upstream PRs in PR #1581

Upstream verification

Possibly related upstream fix

Regression test results database

Scope of regression

Integration tests also affected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions