Skip to content

Fix TOO_MANY_ROWS exception during exact count optimization with multiple parts#100408

Open
alexey-milovidov wants to merge 18 commits intomasterfrom
fix-flaky-04004-exact-count-projection
Open

Fix TOO_MANY_ROWS exception during exact count optimization with multiple parts#100408
alexey-milovidov wants to merge 18 commits intomasterfrom
fix-flaky-04004-exact-count-projection

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

When optimizeUseAggregateProjection calls selectRangesToRead(find_exact_ranges=true) for the exact count optimization (used for SELECT count() queries), the max_rows_to_read check in filterPartsByPrimaryKeyAndSkipIndexes could throw TOO_MANY_ROWS for tables without user-defined projections.

This happened because has_projections was false, so the graceful exit path (setting exceeded_row_limits = true and returning) was not taken. Instead, the exception was thrown immediately, even though the exact count optimization could serve the query by computing counts directly from the primary key analysis without reading all the data.

The fix includes find_exact_ranges in the has_projections flag, consistent with how support_projection_optimization already treats it. This allows the exact count optimization to proceed: it computes exact counts from primary key ranges and only needs to read the small number of boundary granules, which are well within the limit.

The issue was triggered when CI randomized MergeTree settings caused data to be split into multiple parts with non-aligned granule boundaries, making the total estimated rows slightly exceed the max_rows_to_read limit.

Fixes #100129

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix TOO_MANY_ROWS exception for SELECT count() queries with max_rows_to_read / force_primary_key when data is split across multiple parts with non-aligned granule boundaries.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

…iple parts

When `optimizeUseAggregateProjection` calls `selectRangesToRead(find_exact_ranges=true)`
for the exact count optimization (used for `SELECT count()` queries), the
`max_rows_to_read` check in `filterPartsByPrimaryKeyAndSkipIndexes` could throw
`TOO_MANY_ROWS` for tables without user-defined projections.

This happened because `has_projections` was false, so the graceful exit path
(setting `exceeded_row_limits = true` and returning) was not taken. Instead,
the exception was thrown immediately, even though the exact count optimization
could serve the query by computing counts directly from the primary key
analysis without reading all the data.

The fix includes `find_exact_ranges` in the `has_projections` flag, consistent
with how `support_projection_optimization` already treats it. This allows the
exact count optimization to proceed: it computes exact counts from primary key
ranges and only needs to read the small number of boundary granules, which are
well within the limit.

The issue was triggered when CI randomized MergeTree settings caused data to be
split into multiple parts with non-aligned granule boundaries, making the total
estimated rows slightly exceed the `max_rows_to_read` limit.

#100129

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 23, 2026

Workflow [PR], commit [7fa82a6]

Summary:

job_name test_name status info comment
Build (arm_fuzzers) failure
Build ClickHouse failure cidb
Stateless tests (amd_tsan, parallel, 2/2) failure
03994_map_subcolumns_small_wide FAIL cidb
Integration tests (amd_llvm_coverage, 2/5) failure
test_overcommit_tracker/test.py::test_user_overcommit FAIL cidb
Integration tests (amd_asan_ubsan, db disk, old analyzer, 1/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 3/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 5/6) error
Integration tests (amd_asan_ubsan, db disk, old analyzer, 6/6) error
Integration tests (amd_binary, 2/5) error
Integration tests (amd_binary, 4/5) error
Integration tests (amd_tsan, 1/6) error

AI Review

Summary

This PR fixes a real planner/runtime mismatch for exact-count optimization: when selectRangesToRead(find_exact_ranges=true) was used, row-limit handling could throw TOO_MANY_ROWS too early on multi-part layouts even though exact-count could satisfy count() mostly from PK analysis. The change aligns has_projections behavior with exact-range mode, clears stale exceeded_row_limits state for the reduced inexact remainder, and adds focused stateless regressions (including force_primary_key + non-aligned granules). High-level verdict: the fix is coherent, targeted, and sufficiently covered.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 23, 2026
/// on the full table scan should not cause an immediate failure.
bool projection_parts_exist = std::any_of(res_parts.begin(), res_parts.end(), [](const auto & part) { return part.data_part->isProjectionPart(); });
bool has_projections = metadata_snapshot->hasProjections() || projection_parts_exist;
bool has_projections = metadata_snapshot->hasProjections() || projection_parts_exist || find_exact_ranges;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This fixes the runtime behavior, but there is still no regression test that exercises SELECT count() with max_rows_to_read / force_primary_key across multiple parts with non-aligned granule boundaries.

Please add a new stateless test (preferably in tests/queries/0_stateless) that reproduces the previous TOO_MANY_ROWS exception and verifies the query succeeds after this fix. Without a dedicated test, this exact planner-path regression is likely to reappear.

alexey-milovidov and others added 17 commits March 24, 2026 00:30
The exact count optimization can now compute `count()` results from
primary key analysis without reading data, so `count()` queries bypass
the `max_rows_to_read` limit on MergeTree tables. Update the test to
expect success for `count()` queries and add `SELECT *` tests to verify
that non-count queries still respect the limit.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The exact count optimization that bypasses `max_rows_to_read` only works
when `optimize_use_projections` is enabled. Since CI randomizes this
setting, the test must keep the original `TOO_MANY_ROWS` expectations
for `count()` queries on MergeTree tables.

The C++ fix in the previous commit is still correct — it prevents the
spurious exception when the projection optimization IS running (as in
the 04004 test).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… optimization

The exact count optimization that bypasses `max_rows_to_read` requires
`optimize_use_projections = 1`. Set it explicitly so the test works even
when CI randomizes it to false. Also add `SELECT *` queries to verify
that non-count queries still fail fast with `TOO_MANY_ROWS`.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…jections`

The exact count optimization code path in `getAggregateProjectionCandidates`
requires `can_use_minmax_projection` to be true before it will set
`only_count_column`. This flag depends on `optimize_use_implicit_projections`.

When CI randomizes `optimize_use_implicit_projections = false`, the function
returns early (no implicit projections + no explicit aggregate projections),
and the exact count optimization never runs. The count query then hits
`max_rows_to_read` and throws `TOO_MANY_ROWS`.

Also add a dedicated regression test for the original issue (#100129).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ary-granule reads

The exact count optimization still needs to read boundary granules at
range edges. With `index_granularity = 128` and a BETWEEN condition
spanning two boundaries, 256 rows of boundary granules need to be read.
The previous `max_rows_to_read = 100` was too low, causing execution-time
`TOO_MANY_ROWS`.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…oss multiple parts

Addresses review feedback: adds a dedicated stateless test that exercises
`SELECT count()` with `max_rows_to_read` / `force_primary_key` when data is
split across multiple parts with non-aligned granule boundaries. Before the
fix in this PR, this would throw `TOO_MANY_ROWS`.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The `SELECT *` without a WHERE clause throws `INDEX_NOT_USED` instead of
`TOO_MANY_ROWS` when `force_primary_key` is set. Add a WHERE on the
primary key that covers all rows so the query reaches the row limit check.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 31, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.00% -0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.70% 76.60% -0.10%

Changed lines: 100.00% (11/11) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: 04004_integral_col_comparison_with_float_key_condition

1 participant