Skip to content

Optimize index analysis by applying query condition cache earlier and caching filtered ranges#82380

Merged
shankar-iyer merged 11 commits intoClickHouse:masterfrom
amosbird:better-query-condition-cache
Oct 9, 2025
Merged

Optimize index analysis by applying query condition cache earlier and caching filtered ranges#82380
shankar-iyer merged 11 commits intoClickHouse:masterfrom
amosbird:better-query-condition-cache

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

@amosbird amosbird commented Jun 23, 2025

Ref: ClickHouse/ClickBench#412

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improved query performance by refactoring the order and integration of Query Condition Cache (QCC) with index analysis. QCC filtering is now applied before primary key and skip index analysis, reducing unnecessary index computation. Index analysis has been extended to support multiple range filters, and its filtering results are now stored back into the QCC. This significantly speeds up queries where index analysis dominates execution time—especially those relying on skip indexes (e.g. vector or inverted indexes)

Resolves #85779.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 23, 2025

Workflow [PR], commit [23c403f]

@clickhouse-gh clickhouse-gh bot added pr-performance Pull request with some performance improvements submodule changed At least one submodule changed in this PR. labels Jun 23, 2025
@amosbird amosbird force-pushed the better-query-condition-cache branch from 02118d5 to 58689af Compare June 23, 2025 03:50
@amosbird amosbird removed the submodule changed At least one submodule changed in this PR. label Jun 23, 2025
@rschu1ze rschu1ze self-assigned this Jun 23, 2025
}

/// Fill query condition cache with ranges excluded by index analysis.
if (reader_settings.use_query_condition_cache && query_info_.filter_actions_dag)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function filterPartsByQueryConditionCache also considers the PREWHERE info:

if (const auto & prewhere_info = select_query_info.prewhere_info)
{
    for (const auto * outputs : prewhere_info->prewhere_actions.getOutputs())
    {
        if (outputs->result_name == prewhere_info->prewhere_column_name)
        {
            auto stats = drop_mark_ranges(outputs);
            LOG_DEBUG(log,
                    "Query condition cache has dropped {}/{} granules for PREWHERE condition {}.",
                    stats.granules_dropped,
                    stats.total_granules,
                    prewhere_info->prewhere_column_name);
            break;
        }
    }
}

if (const auto & filter_actions_dag = select_query_info.filter_actions_dag)
{
    const auto * output = filter_actions_dag->getOutputs().front();
    auto stats = drop_mark_ranges(output);
    LOG_DEBUG(log,
            "Query condition cache has dropped {}/{} granules for WHERE condition {}.",
            stats.granules_dropped,
            stats.total_granules,
            filter_actions_dag->getOutputs().front()->result_name);
}

... whereas here, we don't. Should we?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_actions_dag is a superset of the prewhere information, encompassing the full filtering context—including WHERE clauses, row-level security policies, and additional filters. It is the only source used for index analysis, as prewhere alone is incomplete and insufficient for that purpose.

Note: To support skip indexes that rely on ORDER BY ... LIMIT semantics (e.g., vector search TopN), the filter will need to be extended to capture ordering and limit information as well. This enhancement is currently a TODO.

auto data_part = remaining_ranges.data_part;
String part_name = data_part->isProjectionPart() ? fmt::format("{}:{}", data_part->getParentPartName(), data_part->name)
: data_part->name;
query_condition_cache->write(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this PR, the same query now writes potentially more than once in the cache (during index analysis and at the end of scan).

It is not clear to me what entry will prevail in the cache?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no actual duplication in cache writes. Let's assume that the index analysis didn't happen at all, prewhere or where would still evaluate the predicate, potentially drop the entire granule, and update the cache accordingly. So the write during index analysis isn't extra, it just happens earlier in the pipeline.

As a side note, the current QCC implementation tracks granule states at the read task level rather than per granule. While this could be refined, I didn't observe any measurable performance gain on ClickBench, so it's left as a TODO for now.

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Jul 1, 2025

As far as I can tell, the current QCC implementation does not properly recognize function determinism. For example, it caches predicates involving IN (subquery), which is semantically incorrect and should be avoided. See https://fiddle.clickhouse.com/6a6c6d95-2e6c-4641-862b-fe401b0c30a3 . It can be addressed by extending ActionsDAG a little bit.

Additionally, it's unclear how the system behaves in the event of a hash collision. While such collisions are unlikely in practice, they are theoretically possible and should be accounted for.

@amosbird amosbird force-pushed the better-query-condition-cache branch from 58689af to 9a40aa8 Compare July 1, 2025 05:08
/// Fill query condition cache with ranges excluded by index analysis.
if (condition_hash)
{
RangesInDataParts remaining;
Copy link
Copy Markdown
Member

@shankar-iyer shankar-iyer Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same question as Robert about this code block. But I got it now - we don't want to repeat PK analysis on the ranges that were already rejected by the first PK analysis for a predicate. Hence we update the query condition cache with the excluded ranges.

If a part is fully skipped, then CPU use will be ignorable But if a part's partial ranges are selected, does the computation of remaining ranges take more than a bit of CPU? And this code block will be executed on every query execution with the same predicate?

Copy link
Copy Markdown
Collaborator Author

@amosbird amosbird Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the impact should be negligible. While it does a small diff calculation on the part ranges every time, this calculation has linear complexity and is insignificant compared to the I/O cost of each part range may occur later.

@shankar-iyer
Copy link
Copy Markdown
Member

This significantly speeds up queries where index analysis dominates execution time—especially those relying on skip indexes (e.g. vector or inverted indexes)

Vector search queries don't use query condition cache since the condition matching logic does not analyze ORDER BY ... LIMIT (I think noted in the PR somewhere).

Thanks for the PR! Please refresh the PR and I will complete the review.

@amosbird
Copy link
Copy Markdown
Collaborator Author

Vector search queries don't use query condition cache since the condition matching logic does not analyze ORDER BY ... LIMIT (I think noted in the PR somewhere).

Yes. It's #82380 (comment)

Please refresh the PR and I will complete the review.

Sure. Will do it today.

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Oct 6, 2025

test_jbod_balancer/test.py::test_jbod_balanced_merge

#88104

@shankar-iyer Could you help take another look? It seems CH Inc sync has some test failures and it's likely need SET use_query_condition_cache = 0;

@shankar-iyer
Copy link
Copy Markdown
Member

test_jbod_balancer/test.py::test_jbod_balanced_merge

#88104

@shankar-iyer Could you help take another look? It seems CH Inc sync has some test failures and it's likely need SET use_query_condition_cache = 0;

Failure is in ASAN tests, looks like runtime limit exceed. I will get back after confirming.

@shankar-iyer
Copy link
Copy Markdown
Member

@amosbird The CH inc sync failures have been resolved internally. Can you please refresh (the test file name has changed in PR #88152) ?

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Oct 8, 2025

Sure, refreshed.

@shankar-iyer
Copy link
Copy Markdown
Member

Fast test failed.

@amosbird Please check the latest merge, specially for tests prefixed with 02346_text_index.

@amosbird
Copy link
Copy Markdown
Collaborator Author

amosbird commented Oct 9, 2025

PR / AST fuzzer (amd_ubsan) (pull_request)

Something went wrong in 03596_parquet_prewhere_page_skip_bug which should be unrelated to this PR.

@shankar-iyer shankar-iyer added this pull request to the merge queue Oct 9, 2025
Merged via the queue into ClickHouse:master with commit a80044b Oct 9, 2025
121 of 123 checks passed
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 9, 2025
@zlareb1
Copy link
Copy Markdown
Member

zlareb1 commented Oct 13, 2025

@amosbird Could you clarify why use_query_condition_cache = 0 is explicitly set in some of the tests?

@amosbird
Copy link
Copy Markdown
Collaborator Author

Because these tests check the EXPLAIN output for index pruning. This PR moved the query condition cache before the pruning step, which changes the results, so we explicitly disable it in those tests.

@rschu1ze
Copy link
Copy Markdown
Member

@amosbird Could you clarify why use_query_condition_cache = 0 is explicitly set in some of the tests?

Added a note to the docs: #88462

@rschu1ze
Copy link
Copy Markdown
Member

Issue for broken EXPLAIN indexes = 1: #88467

@azat
Copy link
Copy Markdown
Member

azat commented Oct 14, 2025

@zlareb1
Copy link
Copy Markdown
Member

zlareb1 commented Oct 14, 2025

Doesn't look related to this change but I see multiple integration tests failures in #88477 where I explicitly disabled use_query_condition_cache and ran all the tests.
Most of the tests are usually stable but consistently failing in #88477 at cluster start Timed out while waiting for instance 'node4' with ip address 172.16.4.11 to start.

@azat
Copy link
Copy Markdown
Member

azat commented Oct 14, 2025

Most of the tests are usually stable but consistently failing in #88477 at cluster start Timed out while waiting for instance 'node4' with ip address 172.16.4.11 to start.

This could mean lots of things, but in your case it is due to you enabled this setting for old server as well (see tests with non standard tag, i.e. add_instance(tag=)), which does not have it

2025.10.14 08:06:30.634010 [ 10 ] {} <Error> Application: Caught exception while setting up access control.: Code: 115. DB::Exception: Setting use_query_condition_cache is neither a builtin setting nor started with the prefix 'custom_' registered for user-defined settings: while parsing profil>

P.S. You can find all the details in artifacts

@shankar-iyer
Copy link
Copy Markdown
Member

This may work worse for simple queries - https://pastila.nl/?043a6f47/7cfc432f86e6fd44263b0b6a3960eb71#k/dvV6ZpLnWx3No9s8pmvA==

Yes, too many parts to do index analysis + only 1 range / part matches the predicate + not much data to scan. There is some contention and CPU in looking up and updating the cache.

Query condition cache is best when - Mix of PK & non-PK predicates + lots of ranges to scan (with or without skip indexes) + low selectivity of the predicates.

I think we need to look at #87498 and consider this - would a PK only predicate be always better evaluated by binary search / exclusion scan + cost of reading the primary key v/s looking up the query condition cache for the PK only predicate

@rschu1ze @azat

alexey-milovidov added a commit that referenced this pull request Mar 5, 2026
…ry condition cache

PR #82380 moved Query Condition Cache filtering before the distributed
index analysis code path. QCC can split a single mark range into multiple
ranges when it has cached data about which marks don't match. PR #98269
removed the same assertion from `distributedIndexAnalysis.cpp` but missed
this one in `ReadFromMergeTree.cpp`. The assertion is unnecessary since
the ranges are immediately replaced on the next line.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=98770&sha=ad7e094fdd4b057b0bac0acb5b505270f79f3b3d&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Put the result of skip index analysis into query condition cache

7 participants