Skip to content

Filtering by secondary indices should be performed on data reading #75774

@alexey-milovidov

Description

@alexey-milovidov

Company or project name

ClickHouse

Describe the situation

Currently, index analysis is done before the query starts reading the data.
It determines which granules to read by filtering out unnecessary granules.

This has the following disadvantages:

  1. Even if a query will finish earlier due to LIMIT, a larger amount of the index has to be read.
  2. Analysing index produces a visible delay before query processing: the progress bar does not show instantly.
  3. Sometimes, reading indices can be heavier than reading the data, but we cannot make the decision of not reading an index dynamically.
  4. The order in which different indices are processed cannot be adapted dynamically.

It also has some advantages, but all of them do not really matter:

  1. It better estimates the number of records to read, and the pipeline (such as the number of processing lanes) can be adapted from this information.
  2. EXPLAIN ESTIMATE shows more precise results but takes more time.

Solution

When we are going to read a granule, check if there are index granules covering it. Read index granules first, and skip this granule if necessary.

We can collect statistics about how much time it takes to read index granules, and how large portions of the data every index filtered. These statistics can be used to avoid reading certain indices at all, and reading other indices in the optimal order.

These statistics can live only during the query run, there is no need to share them.

Additional context

Probably index analysis also affects the choice of projections, but we don't want it to be too heavy. If it affects join order - we should better estimate it from runtime statistics rather than requiring index analysis in advance.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions