Skip to content

Index not used for IN operator with literals #10574

@akuzm

Description

@akuzm

Use hits table from Yandex.Metrica dataset.
The first query does IN over a subquery, and we can see that the index is used. On the other hand, when I write out the subquery output as literals, the index is not used. Note the reported number of processed rows, and merge tree optimizer output in server log.

/4/ :) select UserID from hits where (CounterID, EventTime) in (select CounterID, max(EventTime) from hits where CounterID = 25703952 and EventDate < '2014-03-20' group by CounterID)

...

1 rows in set. Elapsed: 0.008 sec. Processed 155.65 thousand rows, 1.26 MB (20.07 million rows/s., 162.94 MB/s.) 

2020.04.29 11:28:41.012947 [ 16461 ] {283cd210-3ad6-4bfd-b4fc-9c4d72ed9565} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(CounterID, EventTime) IN ((SELECT CounterID, max(EventTime) FROM hits WHERE (CounterID = 25703952) AND (EventDate < '2014-03-20') GROUP BY CounterID) AS _subquery10)" moved to PREWHERE
2020.04.29 11:28:41.013204 [ 16461 ] {283cd210-3ad6-4bfd-b4fc-9c4d72ed9565} <Debug> default.hits (SelectExecutor): Key condition: (column 0 in 1-element set)
2020.04.29 11:28:41.013214 [ 16461 ] {283cd210-3ad6-4bfd-b4fc-9c4d72ed9565} <Debug> default.hits (SelectExecutor): MinMax index condition: unknown
2020.04.29 11:28:41.013246 [ 16461 ] {283cd210-3ad6-4bfd-b4fc-9c4d72ed9565} <Debug> default.hits (SelectExecutor): Selected 1 parts by date, 1 parts by key, 19 marks to read from 1 ranges
2020.04.29 11:28:41.013367 [ 16461 ] {283cd210-3ad6-4bfd-b4fc-9c4d72ed9565} <Trace> default.hits (SelectExecutor): Reading approx. 155648 rows with 1 streams

/4/ :) select UserID from hits where (CounterID, EventTime) in (toUInt32(25703952), toDateTime('2014-03-19 23:59:58'))

...

1 rows in set. Elapsed: 0.030 sec. Processed 8.87 million rows, 71.01 MB (291.92 million rows/s., 2.34 GB/s.) 

2020.04.29 11:22:35.593363 [ 16461 ] {df28294d-5bc4-4738-a47f-27e2a6ed2c12} <Debug> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "(CounterID, EventTime) IN (toUInt32(25703952), toDateTime('2014-03-19 23:59:58'))" moved to PREWHERE
2020.04.29 11:22:35.593608 [ 16461 ] {df28294d-5bc4-4738-a47f-27e2a6ed2c12} <Debug> default.hits (SelectExecutor): Key condition: unknown
2020.04.29 11:22:35.593619 [ 16461 ] {df28294d-5bc4-4738-a47f-27e2a6ed2c12} <Debug> default.hits (SelectExecutor): MinMax index condition: unknown
2020.04.29 11:22:35.593684 [ 16461 ] {df28294d-5bc4-4738-a47f-27e2a6ed2c12} <Debug> default.hits (SelectExecutor): Selected 1 parts by date, 1 parts by key, 1094 marks to read from 1 ranges
2020.04.29 11:22:35.593862 [ 16461 ] {df28294d-5bc4-4738-a47f-27e2a6ed2c12} <Trace> default.hits (SelectExecutor): Reading approx. 8873898 rows with 6 streams

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed user-visible misbehaviour in official releaseperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions