Skip to content

ColumnLazy has performance issues during the reading step #79645

@acking-you

Description

@acking-you

Company or project name

No response

Describe the situation

The optimization of ColumnLazy performs much worse than directly rewriting the AST, and the larger the value of limit, the greater the gap.

My tests show that for hot queries, rewriting the AST with ORDER BY LIMIT 10 takes only 0.12 seconds, while ColumnLazy takes 1 second. For LIMIT 100, the difference is even greater: rewriting the AST takes only 0.18 seconds, while ColumnLazy takes 4.4 seconds.

How to reproduce

Execute SQL using the hits dataset, and the results are as follows:

set query_plan_max_limit_for_lazy_materialization=0;
-- ColumnLazy limit 10 (hot queries)
SELECT * from hits ORDER BY "EventTime" LIMIT 10;
10 rows in set. Elapsed: 1.018 sec. Processed 78.00 million rows, 935.97 MB (76.64 million rows/s., 919.71 MB/s.)
Peak memory usage: 20.39 MiB.
-- ColumnLazy limit 100 (hot queries)
SELECT * from hits ORDER BY "EventTime" LIMIT 100;
100 rows in set. Elapsed: 4.516 sec. Processed 78.00 million rows, 935.97 MB (17.27 million rows/s., 207.25 MB/s.)
Peak memory usage: 28.26 MiB.

-- rewrite sql limit 10 (hot queries)
select * from hits where (_part,_part_offset) in  (SELECT _part,_part_offset from hits ORDER BY "EventTime" LIMIT 10);
10 rows in set. Elapsed: 0.125 sec. Processed 78.07 million rows, 980.26 MB (626.38 million rows/s., 7.86 GB/s.)
Peak memory usage: 47.20 MiB.
-- rewrite sql limit 100 (hot queries)
select * from hits where (_part,_part_offset) in  (SELECT _part,_part_offset from hits ORDER BY "EventTime" LIMIT 100);
100 rows in set. Elapsed: 0.181 sec. Processed 78.71 million rows, 1.37 GB (434.09 million rows/s., 7.53 GB/s.)
Peak memory usage: 298.96 MiB.

Expected performance

At least comparable to the performance of directly rewriting the AST.

Additional context

I reviewed the source code related to MergeTreeLazilyReader and found that, for sequential reading, it constructs the following structure based on _part_index/_part_offset, sorts all the values, and finally restores the order according to the permutation.

struct RowOffsetWithIdx
{
size_t row_offset;
size_t row_idx;
RowOffsetWithIdx(const size_t row_offset_, const size_t row_idx_)
: row_offset(row_offset_), row_idx(row_idx_) {}
};
using RowOffsetsWithIdx = std::vector<RowOffsetWithIdx>;
using PartIndexToRowOffsets = std::map<size_t, RowOffsetsWithIdx>;

However, in the following code, the value passed to max_rows_to_read is always 1, and the number of rows returned by read_rows is also always 1. To verify this hypothesis, I added a counter and found that whether using limit 10 or limit 100, the method reader->readRows was called 10 or 100 times respectively.

auto read_rows = reader->readRows(mark_range_iter->begin, mark_range_iter->end, continue_reading,
1, skipped_rows, columns_to_read);

Conclusion

The current implementation of ColumnLazy reading has the following two defects:

  1. The advantage of sequential reading is not utilized.
  2. Concurrent reading is not supported.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions