-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Improve performance of subcolumns reading from compact parts #76141
Copy link
Copy link
Closed
Labels
Description
Right now when we request a subcolumn from a compact part we read the whole column and then extract the requested subcolumn in memory:
ClickHouse/src/Storages/MergeTree/MergeTreeReaderCompact.cpp
Lines 188 to 212 in 8ae0991
| if (name_and_type.isSubcolumn()) | |
| { | |
| const auto & type_in_storage = name_and_type.getTypeInStorage(); | |
| const auto & name_in_storage = name_and_type.getNameInStorage(); | |
| const auto & serialization = serializations_of_full_columns.at(name_in_storage); | |
| ColumnPtr temp_full_column = getFullColumnFromCache(columns_cache_for_subcolumns, name_in_storage); | |
| if (!temp_full_column) | |
| { | |
| temp_full_column = type_in_storage->createColumn(*serialization); | |
| serialization->deserializeBinaryBulkWithMultipleStreams(temp_full_column, rows_to_read, deserialize_settings, deserialize_binary_bulk_state_map_for_subcolumns[name_in_storage], nullptr); | |
| if (columns_cache_for_subcolumns) | |
| columns_cache_for_subcolumns->emplace(name_in_storage, temp_full_column); | |
| } | |
| auto subcolumn = type_in_storage->getSubcolumn(name_and_type.getSubcolumnName(), temp_full_column); | |
| /// TODO: Avoid extra copying. | |
| if (column->empty()) | |
| column = IColumn::mutate(subcolumn); | |
| else | |
| column->assumeMutable()->insertRangeFrom(*subcolumn, 0, subcolumn->size()); | |
| } |
When the column is large (for example when it's a large JSON column) it takes quite a lot of time.
To read separate subcolumns from compact parts we need to modify the format a bit and store information about each substream offset (right now we store offset of each column) to be able to read individual substreams separately.
@CurtizJ WDYT?
Reactions are currently unavailable