Improve performance of subcolumns reading from compact parts

Right now when we request a subcolumn from a compact part we read the whole column and then extract the requested subcolumn in memory: https://github.com/ClickHouse/ClickHouse/blob/8ae0991e7a5a77bfd2ee383d5326189b36177e4e/src/Storages/MergeTree/MergeTreeReaderCompact.cpp#L188-L212.

When the column is large (for example when it's a large JSON column) it takes quite a lot of time.

To read separate subcolumns from compact parts we need to modify the format a bit and store information about each substream offset (right now we store offset of each column) to be able to read individual substreams separately.

@CurtizJ WDYT? 

	if (name_and_type.isSubcolumn())
	{
	const auto & type_in_storage = name_and_type.getTypeInStorage();
	const auto & name_in_storage = name_and_type.getNameInStorage();
	const auto & serialization = serializations_of_full_columns.at(name_in_storage);

	ColumnPtr temp_full_column = getFullColumnFromCache(columns_cache_for_subcolumns, name_in_storage);

	if (!temp_full_column)
	{
	temp_full_column = type_in_storage->createColumn(*serialization);
	serialization->deserializeBinaryBulkWithMultipleStreams(temp_full_column, rows_to_read, deserialize_settings, deserialize_binary_bulk_state_map_for_subcolumns[name_in_storage], nullptr);

	if (columns_cache_for_subcolumns)
	columns_cache_for_subcolumns->emplace(name_in_storage, temp_full_column);
	}

	auto subcolumn = type_in_storage->getSubcolumn(name_and_type.getSubcolumnName(), temp_full_column);

	/// TODO: Avoid extra copying.
	if (column->empty())
	column = IColumn::mutate(subcolumn);
	else
	column->assumeMutable()->insertRangeFrom(*subcolumn, 0, subcolumn->size());
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of subcolumns reading from compact parts #76141

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of subcolumns reading from compact parts #76141

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions