Support of dynamic subcolumns (JSON data type)#23932
Support of dynamic subcolumns (JSON data type)#23932alesapin merged 101 commits intoClickHouse:masterfrom
Conversation
|
This PR depends on #22535 and contains changes from it. If you want to see changes, that related only to implementation of dynamic subcolumns without |
| /// Virtual columns must be appended after ordinary, because user can | ||
| /// override them. |
There was a problem hiding this comment.
what do you mean by overriding here?
There was a problem hiding this comment.
User can define physical columns in table definition with the same names as virtual columns.
I've wanted to give the following example, but it doesn't work actually :)
create table kek (_part UInt32) ENGINE = MergeTree ORDER BY tuple();
insert into kek values (1);
select _part from kek;
Received exception from server (version 22.3.1):
Code: 352. DB::Exception: Received from localhost:9000. DB::Exception: Block structure mismatch in (columns with identical name must have identical structure) stream: different types:
_part UInt32 UInt32(size = 0)
_part String String(size = 0). (AMBIGUOUS_COLUMN_NAME)
| "Not enough name parts for path {}. Expected at least {}, got {}", | ||
| paths[i].getPath(), pos + 1, num_parts); | ||
|
|
||
| size_t array_dimensions = kind == Node::NESTED ? 1 : parts[pos].anonymous_array_level; |
There was a problem hiding this comment.
and this "anonymous_array_level" is also very very confusing :) why "anonymous"? I read the comment for it in PathInData, but still do no clue why anonymous
There was a problem hiding this comment.
"anonymous" means that it doesn't have key related to it. Maybe "unnamed" instead "anonymous" is better. Just "array_level" is not correct, because this field doesn't represent number of dimensions in array in a whole.
|
Going to fix tidy in master, timeout -- OK. |
Before this patch SELECT queries hold parts even if they were not required by select (had been eliminated by partition pruning). This defers removing parts if you have long running queries. This had been introduced in ClickHouse#23932, with introduction of StorageSnapshotPtr. Signed-off-by: Azat Khuzhin <[email protected]>
| /// Snapshot of storage that fixes set columns that can be read in query. | ||
| /// There are 3 sources of columns: regular columns from metadata, | ||
| /// dynamic columns from object Types, virtual columns. | ||
| struct StorageSnapshot |
There was a problem hiding this comment.
Hi @CurtizJ , could you kindly explain the difference between StorageInMemoryMetadata vs StorageSnapshot? e.g., when to use StorageInMemoryMetadata::getColumns and when to use StorageSnapshot::getColumns? Thank you!
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
New data type
Object(<schema_format>), which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.gdata.key1.key2or with cast operatordata.key1.key2::Int64.Detailed description / Documentation draft:
Resolves #23516.