Skip to content

Efficient read of subcolumns from tables. #14196

@alexey-milovidov

Description

@alexey-milovidov

Use case

Scenario 1:
User has column of type Tuple in table, example:

CREATE TABLE t (x Tuple(Int8, String)) ENGINE = MergeTree ...

If they write

SELECT x.2 FROM t

now both subcolumns of tuple will be read (we will read whole x column).
We want to read only required subcolumn.

Scenario 2 (done, #17310):
Users can create a table with Nested fields.
They already can query subcolumns:

SELECT nested.x, nested.y FROM table

But it is implemented in very special way.
And users cannot query all nested column as an array of tuples:

SELECT nested FROM table

Neither they can do arrayJoin on nested as a whole:

SELECT arrayJoin(nested) FROM table

We want to rectify the code around support for Nested fields.

Scenario 3 (done, #17310):
Allow to have Nested fields with arbitrary nesting:

CREATE TABLE t (x Nested(a Int8, b Nested(u String, v Float32))) ENGINE = MergeTree ...
SELECT x.a, x.b.u, x.b.v FROM t;

Scenario 4:
Allow to write nested.* expressions.

Scenario 5 (done, #24406):
When user is only using array length in query, we don't have to read array contents - we can only read subcolumn with length:

SELECT length(arr) FROM table

Can be rewritten to

SELECT arr.size0 FROM table

And size0 is the name of subcolumn that we already have on disk.

Scenario 6 (done, #17310):
We already support named tuples.
Allow to write x.name for named tuples. Now only the syntax tupleElement(x, 'name') is possible.

Scenario 7:
Support the case when the existence and the types of subcolumns are determined at query time, not at table creation time.
It can be represented by special data type for semistructured data.

Metadata

Metadata

Assignees

Labels

featurest-discussionWhen the implementation aspects are not clear or when the PR is on hold due to questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions