Support of dynamic subcolumns (JSON data type) by CurtizJ · Pull Request #23932 · ClickHouse/ClickHouse

CurtizJ · 2021-05-07T02:44:01Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
New data type Object(<schema_format>), which supports storing of semi-structured data (for now JSON only). Data is written to such types as string. Then all paths are extracted according to format of semi-structured data and written as separate columns in most optimal types, that can store all their values. Those columns can be queried by names that match paths in source data. E.g data.key1.key2 or with cast operator data.key1.key2::Int64.

Detailed description / Documentation draft:
Resolves #23516.

CurtizJ · 2021-05-07T02:47:46Z

This PR depends on #22535 and contains changes from it. If you want to see changes, that related only to implementation of dynamic subcolumns without ColumnSparse, you can use link CurtizJ/ClickHouse@sparse-serialization...CurtizJ:dynamic-columns.

src/Columns/ColumnObject.cpp

src/Storages/StorageSnapshot.cpp

kssenii · 2022-03-07T17:26:56Z

src/Storages/StorageSnapshot.cpp

+            /// Virtual columns must be appended after ordinary, because user can
+            /// override them.


what do you mean by overriding here?

User can define physical columns in table definition with the same names as virtual columns.

I've wanted to give the following example, but it doesn't work actually :)

create table kek (_part UInt32) ENGINE = MergeTree ORDER BY tuple(); insert into kek values (1); select _part from kek; Received exception from server (version 22.3.1): Code: 352. DB::Exception: Received from localhost:9000. DB::Exception: Block structure mismatch in (columns with identical name must have identical structure) stream: different types: _part UInt32 UInt32(size = 0) _part String String(size = 0). (AMBIGUOUS_COLUMN_NAME)

src/Storages/StorageMemory.h

src/Storages/StorageDistributed.cpp

src/Interpreters/getColumnFromBlock.cpp

src/Columns/ColumnObject.cpp

src/DataTypes/Serializations/PathInData.cpp

src/DataTypes/ObjectUtils.cpp

kssenii · 2022-03-10T19:01:37Z

src/DataTypes/ObjectUtils.cpp

+                        "Not enough name parts for path {}. Expected at least {}, got {}",
+                            paths[i].getPath(), pos + 1, num_parts);
+
+                size_t array_dimensions = kind == Node::NESTED ? 1 : parts[pos].anonymous_array_level;


and this "anonymous_array_level" is also very very confusing :) why "anonymous"? I read the comment for it in PathInData, but still do no clue why anonymous

"anonymous" means that it doesn't have key related to it. Maybe "unnamed" instead "anonymous" is better. Just "array_level" is not correct, because this field doesn't represent number of dimensions in array in a whole.

Co-authored-by: Kseniia Sumarokova <[email protected]>

alesapin · 2022-03-17T09:45:39Z

Going to fix tidy in master, timeout -- OK.

Before this patch SELECT queries hold parts even if they were not required by select (had been eliminated by partition pruning). This defers removing parts if you have long running queries. This had been introduced in ClickHouse#23932, with introduction of StorageSnapshotPtr. Signed-off-by: Azat Khuzhin <[email protected]>

nudles · 2023-08-13T15:06:58Z

src/Storages/StorageSnapshot.h

+/// Snapshot of storage that fixes set columns that can be read in query.
+/// There are 3 sources of columns: regular columns from metadata,
+/// dynamic columns from object Types, virtual columns.
+struct StorageSnapshot


Hi @CurtizJ , could you kindly explain the difference between StorageInMemoryMetadata vs StorageSnapshot? e.g., when to use StorageInMemoryMetadata::getColumns and when to use StorageSnapshot::getColumns? Thank you!

CurtizJ added 11 commits April 23, 2021 15:53

dynamic subcolumns: wip

aa8d31d

dynamic subcolumns: wip

e237840

dynamic subcolumns: wip

6240169

dynamic subcolumns: wip

644df6b

dynamic subcolumns: support arrays

28a9a5c

dynamic subcolumns: support merges

3bc2a08

dynamic columns: better getting of sample block

e447069

dynamic columns: support of different types

f22f22c

dynamic columns: support of different types

0dea7d2

dynamic columns: better input formats of json type

5150c3b

disable sparse columns

1823ea6

robot-clickhouse added doc-alert pr-feature Pull request with new product feature labels May 7, 2021

CurtizJ changed the title ~~Dynamic columns~~ Support of dynamic subcolumns May 7, 2021

CurtizJ changed the title ~~Support of dynamic subcolumns~~ Support of dynamic subcolumns (JSON data type) May 7, 2021

fix build

012009c

alexey-milovidov mentioned this pull request May 7, 2021

Roadmap 2021 (discussion) #17623

Closed

CurtizJ added 11 commits May 11, 2021 15:01

fix reading of nested

37989cd

remove unused code

f7582bf

Merge remote-tracking branch 'origin/sparse-serialization' into HEAD

0bdf9d2

dynamic subcolumns: better handling of missed values

13ae569

dynamic subcolumns: add test

b0cc45c

fix build

d83819f

dynamic subcolumns: better handling of nulls and empty arrays

7e5f784

fix conversion of tuples

041e2de

dynamic subcolumns: better handling of incompatible types

24707e6

Merge remote-tracking branch 'origin/sparse-serialization' into HEAD

9f52362

Merge remote-tracking branch 'origin/sparse-serialization' into HEAD

205a232

CurtizJ added the force tests label Jun 8, 2021

hexiaoting mentioned this pull request Feb 23, 2022

Add new DataType Map(key,value) #15806

Merged

CurtizJ added 5 commits February 25, 2022 13:41

Merge remote-tracking branch 'upstream/master' into HEAD

fcdebea

fix deducing of nested types

7df8b38

add more comments

2758db5

minor fixes

04a3a10

Merge remote-tracking branch 'upstream/master' into HEAD

c1fdcf7

CurtizJ marked this pull request as ready for review March 1, 2022 17:22

CurtizJ added 4 commits March 2, 2022 03:31

fix reading of missed subcolumns

d7cd9aa

fix msan

76e40e4

Merge remote-tracking branch 'upstream/master' into HEAD

df3b07f

Merge remote-tracking branch 'upstream/master' into HEAD

0bc57da

kssenii reviewed Mar 10, 2022

View reviewed changes

CurtizJ and others added 4 commits March 10, 2022 22:24

Apply suggestions from code review

37efe2d

Co-authored-by: Kseniia Sumarokova <[email protected]>

Merge remote-tracking branch 'upstream/master' into HEAD

36ec379

minor fixes

0639177

Merge remote-tracking branch 'upstream/master' into HEAD

0ba78c3

kssenii approved these changes Mar 16, 2022

View reviewed changes

CurtizJ added 3 commits March 16, 2022 16:51

add experimental settings for Object type

2ced42e

fix race

de2cc23

fix clang-tidy

416c7f2

alesapin merged commit 457fa0d into ClickHouse:master Mar 17, 2022

Astlol mentioned this pull request Mar 28, 2022

New type Object('JSON') is not supported ClickHouse/clickhouse-go#531

Closed

fuziontech mentioned this pull request May 9, 2022

Support clickhouse 22.3 LTS (or later) PostHog/posthog#9685

Closed

cwurm mentioned this pull request May 11, 2022

Add documentation for JSON data type #37127

Merged

azat mentioned this pull request Jun 7, 2022

Fix refcnt for unused MergeTree parts in SELECT queries #37913

Merged

nudles reviewed Aug 13, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of dynamic subcolumns (JSON data type)#23932

Support of dynamic subcolumns (JSON data type)#23932
alesapin merged 101 commits intoClickHouse:masterfrom
CurtizJ:dynamic-columns

CurtizJ commented May 7, 2021 •

edited

Loading

Uh oh!

CurtizJ commented May 7, 2021

Uh oh!

Uh oh!

Uh oh!

kssenii Mar 7, 2022

Uh oh!

CurtizJ Mar 10, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kssenii Mar 10, 2022

Uh oh!

CurtizJ Mar 10, 2022

Uh oh!

alesapin commented Mar 17, 2022

Uh oh!

nudles Aug 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

		/// Virtual columns must be appended after ordinary, because user can
		/// override them.

Conversation

CurtizJ commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CurtizJ commented May 7, 2021

Uh oh!

Uh oh!

Uh oh!

kssenii Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

CurtizJ Mar 10, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kssenii Mar 10, 2022

Choose a reason for hiding this comment

Uh oh!

CurtizJ Mar 10, 2022

Choose a reason for hiding this comment

Uh oh!

alesapin commented Mar 17, 2022

Uh oh!

nudles Aug 13, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

CurtizJ commented May 7, 2021 •

edited

Loading