Significantly improve performance of JSON subcolumns reading from shared data in MergeTree by Avogar · Pull Request #83777 · ClickHouse/ClickHouse

Avogar · 2025-07-15T11:32:53Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Significantly improve performance of JSON subcolumns reading from shared data in MergeTree by implementing new serializations for JSON shared data in MergeTree.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Now there are 3 possible JSON shared data serializations in MergeTree:

map - serializes shared data as a single column with type Map(String, String) (old serialization). This serialization is efficient for writing data and reading the whole JSON column, but it's not efficient for reading paths sub-columns.
map_with_buckets - serializes shared data as N columns ("buckets") with type Map(String, String) where each bucket contains only subset of paths. This serialization is less efficient for writing data and reading the whole JSON column, but it's more efficient for reading paths sub-columns because it reads data only from required buckets.
advanced - serializes shared data in a special data structure that maximizes the performance
of paths sub-columns reading by storing some additional information that allows to read only the data of requested paths. This serialization also supports buckets, so each bucket contains only subset of paths. This serialization is quite inefficient for writing data (so it's not recommended to use this serialization for zero-level parts), reading the whole JSON column is slightly less efficient compared to map serialization, but it's very efficient for reading paths sub-columns.

Type of serialization is controlled by MergeTree
settings object_shared_data_serialization_version
and object_shared_data_serialization_version_for_zero_level_parts (zero level part is the part created during inserting data into the table, during merges parts have higher level). Default serialization will be map for next few releases, but after that it will be changed to advanced (for zero-level parts it will be map or map_with_buckets). Note: changing shared data serialization version is supported only for object serialization version v3 (setting object_serialization_version) which is currently v2 by default (but will be changed after a few releases to v3).

Number of buckets is controlled by MergeTree settings object_shared_data_buckets_for_compact_part (8 by default)
and object_shared_data_buckets_for_wide_part (32 by default).

See the performance comparison of JSON paths subcolumns below in the comments

…eading

clickhouse-gh · 2025-07-15T11:33:20Z

Workflow [PR], commit [e36c1ea]

Summary: ❌

job_name	test_name	status
Stateless tests (amd_asan, distributed plan, sequential)		failure
	01150_ddl_guard_rwr	FAIL
Install packages (amd_debug)		failure
	Install clickhouse binary in deb	failure
Stress test (arm_asan)		failure
	Server died	FAIL
	Hung check failed, possible deadlock found (see hung_check.log)	FAIL
	Killed by signal (in clickhouse-server.log)	FAIL
	Fatal message in clickhouse-server.log (see fatal_messages.txt)	FAIL
	Killed by signal (output files)	FAIL
Stress test (amd_msan)		failure
	Server died	FAIL
	Hung check failed, possible deadlock found (see hung_check.log)	FAIL
	Killed by signal (in clickhouse-server.log)	FAIL
	Fatal message in clickhouse-server.log (see fatal_messages.txt)	FAIL
	Killed by signal (output files)	FAIL
	Found signal in gdb.log	FAIL

Avogar · 2025-07-15T11:39:34Z

Performance comparison of reading JSON paths sub-columns on local filesystem

In this comparison we have simple JSONs {"key0" : 0, "key1" : 1, "key2" : 2, ..., "keyN" : N} stored using:

Map
String
JSON with all paths separated into sub-columns
JSON with all paths stored in shared data with map serialization
JSON with all paths stored in shared data with map_with_buckets serialization
JSON with all paths stored in shared data with advanced serialization

For map_with_buckets and advanced shared data serializations 8 buckets was used for Compact parts and 32 for Wide.

JSON with 10 paths, 5,000,000 rows

Results

Compact part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │     0.04 │ 73.41 MiB    │ select json.key0                    │
│ JSON shared data "advanced"         │     0.05 │ 78.84 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.084 │ 77.76 MiB    │ select json.key0                    │
│ JSON shared data "map"              │    0.095 │ 128.95 MiB   │ select json.key0                    │
│ Map                                 │    0.097 │ 637.94 MiB   │ select json['key0']                 │
│ String                              │    0.138 │ 463.17 MiB   │ select JSONExtractInt(json, 'key0') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON no shared data                 │    0.043 │ 74.18 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.043 │ 82.75 MiB    │ select json.non_existing_key                    │
│ JSON shared data "advanced"         │    0.049 │ 78.82 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map"              │    0.065 │ 101.99 MiB   │ select json.non_existing_key                    │
│ Map                                 │    0.087 │ 599.00 MiB   │ select json['non_existing_key']                 │
│ String                              │     0.13 │ 428.92 MiB   │ select JSONExtractInt(json, 'non_existing_key') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │     0.05 │ 154.72 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │    0.069 │ 222.61 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    0.164 │ 698.95 MiB   │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON shared data "map_with_buckets" │    0.247 │ 254.73 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map"              │    0.253 │ 237.10 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │    0.472 │ 463.18 MiB   │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    0.059 │ 384.38 MiB   │ select json │
│ JSON no shared data                 │    0.067 │ 363.50 MiB   │ select json │
│ JSON shared data "map"              │    0.085 │ 583.25 MiB   │ select json │
│ Map                                 │    0.086 │ 570.58 MiB   │ select json │
│ JSON shared data "advanced"         │    0.127 │ 632.79 MiB   │ select json │
│ JSON shared data "map_with_buckets" │    0.314 │ 652.40 MiB   │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

Wide part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │    0.007 │ 61.07 MiB    │ select json.key0                    │
│ JSON shared data "advanced"         │     0.01 │ 67.58 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.048 │ 102.98 MiB   │ select json.key0                    │
│ Map                                 │    0.075 │ 100.66 MiB   │ select json['key0']                 │
│ String                              │    0.121 │ 119.15 MiB   │ select JSONExtractInt(json, 'key0') │
│ JSON shared data "map"              │    0.124 │ 775.28 MiB   │ select json.key0                    │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON shared data "map_with_buckets" │    0.004 │ 31.95 MiB    │ select json.non_existing_key                    │
│ JSON shared data "advanced"         │    0.004 │ 29.49 MiB    │ select json.non_existing_key                    │
│ JSON no shared data                 │    0.005 │ 32.51 MiB    │ select json.non_existing_key                    │
│ Map                                 │    0.062 │ 97.11 MiB    │ select json['non_existing_key']                 │
│ JSON shared data "map"              │    0.091 │ 762.46 MiB   │ select json.non_existing_key                    │
│ String                              │    0.116 │ 119.15 MiB   │ select JSONExtractInt(json, 'non_existing_key') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │    0.034 │ 119.84 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │    0.052 │ 289.89 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    0.116 │ 109.39 MiB   │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON shared data "map_with_buckets" │    0.236 │ 536.31 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map"              │    0.257 │ 794.16 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │    0.503 │ 118.41 MiB   │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │     0.04 │ 92.00 MiB    │ select json │
│ Map                                 │    0.061 │ 95.35 MiB    │ select json │
│ JSON shared data "map"              │    0.067 │ 99.96 MiB    │ select json │
│ JSON no shared data                 │    0.069 │ 100.20 MiB   │ select json │
│ JSON shared data "advanced"         │    0.083 │ 108.27 MiB   │ select json │
│ JSON shared data "map_with_buckets" │    0.441 │ 480.54 MiB   │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

JSON with 100 paths, 2,000,000 rows

Results

Compact part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │     0.02 │ 65.28 MiB    │ select json.key0                    │
│ JSON shared data "advanced"         │    0.028 │ 69.52 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.066 │ 102.94 MiB   │ select json.key0                    │
│ JSON shared data "map"              │    0.304 │ 417.05 MiB   │ select json.key0                    │
│ String                              │    0.488 │ 3.46 GiB     │ select JSONExtractInt(json, 'key0') │
│ Map                                 │    0.509 │ 4.89 GiB     │ select json['key0']                 │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON shared data "advanced"         │    0.027 │ 31.72 MiB    │ select json.non_existing_key                    │
│ JSON no shared data                 │    0.032 │ 31.81 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.033 │ 31.49 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map"              │    0.252 │ 413.62 MiB   │ select json.non_existing_key                    │
│ Map                                 │    0.488 │ 4.60 GiB     │ select json['non_existing_key']                 │
│ String                              │      0.5 │ 3.61 GiB     │ select JSONExtractInt(json, 'non_existing_key') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │    0.039 │ 185.00 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │    0.068 │ 195.54 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │     0.25 │ 308.87 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map"              │     0.44 │ 548.58 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    0.488 │ 5.31 GiB     │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ String                              │    1.337 │ 3.62 GiB     │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    0.276 │ 2.37 GiB     │ select json │
│ JSON no shared data                 │    0.365 │ 3.48 GiB     │ select json │
│ Map                                 │     0.48 │ 4.75 GiB     │ select json │
│ JSON shared data "advanced"         │     0.55 │ 5.51 GiB     │ select json │
│ JSON shared data "map"              │    0.552 │ 5.21 GiB     │ select json │
│ JSON shared data "map_with_buckets" │     1.25 │ 6.19 GiB     │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

Wide part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │    0.004 │ 25.55 MiB    │ select json.key0                    │
│ JSON shared data "advanced"         │    0.007 │ 31.82 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.029 │ 434.77 MiB   │ select json.key0                    │
│ Map                                 │     0.29 │ 355.89 MiB   │ select json['key0']                 │
│ JSON shared data "map"              │    0.357 │ 6.68 GiB     │ select json.key0                    │
│ String                              │     0.38 │ 437.79 MiB   │ select JSONExtractInt(json, 'key0') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON no shared data                 │    0.003 │ 10.12 MiB    │ select json.non_existing_key                    │
│ JSON shared data "advanced"         │    0.003 │ 17.07 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │     0.01 │ 122.53 MiB   │ select json.non_existing_key                    │
│ Map                                 │    0.298 │ 364.51 MiB   │ select json['non_existing_key']                 │
│ JSON shared data "map"              │    0.345 │ 6.65 GiB     │ select json.non_existing_key                    │
│ String                              │    0.388 │ 428.46 MiB   │ select JSONExtractInt(json, 'non_existing_key') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │    0.016 │ 127.10 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │    0.029 │ 295.10 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │    0.156 │ 1.60 GiB     │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    0.301 │ 361.95 MiB   │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON shared data "map"              │    0.549 │ 6.66 GiB     │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │    1.225 │ 424.91 MiB   │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    0.192 │ 508.18 MiB   │ select json │
│ Map                                 │    0.286 │ 370.25 MiB   │ select json │
│ JSON shared data "map"              │    0.314 │ 432.68 MiB   │ select json │
│ JSON no shared data                 │     0.32 │ 1.02 GiB     │ select json │
│ JSON shared data "advanced"         │    0.384 │ 397.92 MiB   │ select json │
│ JSON shared data "map_with_buckets" │    1.323 │ 1.07 GiB     │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

JSON with 1000 paths, 1,000,000 rows

Results

Compact part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON shared data "advanced"         │    0.056 │ 16.74 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.226 │ 126.73 MiB   │ select json.key0                    │
│ JSON no shared data                 │    0.259 │ 16.13 MiB    │ select json.key0                    │
│ JSON shared data "map"              │    1.366 │ 483.22 MiB   │ select json.key0                    │
│ String                              │    2.014 │ 9.70 GiB     │ select JSONExtractInt(json, 'key0') │
│ Map                                 │    2.236 │ 9.47 GiB     │ select json['key0']                 │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON shared data "advanced"         │    0.074 │ 9.55 MiB     │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.233 │ 8.61 MiB     │ select json.non_existing_key                    │
│ JSON no shared data                 │    0.259 │ 8.74 MiB     │ select json.non_existing_key                    │
│ JSON shared data "map"              │    1.364 │ 491.05 MiB   │ select json.non_existing_key                    │
│ String                              │     2.11 │ 9.70 GiB     │ select JSONExtractInt(json, 'non_existing_key') │
│ Map                                 │    2.371 │ 9.89 GiB     │ select json['non_existing_key']                 │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON shared data "advanced"         │    0.092 │ 122.48 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON no shared data                 │    0.273 │ 115.47 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │    0.727 │ 242.62 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map"              │    1.548 │ 509.09 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    2.255 │ 9.76 GiB     │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ String                              │    5.755 │ 11.28 GiB    │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    1.263 │ 8.90 GiB     │ select json │
│ JSON no shared data                 │    1.974 │ 8.46 GiB     │ select json │
│ Map                                 │    2.328 │ 10.20 GiB    │ select json │
│ JSON shared data "map"              │     2.45 │ 9.84 GiB     │ select json │
│ JSON shared data "advanced"         │    2.497 │ 9.74 GiB     │ select json │
│ JSON shared data "map_with_buckets" │    6.063 │ 10.77 GiB    │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

Wide part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │    0.006 │ 9.05 MiB     │ select json.key0                    │
│ JSON shared data "advanced"         │    0.013 │ 18.20 MiB    │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.054 │ 450.86 MiB   │ select json.key0                    │
│ Map                                 │    1.363 │ 507.60 MiB   │ select json['key0']                 │
│ JSON shared data "map"              │    1.508 │ 8.98 GiB     │ select json.key0                    │
│ String                              │    1.588 │ 630.03 MiB   │ select JSONExtractInt(json, 'key0') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON no shared data                 │    0.005 │ 4.65 MiB     │ select json.non_existing_key                    │
│ JSON shared data "advanced"         │    0.005 │ 11.13 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.038 │ 273.85 MiB   │ select json.non_existing_key                    │
│ Map                                 │     1.43 │ 530.90 MiB   │ select json['non_existing_key']                 │
│ JSON shared data "map"              │    1.489 │ 9.19 GiB     │ select json.non_existing_key                    │
│ String                              │    1.617 │ 615.93 MiB   │ select JSONExtractInt(json, 'non_existing_key') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │    0.018 │ 48.25 MiB    │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │    0.056 │ 139.46 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │    0.271 │ 1.68 GiB     │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    1.452 │ 514.23 MiB   │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON shared data "map"              │    1.722 │ 18.53 GiB    │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │    5.044 │ 567.16 MiB   │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    1.206 │ 629.30 MiB   │ select json │
│ Map                                 │    1.466 │ 559.75 MiB   │ select json │
│ JSON shared data "map"              │    1.751 │ 645.92 MiB   │ select json │
│ JSON shared data "advanced"         │    2.096 │ 668.15 MiB   │ select json │
│ JSON no shared data                 │    5.343 │ 8.52 GiB     │ select json │
│ JSON shared data "map_with_buckets" │    6.222 │ 1.32 GiB     │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

JSON with 10000 paths, 200,000 rows:

Results

Compact part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON shared data "advanced"         │    0.242 │ 4.63 MiB     │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.464 │ 4.92 MiB     │ select json.key0                    │
│ JSON shared data "map"              │    3.577 │ 483.57 MiB   │ select json.key0                    │
│ Map                                 │    4.391 │ 7.83 GiB     │ select json['key0']                 │
│ JSON no shared data                 │    4.404 │ 128.02 MiB   │ select json.key0                    │
│ String                              │    5.194 │ 9.77 GiB     │ select JSONExtractInt(json, 'key0') │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON shared data "advanced"         │    0.314 │ 3.68 MiB     │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.552 │ 121.90 MiB   │ select json.non_existing_key                    │
│ JSON shared data "map"              │    3.417 │ 516.85 MiB   │ select json.non_existing_key                    │
│ String                              │    3.778 │ 9.53 GiB     │ select JSONExtractInt(json, 'non_existing_key') │
│ JSON no shared data                 │    4.389 │ 128.02 MiB   │ select json.non_existing_key                    │
│ Map                                 │    4.511 │ 7.88 GiB     │ select json['non_existing_key']                 │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON shared data "advanced"         │    0.605 │ 21.11 MiB    │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │    1.325 │ 227.89 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map"              │    3.403 │ 502.56 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    4.344 │ 7.76 GiB     │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON no shared data                 │    4.437 │ 127.24 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │   11.144 │ 11.06 GiB    │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    2.015 │ 9.81 GiB     │ select json │
│ Map                                 │    4.279 │ 8.09 GiB     │ select json │
│ JSON shared data "advanced"         │    5.595 │ 10.52 GiB    │ select json │
│ JSON shared data "map"              │    5.643 │ 10.92 GiB    │ select json │
│ JSON shared data "map_with_buckets" │    15.55 │ 14.59 GiB    │ select json │
│ JSON no shared data                 │   23.845 │ 13.65 GiB    │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

Wide part:

Q1: Reading single key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────┐
│ JSON no shared data                 │    0.027 │ 18.64 MiB    │ select json.key0                    │
│ JSON shared data "advanced"         │    0.063 │ 3.89 MiB     │ select json.key0                    │
│ JSON shared data "map_with_buckets" │    0.087 │ 403.55 MiB   │ select json.key0                    │
│ String                              │    3.216 │ 582.70 MiB   │ select JSONExtractInt(json, 'key0') │
│ Map                                 │    3.594 │ 538.37 MiB   │ select json['key0']                 │
│ JSON shared data "map"              │     3.63 │ 12.53 GiB    │ select json.key0                    │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────┘

Q2: Reading non existing key

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────┐
│ JSON shared data "advanced"         │     0.01 │ 3.36 MiB     │ select json.non_existing_key                    │
│ JSON no shared data                 │    0.027 │ 19.38 MiB    │ select json.non_existing_key                    │
│ JSON shared data "map_with_buckets" │    0.081 │ 378.06 MiB   │ select json.non_existing_key                    │
│ String                              │    3.388 │ 578.70 MiB   │ select JSONExtractInt(json, 'non_existing_key') │
│ JSON shared data "map"              │    3.624 │ 12.00 GiB    │ select json.non_existing_key                    │
│ Map                                 │    3.821 │ 547.12 MiB   │ select json['non_existing_key']                 │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────┘

Q3: Reading 5 different keys

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────────────────────────────────────────────────────────────────────┐
│ JSON no shared data                 │      0.1 │ 92.35 MiB    │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "advanced"         │     0.27 │ 123.30 MiB   │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ JSON shared data "map_with_buckets" │    0.453 │ 1.89 GiB     │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ Map                                 │    3.665 │ 536.60 MiB   │ select json['key0'], json['key1'], json['key2'], json['key3'], json['key4'] │
│ JSON shared data "map"              │    3.828 │ 22.54 GiB    │ select json.key0, json.key1, json.key2, json.key3, json.key4                │
│ String                              │    9.911 │ 561.36 MiB   │ select JSONExtractInt(json, 'key0'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key1'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key2'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key3'),                                       ↴│
│                                     │          │              │↳       JSONExtractInt(json, 'key4')                                         │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────────────────────────────────────────────────────────────────────┘

Q4: Reading the whole JSON

┌─table_type──────────────────────────┬─time_sec─┬─memory_usage─┬─query───────┐
│ String                              │    2.734 │ 581.15 MiB   │ select json │
│ Map                                 │    3.869 │ 550.14 MiB   │ select json │
│ JSON shared data "map"              │    4.182 │ 618.78 MiB   │ select json │
│ JSON shared data "advanced"         │    4.774 │ 683.27 MiB   │ select json │
│ JSON shared data "map_with_buckets" │   15.271 │ 1.45 GiB     │ select json │
│ JSON no shared data                 │        - │ OOM          │ select json │
└─────────────────────────────────────┴──────────┴──────────────┴─────────────┘

UnamedRus · 2025-07-15T11:42:24Z

Use case, when all paths are read is also important. (And i assume, String still will be winner here)
May be, it make sense to benchmark it as well?

Avogar · 2025-07-15T13:15:16Z

May be, it make sense to benchmark it as well?

Done

Avogar · 2025-07-15T14:03:14Z

~ Half of the added lines is tests

…hared-data-v2

…to json-shared-data-v2

…hared-data-v2

Avogar · 2025-07-17T13:40:39Z

Avogar · 2025-08-21T11:11:29Z

Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) - #85861
Stateless tests (amd_tsan, parallel, 2/2) - #85939
Integration tests (arm_binary, distributed plan, 4/4) - #85972

…hared-data-v2

…s too long

Avogar · 2025-08-25T00:06:19Z

Stateless tests (amd_asan, distributed plan, sequential) - #85781
Stress test (arm_asan) - #84669
Stress test (amd_msan) - #81144
Install packages (amd_debug) - Error DB::Exception: Not enough space for clickhouse binary in /usr/bin, required 5.68 GiB, available 4.24 GiB. (NOT_ENOUGH_SPACE), I don't think it's caused by this PR

piyushsingariya · 2025-08-25T05:57:49Z

hi @Avogar would like to test this improvement, can you share a relevant docker tag for this? Thanks!

Avogar · 2025-08-25T11:25:53Z

hi @Avogar would like to test this improvement, can you share a relevant docker tag for this? Thanks!

It's not included in any release yet, so right now it's visible only in master. To test master I think you can use tag head. Or you can download the binary from master using curl https://clickhouse.com/ | sh.

The closest release is 25.8 and it will include this change

xzx0xzx · 2025-10-10T06:19:10Z

@Avogar Hi, When should we use map_with_buckets or advanced?
If we don’t take storage space into consideration, should we always use advanced?

Avogar · 2025-10-10T09:05:26Z

@Avogar Hi, When should we use map_with_buckets or advanced?
If we don’t take storage space into consideration, should we always use advanced?

For object_shared_data_serialization_version it's better to always use advanced, yes. For object_shared_data_serialization_version_for_zero_level_parts better to use map or map_with_buckets. Using advanced for zero level parts makes insertion very slow, using map_with_buckets makes insertion slower compared to map, but not as slow as advanced, so if you need good performance for zero level parts as well and ready to sacrifice some insertion performance, you can use map_with_buckets for them.

Also, as for all new features, make sure you test it out on your use case before using it on production data to identify any possible problems.

Avogar and others added 3 commits June 27, 2025 10:40

Inotoduce new shared data serialization in JSON for fast subcolumns r…

1626833

…eading

Merge branch 'master' into json-shared-data-v2

c6c9fe6

Make some improvements and adjustments, clean up the code

1e1b078

Avogar mentioned this pull request Jul 15, 2025

[Umbrella] JSON type improvements #68428

Open

56 tasks

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Jul 15, 2025

Fix style check and tidy build

a6a9896

Avogar and others added 6 commits July 15, 2025 14:51

Fix failing test

448b7a7

Fix tidy build

696e1e3

Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-s…

8426bdf

…hared-data-v2

Merge branch 'json-shared-data-v2' of github.com:Avogar/ClickHouse in…

7467e32

…to json-shared-data-v2

Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-s…

ab471fa

…hared-data-v2

Fix bugs found by CI

0dfc6b9

Avogar force-pushed the json-shared-data-v2 branch from 5f48060 to 0dfc6b9 Compare July 15, 2025 22:42

Avogar added 10 commits July 16, 2025 08:21

Fix failing tests

7b83264

Fix failing test

faca48d

Fix failing test

6ae7030

Fix double reading of shared data

432c09f

Fix performance test

520bc4f

Fix insertion performance

175236f

Fix performance tests

0469156

Fix tests

5d4534a

Update performance tests

6913ba8

Delete old test

74908ca

Remove bad files

ebd215a

Avogar and others added 7 commits August 15, 2025 20:02

Fix reading Array offsets as subcolumn

7742293

Fix usage of substreams cache

dfcfbb3

Better check for cached column in current range

db853f8

Use old settings by default, randomize new settings in tests

9ff9027

Mark some tests as long

e0afb4b

Merge branch 'master' into json-shared-data-v2

7c75156

Fix lazy materialization for Variant column

f9344e4

Avogar added this pull request to the merge queue Aug 21, 2025

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Aug 21, 2025

Avogar and others added 5 commits August 21, 2025 13:56

Merge branch 'master' into json-shared-data-v2

390d03a

Restart CI

4a40df0

Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-s…

8b9d852

…hared-data-v2

Remove adding zero byte to String column

b18c95d

Don't run 03555_json_shared_data_advanced_paths_indexes with S3 - it'…

e36c1ea

…s too long

Avogar enabled auto-merge August 25, 2025 00:07

Avogar added this pull request to the merge queue Aug 25, 2025

Merged via the queue into ClickHouse:master with commit e6e5d88 Aug 25, 2025
117 of 122 checks passed

Avogar deleted the json-shared-data-v2 branch August 25, 2025 00:37

robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 25, 2025

Avogar mentioned this pull request Sep 15, 2025

JSON: Ability to move skip data paths to shared object. #87037

Open

Avogar mentioned this pull request Sep 29, 2025

High memory usage in merge JSON type columns #87034

Closed

Steffen911 mentioned this pull request Dec 3, 2025

perf: update metadata JSON type and settings for events table langfuse/langfuse#10881

Merged

Algunenano mentioned this pull request Dec 12, 2025

Fix sparse deserialization errors for virtual subcolumns (e.g., string size) #92060

Closed

1 task

Conversation

Avogar commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Avogar commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance comparison of reading JSON paths sub-columns on local filesystem

JSON with 10 paths, 5,000,000 rows

Compact part:

Wide part:

JSON with 100 paths, 2,000,000 rows

Compact part:

Wide part:

JSON with 1000 paths, 1,000,000 rows

Compact part:

Wide part:

JSON with 10000 paths, 200,000 rows:

Compact part:

Wide part:

Uh oh!

UnamedRus commented Jul 15, 2025

Uh oh!

Avogar commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Avogar commented Jul 15, 2025

Uh oh!

Avogar commented Jul 17, 2025

Uh oh!

Avogar commented Aug 21, 2025

Uh oh!

Avogar commented Aug 25, 2025

Uh oh!

Uh oh!

piyushsingariya commented Aug 25, 2025

Uh oh!

Avogar commented Aug 25, 2025

Uh oh!

xzx0xzx commented Oct 10, 2025

Uh oh!

Avogar commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Avogar commented Jul 15, 2025 •

edited

Loading

clickhouse-gh bot commented Jul 15, 2025 •

edited

Loading

Avogar commented Jul 15, 2025 •

edited

Loading

Avogar commented Jul 15, 2025 •

edited

Loading