Allow to serialize/deserialize JSON column as single String column in Native format#70312
Allow to serialize/deserialize JSON column as single String column in Native format#70312vdimir merged 11 commits intoClickHouse:masterfrom
Conversation
|
This is an automated comment for commit 82aeaac with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
| M(Bool, input_format_native_allow_types_conversion, true, "Allow data types conversion in Native input format", 0) \ | ||
| M(Bool, input_format_native_decode_types_in_binary_format, false, "Read data types in binary format instead of type names in Native input format", 0) \ | ||
| M(Bool, output_format_native_encode_types_in_binary_format, false, "Write data types in binary format instead of type names in Native output format", 0) \ | ||
| M(Bool, output_format_native_write_json_as_string, false, "Write data of JSON column as String column containing JSON strings", 0) \ |
There was a problem hiding this comment.
It may be worth mentioning what we will have by default when setting is disabled. And what is possible drawbacks of using this mode (or why we are keeping two modes)
There was a problem hiding this comment.
We also may add new setting to randomization
There was a problem hiding this comment.
It may be worth mentioning what we will have by default when setting is disabled
By default we use some complex serialization that is used only internally for inter-server communications via Native protocol, and I believe no-one will support this default serialization. We decided to add the possibility to serialize/deserialize JSON column as String to simplify the integration with clients in other languages so they can support new JSON type in Native format.
So I am not sure what to write here, smth like Write data of JSON column as String column containing JSON strings instead of default native JSON serialization?
We also may add new setting to randomization
Yes, let's do it
There was a problem hiding this comment.
And what is the advantages of using that complex serialization for inter-server? Can't we switch to simple via string there as well?
There was a problem hiding this comment.
And what is the advantages of using that complex serialization for inter-server?
It serializes/deserializes the internal representation of JSON column - all typed and dynamic paths are serialized as separate subcolumns in separate streams (which means better compression and fast columnar serialization/deserialization) + has logic of serialization the special shared data column with type Array(Tuple(paths String, values String)) that contains paths and values when the limit on dynamic paths is reached. Basically, almost the same as we serialize JSON column in the MergeTree data parts because it's the same code. So, the serialization and deserialization is much faster then serializing/deserializing the JSON column into/from String representation
|
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Allow to serialize/deserialize JSON column as single String column in Native format. For output use setting
output_format_native_write_json_as_string. For input, use serialization version1before the column data.Closes #70281
Documentation entry for user-facing changes
CI Settings (Only check the boxes if you know what you are doing):