Implement new JSON data type.#66444
Conversation
…into json-data-type
…into json-data-type
|
I am working on the tests, will add them later |
There was a problem hiding this comment.
Old implementation was moved to ColumnObjectDeprecated.h/cpp files. Github doesn't show that old code was moved and thinks that the code was just changed. Ignore it and read code in these files as completely new. The same for similar files DataTypeObject/SerializationObject.
|
This is an automated comment for commit db691e4 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
|
Any specific reason to make distinction between SKIP and SKIP PREFIX? I assume, that if a.b.c is "object", that if we set SKIP a.b.c, user may want to skip it as whole, basically behavior of SKIP PREFIX. |
Yes, this is a good remark. It will be better to combine these 2 |
|
@Avogar congratulations. This is amazing. |
|
I think I speak for everybody when I say this is a GREAT news! |
Backport #66444 to 24.8: Implement new JSON data type.
|
I would be thankful for a migration guide from the old experimental JSON type. |
Right now the only way is to do |
|
Amazing PR! Is there benchmark to compare the two versions? |
Right now we don't have such benchmark. But we advise not to use old Object type because it's deprecated and won't be supported anymore (also it has known bugs that won't be fixed). |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implement new JSON data type.
Ref: #54864
Closes: #54864
Closes: #53908
Documentation entry for user-facing changes
JSON type
To declare a column of
JSONtype, use the following syntax:Where:
max_dynamic_pathsis an optional parameter indicating how many paths can be stored separately as subcolumns across single block of data that is stored separately (for example across single data part for MergeTree table). If this limit is exceeded, all other paths will be stored together in a single structure. Default value ofmax_dynamic_pathsis1000.max_dynamic_typesis an optional parameter between1and255indicating how many different data types can be stored inside a single path column with typeDynamicacross single block of data that is stored separately (for example across single data part for MergeTree table). If this limit is exceeded, all new types will be converted to typeString. Default value ofmax_dynamic_typesis32.some.path TypeNameis an optional type hint for particular path in the JSON. Such paths will be always stored as subcolumns with specified type.SKIP path.to.skipis an optional hint for particular path that should be skipped during JSON parsing. Such paths will never be stored in the JSON column. If specified path is a nested JSON object, the whole nested object will be skipped.SKIP REGEXP 'path_regexp'is an optional hint with a regular expression that is used to skip paths during JSON parsing. All paths that match this regular expression will never be stored in the JSON column.Creating JSON
Using
JSONtype in table column definition:Using CAST from 'String':
CAST from named
Tuple,MapandObject('json')toJSONtype will be supported later.Reading JSON paths as subcolumns
JSON type supports reading every path as a separate subcolumn. If type of the requested path was not specified in the JSON type declaration, the subcolumn of the path will always have type Dynamic.
For example:
If the requested path wasn't found in the data, it will be filled with
NULLvalues:Let's check the data types of returned subcolumns:
As we can see, for
a.bthe type isUInt32as we specified in the JSON type declaration, and for all other subcolumns the type isDynamic.It is also possible to read subcolumns of a
Dynamictype using special syntaxjson.some.path.:TypeName:Reading JSON sub-objects as subcolumns
JSON type supports reading nested objects as subcolumns with type
JSONusing special syntaxjson.^some.path:Handling arrays of JSON objects
JSON paths that contains an array of objects are parsed as type
Array(JSON)and inserted intoDynamiccolumn for this path. To read an array of objects you can extract it fromDynamiccolumn as a subcolumn:As you can notice, the
max_dynamic_types/max_dynamic_pathsparameters of the nestedJSONtype were reduced compared to the default values. It's needed to avoid number of subcolumns to grow uncontrolled on nested arrays of JSON objects.Let's try to read subcolumns from this nested
JSONcolumn:We can avoid writing
Array(JSON)subcolumn name using special syntax:The number of
[]after path indicates the array level.json.path[][]will be transformed tojson.path.:Array(Array(JSON))Let's check the paths and types inside our
Array(JSON):Let's read subcolumns from
Array(JSON)column:We can also read sub-object subcolumns from nested
JSONcolumn:CI Settings (Only check the boxes if you know what you are doing):