-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[Umbrella] JSON type improvements #68428
Copy link
Copy link
Open
Labels
Description
This issue contains tasks for further improvements of new JSON data type. Continuation of #54864.
Important tasks:
- Reduce memory usage during insertion into JSON column and during merges by better buffer size managing - Reduce memory usage of inserts to JSON by using adaptive write buffer size #69272
- Support ALTER to new
JSONtype fromString/Map/Tuple/ - Support alter from String to JSON #70442 - Support ALTER to new
JSONtype fromObject- Support ALTER from Object to JSON #71784 - Support CAST from
Map/Tuple/Objecttypes to newJSON- Implement simple CAST from Map/Tuple/Object to new JSON through serialization/deserialization from JSON string #71320 - Support CAST between
JSONtypes with different parameters (skip paths, limits, type hints) - Support CAST and ALTER between JSON types with different parameters #72303 - Try to implement default implementation for functions for
Dynamictype (so any function will be executed on underlying types dynamically and throw an exception on first incompatible type) - Support Dynamic type in most functions #69691 - Add aggregate functions
distinctDynamicTypes/distinctJSONPaths/distinctJSONPathsAndTypesfor better introspection of the content in the JSON column - Add aggregate functions distinctDynamicTypes/distinctJSONPaths/distinctJSONPathsAndTypes #68463 - Allow to insert/select JSON from binary strings in RowBinary format (Support JSON deserialization from binary string #69443) - Allow to read/write JSON type as binary string in RowBinary format #70288
- Allow to insert/select JSON from single String column in Native format (Support JSON serialization to a string in Native format #70281) - Allow to serialize/deserialize JSON column as single String column in Native format #70312
- Support JSON subcolumns in ORDER BY and data-skipping indices expression during table creation - Support subcolumns in MergeTree sorting key and skip indexes #72644
- Support
Dynamictype inifNull/coalescefunctions - Support Dynamic type in functions ifNull and coalesce #72772 - Support
Dynamictype in functionstoFloat64and similar - Support Dynamic in functions toFloat64/touInt32/etc #72989 - Support equal comparison between JSON values - Support equal comparison for JSON column #72991
- Try to improve subcolumns formatting - Improve formatting of identifiers with JSON subcolumns #73085
- Try to support
Nullable(JSON)- Support Nullable(JSON) #73556 - Support subcolumns in MaterializedView queries - Support subcolumns in materialized view select query #74030
- Support referring JSON subcolumns in default and materialized exressions - Support subcolumns in default and materialized expressions #74403.
- Improve performance of reading the whole JSON column from wide parts - Improve performance of the whole JSON column reading in Wide parts from S3 #74827
- Improve performance of subcolumns reading from compact parts - Save marks for each substream in Compact part to be able to read individual subcolumns #77940
- Add new serialization for Dynamic and JSON without SharedVariant/SharedData for better integrations - Implement flattened serialization for Dynamic and JSON in Native format #80499
- Support ALTER UPDATE for
JSON/Dynamictypes - Allow ALTER UPDATE in JSON and Dynamic columns #82419 - Use
Array(Dynamic)instead of unnamed Tuple for arrays of values with different types - Use Array(Dynamic) instead of unnamed tuple for arrays of values with different data types in JSON subcolumn types inference #74937 - Improve deserialization of JSON subcolumns from shared data - Significantly improve performance of JSON subcolumns reading from shared data in MergeTree #83777
- Remove old
Objectdata type - Remove deprecated Object type #85718 - Support JSON in
tupleElementfunction - Support JSON type in tupleElement #91327 - Add setting
type_json_skip_invalid_typed_pathsto use default value during bad parsing of a path with a hint type - feat(json): add type_json_skip_invalid_typed_paths setting #89886 - Optimize
distinctJSONPathsaggregate function so it reads only metadata files with the list of paths - Optimize distinctJSONPaths aggregate function #92196 - Optimize insertion into JSON column - Slightly optimize parsing of JSON type #93614 and Slightly optimize squashing of JSON columns for some cases #94247
- Add syntax
json.$a.bfor union ofjson.a.bandjson.^a.bor function for such use case - Introduce new "combined" subcolumn for JSON data type that combines literal and sub-object values #98788 - Support
JSONtype inJSONExtract*functions - Support for native JSON type in JSONExtract* functions. Fixes #88370. #96711 - Support creating indexes on
JSONAllPathsfunction similar tomapKeysforMaptype - Add MergeTree skip index support for JSON paths using JSONAllPaths with bloom_filter, tokenbf_v1, ngrambf_v1, and text (inverted) index types #98886
Tasks with no priority:
- Improve alters of JSON/Dynamic columns in Wide part (avoid whole part rewrite)
- Replace JSON column in the subquery to requested subcolumns Replace JSON column in the subquery to requested subcolumns #75538
- Add setting
max_subcolumns_to_read. - Add support for EPHEMERAL subcolumns in JSON paths types declaration.
- Add support for TTL for JSON subcolumns.
- Add support for CODEC for JSON subcolumns.
- Support ALTER UPDATE for individual subcolumns
- Add function and aggregate function to merge JSON objects
- Match json and jsonb type in PostgreSQL to new JSON type under a setting
- Support JSON type in MongoDB engine (Load Mongo documents straight into JSON type #71970)
- Add function
JSONAllValuesthat returns the list of values (casted to String maybe) stored in JSON column in sorted by path order - Add new JSON type to schema inference
- Consider adding new syntax to read subcolumns by regex (comment)
- Consider implementing function for querying paths dynamically (comment)
- Add information about subcolumn sizes in system table.
- Support writing
json.array[N].keyforArray(JSON)subcolumns (comment) - Support declaring the type for all dynamic subcolumns (comment)
- Support
Dynamictype as first argument of functionhas - Add function like
jsonuntuple(json, array_of_paths)to select paths as separate columns - Support
arrayElementfunction for JSON to read subcolumns likejson['key']and use optimization functions to subcolumns here - Add new syntax in JSON type definition to define type for path regexp (comment)
- Support JSON in BSONEachRow format
- Add functions
JSONRemoveKeysandJSONUpdateKeys - Support subcolumns in ReplacingMergeTree arguments for
versionandis_deletedcolumns - Consider adding function that converts JSON to
Map(String, Dynamic)
Feel free to add any ideas for improvements/feature requests in the comments in this issue
Reactions are currently unavailable