Introduce new "combined" subcolumn for JSON data type that combines literal and sub-object values#98788
Introduce new "combined" subcolumn for JSON data type that combines literal and sub-object values#98788
Conversation
path subcolumn and the sub-object subcolumn into a single access: - returns the literal value as `Dynamic` if the path holds a scalar, - returns the sub-object as `Dynamic` if the path holds a nested object, - returns `NULL` if the path is absent entirely. Implementation: - Parser: handle `$`-prefixed identifiers in `ParserCompoundIdentifier` via the new `is_dollar_bareword` flag in `ExpressionElementParsers`. - In-memory (Memory engine): `extractCombinedColumn` in `DataTypeObject` merges the literal and sub-object columns row by row. - MergeTree (on-disk): new `SerializationObjectCombinedPath` reads the literal and sub-object substreams and merges them during deserialization, using `insert_only_rows_in_current_range_from_substreams_cache` to avoid stale cache interactions between the two substreams. - `tupleElement` and `JSONExtract` function families extended to support the combined subcolumn path prefix. Tests added in `03467`–`03470` covering parser round-trips, prefix correctness, Memory/Compact/Wide MergeTree variants, and `Array(JSON)`. Documentation added to `newjson.md` with a comparison example of all three subcolumn types (`json.a`, `json.^a`, `json.`) side by side.
|
Workflow [PR], commit [f6525f5] Summary: ✅ AI ReviewSummaryThis PR introduces a new JSON combined subcolumn syntax ( Missing context
ClickHouse Rules
Final Verdict
|
Does, it also apply to root json as well? |
No. Root JSON column can contain only objects. In theory, you can already use Dynamic type for this purpose: create table test (json_value Dynamic) engine=MergeTree order by tuple();
insert into test select 42::Int64;
insert into test select null;
insert into test select 'data';
insert into test select '{"a" : {"b" : 42}}'::JSON;
select json_value, json_value.Int64, json_value.String, json_value.JSON, json_value.JSON.a.b from test;But for this to work with JSON input formats we need to add support for JSON type in schema inference (right now objects are inferred as named tuples). Or you can always wrap your JSON value into an object and just use JSON type: |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| auto subcolumn_name = subcolumn_name_col->getValue<String>(); | ||
| return wrapInArrays(object->getSubcolumnType(subcolumn_name), count_arrays); | ||
| /// Use combined `$` subcolumn that merges literal value and sub-object. | ||
| auto combined_name = String(1, DataTypeObject::COMBINED_SUBCOLUMN_PREFIX) + "`" + subcolumn_name + "`"; |
There was a problem hiding this comment.
❌ This change drops previously covered behavior for tupleElement on JSON typed subcolumns (for example tupleElement(json, 'c.:Array(JSON)')), and the corresponding test was removed in this PR.
Because combined_name is always built as $`<element_name>` , any element_name containing a dynamic-type suffix (or embedded backquotes) no longer maps to the original subcolumn syntax. This is a backward-compatibility regression and effectively relaxes an existing test instead of preserving behavior.
Please either preserve support for these existing paths in tupleElement or keep the old behavior behind compatibility handling and add explicit tests for both forms.
|
#99920 |
No, I didn't look at it yet. You can go ahead (or assign me to the issue, I can also take a look) |
…fier ambiguity; add serialization pooling support for SerializationObjectCombinedPath
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 88.70% (259/292, 0 noise lines excluded) |
Introduce new "combined" subcolumn for JSON data type that combines literal and sub-object values
|
Hi @Avogar @bharatnc — while reviewing this PR I found the following:
Could you review the test PR and add the |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add combined subcolumn syntax
json.@pathfor theJSONtype that returns the literal value asDynamicif the path holds a scalar, the sub-object asDynamicif the path holds a nested object, orNULLif the path is absent.Documentation entry for user-facing changes
Reading JSON combined sub-columns
The
JSONtype supports reading a path as a combined sub-column using the special syntax[email protected].A combined sub-column for a given path returns:
Dynamic, if the path has a literal value.Dynamic, if the path has no literal value but has nested sub-paths.NULL, if neither a literal value nor any sub-paths exist for that path.This is useful when a path may hold either a scalar value or a nested object across different rows, and is more convenient than separately querying the literal sub-column (
json.a) and the sub-object sub-column (json.^a).The following example compares all three sub-column types for path
a:Row 1:
aholds a literal42.json.areturns it asDynamic(Int64),json.^areturns an empty sub-object{}(no nested keys undera), andjson.@areturns the literal42.Row 2:
aholds a nested object.json.areturnsNULL(no literal at that path),json.^areturns the sub-object asJSON, andjson.@aalso returns the sub-object asDynamic(JSON).Row 3:
ais absent entirely. Bothjson.aandjson.@areturnNULL, whilejson.^areturns an empty{}.Documentation is written (mandatory for new features)