-
Notifications
You must be signed in to change notification settings - Fork 8.3k
JSON type inference #8327
Description
Introduction
JSON is a dynamic format. So there is no type definition for a JSON object. Although ClickHouse can use JSON, users are forced to know the type of a JSON element.
In the following example, user must know that i is an int type and f is a float type:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT
JSONExtract(json, 'i', 'Int8') AS i,
JSONExtract(json, 'f', 'Float32') AS f
┌─i─┬─f─┐
│ 1 │ 0 │
└───┴───┘Although there is a function for extracting raw value(JSONExtractRaw), the resulting type will always be the String:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT
JSONExtractRaw(json, 'i') AS i,
JSONExtractRaw(json, 'f') AS f,
toTypeName(i) AS i_type,
toTypeName(f) AS f_type
┌─i─┬─f──────────────────┬─i_type─┬─f_type─┐
│ 1 │ 1.2000000000000002 │ String │ String │
└───┴────────────────────┴────────┴────────┘There is also a function for checking JSON value type(JSONType).
WITH '{"i": 1, "f": 1.2}' AS json
SELECT JSONType(json, 'i') AS i_type
┌─i_type─┐
│ Int64 │
└────────┘With combining these two functions, users will have a powerful JSON database like MongoDB. But the usage will be really ugly and slow because it must be in two steps:
- User must issue an SQL using
JSONTypeto get the type:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT JSONType(json, 'i') AS i_type
┌─i_type─┐
│ Int64 │
└────────┘- Then user must build a second SQL statement using
JSONExtractand use the type they get from first step:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT
JSONExtract(json, 'i', 'Int64') AS value,
toTypeName(value) AS type
┌─value─┬─type──┐
│ 1 │ Int64 │
└───────┴───────┘Using JSONType output as CAST parameter will not work:
WITH
'{"i": 1, "f": 1.2}' AS json,
JSONType(json, 'i') AS type
SELECT JSONExtract(json, 'i', type)
Received exception from server (version 19.17.2):
Code: 368. DB::Exception: Received from some-clickhouse-server:9000. DB::Exception: std::bad_typeid.Use case
Automatically inferring the type of a JSON value will make ClickHouse a great alternative for MongoDB and ElasticSearch like document stores.
Right now, we have to know the JSON schema, which is not so possible. But if this feature gets implemented, we will be able to freely query tables which will make ClickHouse best alternative for ElasticSearch.
Users will be able to send any JSON structure they want and they will be able to query for any field.
Describe the solution you'd like
Implementing a function that combines JSONType and JSONExtract functions will solve this problem.
A new function named JSON will first check the type of the value then use JSONExtract to extract the value and convert to value type:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT
JSON(json, 'i') AS value,
toTypeName(value) AS type
┌─value─┬─type──┐
│ 1 │ Int64 │
└───────┴───────┘