Skip to content

JSON type inference #8327

@ramazanpolat

Description

@ramazanpolat

Introduction

JSON is a dynamic format. So there is no type definition for a JSON object. Although ClickHouse can use JSON, users are forced to know the type of a JSON element.

In the following example, user must know that i is an int type and f is a float type:

WITH '{"i": 1, "f": 1.2}' AS json
SELECT
    JSONExtract(json, 'i', 'Int8') AS i,
    JSONExtract(json, 'f', 'Float32') AS f

┌─i─┬─f─┐
│ 10 │
└───┴───┘

Although there is a function for extracting raw value(JSONExtractRaw), the resulting type will always be the String:

WITH '{"i": 1, "f": 1.2}' AS json
SELECT
    JSONExtractRaw(json, 'i') AS i,
    JSONExtractRaw(json, 'f') AS f,
    toTypeName(i) AS i_type,
    toTypeName(f) AS f_type

┌─i─┬─f──────────────────┬─i_type─┬─f_type─┐
│ 11.2000000000000002 │ String │ String │
└───┴────────────────────┴────────┴────────┘

There is also a function for checking JSON value type(JSONType).

WITH '{"i": 1, "f": 1.2}' AS json
SELECT JSONType(json, 'i') AS i_type

┌─i_type─┐
│ Int64  │
└────────┘

With combining these two functions, users will have a powerful JSON database like MongoDB. But the usage will be really ugly and slow because it must be in two steps:

  1. User must issue an SQL using JSONType to get the type:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT JSONType(json, 'i') AS i_type

┌─i_type─┐
│ Int64  │
└────────┘
  1. Then user must build a second SQL statement using JSONExtract and use the type they get from first step:
WITH '{"i": 1, "f": 1.2}' AS json
SELECT
    JSONExtract(json, 'i', 'Int64') AS value,
    toTypeName(value) AS type

┌─value─┬─type──┐
│     1 │ Int64 │
└───────┴───────┘

Using JSONType output as CAST parameter will not work:

WITH
    '{"i": 1, "f": 1.2}' AS json,
    JSONType(json, 'i') AS type
SELECT JSONExtract(json, 'i', type)

Received exception from server (version 19.17.2):
Code: 368. DB::Exception: Received from some-clickhouse-server:9000. DB::Exception: std::bad_typeid.

Use case

Automatically inferring the type of a JSON value will make ClickHouse a great alternative for MongoDB and ElasticSearch like document stores.

Right now, we have to know the JSON schema, which is not so possible. But if this feature gets implemented, we will be able to freely query tables which will make ClickHouse best alternative for ElasticSearch.

Users will be able to send any JSON structure they want and they will be able to query for any field.

Describe the solution you'd like

Implementing a function that combines JSONType and JSONExtract functions will solve this problem.

A new function named JSON will first check the type of the value then use JSONExtract to extract the value and convert to value type:

WITH '{"i": 1, "f": 1.2}' AS json
SELECT
    JSON(json, 'i') AS value,
    toTypeName(value) AS type

┌─value─┬─type──┐
│     1 │ Int64 │
└───────┴───────┘

Metadata

Metadata

Labels

featurenot plannedKnown issue, no plans to fix it currenlty

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions