Skip to content

Infer Array(Dynamic) instead of unnamed Tuple for arrays of values with different types in JSON#80859

Merged
Avogar merged 13 commits intoClickHouse:masterfrom
Avogar:infer-array-of-dynamic
Aug 18, 2025
Merged

Infer Array(Dynamic) instead of unnamed Tuple for arrays of values with different types in JSON#80859
Avogar merged 13 commits intoClickHouse:masterfrom
Avogar:infer-array-of-dynamic

Conversation

@Avogar
Copy link
Copy Markdown
Member

@Avogar Avogar commented May 26, 2025

Changelog category (leave one):

  • Backward Incompatible Change

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Infer Array(Dynamic) instead of unnamed Tuple for arrays of values with different types in JSON. To use previous behaviour, disable setting input_format_json_infer_array_of_dynamic_from_array_of_different_types.

Closes #74937

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 26, 2025

Workflow [PR], commit [19da2a4]

@clickhouse-gh clickhouse-gh bot added the pr-backward-incompatible Pull request with backwards incompatible changes label May 26, 2025
@pamarcos pamarcos self-assigned this Jun 4, 2025
@pamarcos pamarcos self-requested a review June 4, 2025 15:32
Copy link
Copy Markdown
Member

@pamarcos pamarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc is crystal clear, the breaking change after a setting to allow previous behavior, and the comments in the code are perfect. Very nice work 🚀

I left two minor nitpick comments. I don't think the performance comparison have anything to do with your changes. They're known to be unstable for AMD #78555

I'd change the changelog entry to say how to disable the new behavior more explicitly, though. Something along these lines:

Infer Array(Dynamic) instead of unnamed Tuple for arrays of values with different types in JSON. To use previous behaviour, set input_format_json_infer_array_of_dynamic_from_array_of_different_types=0

/// Only unnamed Tuple types can be inferred from arrays.
for (const auto & type : data_types)
{
if (const auto * tuple_type = typeid_cast<const DataTypeTuple *>(type.get()); tuple_type && tuple_type->haveExplicitNames())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not related to your changes and it's nitpicking, but I think we should rename haveExplicitNames() and the underlying variable have_explicit_names to hasExplicitNames() and has_explicit_names, for consistency.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but I don't want to change it in this PR. Feel free to create a separate PR for it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, done in #82194
I added you as a reviewer directly, since you already have the context for this

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 19, 2025

Workflow [PR], commit [4cfe3b5]

Summary:

@pamarcos
Copy link
Copy Markdown
Member

@Avogar everything is green and ready to be merged 🚀

@Avogar Avogar added this pull request to the merge queue Aug 18, 2025
Merged via the queue into ClickHouse:master with commit 8da4ba9 Aug 18, 2025
123 checks passed
@Avogar Avogar deleted the infer-array-of-dynamic branch August 18, 2025 11:46
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backward-incompatible Pull request with backwards incompatible changes pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Array(Dynamic) instead of unnamed tuple for arrays of values with different data types in JSON subcolumn types inference

5 participants