Skip to content

Slightly optimize parsing of JSON type#93614

Merged
Avogar merged 10 commits intoClickHouse:masterfrom
Avogar:optimize-insert-into-json
Jan 29, 2026
Merged

Slightly optimize parsing of JSON type#93614
Avogar merged 10 commits intoClickHouse:masterfrom
Avogar:optimize-insert-into-json

Conversation

@Avogar
Copy link
Copy Markdown
Member

@Avogar Avogar commented Jan 7, 2026

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Slightly optimize parsing of JSON type.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jan 7, 2026

Workflow [PR], commit [4773f84]

Summary:

job_name test_name status info comment
Stateless tests (amd_debug, parallel) failure
00625_arrays_in_nested FAIL cidb IGNORED
Stateless tests (amd_msan, parallel, 1/2) failure
02473_multistep_prewhere FAIL cidb IGNORED
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting: the query: (STID: 1941-1bfa) FAIL cidb, issue ISSUE EXISTS
Performance Comparison (arm_release, master_head, 3/6) failure
Check Results failure IGNORED

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Jan 7, 2026
@Avogar
Copy link
Copy Markdown
Member Author

Avogar commented Jan 14, 2026

Screenshot 2026-01-14 at 20 27 33

@Avogar Avogar marked this pull request as ready for review January 14, 2026 19:50
Comment on lines -113 to -115
ColumnCheckpoints checkpoints(columns.size());
for (size_t column_idx = 0; column_idx < columns.size(); ++column_idx)
checkpoints[column_idx] = columns[column_idx]->getCheckpoint();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating and updating checkpoints slows down parsing in general. It's quite noticable in the flamegraph when there are thousands of subcolumns inside the type when we create and update a checkpoint for each of the subcolumn.

In general, all input formats should handle errors gracefully and don't leave inconsistent columns. In this PR I added fixes for known places where we didn't do it

@antonio2368 antonio2368 self-assigned this Jan 19, 2026
@Avogar Avogar added this pull request to the merge queue Jan 29, 2026
Merged via the queue into ClickHouse:master with commit 7ea6f88 Jan 29, 2026
129 of 134 checks passed
@Avogar Avogar deleted the optimize-insert-into-json branch January 29, 2026 13:37
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 29, 2026
@Avogar Avogar mentioned this pull request Mar 6, 2026
56 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants