Skip to content

Improve performance of the whole JSON column reading in Wide parts from S3#74827

Merged
Avogar merged 24 commits intoClickHouse:masterfrom
Avogar:improve-json-reading
Feb 14, 2025
Merged

Improve performance of the whole JSON column reading in Wide parts from S3#74827
Avogar merged 24 commits intoClickHouse:masterfrom
Avogar:improve-json-reading

Conversation

@Avogar
Copy link
Copy Markdown
Member

@Avogar Avogar commented Jan 20, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Improve performance of the whole JSON column reading in Wide parts from S3. It's done by adding prefetches for subcolumn prefixes deserialization, cache of deserialized prefixes and parallel deserialization of subcolumn prefixes. It improves reading of the JSON column from S3 4 times in query like SELECT data FROM table and about 10 times in query like SELECT data FROM table LIMIT 10.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings (Only check the boxes if you know what you are doing)

All builds in Builds_1 and Builds_2 stages are always mandatory and will run independently of the checks below:

  • Only: Stateless tests
  • Only: Stateful tests
  • Only: Integration tests
  • Only: Performance tests

  • Skip: Style check
  • Skip: Fast test

  • Run all checks ignoring all possible failures (Resource-intensive. All test jobs execute in parallel).
  • Disable CI cache

@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-performance Pull request with some performance improvements label Jan 20, 2025
@robot-ch-test-poll4
Copy link
Copy Markdown
Contributor

robot-ch-test-poll4 commented Jan 20, 2025

This is an automated comment for commit d1a0ddc with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors❌ failure
Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
BuzzHouse (asan)There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
BuzzHouse (debug)There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
BuzzHouse (msan)There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
BuzzHouse (tsan)There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
BuzzHouse (ubsan)There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns ClickBench with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@vdimir vdimir self-assigned this Jan 21, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 10, 2025

Workflow [PR], commit [36a930a]

@Avogar Avogar requested a review from vdimir February 11, 2025 11:39
@Avogar Avogar added this pull request to the merge queue Feb 14, 2025
@Avogar Avogar mentioned this pull request Feb 14, 2025
56 tasks
Merged via the queue into ClickHouse:master with commit 27c59fe Feb 14, 2025
114 of 116 checks passed
@Avogar Avogar deleted the improve-json-reading branch February 14, 2025 15:30
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 14, 2025
baibaichen added a commit to Kyligence/gluten that referenced this pull request Feb 15, 2025
baibaichen added a commit to apache/gluten that referenced this pull request Feb 15, 2025
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250215)

* Fix build due to ClickHouse/ClickHouse#75938

* Fix Ut due to ClickHouse/ClickHouse#74827

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
kevinw66 pushed a commit to kevinw66/incubator-gluten that referenced this pull request Mar 3, 2025
)

* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20250215)

* Fix build due to ClickHouse/ClickHouse#75938

* Fix Ut due to ClickHouse/ClickHouse#74827

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants