Parallelize object storage output by alexey-milovidov · Pull Request #99548 · ClickHouse/ClickHouse

alexey-milovidov · 2026-03-16T07:41:49Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Improve the performance of data lakes. In previous versions, reading from object storage didn't resize the pipeline to the number of processing threads.

`StorageObjectStorage` (S3, Azure, GCS) with a single file created only one pipeline source without calling `pipe.resize()`. This meant the entire query ran through a single pipeline thread — downstream processors like `AggregatingTransform` could not run in parallel. `StorageFile` already does this resize (line 1913 in StorageFile.cpp), enabling parallel aggregation even with a single input file. This commit adds the same `pipe.resize(max_threads)` to `ReadFromObjectStorageStep::initializePipeline`. The effect on the ClickBench datalake benchmark is dramatic: - Q28 (`REGEXP_REPLACE` + `GROUP BY`): 79s → 4.8s (was 79× slower than local Parquet, now 1.1×) - Q16 (heavy `GROUP BY`): 21× → 1.2× - Q13: 12× → 1.3× Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Checks that EXPLAIN PIPELINE shows `Resize 1 → N` after `ReadFromObjectStorage`, proving the single source output is distributed across multiple pipeline threads. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

clickhouse-gh · 2026-03-16T07:42:29Z

Workflow [PR], commit [aee11a3]

Summary: ✅

AI Review

Summary

This PR parallelizes output in ReadFromObjectStorageStep by adding a Resize stage (guarded by parallelize_output_from_storages and format capability checks), similar to existing behavior in StorageFile and StorageURL. It also adds a stateless test validating Resize 1 → 4 for S3. After reviewing the diff and full touched files, I did not find new blocker/major issues that are both high-confidence and not already covered by existing inline discussion.

Missing context

⚠️ No CI report/log links were provided in the review request, so this review is limited to static code/test analysis.

ClickHouse Rules

Item	Status	Notes
Deletion logging	➖
Serialization versioning	➖
Core-area scrutiny	✅
No test removal	✅
Experimental gate	➖
No magic constants	✅
Backward compatibility	✅
`SettingsChangesHistory.cpp`	➖
Safe rollout	✅
Compilation time	✅

Final Verdict

Status: ✅ Approve

kssenii · 2026-03-16T09:52:48Z

tests/queries/0_stateless/04040_object_storage_parallelize_output.sh

+# Without this, queries on a single S3/data-lake Parquet file run
+# entirely single-threaded (e.g. Q28 in ClickBench: 79× slower).


I do not think this is true, we have ParallelReadBuffer (which should work with all formats apart from AFAIK new parquet reader) with parallelization via readBigAt. So if execution is single-threaded, then readBigAt is not supported or the file is just not big enough to decide to read it in parallel. See

ClickHouse/src/IO/ParallelReadBuffer.cpp

Lines 296 to 306 in ca135b4

std::unique_ptr<ParallelReadBuffer> wrapInParallelReadBufferIfSupported(

ReadBuffer & buf, ThreadPoolCallbackRunnerUnsafe<void> schedule, size_t max_working_readers,

size_t range_step, size_t file_size)

{

auto * seekable = dynamic_cast<SeekableReadBuffer*>(&buf);

if (!seekable || !seekable->supportsReadAt())

return nullptr;

return std::make_unique<ParallelReadBuffer>(

*seekable, schedule, max_working_readers, range_step, file_size);

}

Then how to explain this? https://benchmark.clickhouse.com/#system=+i%20%20,s|H(u,s&type=-&machine=-ca4e|ca2l|6t|g4e|6ax|ae-l|6ale|3al&cluster_size=-&opensource=-&hardware=+c&tuned=+n&metric=hot&queries=-

So non-datalake read is much faster than same data as datalake format?
As we have absolutely the same read-related code for both, then performance difference must be data lake specific, while this PR affects both non-datalake and datalake implementations, e.g. if plain parquet reading is fast enough already, then I think the issue must be fixed differently from what this PR does. Such a big performance difference looks extremely strange (if in datalake it also has the same single parquet data file, not split into multiple small files I mean). This needs to be investigated.

Yes.

Note: The data lake mode in ClickBench actually means reading Parquet files from S3 (not related to any data lake metadata formats).

And the investigation pointed to the difference between StorageFile and StorageObjectStorage.

Note: The data lake mode in ClickBench actually means reading Parquet files from S3 (not related to any data lake metadata formats).

Did not know this, thought it was iceberg s3 parquet vs s3 parquet...

And the investigation pointed to the difference between StorageFile and StorageObjectStorage.

Ok, looks like it makes sense indeed.

But also what is the version on which that benchmark was run for data lake mode? We had 2 fixes for readBigAt in latest versions.

It's always master.

alexey-milovidov · 2026-03-16T23:05:32Z

I also double checked the PR binary vs. the master binary manually, just in case... and it is phenomenally faster.

src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp

Preserve the original requested stream count in `max_num_streams`, matching the pattern used by `StorageFile` and `StorageURL`. This ensures that `pipe.resize` respects the stream cap set by `max_streams_for_files_processing_in_cluster_functions` for distributed processing, instead of bypassing it with raw `max_threads`. #99548 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ouse/ClickHouse into parallelize-object-storage-output

clickhouse-gh · 2026-03-18T08:35:20Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	83.70%	83.80%	+0.10%
Functions	23.90%	23.90%	+0.00%
Branches	76.30%	76.30%	+0.00%

PR changed lines: PR changed-lines coverage: 94.12% (16/17, 0 noise lines excluded)
Diff coverage report
Uncovered code

…t-storage-output Parallelize object storage output

Antalya 26.1 Backport of ClickHouse#99548 - Parallelize object storage output

alexey-milovidov and others added 2 commits March 16, 2026 07:35

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Mar 16, 2026

alexey-milovidov requested review from al13n321, alesapin, divanik, kssenii, melvynator and scanhex12 March 16, 2026 07:42

kssenii reviewed Mar 16, 2026

View reviewed changes

kssenii approved these changes Mar 16, 2026

View reviewed changes

clickhouse-gh bot assigned kssenii Mar 16, 2026

Merge master into parallelize-object-storage-output

2e18f21

clickhouse-gh bot reviewed Mar 16, 2026

View reviewed changes

src/Processors/QueryPlan/ReadFromObjectStorageStep.cpp Outdated Show resolved Hide resolved

alexey-milovidov and others added 5 commits March 16, 2026 23:29

Merge branch 'parallelize-object-storage-output' of github.com:ClickH…

69471e1

…ouse/ClickHouse into parallelize-object-storage-output

Merge fix for flaky 04004_alter_modify_column_ttl_without_type test

b20e9c1

Merge branch 'master' into parallelize-object-storage-output

3716e83

Merge master into parallelize-object-storage-output

aee11a3

alexey-milovidov mentioned this pull request Mar 18, 2026

Fix broken docker pull retry in integration tests #99828

Merged

1 task

alexey-milovidov added this pull request to the merge queue Mar 18, 2026

Merged via the queue into master with commit 52da74a Mar 18, 2026
163 checks passed

alexey-milovidov deleted the parallelize-object-storage-output branch March 18, 2026 09:23

robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Mar 18, 2026

alexey-milovidov mentioned this pull request Mar 21, 2026

Update ClickHouse results for Data Lakes ClickHouse/ClickBench#819

Merged

mkmkme pushed a commit to Altinity/ClickHouse that referenced this pull request Mar 26, 2026

Merge pull request ClickHouse#99548 from ClickHouse/parallelize-objec…

173a619

…t-storage-output Parallelize object storage output

mkmkme mentioned this pull request Mar 26, 2026

Antalya 26.1 Backport of #99548 - Parallelize object storage output Altinity/ClickHouse#1580

Merged

27 tasks

zvonand added a commit to Altinity/ClickHouse that referenced this pull request Mar 31, 2026

Merge pull request #1580 from Altinity/backports/antalya-26.1/99548

aeaa6b7

Antalya 26.1 Backport of ClickHouse#99548 - Parallelize object storage output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize object storage output#99548

Parallelize object storage output#99548
alexey-milovidov merged 8 commits intomasterfrom
parallelize-object-storage-output

alexey-milovidov commented Mar 16, 2026

Uh oh!

clickhouse-gh bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

kssenii Mar 16, 2026

Uh oh!

alexey-milovidov Mar 16, 2026

Uh oh!

kssenii Mar 16, 2026

Uh oh!

alexey-milovidov Mar 16, 2026

Uh oh!

alexey-milovidov Mar 16, 2026

Uh oh!

kssenii Mar 16, 2026 •

edited

Loading

Uh oh!

kssenii Mar 16, 2026

Uh oh!

alexey-milovidov Mar 16, 2026

Uh oh!

alexey-milovidov commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

clickhouse-gh bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# Without this, queries on a single S3/data-lake Parquet file run
		# entirely single-threaded (e.g. Q28 in ClickBench: 79× slower).

	std::unique_ptr<ParallelReadBuffer> wrapInParallelReadBufferIfSupported(
	ReadBuffer & buf, ThreadPoolCallbackRunnerUnsafe<void> schedule, size_t max_working_readers,
	size_t range_step, size_t file_size)
	{
	auto * seekable = dynamic_cast<SeekableReadBuffer*>(&buf);
	if (!seekable \|\| !seekable->supportsReadAt())
	return nullptr;

	return std::make_unique<ParallelReadBuffer>(
	*seekable, schedule, max_working_readers, range_step, file_size);
	}

Conversation

alexey-milovidov commented Mar 16, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

clickhouse-gh bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context

ClickHouse Rules

Final Verdict

Uh oh!

kssenii Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

kssenii Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

kssenii Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kssenii Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh bot commented Mar 18, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

clickhouse-gh bot commented Mar 16, 2026 •

edited

Loading

kssenii Mar 16, 2026 •

edited

Loading

alexey-milovidov commented Mar 16, 2026 •

edited

Loading