perf(sql): parallel ORDER BY long_column LIMIT N for high-cardinality GROUP BY#6582
Merged
bluestreak01 merged 7 commits intomasterfrom Dec 31, 2025
Merged
perf(sql): parallel ORDER BY long_column LIMIT N for high-cardinality GROUP BY#6582bluestreak01 merged 7 commits intomasterfrom
bluestreak01 merged 7 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Contributor
[PR Coverage check]😞 fail : 49 / 209 (23.44%) file detail
|
bluestreak01
approved these changes
Dec 31, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch adds parallel execution for the
ORDER BY+LIMIT(top K) phase in high-cardinality parallel GROUP BY queries. When the GROUP BY result is sharded (due to high cardinality), the top K selection now processes each shard in parallel using worker threads, then merges the per-shard results.Changes:
GroupByLongTopKJobandGroupByLongTopKTaskfor parallel top K processingAsyncGroupByRecordCursor#parallelLongTopK()orchestrates parallel execution when:cairo.sql.parallel.groupby.topk.thresholdcairo.sql.parallel.groupby.topk.threshold- minimum map size to enable parallel top K (default 5M)cairo.sql.parallel.groupby.topk.queue.capacity- task queue capacity (default - same as page frame reduce queue capacity)Benchmarks
ClickBench run on Ryzen 7900x, 64GB RAM, Ubuntu 24.04.
The queries that benefit from parallel top K and their respective sharded map sizes:
ORDER BY + LIMIT phase (not full query execution) times:
Analysis of Q32 speed-up
Memory bandwidth analysis:
For Q32's map entries (GROUP BY WatchID long, ClientIP ipv4):
Total data: 100M × 48 bytes ≈ 4.8 GB
Ryzen 7900X with DDR5 dual-channel has theoretical bandwidth of ~90 GB/s, practical peak ~60-70 GB/s. The sequential scan is already at ~60-70% of peak memory bandwidth. The parallel version pushes it closer to ~75-80%. We're hitting the memory bandwidth ceiling.
This explains the modest 1.3x improvement - it's not a CPU-bound workload where 23 workers would help proportionally. The single-threaded scan with hardware prefetching already saturates most of the available DRAM bandwidth. Additional threads only marginally improve memory-level parallelism. The queries with better speedups (Q35 at 11.2x) have smaller working sets that benefit more from cache effects and CPU parallelism rather than being memory-bound.
Queries