-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
PerformancePerformance improvementsPerformance improvementsSQLIssues or changes relating to SQL executionIssues or changes relating to SQL execution
Description
Is your feature request related to a problem?
#4032 added parallel execution for more parallel GROUP BY cases.
Specifically, this ClickBench query involves a large dataset (>90M rows), so we shard the underlying FastMap into smaller maps using radix partitioning:
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;This means that LimitedSizeSortedLightRecordCursor which does the ORDER BY c DESC LIMIT 10 part operates on a ShardedMapCursor cursor which wraps the shard maps. This top K takes around 2 seconds since it runs on a single thread. We should detect the "potentially sharded result set" case, then run top K in parallel, and then do a K-way merge on top of the gathered results. This should speed up the top K part of the query execution significantly.
Describe the solution you'd like.
No response
Describe alternatives you've considered.
No response
Full Name:
Andrei Pechkurov
Affiliation:
QuestDB
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PerformancePerformance improvementsPerformance improvementsSQLIssues or changes relating to SQL executionIssues or changes relating to SQL execution