perf(sql): speed up parallel GROUP BY and Top K queries by mtopolnik · Pull Request #6754 · questdb/questdb

mtopolnik · 2026-02-06T15:29:51Z

Summary

Add an unordered mode to PageFrameSequence that uses SOUnboundedCountDownLatch instead of ordered collection via collectSubSeq, eliminating head-of-line blocking for reducers that don't need ordered results
Convert GROUP BY (keyed), GROUP BY (not-keyed), and Top K factories to use the new unordered mode, replacing ~50 lines of boilerplate ordered collect loops with a single dispatchAllAndAwait() call in each cursor
Copy error fields in storeError() into a dedicated CairoException instance to prevent thread-local exception recycling from corrupting stored error data

Benchmark results

JDBC client benchmark (HolBlockingBenchmark) with a skewed-partition table designed to provoke HOL blocking: 50 hourly partitions (45 × 1K rows + 5 × 1M rows, ~5M rows total). Large partitions appear every 10th hour (1000:1 size ratio). Queries use expensive per-row math (sqrt, log) to amplify the processing time difference between large and small partitions. The benchmark also reports per-iteration "spread" (max − min latency across concurrent threads), which directly measures how much the slowest thread lags behind the fastest in each batch.

Run with mvn -pl benchmarks package -DskipTests && java -cp benchmarks/target/benchmarks.jar org.questdb.HolBlockingBenchmark.

Raw numbers

master

Query [GroupBy]: SELECT category, count(), sum(sqrt(value)) FROM events GROUP BY category
  Concurrency 1:   avg=10.6ms  p50=9.5ms  p90=14.4ms  p99=20.5ms  max=26.5ms  qps=80
  Concurrency 2:   avg=41.4ms  p50=45.6ms  p90=59.2ms  p99=78.9ms  max=80.7ms  qps=43  spread(avg=10.1ms p50=7.7ms p99=37.6ms)
  Concurrency 4:   avg=43.2ms  p50=40.2ms  p90=65.8ms  p99=91.1ms  max=113.7ms  qps=82  spread(avg=35.2ms p50=34.3ms p99=78.6ms)
  Concurrency 8:   avg=90.0ms  p50=83.7ms  p90=150.1ms  p99=214.5ms  max=256.0ms  qps=79  spread(avg=121.1ms p50=115.7ms p99=211.4ms)

Query [GroupByNotKeyed]: SELECT count(), sum(log(value + 1)) FROM events
  Concurrency 1:   avg=21.1ms  p50=20.6ms  p90=24.1ms  p99=28.1ms  max=28.5ms  qps=42
  Concurrency 2:   avg=54.4ms  p50=54.5ms  p90=70.9ms  p99=86.3ms  max=89.0ms  qps=33  spread(avg=15.2ms p50=13.3ms p99=43.5ms)
  Concurrency 4:   avg=84.9ms  p50=80.1ms  p90=123.3ms  p99=169.0ms  max=186.6ms  qps=42  spread(avg=61.0ms p50=55.4ms p99=125.9ms)
  Concurrency 8:   avg=162.3ms  p50=150.2ms  p90=262.8ms  p99=379.2ms  max=613.4ms  qps=44  spread(avg=205.6ms p50=188.0ms p99=395.3ms)

Query [GroupByFiltered]: SELECT category, count(), sum(sqrt(value)) FROM events WHERE value > 500000 GROUP BY category
  Concurrency 1:   avg=9.8ms  p50=9.6ms  p90=11.5ms  p99=15.3ms  max=15.8ms  qps=92
  Concurrency 2:   avg=19.7ms  p50=19.1ms  p90=25.8ms  p99=36.6ms  max=39.4ms  qps=90  spread(avg=4.5ms p50=3.3ms p99=20.0ms)
  Concurrency 4:   avg=37.1ms  p50=33.7ms  p90=58.7ms  p99=81.2ms  max=95.9ms  qps=95  spread(avg=30.2ms p50=27.7ms p99=68.4ms)
  Concurrency 8:   avg=73.9ms  p50=67.0ms  p90=124.5ms  p99=183.8ms  max=219.0ms  qps=96  spread(avg=99.3ms p50=95.4ms p99=194.8ms)

Query [TopK]: SELECT ts, category, value FROM events ORDER BY value DESC LIMIT 10
  Concurrency 1:   avg=3.7ms  p50=3.5ms  p90=4.4ms  p99=6.8ms  max=7.0ms  qps=244
  Concurrency 2:   avg=7.0ms  p50=6.6ms  p90=9.6ms  p99=14.5ms  max=21.6ms  qps=254  spread(avg=2.0ms p50=1.2ms p99=9.3ms)
  Concurrency 4:   avg=13.1ms  p50=12.2ms  p90=20.8ms  p99=31.2ms  max=41.9ms  qps=270  spread(avg=11.9ms p50=10.4ms p99=32.0ms)
  Concurrency 8:   avg=23.2ms  p50=19.4ms  p90=43.7ms  p99=69.8ms  max=95.2ms  qps=291  spread(avg=38.6ms p50=36.6ms p99=72.1ms)

This PR

Query [GroupBy]: SELECT category, count(), sum(sqrt(value)) FROM events GROUP BY category
  Concurrency 1:   avg=14.2ms  p50=12.8ms  p90=20.3ms  p99=21.5ms  max=22.1ms  qps=63
  Concurrency 2:   avg=25.3ms  p50=24.8ms  p90=34.3ms  p99=38.6ms  max=43.8ms  qps=71  spread(avg=7.4ms p50=6.3ms p99=21.4ms)
  Concurrency 4:   avg=48.3ms  p50=47.4ms  p90=62.7ms  p99=83.4ms  max=115.9ms  qps=74  spread(avg=25.0ms p50=23.2ms p99=70.3ms)
  Concurrency 8:   avg=92.9ms  p50=91.5ms  p90=112.2ms  p99=149.3ms  max=197.4ms  qps=77  spread(avg=39.6ms p50=34.4ms p99=114.3ms)

Query [GroupByNotKeyed]: SELECT count(), sum(log(value + 1)) FROM events
  Concurrency 1:   avg=20.7ms  p50=20.3ms  p90=23.3ms  p99=27.8ms  max=29.2ms  qps=44
  Concurrency 2:   avg=38.1ms  p50=37.7ms  p90=48.9ms  p99=58.1ms  max=58.3ms  qps=47  spread(avg=9.5ms p50=9.0ms p99=28.3ms)
  Concurrency 4:   avg=76.2ms  p50=75.0ms  p90=95.2ms  p99=116.2ms  max=146.8ms  qps=47  spread(avg=29.6ms p50=25.7ms p99=85.0ms)
  Concurrency 8:   avg=151.3ms  p50=151.0ms  p90=172.7ms  p99=212.2ms  max=258.7ms  qps=48  spread(avg=51.2ms p50=42.5ms p99=132.2ms)

Query [GroupByFiltered]: SELECT category, count(), sum(sqrt(value)) FROM events WHERE value > 500000 GROUP BY category
  Concurrency 1:   avg=10.1ms  p50=9.4ms  p90=12.3ms  p99=17.9ms  max=21.0ms  qps=89
  Concurrency 2:   avg=18.5ms  p50=18.2ms  p90=25.1ms  p99=31.3ms  max=35.6ms  qps=94  spread(avg=5.6ms p50=4.2ms p99=17.1ms)
  Concurrency 4:   avg=38.3ms  p50=37.7ms  p90=51.7ms  p99=60.2ms  max=79.2ms  qps=93  spread(avg=19.2ms p50=17.9ms p99=42.1ms)
  Concurrency 8:   avg=70.3ms  p50=69.5ms  p90=86.5ms  p99=111.2ms  max=135.1ms  qps=102  spread(avg=37.2ms p50=32.5ms p99=77.8ms)

Query [TopK]: SELECT ts, category, value FROM events ORDER BY value DESC LIMIT 10
  Concurrency 1:   avg=3.5ms  p50=3.3ms  p90=3.9ms  p99=6.9ms  max=12.1ms  qps=257
  Concurrency 2:   avg=6.7ms  p50=6.4ms  p90=8.9ms  p99=13.3ms  max=15.5ms  qps=269  spread(avg=1.7ms p50=0.8ms p99=8.5ms)
  Concurrency 4:   avg=11.7ms  p50=11.3ms  p90=17.9ms  p99=26.4ms  max=31.2ms  qps=305  spread(avg=9.0ms p50=8.8ms p99=19.2ms)
  Concurrency 8:   avg=22.7ms  p50=22.2ms  p90=31.2ms  p99=40.8ms  max=54.6ms  qps=310  spread(avg=18.0ms p50=17.0ms p99=38.1ms)

Analysis

Per-iteration spread (max − min across threads)

Spread measures how much the slowest thread lags behind the fastest in each batch — a direct proxy for HOL blocking.

Query @ concurrency 8	spread avg (PR)	spread avg (master)	Change
GroupByNotKeyed	51.2ms	205.6ms	-75%
GroupBy	39.6ms	121.1ms	-67%
GroupByFiltered	37.2ms	99.3ms	-63%
TopK	18.0ms	38.6ms	-53%

At concurrency 4 the reductions are smaller (24–51%).

Tail latency (p99/max)

Query @ concurrency 8	p99 (PR)	p99 (master)	Change	max (PR)	max (master)	Change
GroupByNotKeyed	212.2ms	379.2ms	-44%	258.7ms	613.4ms	-58%
TopK	40.8ms	69.8ms	-42%	54.6ms	95.2ms	-43%
GroupByFiltered	111.2ms	183.8ms	-40%	135.1ms	219.0ms	-38%
GroupBy	149.3ms	214.5ms	-30%	197.4ms	256.0ms	-23%

Latency predictability (p99/p50 ratio)

Query @ concurrency 8	p99/p50 (PR)	p99/p50 (master)
GroupByNotKeyed	1.4x	2.5x
GroupBy	1.6x	2.6x
GroupByFiltered	1.6x	2.7x
TopK	1.8x	3.6x

Throughput at concurrency 8

Query	qps (PR)	qps (master)	Change
TopK	310	291	+7%
GroupByNotKeyed	48	44	+9%
GroupByFiltered	102	96	+6%
GroupBy	77	79	-3%

GroupBy throughput is slightly lower on the PR at concurrency 8.

Throughput scaling from concurrency 1 to 2

On master, GroupBy and GroupByNotKeyed throughput drops when adding a second concurrent thread:

Query	qps c=1 → c=2 (master)	qps c=1 → c=2 (PR)
GroupBy	80 → 43 (-46%)	63 → 71 (+13%)
GroupByNotKeyed	42 → 33 (-21%)	44 → 47 (+7%)

The PR scales monotonically; master regresses at concurrency 2 before recovering at concurrency 4.

Single-thread regression

Query	avg c=1 (PR)	avg c=1 (master)	Change
GroupBy	14.2ms	10.6ms	+34%
GroupByFiltered	10.1ms	9.8ms	+3%
GroupByNotKeyed	20.7ms	21.1ms	-2%
TopK	3.5ms	3.7ms	-5%

GroupBy shows a 34% single-thread regression (14.2ms vs 10.6ms). GroupByFiltered, GroupByNotKeyed, and TopK are within noise. The GroupBy regression does not appear at concurrency 8 (92.9ms vs 90.0ms, +3%), suggesting the single-thread ordered path on master benefits from some caching or prefetch effect that the unordered path does not exploit. This warrants further investigation.

Test plan

Run ParallelTopKFuzzTest — includes new tests for fault tolerance (NPE in filter), query timeout, empty table, and concurrent execution (8 threads x 50 iterations)
Run ParallelGroupByFuzzTest — verifies error message assertions match the new error path through storeError()/throwStoredError()
Run the full test suite to verify no regressions in ordered PageFrameSequence behavior (filter-only, window join paths remain unchanged)

🤖 Generated with Claude Code

coderabbitai · 2026-02-06T15:29:59Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This pull request introduces an unordered, head-of-line-blocking-free frame processing pathway for parallel GROUP BY and top-K operations. It adds new async reduction components (UnorderedPageFrameSequence, UnorderedPageFrameReduceTask/Reducer, UnorderedPageFrameReduceJob), consolidates filter configuration into AsyncFilterContext, refactors async group-by and top-K factories to use the new sequencing model, updates First/Last aggregate functions to handle out-of-order row processing, and removes explicit PostgreSQL driver loading from benchmarks.

Changes

Cohort / File(s)	Summary
Documentation and Configuration `CLAUDE.md`, `core/src/main/resources/io/questdb/site/conf/server.conf`, `pkg/ami/marketplace/assets/server.conf`, `core/src/test/resources/server.conf`	Added extensive documentation sections to CLAUDE.md; introduced new configuration property `cairo.unordered.page.frame.reduce.queue.capacity` with default/test values of 4096/2048.
Unordered Frame Reduction Framework `core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReduceTask.java`, `core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReducer.java`, `core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReduceJob.java`, `core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameSequence.java`	New classes and interfaces implementing unordered frame reduction: task payload, functional reducer interface, job consumer, and core sequencing logic with work-stealing and circuit-breaker integration.
Filter Configuration Consolidation `core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterContext.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterUtils.java`	New AsyncFilterContext class consolidates filter-related state and accessors; AsyncFilterUtils adds compiled-filter overload accepting explicit memory and address caches.
MessageBus Integration `core/src/main/java/io/questdb/MessageBus.java`, `core/src/main/java/io/questdb/MessageBusImpl.java`, `core/src/main/java/io/questdb/mp/WorkerPoolUtils.java`	Added public accessors for unordered reduce queue/sequences in MessageBus; implemented initialization and wiring in MessageBusImpl; integrated UnorderedPageFrameReduceJob per-worker instantiation in WorkerPoolUtils.
Configuration API Expansion `core/src/main/java/io/questdb/PropertyKey.java`, `core/src/main/java/io/questdb/PropServerConfiguration.java`, `core/src/main/java/io/questdb/cairo/CairoConfiguration.java`, `core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java`, `core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java`	Added new property key and corresponding getters across configuration hierarchy to expose unordered reduce queue capacity (default 4096, power-of-two aligned).
Async Group-By Refactoring `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java`	Replaced PageFrameSequence with UnorderedPageFrameSequence; migrated filter configuration to AsyncFilterContext; removed reduce-task-factory wiring; refactored frame dispatch and memory management to frameIndex-based model.
Async Top-K Refactoring `core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKAtom.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java`, `core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKRecordCursorFactory.java`	Replaced PageFrameSequence with UnorderedPageFrameSequence; consolidated filter state into AsyncFilterContext; updated frame dispatch, memory pool navigation, and late-materialization logic to unordered model.
First Aggregate Functions `core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstArrayGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/First...GroupByFunction.java` (27 functions total)	Updated computeNext logic to conditionally invoke computeFirst when incoming rowId is smaller than stored rowId, enabling correct first-value selection under out-of-order processing.
FirstNotNull Aggregate Functions `core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullArrayGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNull...GroupByFunction.java` (12 functions total)	Modified computeNext to retrieve argument values upfront and update stored state when unset or when current rowId is smaller than stored rowId, ensuring non-null-first with earliest-rowId precedence.
Last Aggregate Functions `core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastArrayGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/Last...GroupByFunction.java` (27 functions total)	Added conditional guards in computeNext to only invoke computeFirst when incoming rowId is greater than stored rowId, implementing strict last-value semantics under out-of-order processing.
LastNotNull Aggregate Functions `core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullArrayGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java`, `core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNull...GroupByFunction.java` (12 functions total)	Tightened update conditions to only call computeFirst when stored value is unset OR current rowId is greater than stored rowId, preserving last-not-null semantics during unordered processing.
Benchmark Utilities `benchmarks/src/main/java/org/questdb/HolBlockingBenchmark.java`	New benchmark tool measuring query latency and throughput for GROUP BY/top-K queries under varying concurrency to expose head-of-line blocking.
Benchmark Driver Updates `benchmarks/src/main/java/org/questdb/IlpAndPgWireInsertSelectBenchmark.java`, `benchmarks/src/main/java/org/questdb/IlpArrayBenchmark.java`, `benchmarks/src/main/java/org/questdb/IlpProtocolV2Benchmark.java`, `benchmarks/src/main/java/org/questdb/IlpTimestampColumnBenchmark.java`, `benchmarks/src/main/java/org/questdb/PGSymbolInBenchmark.java`	Removed explicit PostgreSQL driver registration via Class.forName(), relying on automatic JDBC driver loading instead.
Code Generation `core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java`	Removed reduceTaskFactory argument from three code generation call sites (generateOrderBy, generateSelectGroupBy, and parallel-filter scaffolding).
Async Filter Factory `core/src/main/java/io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java`	Changed prepareBindVarMemory invocation from static import to explicit AsyncFilterUtils qualification.
Test Updates `core/src/test/java/io/questdb/test/PropServerConfigurationTest.java`, `core/src/test/java/io/questdb/test/ServerMainTest.java`, `core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java`, `core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java`, `core/src/test/java/io/questdb/test/griffin/engine/table/AsyncFilteredRecordCursorFactoryTest.java`	Updated test expectations and error message assertions to reflect new reduce-queue error classification ("unexpected reduce error" vs. "unexpected filter error"); added configuration property assertion for new queue capacity.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

PR #6130: Modifies AsyncGroupByNotKeyedRecordCursorFactory and related async group-by cursor infrastructure, directly overlapping with frame-sequencing refactoring in this PR.
PR #6482: Adds and relocates prepareBindVarMemory in AsyncFilterUtils, directly related to async filter initialization changes in this PR.
PR #6432: Touches async GROUP BY frame-reduce pathway, queue/config keys, and frame-sequencing/reducer refactoring, closely related to this PR's unordered frame execution model.

Suggested reviewers

mtopolnik
bluestreak01

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: improving performance of parallel GROUP BY and Top K queries, which is the core objective of this PR.
Linked Issues check	✅ Passed	All major objectives from issue `#6655` are met: unordered PageFrameSequence implemented with SOUnboundedCountDownLatch, applied to GROUP BY and Top K factories, work-stealing preserved, and tests added for correctness and performance.
Out of Scope Changes check	✅ Passed	All changes are within scope of `#6655`. Documentation updates (CLAUDE.md) and JDBC driver loading removal are incidental improvements. Configuration files, tests, and benchmark tools support the main objective.
Description check	✅ Passed	The PR description provides comprehensive context on the changes, including technical details, benchmark results, and test plans.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mt_pageframeseq

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

puzpuzpuz · 2026-02-06T19:25:22Z

@mtopolnik there is a number of downsides in reusing the same reduce job + task + sequence. The thing is that PageFrameReduceTask is rather "heavy" as it includes filtered row id list and page frame memory pool, but what's even more important is that the tasks are meant to be only released either when the filtered rows are iterated (filter factory case) or when the rows are aggregated (group by factory case). On the other hand, in case of "unordered" tasks, we can copy the task fields and release it immediately. Also, we won't need shard queues and can make the queue capacity larger than the page frame task queue without worrying about the memory footprint.

- Free frameFilteredMemoryRecords and ownerPageFrameFiltered record in AsyncGroupByNotKeyedAtom.close() to match AsyncGroupByAtom - Free ownerRecordA in AsyncTopKAtom.clear() alongside ownerRecordB - Fix copyright year in UnorderedPageFrameReduceTask (2024 -> 2026) - Remove redundant circuit breaker init in reduceLocally() Co-Authored-By: Claude Opus 4.6 <[email protected]>

Wrap the dispatch loop in try/finally so that queuedCount is always set, even when reduceLocally() throws. Without this, await() in close() sees queuedCount == 0 and returns immediately, allowing reset() to free the frameAddressCache while worker threads may still be processing in-flight tasks. Also add frameSequenceId to UnorderedPageFrameReduceTask with an assert-level validation in consumeQueue() to detect stale tasks from a previous query execution (defense-in-depth, matching the pattern used by the ordered PageFrameReduceJob). Remove unnecessary volatile on queuedCount since it is only accessed by the owner thread. Co-Authored-By: Claude Opus 4.6 <[email protected]>

- Replace assert-only stale task check in UnorderedPageFrameReduceJob with a runtime guard that logs and skips the task without touching the latch or error state. - Remove unused toTop() from UnorderedPageFrameSequence. - Call setError() in reduceLocally() before re-throwing so that the error state is consistent with the worker path. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The stealWork() method can process tasks from any UnorderedPageFrameSequence, not just the owner's, which re-initializes localRecord for a foreign symbol table. Add a comment warning callers not to rely on localRecord state after the call. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Work-stealing via consumeQueue may re-initialize the owner's circuit breaker wrapper with a foreign sequence's delegate. Restore it after each stealWork() call to prevent false cancellation of the owner's query. Co-Authored-By: Claude Opus 4.6 <[email protected]>

The three async atom classes (AsyncGroupByAtom, AsyncGroupByNotKeyedAtom, AsyncTopKAtom) shared ~170 lines of identical filter and memory-pool infrastructure: 16 fields, 12 getter methods, constructor allocation, close/clear logic, and filter initialization. Extract the shared code into a new AsyncFilterContext composition helper. Each atom now holds a single filterCtx field and delegates filter-related operations to it. Call sites in the 3 factory classes and 3 cursor classes use atom.getFilterContext().getXxx() to access the shared methods. Co-Authored-By: Claude Opus 4.6 <[email protected]>

JDBC-based benchmark that measures query latency and throughput for GROUP BY and TOP-K queries under varying concurrency (1, 2, 4, 8). Creates a skewed-partition table with 500 hourly partitions: 10 large (500K rows) interleaved among 490 small (10K rows) to expose head-of-line blocking in the ordered PageFrameSequence. Co-Authored-By: Claude Opus 4.6 <[email protected]>

JDBC 4.0 auto-discovers drivers via ServiceLoader, so the explicit Class.forName("org.postgresql.Driver") calls are unnecessary. Also revert the module-info requires added in the previous commit since there is no longer any reflection on the PG driver class. Co-Authored-By: Claude Opus 4.6 <[email protected]>

mtopolnik · 2026-02-08T17:22:52Z

@puzpuzpuz I did a new implementation from scratch and force-pushed it. I think this tracks closer to the ideas you laid out in the issue.

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterUtils.java

...c/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullUuidGroupByFunction.java

When reduceLocally() throws, call cancel() on the frame sequence so that other worker threads see it as inactive and skip remaining tasks instead of wasting CPU. Co-Authored-By: Claude Opus 4.6 <[email protected]>

When parallel GROUP BY merges results from different workers, merge() checked only destRowId == LONG_NULL to detect an uninitialized destination. But computeFirst() stores a valid rowId even when the value is null, so a shard where all rows have null values gets a real rowId. When a second shard with non-null values (but a higher rowId) merges into it, the condition srcRowId < destRowId fails and destRowId != LONG_NULL, so the non-null value is incorrectly discarded. The fix adds a dest-value null check to the merge condition in all 22 affected merge() methods across 14 FirstNotNull* GroupByFunction classes (char, date, double, float, int, long, str, symbol, timestamp, uuid, varchar, geohash×4, IPv4, decimal×6). The Array variant was already correct. Co-Authored-By: Claude Opus 4.6 <[email protected]>

glasstiger · 2026-02-17T11:51:46Z

[PR Coverage check]

😍 pass : 744 / 853 (87.22%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java	0	1	00.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullDecimalGroupByFunctionFactory.java	0	10	00.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullGeoHashGroupByFunctionFactory.java	0	4	00.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstStrGroupByFunction.java	1	10	10.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstVarcharGroupByFunction.java	1	10	10.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstArrayGroupByFunction.java	1	9	11.11%
🔵	io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstUuidGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstSymbolGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java	1	2	50.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java	1	2	50.00%
🔵	io/questdb/cairo/sql/async/UnorderedPageFrameSequence.java	142	181	78.45%
🔵	io/questdb/PropServerConfiguration.java	44	55	80.00%
🔵	io/questdb/cairo/sql/async/UnorderedPageFrameReduceJob.java	51	56	91.07%
🔵	io/questdb/griffin/engine/table/AsyncFilterContext.java	106	110	96.36%
🔵	io/questdb/griffin/engine/functions/groupby/LastUuidGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastSymbolGroupByFunction.java	2	2	100.00%
🔵	io/questdb/cairo/sql/async/PageFrameSequence.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastDoubleGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullUuidGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullTimestampGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/table/AsyncTopKRecordCursorFactory.java	34	34	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullFloatGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastCharGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullArrayGroupByFunction.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullSymbolGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastFloatGroupByFunction.java	2	2	100.00%
🔵	io/questdb/cairo/sql/async/PageFrameReduceJob.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastLongGroupByFunction.java	2	2	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastArrayGroupByFunction.java	6	6	100.00%
🔵	io/questdb/cairo/sql/async/UnorderedPageFrameReduceTask.java	14	14	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastDateGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullLongGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullIntGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastIPv4GroupByFunction.java	2	2	100.00%
🔵	io/questdb/PropertyKey.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/AsyncTopKAtom.java	9	9	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullDateGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullVarcharGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastByteGroupByFunction.java	2	2	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullStrGroupByFunction.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastShortGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullLongGroupByFunction.java	6	6	100.00%
🔵	io/questdb/MessageBusImpl.java	9	9	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java	38	38	100.00%
🔵	io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java	5	5	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastIntGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullStrGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastStrGroupByFunction.java	10	10	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullDateGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/AsyncFilterUtils.java	12	12	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByAtom.java	26	26	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullUuidGroupByFunction.java	7	7	100.00%
🔵	io/questdb/mp/WorkerPoolUtils.java	3	3	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastTimestampGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullIntGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java	38	38	100.00%
🔵	io/questdb/cairo/sql/async/PageFrameReduceTask.java	12	12	100.00%
🔵	io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastVarcharGroupByFunction.java	10	10	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java	4	4	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullVarcharGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullDoubleGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstNotNullSymbolGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullArrayGroupByFunction.java	6	6	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/functions/groupby/LastNotNullFloatGroupByFunction.java	2	2	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java	13	13	100.00%

mtopolnik added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Feb 6, 2026

mtopolnik changed the title ~~perf(sql): eliminate head-of-line blocking in PageFrameSequence~~ perf(sql): speed up parallel GROUP BY and Top K queries Feb 6, 2026

mtopolnik added 2 commits February 7, 2026 13:51

Improve CLAUDE.md

bd435b7

Initial impl

03f784b

mtopolnik force-pushed the mt_pageframeseq branch from 4152fb5 to 03f784b Compare February 8, 2026 08:50

mtopolnik and others added 21 commits February 8, 2026 10:55

Don't assume ordered processing in first()/last()

155159f

Adjust error message

cdd2409

Add new config property everywhere

57a3cdd

Merge branch 'master' into mt_pageframeseq

855e113

Address review

71e7c0d

Formatting lint fix

ef4c30e

Reorder code

e5d4112

Remove duplicated line

8d01e65

Avoid static imports

710a388

Improve error message

7e1bf68

One more error message adjustment in test

8d01f8d

Eliminate duplicated block

5828e0b

Add postgresql jdbc as module requirement

851aff7

puzpuzpuz reviewed Feb 10, 2026

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java Outdated Show resolved Hide resolved

puzpuzpuz reviewed Feb 10, 2026

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java Outdated Show resolved Hide resolved

puzpuzpuz reviewed Feb 10, 2026

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterUtils.java Outdated Show resolved Hide resolved

puzpuzpuz reviewed Feb 10, 2026

View reviewed changes

...c/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullUuidGroupByFunction.java Show resolved Hide resolved

mtopolnik and others added 18 commits February 10, 2026 12:02

Add comment explaining assertion

67efce0

Lint fixes

ca4f2dd

Cancel sequence on local reduce error

8e548d3

When reduceLocally() throws, call cancel() on the frame sequence so that other worker threads see it as inactive and skip remaining tasks instead of wasting CPU. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Better test naming

f64643b

long -> int variable type

4baa673

Eliminate duplicated code block

22c399b

CLAUDE.md cleanup

f5c5f63

Add failing test with all-NULL partition

6bfe639

Merge branch 'master' into mt_pageframeseq

19fba9c

Lint fixes

f6927e0

Clean up CLAUDE.md

f05ad77

Merge remote-tracking branch 'origin/master' into mt_pageframeseq

555d2b7

Merge branch 'master' into mt_pageframeseq

5749fd0

Reorder PropServerConfiguration members

24e0bc3

Merge remote-tracking branch 'upstream/master' into mt_pageframeseq

29e6504

Simplify clarifying comment

6a91abf

Minor code simplification

e2f60a2

puzpuzpuz self-requested a review February 16, 2026 08:22

puzpuzpuz approved these changes Feb 16, 2026

View reviewed changes

Merge branch 'master' into mt_pageframeseq

8e5378d

bluestreak01 merged commit b06bd2a into master Feb 17, 2026
44 checks passed

bluestreak01 deleted the mt_pageframeseq branch February 17, 2026 19:30

maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026

perf(sql): speed up parallel GROUP BY and Top K queries (questdb#6754)

877bdac

nwoolmer pushed a commit that referenced this pull request Feb 19, 2026

perf(sql): speed up parallel GROUP BY and Top K queries (#6754)

7051095

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): speed up parallel GROUP BY and Top K queries#6754

perf(sql): speed up parallel GROUP BY and Top K queries#6754
bluestreak01 merged 56 commits intomasterfrom
mt_pageframeseq

mtopolnik commented Feb 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Review skipped

Uh oh!

puzpuzpuz commented Feb 6, 2026

Uh oh!

mtopolnik commented Feb 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mtopolnik commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark results

Raw numbers

Analysis

Per-iteration spread (max − min across threads)

Tail latency (p99/max)

Latency predictability (p99/p50 ratio)

Throughput at concurrency 8

Throughput scaling from concurrency 1 to 2

Single-thread regression

Test plan

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

puzpuzpuz commented Feb 6, 2026

Uh oh!

mtopolnik commented Feb 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glasstiger commented Feb 17, 2026

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mtopolnik commented Feb 6, 2026 •

edited

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading