Skip to content

perf(sql): speed up parallel GROUP BY and Top K queries#6754

Merged
bluestreak01 merged 56 commits intomasterfrom
mt_pageframeseq
Feb 17, 2026
Merged

perf(sql): speed up parallel GROUP BY and Top K queries#6754
bluestreak01 merged 56 commits intomasterfrom
mt_pageframeseq

Conversation

@mtopolnik
Copy link
Copy Markdown
Contributor

@mtopolnik mtopolnik commented Feb 6, 2026

Closes #6655

Summary

  • Add an unordered mode to PageFrameSequence that uses SOUnboundedCountDownLatch instead of ordered collection via collectSubSeq, eliminating head-of-line blocking for reducers that don't need ordered results
  • Convert GROUP BY (keyed), GROUP BY (not-keyed), and Top K factories to use the new unordered mode, replacing ~50 lines of boilerplate ordered collect loops with a single dispatchAllAndAwait() call in each cursor
  • Copy error fields in storeError() into a dedicated CairoException instance to prevent thread-local exception recycling from corrupting stored error data

Benchmark results

JDBC client benchmark (HolBlockingBenchmark) with a skewed-partition table designed to provoke HOL blocking: 50 hourly partitions (45 × 1K rows + 5 × 1M rows, ~5M rows total). Large partitions appear every 10th hour (1000:1 size ratio). Queries use expensive per-row math (sqrt, log) to amplify the processing time difference between large and small partitions. The benchmark also reports per-iteration "spread" (max − min latency across concurrent threads), which directly measures how much the slowest thread lags behind the fastest in each batch.

Run with mvn -pl benchmarks package -DskipTests && java -cp benchmarks/target/benchmarks.jar org.questdb.HolBlockingBenchmark.

Raw numbers

master
Query [GroupBy]: SELECT category, count(), sum(sqrt(value)) FROM events GROUP BY category
  Concurrency 1:   avg=10.6ms  p50=9.5ms  p90=14.4ms  p99=20.5ms  max=26.5ms  qps=80
  Concurrency 2:   avg=41.4ms  p50=45.6ms  p90=59.2ms  p99=78.9ms  max=80.7ms  qps=43  spread(avg=10.1ms p50=7.7ms p99=37.6ms)
  Concurrency 4:   avg=43.2ms  p50=40.2ms  p90=65.8ms  p99=91.1ms  max=113.7ms  qps=82  spread(avg=35.2ms p50=34.3ms p99=78.6ms)
  Concurrency 8:   avg=90.0ms  p50=83.7ms  p90=150.1ms  p99=214.5ms  max=256.0ms  qps=79  spread(avg=121.1ms p50=115.7ms p99=211.4ms)

Query [GroupByNotKeyed]: SELECT count(), sum(log(value + 1)) FROM events
  Concurrency 1:   avg=21.1ms  p50=20.6ms  p90=24.1ms  p99=28.1ms  max=28.5ms  qps=42
  Concurrency 2:   avg=54.4ms  p50=54.5ms  p90=70.9ms  p99=86.3ms  max=89.0ms  qps=33  spread(avg=15.2ms p50=13.3ms p99=43.5ms)
  Concurrency 4:   avg=84.9ms  p50=80.1ms  p90=123.3ms  p99=169.0ms  max=186.6ms  qps=42  spread(avg=61.0ms p50=55.4ms p99=125.9ms)
  Concurrency 8:   avg=162.3ms  p50=150.2ms  p90=262.8ms  p99=379.2ms  max=613.4ms  qps=44  spread(avg=205.6ms p50=188.0ms p99=395.3ms)

Query [GroupByFiltered]: SELECT category, count(), sum(sqrt(value)) FROM events WHERE value > 500000 GROUP BY category
  Concurrency 1:   avg=9.8ms  p50=9.6ms  p90=11.5ms  p99=15.3ms  max=15.8ms  qps=92
  Concurrency 2:   avg=19.7ms  p50=19.1ms  p90=25.8ms  p99=36.6ms  max=39.4ms  qps=90  spread(avg=4.5ms p50=3.3ms p99=20.0ms)
  Concurrency 4:   avg=37.1ms  p50=33.7ms  p90=58.7ms  p99=81.2ms  max=95.9ms  qps=95  spread(avg=30.2ms p50=27.7ms p99=68.4ms)
  Concurrency 8:   avg=73.9ms  p50=67.0ms  p90=124.5ms  p99=183.8ms  max=219.0ms  qps=96  spread(avg=99.3ms p50=95.4ms p99=194.8ms)

Query [TopK]: SELECT ts, category, value FROM events ORDER BY value DESC LIMIT 10
  Concurrency 1:   avg=3.7ms  p50=3.5ms  p90=4.4ms  p99=6.8ms  max=7.0ms  qps=244
  Concurrency 2:   avg=7.0ms  p50=6.6ms  p90=9.6ms  p99=14.5ms  max=21.6ms  qps=254  spread(avg=2.0ms p50=1.2ms p99=9.3ms)
  Concurrency 4:   avg=13.1ms  p50=12.2ms  p90=20.8ms  p99=31.2ms  max=41.9ms  qps=270  spread(avg=11.9ms p50=10.4ms p99=32.0ms)
  Concurrency 8:   avg=23.2ms  p50=19.4ms  p90=43.7ms  p99=69.8ms  max=95.2ms  qps=291  spread(avg=38.6ms p50=36.6ms p99=72.1ms)
This PR
Query [GroupBy]: SELECT category, count(), sum(sqrt(value)) FROM events GROUP BY category
  Concurrency 1:   avg=14.2ms  p50=12.8ms  p90=20.3ms  p99=21.5ms  max=22.1ms  qps=63
  Concurrency 2:   avg=25.3ms  p50=24.8ms  p90=34.3ms  p99=38.6ms  max=43.8ms  qps=71  spread(avg=7.4ms p50=6.3ms p99=21.4ms)
  Concurrency 4:   avg=48.3ms  p50=47.4ms  p90=62.7ms  p99=83.4ms  max=115.9ms  qps=74  spread(avg=25.0ms p50=23.2ms p99=70.3ms)
  Concurrency 8:   avg=92.9ms  p50=91.5ms  p90=112.2ms  p99=149.3ms  max=197.4ms  qps=77  spread(avg=39.6ms p50=34.4ms p99=114.3ms)

Query [GroupByNotKeyed]: SELECT count(), sum(log(value + 1)) FROM events
  Concurrency 1:   avg=20.7ms  p50=20.3ms  p90=23.3ms  p99=27.8ms  max=29.2ms  qps=44
  Concurrency 2:   avg=38.1ms  p50=37.7ms  p90=48.9ms  p99=58.1ms  max=58.3ms  qps=47  spread(avg=9.5ms p50=9.0ms p99=28.3ms)
  Concurrency 4:   avg=76.2ms  p50=75.0ms  p90=95.2ms  p99=116.2ms  max=146.8ms  qps=47  spread(avg=29.6ms p50=25.7ms p99=85.0ms)
  Concurrency 8:   avg=151.3ms  p50=151.0ms  p90=172.7ms  p99=212.2ms  max=258.7ms  qps=48  spread(avg=51.2ms p50=42.5ms p99=132.2ms)

Query [GroupByFiltered]: SELECT category, count(), sum(sqrt(value)) FROM events WHERE value > 500000 GROUP BY category
  Concurrency 1:   avg=10.1ms  p50=9.4ms  p90=12.3ms  p99=17.9ms  max=21.0ms  qps=89
  Concurrency 2:   avg=18.5ms  p50=18.2ms  p90=25.1ms  p99=31.3ms  max=35.6ms  qps=94  spread(avg=5.6ms p50=4.2ms p99=17.1ms)
  Concurrency 4:   avg=38.3ms  p50=37.7ms  p90=51.7ms  p99=60.2ms  max=79.2ms  qps=93  spread(avg=19.2ms p50=17.9ms p99=42.1ms)
  Concurrency 8:   avg=70.3ms  p50=69.5ms  p90=86.5ms  p99=111.2ms  max=135.1ms  qps=102  spread(avg=37.2ms p50=32.5ms p99=77.8ms)

Query [TopK]: SELECT ts, category, value FROM events ORDER BY value DESC LIMIT 10
  Concurrency 1:   avg=3.5ms  p50=3.3ms  p90=3.9ms  p99=6.9ms  max=12.1ms  qps=257
  Concurrency 2:   avg=6.7ms  p50=6.4ms  p90=8.9ms  p99=13.3ms  max=15.5ms  qps=269  spread(avg=1.7ms p50=0.8ms p99=8.5ms)
  Concurrency 4:   avg=11.7ms  p50=11.3ms  p90=17.9ms  p99=26.4ms  max=31.2ms  qps=305  spread(avg=9.0ms p50=8.8ms p99=19.2ms)
  Concurrency 8:   avg=22.7ms  p50=22.2ms  p90=31.2ms  p99=40.8ms  max=54.6ms  qps=310  spread(avg=18.0ms p50=17.0ms p99=38.1ms)

Analysis

Per-iteration spread (max − min across threads)

Spread measures how much the slowest thread lags behind the fastest in each batch — a direct proxy for HOL blocking.

Query @ concurrency 8 spread avg (PR) spread avg (master) Change
GroupByNotKeyed 51.2ms 205.6ms -75%
GroupBy 39.6ms 121.1ms -67%
GroupByFiltered 37.2ms 99.3ms -63%
TopK 18.0ms 38.6ms -53%

At concurrency 4 the reductions are smaller (24–51%).

Tail latency (p99/max)

Query @ concurrency 8 p99 (PR) p99 (master) Change max (PR) max (master) Change
GroupByNotKeyed 212.2ms 379.2ms -44% 258.7ms 613.4ms -58%
TopK 40.8ms 69.8ms -42% 54.6ms 95.2ms -43%
GroupByFiltered 111.2ms 183.8ms -40% 135.1ms 219.0ms -38%
GroupBy 149.3ms 214.5ms -30% 197.4ms 256.0ms -23%

Latency predictability (p99/p50 ratio)

Query @ concurrency 8 p99/p50 (PR) p99/p50 (master)
GroupByNotKeyed 1.4x 2.5x
GroupBy 1.6x 2.6x
GroupByFiltered 1.6x 2.7x
TopK 1.8x 3.6x

Throughput at concurrency 8

Query qps (PR) qps (master) Change
TopK 310 291 +7%
GroupByNotKeyed 48 44 +9%
GroupByFiltered 102 96 +6%
GroupBy 77 79 -3%

GroupBy throughput is slightly lower on the PR at concurrency 8.

Throughput scaling from concurrency 1 to 2

On master, GroupBy and GroupByNotKeyed throughput drops when adding a second concurrent thread:

Query qps c=1 → c=2 (master) qps c=1 → c=2 (PR)
GroupBy 80 → 43 (-46%) 63 → 71 (+13%)
GroupByNotKeyed 42 → 33 (-21%) 44 → 47 (+7%)

The PR scales monotonically; master regresses at concurrency 2 before recovering at concurrency 4.

Single-thread regression

Query avg c=1 (PR) avg c=1 (master) Change
GroupBy 14.2ms 10.6ms +34%
GroupByFiltered 10.1ms 9.8ms +3%
GroupByNotKeyed 20.7ms 21.1ms -2%
TopK 3.5ms 3.7ms -5%

GroupBy shows a 34% single-thread regression (14.2ms vs 10.6ms). GroupByFiltered, GroupByNotKeyed, and TopK are within noise. The GroupBy regression does not appear at concurrency 8 (92.9ms vs 90.0ms, +3%), suggesting the single-thread ordered path on master benefits from some caching or prefetch effect that the unordered path does not exploit. This warrants further investigation.

Test plan

  • Run ParallelTopKFuzzTest — includes new tests for fault tolerance (NPE in filter), query timeout, empty table, and concurrent execution (8 threads x 50 iterations)
  • Run ParallelGroupByFuzzTest — verifies error message assertions match the new error path through storeError()/throwStoredError()
  • Run the full test suite to verify no regressions in ordered PageFrameSequence behavior (filter-only, window join paths remain unchanged)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This pull request introduces an unordered, head-of-line-blocking-free frame processing pathway for parallel GROUP BY and top-K operations. It adds new async reduction components (UnorderedPageFrameSequence, UnorderedPageFrameReduceTask/Reducer, UnorderedPageFrameReduceJob), consolidates filter configuration into AsyncFilterContext, refactors async group-by and top-K factories to use the new sequencing model, updates First/Last aggregate functions to handle out-of-order row processing, and removes explicit PostgreSQL driver loading from benchmarks.

Changes

Cohort / File(s) Summary
Documentation and Configuration
CLAUDE.md, core/src/main/resources/io/questdb/site/conf/server.conf, pkg/ami/marketplace/assets/server.conf, core/src/test/resources/server.conf
Added extensive documentation sections to CLAUDE.md; introduced new configuration property cairo.unordered.page.frame.reduce.queue.capacity with default/test values of 4096/2048.
Unordered Frame Reduction Framework
core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReduceTask.java, core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReducer.java, core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameReduceJob.java, core/src/main/java/io/questdb/cairo/sql/async/UnorderedPageFrameSequence.java
New classes and interfaces implementing unordered frame reduction: task payload, functional reducer interface, job consumer, and core sequencing logic with work-stealing and circuit-breaker integration.
Filter Configuration Consolidation
core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterContext.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncFilterUtils.java
New AsyncFilterContext class consolidates filter-related state and accessors; AsyncFilterUtils adds compiled-filter overload accepting explicit memory and address caches.
MessageBus Integration
core/src/main/java/io/questdb/MessageBus.java, core/src/main/java/io/questdb/MessageBusImpl.java, core/src/main/java/io/questdb/mp/WorkerPoolUtils.java
Added public accessors for unordered reduce queue/sequences in MessageBus; implemented initialization and wiring in MessageBusImpl; integrated UnorderedPageFrameReduceJob per-worker instantiation in WorkerPoolUtils.
Configuration API Expansion
core/src/main/java/io/questdb/PropertyKey.java, core/src/main/java/io/questdb/PropServerConfiguration.java, core/src/main/java/io/questdb/cairo/CairoConfiguration.java, core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java, core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java
Added new property key and corresponding getters across configuration hierarchy to expose unordered reduce queue capacity (default 4096, power-of-two aligned).
Async Group-By Refactoring
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java
Replaced PageFrameSequence with UnorderedPageFrameSequence; migrated filter configuration to AsyncFilterContext; removed reduce-task-factory wiring; refactored frame dispatch and memory management to frameIndex-based model.
Async Top-K Refactoring
core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKAtom.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java, core/src/main/java/io/questdb/griffin/engine/table/AsyncTopKRecordCursorFactory.java
Replaced PageFrameSequence with UnorderedPageFrameSequence; consolidated filter state into AsyncFilterContext; updated frame dispatch, memory pool navigation, and late-materialization logic to unordered model.
First Aggregate Functions
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstArrayGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/First...GroupByFunction.java (27 functions total)
Updated computeNext logic to conditionally invoke computeFirst when incoming rowId is smaller than stored rowId, enabling correct first-value selection under out-of-order processing.
FirstNotNull Aggregate Functions
core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullArrayGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/FirstNotNull...GroupByFunction.java (12 functions total)
Modified computeNext to retrieve argument values upfront and update stored state when unset or when current rowId is smaller than stored rowId, ensuring non-null-first with earliest-rowId precedence.
Last Aggregate Functions
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastArrayGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/Last...GroupByFunction.java (27 functions total)
Added conditional guards in computeNext to only invoke computeFirst when incoming rowId is greater than stored rowId, implementing strict last-value semantics under out-of-order processing.
LastNotNull Aggregate Functions
core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullArrayGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java, core/src/main/java/io/questdb/griffin/engine/functions/groupby/LastNotNull...GroupByFunction.java (12 functions total)
Tightened update conditions to only call computeFirst when stored value is unset OR current rowId is greater than stored rowId, preserving last-not-null semantics during unordered processing.
Benchmark Utilities
benchmarks/src/main/java/org/questdb/HolBlockingBenchmark.java
New benchmark tool measuring query latency and throughput for GROUP BY/top-K queries under varying concurrency to expose head-of-line blocking.
Benchmark Driver Updates
benchmarks/src/main/java/org/questdb/IlpAndPgWireInsertSelectBenchmark.java, benchmarks/src/main/java/org/questdb/IlpArrayBenchmark.java, benchmarks/src/main/java/org/questdb/IlpProtocolV2Benchmark.java, benchmarks/src/main/java/org/questdb/IlpTimestampColumnBenchmark.java, benchmarks/src/main/java/org/questdb/PGSymbolInBenchmark.java
Removed explicit PostgreSQL driver registration via Class.forName(), relying on automatic JDBC driver loading instead.
Code Generation
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
Removed reduceTaskFactory argument from three code generation call sites (generateOrderBy, generateSelectGroupBy, and parallel-filter scaffolding).
Async Filter Factory
core/src/main/java/io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java
Changed prepareBindVarMemory invocation from static import to explicit AsyncFilterUtils qualification.
Test Updates
core/src/test/java/io/questdb/test/PropServerConfigurationTest.java, core/src/test/java/io/questdb/test/ServerMainTest.java, core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java, core/src/test/java/io/questdb/test/cairo/mv/MatViewTest.java, core/src/test/java/io/questdb/test/griffin/engine/table/AsyncFilteredRecordCursorFactoryTest.java
Updated test expectations and error message assertions to reflect new reduce-queue error classification ("unexpected reduce error" vs. "unexpected filter error"); added configuration property assertion for new queue capacity.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #6130: Modifies AsyncGroupByNotKeyedRecordCursorFactory and related async group-by cursor infrastructure, directly overlapping with frame-sequencing refactoring in this PR.
  • PR #6482: Adds and relocates prepareBindVarMemory in AsyncFilterUtils, directly related to async filter initialization changes in this PR.
  • PR #6432: Touches async GROUP BY frame-reduce pathway, queue/config keys, and frame-sequencing/reducer refactoring, closely related to this PR's unordered frame execution model.

Suggested reviewers

  • mtopolnik
  • bluestreak01
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.17% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: improving performance of parallel GROUP BY and Top K queries, which is the core objective of this PR.
Linked Issues check ✅ Passed All major objectives from issue #6655 are met: unordered PageFrameSequence implemented with SOUnboundedCountDownLatch, applied to GROUP BY and Top K factories, work-stealing preserved, and tests added for correctness and performance.
Out of Scope Changes check ✅ Passed All changes are within scope of #6655. Documentation updates (CLAUDE.md) and JDBC driver loading removal are incidental improvements. Configuration files, tests, and benchmark tools support the main objective.
Description check ✅ Passed The PR description provides comprehensive context on the changes, including technical details, benchmark results, and test plans.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mt_pageframeseq

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mtopolnik mtopolnik added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Feb 6, 2026
@mtopolnik mtopolnik changed the title perf(sql): eliminate head-of-line blocking in PageFrameSequence perf(sql): speed up parallel GROUP BY and Top K queries Feb 6, 2026
@puzpuzpuz
Copy link
Copy Markdown
Contributor

@mtopolnik there is a number of downsides in reusing the same reduce job + task + sequence. The thing is that PageFrameReduceTask is rather "heavy" as it includes filtered row id list and page frame memory pool, but what's even more important is that the tasks are meant to be only released either when the filtered rows are iterated (filter factory case) or when the rows are aggregated (group by factory case). On the other hand, in case of "unordered" tasks, we can copy the task fields and release it immediately. Also, we won't need shard queues and can make the queue capacity larger than the page frame task queue without worrying about the memory footprint.

mtopolnik and others added 21 commits February 8, 2026 10:55
- Free frameFilteredMemoryRecords and ownerPageFrameFiltered record
  in AsyncGroupByNotKeyedAtom.close() to match AsyncGroupByAtom
- Free ownerRecordA in AsyncTopKAtom.clear() alongside ownerRecordB
- Fix copyright year in UnorderedPageFrameReduceTask (2024 -> 2026)
- Remove redundant circuit breaker init in reduceLocally()

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Wrap the dispatch loop in try/finally so that queuedCount is always
set, even when reduceLocally() throws. Without this, await() in
close() sees queuedCount == 0 and returns immediately, allowing
reset() to free the frameAddressCache while worker threads may still
be processing in-flight tasks.

Also add frameSequenceId to UnorderedPageFrameReduceTask with an
assert-level validation in consumeQueue() to detect stale tasks
from a previous query execution (defense-in-depth, matching the
pattern used by the ordered PageFrameReduceJob).

Remove unnecessary volatile on queuedCount since it is only accessed
by the owner thread.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Replace assert-only stale task check in UnorderedPageFrameReduceJob
  with a runtime guard that logs and skips the task without touching
  the latch or error state.
- Remove unused toTop() from UnorderedPageFrameSequence.
- Call setError() in reduceLocally() before re-throwing so that the
  error state is consistent with the worker path.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The stealWork() method can process tasks from any
UnorderedPageFrameSequence, not just the owner's, which
re-initializes localRecord for a foreign symbol table.
Add a comment warning callers not to rely on localRecord
state after the call.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Work-stealing via consumeQueue may re-initialize the owner's
circuit breaker wrapper with a foreign sequence's delegate.
Restore it after each stealWork() call to prevent false
cancellation of the owner's query.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The three async atom classes (AsyncGroupByAtom, AsyncGroupByNotKeyedAtom,
AsyncTopKAtom) shared ~170 lines of identical filter and memory-pool
infrastructure: 16 fields, 12 getter methods, constructor allocation,
close/clear logic, and filter initialization.

Extract the shared code into a new AsyncFilterContext composition helper.
Each atom now holds a single filterCtx field and delegates filter-related
operations to it. Call sites in the 3 factory classes and 3 cursor classes
use atom.getFilterContext().getXxx() to access the shared methods.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
JDBC-based benchmark that measures query latency and throughput for
GROUP BY and TOP-K queries under varying concurrency (1, 2, 4, 8).
Creates a skewed-partition table with 500 hourly partitions: 10 large
(500K rows) interleaved among 490 small (10K rows) to expose
head-of-line blocking in the ordered PageFrameSequence.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
JDBC 4.0 auto-discovers drivers via ServiceLoader, so the explicit
Class.forName("org.postgresql.Driver") calls are unnecessary. Also
revert the module-info requires added in the previous commit since
there is no longer any reflection on the PG driver class.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@mtopolnik
Copy link
Copy Markdown
Contributor Author

@puzpuzpuz I did a new implementation from scratch and force-pushed it. I think this tracks closer to the ideas you laid out in the issue.

mtopolnik and others added 18 commits February 10, 2026 12:02
When reduceLocally() throws, call cancel() on the frame sequence
so that other worker threads see it as inactive and skip remaining
tasks instead of wasting CPU.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When parallel GROUP BY merges results from different workers,
merge() checked only destRowId == LONG_NULL to detect an
uninitialized destination. But computeFirst() stores a valid
rowId even when the value is null, so a shard where all rows
have null values gets a real rowId. When a second shard with
non-null values (but a higher rowId) merges into it, the
condition srcRowId < destRowId fails and destRowId != LONG_NULL,
so the non-null value is incorrectly discarded.

The fix adds a dest-value null check to the merge condition in
all 22 affected merge() methods across 14 FirstNotNull*
GroupByFunction classes (char, date, double, float, int, long,
str, symbol, timestamp, uuid, varchar, geohash×4, IPv4,
decimal×6). The Array variant was already correct.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@puzpuzpuz puzpuzpuz self-requested a review February 16, 2026 08:22
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 744 / 853 (87.22%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullIPv4GroupByFunctionFactory.java 0 1 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullDecimalGroupByFunctionFactory.java 0 10 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullGeoHashGroupByFunctionFactory.java 0 4 00.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstStrGroupByFunction.java 1 10 10.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstVarcharGroupByFunction.java 1 10 10.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstArrayGroupByFunction.java 1 9 11.11%
🔵 io/questdb/griffin/engine/functions/groupby/FirstCharGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstBooleanGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstUuidGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstDoubleGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstSymbolGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstIPv4GroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstLongGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstTimestampGroupByFunction.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstDateGroupByFunction.java 1 2 50.00%
🔵 io/questdb/cairo/sql/async/UnorderedPageFrameSequence.java 142 181 78.45%
🔵 io/questdb/PropServerConfiguration.java 44 55 80.00%
🔵 io/questdb/cairo/sql/async/UnorderedPageFrameReduceJob.java 51 56 91.07%
🔵 io/questdb/griffin/engine/table/AsyncFilterContext.java 106 110 96.36%
🔵 io/questdb/griffin/engine/functions/groupby/LastUuidGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstIntGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstShortGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastSymbolGroupByFunction.java 2 2 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameSequence.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastDoubleGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullUuidGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstFloatGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullTimestampGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/table/AsyncTopKRecordCursorFactory.java 34 34 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullFloatGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastCharGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullArrayGroupByFunction.java 3 3 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullSymbolGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastFloatGroupByFunction.java 2 2 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameReduceJob.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursor.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastLongGroupByFunction.java 2 2 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastArrayGroupByFunction.java 6 6 100.00%
🔵 io/questdb/cairo/sql/async/UnorderedPageFrameReduceTask.java 14 14 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullDoubleGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastDateGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullLongGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullIntGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastIPv4GroupByFunction.java 2 2 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/griffin/engine/table/AsyncTopKAtom.java 9 9 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullCharGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullDateGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastBooleanGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullVarcharGroupByFunction.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastByteGroupByFunction.java 2 2 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullStrGroupByFunction.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastShortGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullLongGroupByFunction.java 6 6 100.00%
🔵 io/questdb/MessageBusImpl.java 9 9 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedRecordCursorFactory.java 38 38 100.00%
🔵 io/questdb/griffin/engine/table/AsyncTopKRecordCursor.java 5 5 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastIntGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullStrGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastStrGroupByFunction.java 10 10 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullDateGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullTimestampGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/AsyncFilterUtils.java 12 12 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByAtom.java 26 26 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullUuidGroupByFunction.java 7 7 100.00%
🔵 io/questdb/mp/WorkerPoolUtils.java 3 3 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastTimestampGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullIntGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java 38 38 100.00%
🔵 io/questdb/cairo/sql/async/PageFrameReduceTask.java 12 12 100.00%
🔵 io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastVarcharGroupByFunction.java 10 10 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByRecordCursor.java 4 4 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullVarcharGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullDoubleGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstNotNullSymbolGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/FirstByteGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullArrayGroupByFunction.java 6 6 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullCharGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/functions/groupby/LastNotNullFloatGroupByFunction.java 2 2 100.00%
🔵 io/questdb/griffin/engine/table/AsyncGroupByNotKeyedAtom.java 13 13 100.00%

@bluestreak01 bluestreak01 merged commit b06bd2a into master Feb 17, 2026
44 checks passed
@bluestreak01 bluestreak01 deleted the mt_pageframeseq branch February 17, 2026 19:30
maciulis pushed a commit to maciulis/questdb that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Alternative PageFrameSequence implementation to get rid of Head-of-Line Blocking

4 participants