perf(sql): improve query scalability on large multicore machines by puzpuzpuz · Pull Request #6459 · questdb/questdb

puzpuzpuz · 2025-11-26T14:57:54Z

Includes the following changes aimed to improve query engine scalability on multi-core machines:

Fixes MapFragment#totalFunctionCardinality not being updated in all cases in AsyncGroupAtom. Due to that, sharding stats were flapping from enable sharding to disable sharding between subsequent runs in case of queries with high-cardinality count_distinct().
Cardinality statistics in AsyncGroupAtom are now calculated as a sum of recorded group by function cardinalities instead of a maximum value. This way is a better estimate of the amount of work we need to do to merge all of the functions.
Sets cairo.page.frame.reduce.queue.capacity and cairo.sql.parallel.groupby.merge.shard.queue.capacity to 4 * cpu_cores with a cap of 256. This improves worker thread utilization for parallel GROUP BY and filter queries.
Introduces cairo.sql.parallel.filter.dispatch.limit config option (set to max(cpu_cores, 32) by default) that limits the number of in-flight page frame tasks dispatched by parallel filter factories in case of LIMIT N queries. The goal is improve the latency of LIMIT queries.
Adds padding to PerWorkerLocks to avoid false sharing.
Improves scenario variations in ParallelGroupByFuzzTest and ParallelFilterTest.

Benchmarks

ClickBench difference vs. the master branch on c6a.metal, 500GB gp2:

coderabbitai · 2025-11-26T14:58:00Z

Walkthrough

Adds a new SQL parallel filter dispatch limit configuration parameter and infrastructure to control in-flight task dispatch for parallel filter operations. Introduces new getSqlParallelFilterDispatchLimit() methods across configuration classes, modifies PageFrameSequence.next() to accept a dispatch limit parameter, updates cursor implementations to signal and respect this limit, and applies cache-line padding in PerWorkerLocks to reduce false sharing.

Changes

Cohort / File(s)	Change Summary
Configuration Infrastructure `PropertyKey.java`, `PropServerConfiguration.java`, `CairoConfiguration.java`, `CairoConfigurationWrapper.java`, `DefaultCairoConfiguration.java`	Added new `CAIRO_SQL_PARALLEL_FILTER_DISPATCH_LIMIT` configuration property with default `Math.min(queryWorkers, 32)`. Exposed via new `getSqlParallelFilterDispatchLimit()` getter in configuration chain. Also increased default reduce queue capacity from `min(2 * queryWorkers, 64)` to `min(4 * queryWorkers, 256)`.
Cursor Iteration Signaling `RecordCursor.java`, `LimitedSizePartiallySortedLightRecordCursor.java`, `SelectedRecordCursor.java`, `VirtualFunctionRecordCursor.java`	Added new default method `expectLimitedIteration()` to `RecordCursor` interface; implemented delegating overrides in multiple cursor implementations to propagate limited iteration intent through the cursor chain.
Async Dispatch Control `PageFrameSequence.java`, `AsyncFilteredRecordCursor.java`, `AsyncFilteredNegativeLimitRecordCursor.java`	Modified `PageFrameSequence.next()` to accept optional `int dispatchLimit` parameter via method overload; updated dispatch logic to respect the limit. Added `dispatchLimit` field to async cursor classes and threaded it through to `PageFrameSequence` calls. Implemented `expectLimitedIteration()` in `AsyncFilteredRecordCursor` to set dispatch limit for LIMIT queries.
Performance Optimization `PerWorkerLocks.java`	Introduced cache-line padding with `INTS_PER_SLOT = 64 / Integer.BYTES` to reduce false sharing in atomic lock array. Updated all slot index calculations to use padded offsets (`INTS_PER_SLOT * id`).
Group-by Sharding Refactoring `AsyncGroupByAtom.java`	Refactored cardinality calculations: replaced local `calculateLocalFunctionCardinality()` with new `getTotalFunctionCardinality(slotId)` helper; updated shard sizing logic in `finalizeShardStats()` to compute max heap size across shards before applying uniformly; changed sharding condition to use aggregated function cardinality.
Constructor Parameter Rename `AsyncJitFilteredRecordCursorFactory.java`	Renamed constructor parameter `sharedQueryWorkerCount` to `workerCount` for clarity; updated call site propagation.
Configuration & Test Files `server.conf` (root and marketplace), `PropServerConfigurationTest.java`, `ServerMainTest.java`, `ParallelGroupByFuzzTest.java`, `ParallelFilterTest.java`, `test/resources/server.conf`	Added new `cairo.sql.parallel.filter.dispatch.limit` configuration property to server configuration files. Updated test assertions for new parameter and changed reduce queue capacity expectations. Introduced field-based `Rnd` randomization in fuzz and parallel filter tests with dynamic dispatch limit, shard count, and threshold generation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

PageFrameSequence dispatch logic: Carefully review the new next(int dispatchLimit) method, dispatch condition (totalDispatched < dispatchLimit), and the interaction between overloaded next() and parameterized variant.
PerWorkerLocks atomic operations: Verify cache-line padding calculations with INTS_PER_SLOT and ensure all atomic compareAndSet operations correctly use padded indices without off-by-one errors.
AsyncGroupByAtom cardinality refactoring: Review the new getTotalFunctionCardinality(slotId) aggregation logic, shard stats sizing in finalizeShardStats(), and the condition change from calculateLocalFunctionCardinality() to aggregated cardinality in sharding decisions.
expectLimitedIteration propagation: Trace the signal flow from cursor implementations through to dispatch limit application in AsyncFilteredRecordCursor and AsyncFilteredNegativeLimitRecordCursor.
Test randomization patterns: Verify that new randomized configurations in ParallelGroupByFuzzTest and ParallelFilterTest cover sufficient parameter space and don't mask regressions.

Possibly related PRs

perf(sql): disable column pre-touch in parallel filter #6280: Modifies parallel filter execution classes (AsyncFilteredRecordCursor/Factory and AsyncJitFilteredRecordCursorFactory) with related dispatch and worker parameter changes.
chore(sql): speedup tests #6347: Updates the same test class (ParallelFilterTest) with test wiring and randomization changes overlapping with the main PR's test infrastructure modifications.
perf(core): introduce three shared thread pools - Query, Write and Network #5805: Modifies PageFrameSequence constructor parameter naming and related worker count propagation, interconnected with the main PR's dispatch limit parameter threading.

Suggested reviewers

bluestreak01

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.82% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the primary goal of the changeset: improving query scalability on large multicore machines through configuration and code optimizations.
Description check	✅ Passed	The pull request description directly relates to the changeset, detailing specific improvements to query engine scalability including configuration changes, code modifications for parallelization, and performance benchmarks.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch puzpuzpuz_query_scalability_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…calability_fix

puzpuzpuz · 2025-11-28T13:36:58Z

@coderabbitai review

coderabbitai · 2025-11-28T13:37:09Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

core/src/main/resources/io/questdb/site/conf/server.conf (1)

734-736: Clarify default/auto semantics for the new dispatch limit (optional).

The new option is well placed and described, but consider briefly stating what happens when it’s unset or set to non‑positive values (e.g., auto‑calculated vs. “unlimited”) to keep this consistent with nearby Cairo SQL tuning comments.

core/src/main/java/io/questdb/griffin/engine/PerWorkerLocks.java (1)

41-55: Cache-line padding and index changes look correct; consider reusing a shared cache-line constant.

The padding strategy (INTS_PER_SLOT * workerCount and indexing via INTS_PER_SLOT * id) preserves external semantics (slot remains a logical worker index) and is bounds-safe while reducing false sharing on the lock array. If the codebase already exposes a cache-line size constant (e.g., in Os), you could derive INTS_PER_SLOT from it to avoid duplicating the 64‑byte assumption, but that’s purely cosmetic.

Also applies to: 62-63, 76-77, 87-88

core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1)

2020-2042: Assertion for getSqlParallelFilterDispatchLimit() is appropriate; consider a small default/validation test.

This new check correctly ties the test server.conf value (100) through PropServerConfiguration and the CairoConfiguration/wrapper API. As a follow‑up, you might add a focused test that covers the default dispatch limit when the property is unset (and possibly how zero/negative values are handled) to lock in the intended behavior.

core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (2)

362-378: Revisit “exactly PAGE_FRAME_MAX_ROWS” partition cases vs new MIN-based constants

The branches commented as “Add partition with exactly PAGE_FRAME_MAX_ROWS” / “PAGE_FRAME_MAX_ROWS + 1” now use MIN_PAGE_FRAME_MAX_ROWS instead of the actual configuration.getSqlPageFrameMaxRows(). That means you only hit a true “exactly max rows” partition when the runtime max equals the minimum; otherwise you’re just near the limit. If the intent is to exercise specific equality branches in count_distinct merge logic, consider deriving these row counts from configuration.getSqlPageFrameMaxRows() (or updating comments to say “around the page-frame size”).

1943-1946: Timeout fuzzing via shared rnd is fine; adjust comments to match randomized frame counts

Both timeout tests now choose tripWhenTicks from the shared rnd (nextLong(39) / nextLong(48) with Math.max(10, …)), which is deterministic per test instance and gives a reasonable spread of timeout points. However, the comments stating “Page frame count is 40.” (and “Page frame count is 40 and shard count is 8.”) are no longer strictly accurate once pageFrameMaxRows is randomized, so it would be clearer to soften or update those comments to avoid confusion when debugging.

Also applies to: 2693-2696

core/src/main/java/io/questdb/cairo/sql/async/PageFrameSequence.java (1)

498-507: Dispatch-limit handling in dispatch(int) looks correct; consider guarding non-positive limits

Using totalDispatched = dispatchStartFrameIndex - (collectedFrameIndex + 1) to cap reducePubSeq.next() when totalDispatched >= dispatchLimit gives a clean notion of “in-flight but not yet collected” frames, and treating that as equivalent to a full queue preserves the existing control flow (including work-stealing and local reduction). One caveat: if a caller ever passes dispatchLimit <= 0, this will bypass the queue entirely and force all work through workLocally(), which may or may not be what you want; you could defensively clamp dispatchLimit to at least 1 or assert on invalid values at the next(int) entry point.

Also applies to: 526-529

core/src/main/java/io/questdb/PropServerConfiguration.java (1)

455-456: Confirm intended default for cairo.sql.parallel.filter.dispatch.limit

The new config is wired end-to-end (field, constructor init, and CairoConfiguration getter) and looks mechanically correct, but the default is Math.min(queryWorkers, 32), i.e., it caps the in‑flight dispatch limit at 32 on large machines. The PR description mentions a default of max(cpu_cores, 32), which would grow with core count instead; these are meaningfully different behaviors on big boxes. Please double-check which behavior you want and either (a) adjust the default expression here or (b) update the docs/description and any tests expecting the max(cpu_cores, 32) semantics. Also consider whether you want to validate that configured values are non‑negative / non‑zero, or explicitly document how 0 / negative values are interpreted by the filter cursors.

Also applies to: 1842-1843, 3885-3888

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a85eb3 and d0e67a3.

📒 Files selected for processing (22)

core/src/main/java/io/questdb/PropServerConfiguration.java (4 hunks)
core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1 hunks)
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
core/src/main/java/io/questdb/cairo/sql/RecordCursor.java (1 hunks)
core/src/main/java/io/questdb/cairo/sql/async/PageFrameSequence.java (6 hunks)
core/src/main/java/io/questdb/griffin/engine/PerWorkerLocks.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/orderby/LimitedSizePartiallySortedLightRecordCursor.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java (10 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java (4 hunks)
core/src/main/java/io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java (2 hunks)
core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursor.java (1 hunks)
core/src/main/java/io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java (1 hunks)
core/src/main/resources/io/questdb/site/conf/server.conf (1 hunks)
core/src/test/java/io/questdb/test/PropServerConfigurationTest.java (1 hunks)
core/src/test/java/io/questdb/test/ServerMainTest.java (2 hunks)
core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (6 hunks)
core/src/test/java/io/questdb/test/griffin/ParallelFilterTest.java (1 hunks)
core/src/test/resources/server.conf (1 hunks)
pkg/ami/marketplace/assets/server.conf (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-10T14:28:48.329Z

Learnt from: mtopolnik
Repo: questdb/questdb PR: 0
File: :0-0
Timestamp: 2025-11-10T14:28:48.329Z
Learning: In AsOfJoinDenseRecordCursorFactoryBase.java, the `backwardScanExhausted` flag is intentionally NOT reset in `toTop()` because backward scan results are reusable across cursor rewinds. The backward scan caches historical matches that remain valid when the cursor is rewound.

Applied to files:

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java
core/src/main/java/io/questdb/griffin/engine/orderby/LimitedSizePartiallySortedLightRecordCursor.java
core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java

📚 Learning: 2025-11-19T12:21:00.062Z

Learnt from: jerrinot
Repo: questdb/questdb PR: 6413
File: core/src/test/java/io/questdb/test/cutlass/pgwire/PGJobContextTest.java:11982-12002
Timestamp: 2025-11-19T12:21:00.062Z
Learning: QuestDB Java tests use a deterministic random seed. The test utilities (e.g., io.questdb.test.tools.TestUtils and io.questdb.std.Rnd) produce reproducible sequences, so rnd_* functions (including rnd_uuid4) yield deterministic outputs across runs. Do not flag tests in core/src/test/** that assert against values produced by rnd_* as flaky due to randomness.

Applied to files:

core/src/test/java/io/questdb/test/griffin/ParallelFilterTest.java
core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java

🧬 Code graph analysis (3)

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java (1)

core/src/main/java/io/questdb/cairo/sql/PageFrameMemoryRecord.java (1)

PageFrameMemoryRecord (62-740)

core/src/test/java/io/questdb/test/griffin/ParallelFilterTest.java (1)

core/src/test/java/io/questdb/test/tools/TestUtils.java (1)

TestUtils (154-2690)

core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (1)

core/src/test/java/io/questdb/test/tools/TestUtils.java (1)

TestUtils (154-2690)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)

GitHub Check: build
GitHub Check: New pull request (Coverage Report Coverage Report)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
GitHub Check: New pull request (Hosted Running tests on windows-other-2)
GitHub Check: New pull request (Hosted Running tests on windows-other-1)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (30)

core/src/main/java/io/questdb/griffin/engine/orderby/LimitedSizePartiallySortedLightRecordCursor.java (1)

118-118: Good optimization signal for LIMIT queries.

The call to expectLimitedIteration() correctly signals to the base cursor that limited iteration will occur, enabling potential dispatch and parallelization optimizations for LIMIT N queries. The placement before obtaining the base record is appropriate.

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java (4)

241-257: LGTM - cleaner two-pass approach for heap size normalization.

The separation of max-finding and application into distinct loops improves readability and makes the intent clear.

397-405: LGTM - accumulating function cardinality at fragment level.

The design tracks cumulative cardinality per fragment across the query execution, with proper reset in close(). Each worker operates on its own fragment, avoiding contention.

549-556: LGTM - clean helper extraction.

The helper properly delegates to getGroupByFunctions(slotId) to handle owner vs. worker slots.

659-665: LGTM - new field properly managed.

The totalFunctionCardinality field is correctly reset in close() and accumulated in maybeEnableSharding().

core/src/main/java/io/questdb/PropertyKey.java (1)

98-103: New PropertyKey for dispatch limit is consistent and correct.

The enum constant name and property path match the configuration files and follow existing naming patterns; this should integrate cleanly with validators and env overrides.

core/src/test/resources/server.conf (1)

156-160: Test config value for dispatch limit is consistent with the new tests.

The new cairo.sql.parallel.filter.dispatch.limit=100 entry is correctly grouped with other parallel filter settings and aligns with the expectations in PropServerConfigurationTest.

core/src/main/java/io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java (1)

89-92: LGTM!

The delegation pattern is consistent with other cursor methods in this class and aligns with the parallel implementations in SelectedRecordCursor and other wrapper cursors.

core/src/test/java/io/questdb/test/ServerMainTest.java (1)

324-324: Test expectations updated correctly.

The expected values align with the PR's configuration changes:

Queue capacities now use formula 4 * queryWorkers (with 2 workers → 8)

New cairo.sql.parallel.filter.dispatch.limit parameter added with value matching worker count

Also applies to: 403-403, 406-406

core/src/main/java/io/questdb/cairo/sql/RecordCursor.java (1)

144-151: Well-designed API extension.

The new hook method follows good interface design practices:

Default no-op implementation allows incremental adoption by cursor implementations

Clear documentation explains the optimization intent for LIMIT N queries

Consistent with existing default methods in the interface

core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursor.java (1)

62-65: LGTM!

Correct delegation pattern that propagates the limited iteration signal through the cursor chain, consistent with other delegating methods in this class.

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java (3)

64-64: LGTM - dispatch limit integration.

The dispatchLimit field is appropriately declared as final since it's configuration-based and should remain constant for the cursor's lifetime.

Also applies to: 85-85

183-183: Dispatch limit now applied to frame fetching.

Passing dispatchLimit to frameSequence.next() limits in-flight page frame tasks, which is the core optimization for improving latency on LIMIT N queries as described in the PR objectives.

234-240: I'll help you verify whether pattern matching for instanceof is consistently used throughout the codebase. Let me search for exception handling patterns to assess consistency.
<function_calls>

#!/bin/bash

Search for instanceof CairoException patterns to check consistency

rg -n 'instanceof\s+CairoException' --type java -C2 | head -60

</function_calls>

#!/bin/bash

Count pattern matching vs traditional cast approach for CairoException

echo "=== Pattern matching style (instanceof Type var) ==="
rg -c 'instanceof\s+CairoException\s+\w+' --type java | wc -l

echo -e "\n=== Traditional style (instanceof followed by cast) ==="
rg -n 'instanceof CairoException' --type java | grep -v 'instanceof CairoException [a-zA-Z_]' | head -20

</function_calls>

#!/bin/bash

More comprehensive search for all instanceof patterns with CairoException

rg 'instanceof\s+CairoException' --type java | wc -l
echo "---"
rg 'instanceof\s+CairoException\s+\w+' --type java | wc -l

</function_calls>

core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1)

999-1002: LGTM!

The new delegation method follows the established pattern in this wrapper class and correctly forwards the call to the underlying configuration.

pkg/ami/marketplace/assets/server.conf (1)

549-550: LGTM!

The commented configuration directive is properly documented and placed in the appropriate section. This follows the standard pattern for introducing new configuration options.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

587-587: LGTM!

The new configuration method is well-positioned alongside related parallel filter configuration accessors and follows the established naming convention.

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)

1025-1028: LGTM - Unlimited default is appropriate for test configuration.

Returning Integer.MAX_VALUE effectively disables the dispatch limit in the default test configuration, which is appropriate for ensuring tests aren't artificially constrained. Production configurations will provide actual values via PropServerConfiguration.

core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java (4)

156-160: Clarify the interaction between expectLimitedIteration() and of().

Both expectLimitedIteration() (Line 159) and of() (Lines 409-410) set the dispatchLimit field, but with potentially different logic:

expectLimitedIteration() unconditionally sets dispatchLimit = defaultDispatchLimit

of() sets dispatchLimit based on whether rowsRemaining != Long.MAX_VALUE

When are these methods called relative to each other? If of() is always called during cursor initialization, does expectLimitedIteration() serve as an override mechanism, or is it meant to be called before of()? The current implementation doesn't clearly communicate the intended call sequence.

Consider documenting the expected call sequence or consolidating the dispatch limit initialization logic to avoid potential confusion.

409-410: Verify dispatch limit initialization logic.

The comment states "put a cap the number of in-flight page frame tasks in case of LIMIT N query", and the code sets dispatchLimit = rowsRemaining != Long.MAX_VALUE ? defaultDispatchLimit : Integer.MAX_VALUE.

However, there's also an expectLimitedIteration() method (Lines 156-160) that sets the dispatch limit. Ensure the initialization logic between these two paths is intentional and well-coordinated.

73-79: LGTM!

The constructor initialization is clean and properly ordered:

Uses the constant PageFrameMemoryRecord.RECORD_A_LETTER for clarity

Initializes frame memory pool with appropriate cache capacity

Reads the new dispatch limit configuration value

329-337: LGTM!

The fetchNextFrame() method correctly accepts and propagates the dispatchLimit parameter to frameSequence.next(). All call sites have been consistently updated to pass the dispatchLimit field.

core/src/test/java/io/questdb/test/griffin/ParallelFilterTest.java (2)

166-171: LGTM - Enhanced test coverage through randomization.

The introduction of the rnd field and its use to vary convertToParquet improves test coverage by exercising different execution paths. Per established patterns in QuestDB tests, the random seed is deterministic (via TestUtils.generateRandom()), ensuring reproducibility across test runs.

Based on learnings, QuestDB tests produce deterministic random sequences.

174-183: LGTM - Randomized configuration enhances edge case testing.

The test now varies several configuration parameters:

CAIRO_PAGE_FRAME_SHARD_COUNT: 1-4 shards

CAIRO_SQL_PARALLEL_FILTER_DISPATCH_LIMIT: 1-4 in-flight tasks

CAIRO_SQL_PARALLEL_WORK_STEALING_THRESHOLD: 1-16 threshold

This randomization exercises various edge cases (e.g., single shard, minimal dispatch limit) while remaining deterministic due to the seeded RNG. The comment "We intentionally use small values... to exhibit various edge cases" aligns well with this approach.

core/src/main/java/io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java (1)

81-139: LGTM - Parameter rename improves consistency.

The parameter rename from sharedQueryWorkerCount to workerCount simplifies the signature while maintaining clarity. The internal field name sharedQueryWorkerCount is preserved for use in diagnostic output (Line 264 in toPlan()), which provides useful context when inspecting query plans.

core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (3)

66-75: Random seed promotion and ROW_COUNT definition look solid

Using MIN_PAGE_FRAME_MAX_ROWS as the base for ROW_COUNT and promoting rnd to a field keeps the relationship between data volume and page-frame sizing explicit, while TestUtils.generateRandom(LOG) preserves deterministic seeds across runs. No functional issues here.
Based on learnings, this remains deterministic across runs.

84-95: Randomized parallelism and dispatch configuration in setUp() is reasonable

Randomizing pageFrameMaxRows, shard count, reduce queue capacity, sharding/work-stealing thresholds, and the new CAIRO_SQL_PARALLEL_FILTER_DISPATCH_LIMIT (bounded by PAGE_FRAME_COUNT) increases fuzz coverage for a variety of parallel configurations while staying in safe ranges. This should help exercise the new dispatch-limit path without introducing obvious flakiness.

4081-4083: Assumptions in testParallelGroupByThrowsOnTimeout align with randomized setup

The new Assume.assumeTrue(configuration.getGroupByShardingThreshold() <= 5) and ROW_COUNT / configuration.getSqlPageFrameMaxRows() >= 20 checks are consistent with setUp() (threshold ∈ [1,50], page-frame max rows ∈ [100,109]), so the test only runs when there are many frames and a small sharding threshold, which is sensible for a timeout-focused scenario. The insert into tab ... long_sequence(" + ROW_COUNT + ") line continues to use the global ROW_COUNT and looks correct with the updated constants.

Also applies to: 4121-4121

core/src/main/java/io/questdb/cairo/sql/async/PageFrameSequence.java (1)

326-343: next() overload and prepareForDispatch Javadoc remain consistent

Exposing a parameterless next() that simply delegates to next(Integer.MAX_VALUE) keeps existing callers source- and behavior-compatible, while allowing new users to pass an explicit dispatch limit. The updated Javadoc on prepareForDispatch() referencing next(int) instead of next() correctly reflects this API shift; contract and threading assumptions stay the same.

Also applies to: 430-435

core/src/main/java/io/questdb/PropServerConfiguration.java (1)

1830-1841: Parallel reduce/merge queue default scaling looks good

Bumping defaultReduceQueueCapacity to Math.min(4 * queryWorkers, 256) and using it consistently for both cairoPageFrameReduceQueueCapacity and cairoGroupByMergeShardQueueCapacity matches the PR intent (better utilization on large-core machines) while still capping growth and keeping the ceilPow2 behavior. No issues from a correctness or config-compatibility perspective.

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java

glasstiger · 2025-11-28T14:48:29Z

[PR Coverage check]

😍 pass : 51 / 53 (96.23%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java	0	2	00.00%
🔵	io/questdb/griffin/engine/table/AsyncFilteredNegativeLimitRecordCursor.java	3	3	100.00%
🔵	io/questdb/cairo/sql/async/PageFrameSequence.java	4	4	100.00%
🔵	io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java	13	13	100.00%
🔵	io/questdb/griffin/engine/PerWorkerLocks.java	4	4	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	1	1	100.00%
🔵	io/questdb/griffin/engine/orderby/LimitedSizePartiallySortedLightRecordCursor.java	1	1	100.00%
🔵	io/questdb/PropertyKey.java	1	1	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	1	1	100.00%
🔵	io/questdb/PropServerConfiguration.java	3	3	100.00%
🔵	io/questdb/griffin/engine/table/AsyncGroupByAtom.java	16	16	100.00%
🔵	io/questdb/griffin/engine/table/AsyncJitFilteredRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/table/SelectedRecordCursor.java	2	2	100.00%
🔵	io/questdb/cairo/sql/RecordCursor.java	1	1	100.00%

perf(sql): improve query scalability on large multicore machines

60138de

puzpuzpuz self-assigned this Nov 26, 2025

puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Nov 26, 2025

puzpuzpuz added 5 commits November 27, 2025 10:57

Fine-tune page frame reduce queue default capacity

d25819e

Merge remote-tracking branch 'upstream/master' into puzpuzpuz_query_s…

af32eba

…calability_fix

Add dispatch limit for filtered limited queries

6e77483

Fix too strict config prop asserts in tests

616ab66

Avoid false sharing in PerWorkerLocks

f5434e7

puzpuzpuz force-pushed the puzpuzpuz_query_scalability_fix branch from ef5e690 to f5434e7 Compare November 28, 2025 08:54

puzpuzpuz marked this pull request as ready for review November 28, 2025 12:15

Merge branch 'master' into puzpuzpuz_query_scalability_fix

d0e67a3

coderabbitai bot reviewed Nov 28, 2025

View reviewed changes

core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByAtom.java Show resolved Hide resolved

Make code future-proof

e8d3caa

bluestreak01 approved these changes Nov 28, 2025

View reviewed changes

bluestreak01 merged commit ea86684 into master Nov 28, 2025
44 checks passed

bluestreak01 deleted the puzpuzpuz_query_scalability_fix branch November 28, 2025 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): improve query scalability on large multicore machines#6459

perf(sql): improve query scalability on large multicore machines#6459
bluestreak01 merged 8 commits intomasterfrom
puzpuzpuz_query_scalability_fix

puzpuzpuz commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

puzpuzpuz commented Nov 28, 2025

Uh oh!

coderabbitai bot commented Nov 28, 2025

Uh oh!

coderabbitai bot left a comment

Search for instanceof CairoException patterns to check consistency

Count pattern matching vs traditional cast approach for CairoException

More comprehensive search for all instanceof patterns with CairoException

Uh oh!

Uh oh!

glasstiger commented Nov 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

puzpuzpuz commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

puzpuzpuz commented Nov 28, 2025

Uh oh!

coderabbitai bot commented Nov 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Search for instanceof CairoException patterns to check consistency

Count pattern matching vs traditional cast approach for CairoException

More comprehensive search for all instanceof patterns with CairoException

Uh oh!

Uh oh!

glasstiger commented Nov 28, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

puzpuzpuz commented Nov 26, 2025 •

edited

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading