perf(sql): apply fast path for ordered and limited queries over single long column to more scenarios#6272
Conversation
…e long column to more scenarios
WalkthroughThe change refactors Long Top-K support to be column-specific. It updates the RecordCursorFactory API to accept a column index, propagates this through group-by, virtual, and selected cursor layers, adjusts the planner’s decision logic, and updates tests and explain-plan expectations accordingly. PriorityMetadata gains a helper to map projected to base indices. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant P as SqlCodeGenerator
participant F as RecordCursorFactory (possibly Virtual/Selected)
participant V as VirtualFunctionRecordCursor
participant PM as PriorityMetadata
participant B as Base RecordCursorFactory
participant L as LongTopKRecordCursorFactory
C->>P: Plan SELECT ... ORDER BY col LIMIT N
P->>F: recordCursorSupportsLongTopK(columnIndex)?
alt Factory is VirtualRecordCursorFactory
F->>V: getLongTopKColumnIndex(columnIndex)
V->>PM: getBaseColumnIndex(mapped)
PM-->>V: baseColumnIndex or -1
V-->>F: baseColumnIndex or -1
alt valid baseColumnIndex
F->>B: recordCursorSupportsLongTopK(baseColumnIndex)?
B-->>F: true/false
F-->>P: true/false
else invalid
F-->>P: false
end
else Non-virtual factory
F-->>P: type-check on columnIndex (e.g., LONG)
end
alt Supported
P->>L: wrap(B or F, columnIndex, lo, ascending)
L-->>C: LongTopK cursor
else Not supported
P-->>C: Fallback sort/plan
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (10)
core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java (2)
1165-1182: Consider asserting the plan uses Long Top K.Add an "explain ..." assert to lock the optimization in, e.g., expect “Long Top K lo: 10 …” before the result assertion. Helps catch regressions in the fast path selection.
1185-1196: Same suggestion: assert plan for Long Top K.Add an Explain assertion to ensure the top‑K path is engaged in this selected-outer scenario.
core/src/main/java/io/questdb/griffin/engine/orderby/LongTopKRecordCursorFactory.java (1)
55-55: Optional: add a runtime guard in addition toassert.Assertions may be disabled in production. Consider:
-assert base.recordCursorSupportsLongTopK(columnIndex); +assert base.recordCursorSupportsLongTopK(columnIndex); +if (!base.recordCursorSupportsLongTopK(columnIndex)) { + throw new IllegalStateException("Base factory doesn't support Long Top K for column " + columnIndex); +}core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)
313-313: Avoid wildcard static imports (minor).Static import of QueryModel.* alongside specific imports increases surface for accidental name clashes. Prefer explicit static imports for referenced members.
core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java (2)
3649-3665: Good coverage for VirtualRecord with Long Top KPlan shape and alias propagation look correct: VirtualRecord functions [y,42,c] sits between Top K and Async Group By, matching the intended fast path extension.
Consider adding:
- A DESC variant (“order by c desc limit 42”) to validate descending path.
- A negative limit case (e.g., “limit -42”) to assert no regression in limit rewriting behavior.
3668-3684: SelectedRecord subquery case validatedUsing Long Top K above SelectedRecord (inner Async Group By keys [y,z]) is consistent and exercises the subquery projection path as intended.
Optionally add a test with:
- Additional projected expression (e.g., constant or computed column) in the outer select to ensure mapping still resolves.
- “order by c desc” to cover descending priority.
core/src/main/java/io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java (3)
54-61: Constructor: ensure non-null PriorityMetadataLooks good. Consider null-guarding priorityMetadata to fail fast if miswired.
Example:
+ import java.util.Objects; ... - this.priorityMetadata = priorityMetadata; + this.priorityMetadata = Objects.requireNonNull(priorityMetadata, "priorityMetadata");
88-96: Per-column LONG mapping for Top‑K is correct; add small safetyLogic is sound (LongColumn → mapped base index if underlying type is LONG). Add a bounds assert on columnIndex to catch misuse early.
Example:
assert columnIndex >= 0 && columnIndex < functions.size();
125-130: Avoid assert-only guard in production pathReplace the assert with a defensive check to prevent accidental calls when unsupported (asserts may be disabled).
- assert baseColumnIndex != -1; - baseCursor.longTopK(list, baseColumnIndex); + if (baseColumnIndex == -1) { + throw new IllegalStateException("longTopK called for unsupported columnIndex=" + columnIndex); + } + baseCursor.longTopK(list, baseColumnIndex);core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java (1)
135-142: Per-column capability check for Long Top‑KGood delegation: resolve baseColumnIndex via cursor and defer capability to base. Consider short-circuiting when baseColumnIndex is invalid using a guard (already implicit); nothing else needed.
You could inline the return for brevity:
final int baseColumnIndex = cursor.getLongTopKColumnIndex(columnIndex); return baseColumnIndex != -1 && base.recordCursorSupportsLongTopK(baseColumnIndex);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/PriorityMetadata.java(1 hunks)core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/orderby/LongTopKRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursor.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java(4 hunks)core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java(4 hunks)core/src/test/java/io/questdb/test/cairo/fuzz/ParallelGroupByFuzzTest.java(1 hunks)core/src/test/java/io/questdb/test/griffin/ClickBenchTest.java(2 hunks)core/src/test/java/io/questdb/test/griffin/ExplainPlanTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Hosted Running tests on windows-other)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
- GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
- GitHub Check: build
🔇 Additional comments (12)
core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursorFactory.java (1)
180-183: Correct per-column Long Top K propagation via cross-index.Delegation to base with columnCrossIndex mapping is correct and consistent with other methods.
core/src/test/java/io/questdb/test/griffin/ClickBenchTest.java (2)
370-371: Plan expectation updated to Long Top K.Matches new optimizer path; looks good.
570-571: Plan expectation updated to Long Top K.Good alignment with new plan label.
core/src/main/java/io/questdb/griffin/PriorityMetadata.java (1)
61-66: Helper for mapping to base index is correct.Returning -1 for reserved slots is a sensible sentinel; mapping logic is sound.
core/src/main/java/io/questdb/griffin/engine/table/AsyncGroupByRecordCursorFactory.java (1)
167-169: Per-column capability check is correct.Gating Long Top K by ColumnType.LONG on the output metadata is appropriate.
core/src/main/java/io/questdb/griffin/engine/table/SelectedRecordCursor.java (2)
85-89: Correct delegation of Long Top K through cross-index.Delegating to baseCursor with mapped index is consistent and correct.
95-99: Expose pre-computed state size via delegation.Straightforward and correct.
core/src/main/java/io/questdb/cairo/sql/RecordCursorFactory.java (1)
232-236: Override signatures updated for recordCursorSupportsLongK(int) All implementations now use the new per-column signature; no legacy zero-arg overrides remain.core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (1)
3741-3756: RecordCursorSupportsLongK properly implemented
VirtualRecordCursorFactory, SelectedRecordCursorFactory, AsyncGroupByRecordCursorFactory, and GroupByRecordCursorFactory override recordCursorSupportsLongTopK appropriately; default returns false.core/src/main/java/io/questdb/griffin/engine/groupby/GroupByRecordCursorFactory.java (1)
30-30: LGTM! Column-specific Top-K check correctly implemented.The refactored method now accepts a column index and correctly validates whether that specific output column (post-aggregation) is of type LONG. This enables the optimization for group-by scenarios where the aggregated column is LONG.
The implementation is consistent with the inner class's
longTopKmethod (line 224) which retrieves the corresponding function viarecordFunctions.getQuick(columnIndex).Also applies to: 132-134
core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java (2)
80-87: Cursor wiring with PriorityMetadataPassing PriorityMetadata through to VirtualFunctionRecordCursor is correct and aligns with the new mapping-based flow. LGTM.
101-104: Base column name via PriorityMetadataDelegating to priorityMetadata.getColumnName(idx) makes sense for explain plans where indices are mapped. LGTM.
core/src/main/java/io/questdb/griffin/engine/table/VirtualFunctionRecordCursor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/questdb/griffin/engine/table/VirtualRecordCursorFactory.java
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 72 / 75 (96.00%) file detail
|
|
@nwoolmer thanks for the review! |
…e long column to more scenarios (questdb#6272)
Fixes #6285
Our fast path for ORDER BY + LIMIT on a single LONG column didn't apply to queries with virtual columns, expressions, etc., i.e.
VirtualRecordCursorFactoryandSelectedRecordCursorFactoryouter factories case. This patch fixes that, so that the fast path is applied for more queries.Also includes the following changes:
GroupByFunctions that were not marked as non-constant. Such functions would have led to an NPE when used with constant arguments (see NPE infirst()when using multiple casts #6285)With this patch, all of the following queries use the ORDER BY + LIMIT fast path: