Skip to content

perf(sql): improve performance of linear as-of join algo#6338

Merged
bluestreak01 merged 22 commits intomasterfrom
mt_asof-linear-symbol
Nov 3, 2025
Merged

perf(sql): improve performance of linear as-of join algo#6338
bluestreak01 merged 22 commits intomasterfrom
mt_asof-linear-symbol

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 commented Oct 31, 2025

optimisation for as-of joins on single symbol, such as:

SELECT /*+ ASOF_LINEAR_SEARCH(t p) */ avg(bid), max(bid) 
FROM (trades) t 
ASOF JOIN prices p on (t.symbol=p.sym);

It is activated with "ASOF_LINEAR_SEARCH" hint and is non-default, as it is a variant of linear as-of algo.

The performance improvement comes from re-indexing symbols and joining on "int" rather than symbol "text"

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 31, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds a single-symbol ASOF join cursor and integrates it into SQL code generation, refactors switch blocks to switch expressions, adds column-access helper overloads for RecordCursor, updates pattern matching and exception flow in AsyncFilteredRecordCursor, and adds tests for the single-symbol ASOF path.

Changes

Cohort / File(s) Summary
ASOF single-symbol join + wiring
core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java, core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java
New specialized ASOF join factory for single-symbol joins; SqlCodeGenerator now selects a single-symbol fast path (AsOfJoinSingleSymbolRecordCursorFactory) when applicable and inlines previous createAsOfJoin logic into join-selection flow.
Column access helper overloads
core/src/main/java/io/questdb/griffin/engine/join/AsofJoinColumnAccessHelper.java, core/src/main/java/io/questdb/griffin/engine/join/SingleSymbolColumnAccessHelper.java, core/src/main/java/io/questdb/griffin/engine/join/NoopColumnAccessHelper.java
Adds of(RecordCursor) overloads (and a default unsupported overload) so column-access helpers support non-TimeFrame slave cursors; minor formatting cleanup.
ASOF join cursor tests
core/src/test/java/io/questdb/test/griffin/AsOfJoinTest.java
Adds tests covering single-symbol ASOF join behavior and dynamic-slave symbol error path.
Switch-expression refactor (date utilities)
core/src/main/java/io/questdb/griffin/engine/functions/date/GenerateSeriesTimestampStringRecordCursorFactory.java
Replaced several traditional switch blocks with Java switch expressions across metadata, unit handling, random access support, and stride adjustment; no behavioral changes.
Exception handling pattern match
core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java
Uses pattern matching for CairoException in catch block and rethrows bound CairoException for non-interruption cases instead of wrapping.
Minor formatting
core/src/main/java/io/questdb/std/Hash.java
Added trailing newline.

Sequence Diagram(s)

sequenceDiagram
%%{init: {"themeVariables":{"sequenceActorBackground":"#f6f8fa","actorBorder":"#cbd5e1"}}}%%
actor Client
participant SqlCodeGenerator
participant MasterFactory
participant SlaveFactory
participant AsOfSingleFactory
participant AsOfLightFactory
participant AsOfCursor as "AsOfSingleSymbolJoinCursor"
Client->>SqlCodeGenerator: compile SELECT ... ASOF ... WITH HINT ASOF_LINEAR_SEARCH
SqlCodeGenerator->>MasterFactory: build master factory
SqlCodeGenerator->>SlaveFactory: analyze slave, detect single-symbol?
alt single-symbol and linear hint
    SqlCodeGenerator->>AsOfSingleFactory: instantiate AsOfJoinSingleSymbolRecordCursorFactory
    AsOfSingleFactory->>AsOfCursor: create inner cursor, build joinKeyMap from slave
    AsOfCursor-->>MasterFactory: drive master rows, probe joinKeyMap for symbol key
else fallback
    SqlCodeGenerator->>AsOfLightFactory: instantiate AsOfJoinLightRecordCursorFactory
    AsOfLightFactory->>SlaveFactory: build slave time frame structure
end
SqlCodeGenerator->>Client: return compiled plan using chosen factory
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing focused review:
    • AsOfJoinSingleSymbolRecordCursorFactory.java: correctness of joinKeyMap usage, timestamp scaling and tolerance logic, cursor lifecycle and resource cleanup.
    • SqlCodeGenerator.java: join-selection logic ensuring metadata and timestamp index propagation and correctness across ASOF/non-ASOF and CROSS join branches.
    • AsyncFilteredRecordCursor.java: exception handling change (rethrowing CairoException) and its effects on callers and higher-level error handling.

Possibly related PRs

Suggested labels

SQL, Performance

Suggested reviewers

  • mtopolnik
  • RaphDal

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "perf(sql): improve performance of linear as-of join algo" directly and accurately summarizes the main change in the changeset. The primary contribution is a new specialized ASOF join cursor factory (AsOfJoinSingleSymbolRecordCursorFactory) that optimizes single-symbol ASOF joins by re-indexing symbols and performing joins on integer IDs rather than text. The title is concise, uses domain-specific terminology ("sql"), indicates the nature of the change ("perf" for performance), and is clear enough for developers scanning PR history to understand the primary objective without requiring detailed inspection.
Description Check ✅ Passed The pull request description is directly related to the changeset and provides meaningful context about the optimization. It explains that the change targets as-of joins on single symbols, provides a concrete example query, specifies the ASOF_LINEAR_SEARCH hint as the activation mechanism, clarifies that it's non-default, and articulates the performance benefit (re-indexing symbols and joining on integers rather than text). The description accurately reflects the new AsOfJoinSingleSymbolRecordCursorFactory and modifications to SqlCodeGenerator to implement this optimization.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (2)

314-316: Remove redundant static import

import static io.questdb.griffin.model.QueryModel.*; already brings QUERY; the extra explicit static import is unnecessary.

- import static io.questdb.griffin.model.QueryModel.QUERY;
+ // QUERY already available via wildcard static import above

5298-5302: Pattern-matching instanceof introduces unused af

af isn’t referenced; either use the bound var or keep classic instanceof.

- if (!(f instanceof WindowFunction af)) {
+ if (!(f instanceof WindowFunction)) {
core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java (3)

230-243: Consider adding circuit breaker check for long-running queries.

The while loop could iterate through many slave records without checking for query cancellation. For large slave cursors, this might delay cancellation.

Consider adding a circuit breaker check:

 final Record rec = slaveCursor.getRecord();
 while (slaveCursor.hasNext()) {
+    circuitBreaker.statefulThrowExceptionIfTripped();
     slaveTimestamp = scaleTimestamp(rec.getTimestamp(slaveTimestampIndex), slaveTimestampScale);

Note: You'll need to pass the circuit breaker to the hasNext method or store it as a field if this approach is adopted.


215-218: Consider extracting map update logic to reduce duplication.

The map update logic appears twice (lines 215-218 and 235-238) with identical structure.

Consider extracting to a helper method:

private void updateJoinKeyMap(Record rec, int symbolColumnIndex) {
    MapKey key = joinKeyMap.withKey();
    key.putInt(rec.getInt(symbolColumnIndex));
    MapValue value = key.createValue();
    value.putLong(0, rec.getRowId());
}

Then call it from both locations.

Also applies to: 235-238


249-258: Tolerance recheck is correct but subtle.

The tolerance check on line 255 is necessary because map entries may have been added during previous master record iterations with different tolerance windows. A brief comment would help future maintainers understand why this recheck is required.

Consider adding a clarifying comment:

 key = joinKeyMap.withKey();
 key.putInt(slaveSymbolKey);
 value = key.findValue();
 if (value != null) {
+    // Recheck tolerance: map entry may have been added for a previous master record
     slaveCursor.recordAt(slaveRecord, value.getLong(0));
     slaveTimestamp = scaleTimestamp(slaveRecord.getTimestamp(slaveTimestampIndex), slaveTimestampScale);
     record.hasSlave(toleranceInterval == Numbers.LONG_NULL || slaveTimestamp >= masterTimestamp - toleranceInterval);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d979ed8 and 8f3e640.

📒 Files selected for processing (8)
  • core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (9 hunks)
  • core/src/main/java/io/questdb/griffin/engine/functions/date/GenerateSeriesTimestampStringRecordCursorFactory.java (3 hunks)
  • core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/join/AsofJoinColumnAccessHelper.java (2 hunks)
  • core/src/main/java/io/questdb/griffin/engine/join/SingleSymbolColumnAccessHelper.java (2 hunks)
  • core/src/main/java/io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java (1 hunks)
  • core/src/main/java/io/questdb/std/Hash.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/AsOfJoinTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: New pull request (Check Changes Check changes)
  • GitHub Check: build
  • GitHub Check: build
🔇 Additional comments (13)
core/src/main/java/io/questdb/griffin/engine/functions/date/GenerateSeriesTimestampStringRecordCursorFactory.java (4)

83-87: LGTM! Clean switch expression refactor.

The switch expression correctly maps timestamp types to their corresponding metadata. The refactor improves readability while maintaining the same behavior.


205-209: LGTM! Switch expression correctly identifies fixed-length time units.

The logic appropriately returns false for variable-length units ('M', 'y') that don't support random access, and true for all fixed-length units. The refactor is clean and maintains existing semantics.


217-228: LGTM! Switch expression correctly maps time units to timestamp conversions.

The refactor cleanly converts each unit to its corresponding timestamp value. The method is only invoked when supportsRandomAccess() returns true (excluding 'M' and 'y'), and upstream validation at line 148-151 ensures only valid units reach this point.


235-249: LGTM! Switch expression provides clear validation for time unit characters.

The refactor correctly identifies all valid time units and returns the appropriate boolean. The inclusion of both 'h' and 'H' for hours, and 'u' and 'U' for micros provides good compatibility coverage.

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java (3)

196-196: Import for single‑symbol ASOF path

Importing AsOfJoinSingleSymbolRecordCursorFactory looks correct and scoped.


2672-2700: Switch expression simplification for CROSS OUTER joins LGTM

Cleaner and equivalent control flow; null‑row factories preserved per side.


4388-4402: Switch expression for select model LGTM

Branching is clearer; fall‑through to joins/no‑select is preserved.

core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java (6)

55-89: LGTM! Excellent resource management.

The constructor properly handles resource cleanup in the exception path, ensuring the joinKeyMap is freed before calling close() and rethrowing.


96-109: LGTM! Proper cursor lifecycle management.

The getCursor method correctly handles cleanup of both master and slave cursors in the exception path.


129-135: LGTM! Complete resource cleanup.

All resources are properly freed.


152-173: LGTM! Efficient timestamp scaling logic.

The optimization to avoid timestamp scaling when types match is good for performance.


175-194: LGTM! Standard cursor lifecycle methods.

The close() method properly guards against double-close and resets state.


266-301: LGTM! Complete state management.

The toTop() method correctly resets all state, and the of() method properly handles map reopening for cursor reuse.

@bluestreak01
Copy link
Copy Markdown
Member Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@bluestreak01
Copy link
Copy Markdown
Member Author

@CodeRabbit review. Make sure NoOpAccessor cannot be passed to asof join implementations that expect a functioning accessor

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 1, 2025

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find NoOpAccessor definition and usage
rg -nP "class\s+NoOpAccessor" --type=java -A 10

# Find where NoOpAccessor is instantiated or used
rg -nP "NoOpAccessor" --type=java -C 3

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find AsofJoinColumnAccessHelper implementations and usages
rg -nP "(implements|extends).*AsofJoinColumnAccessHelper" --type=java -C 5

# Find all asof join factory implementations
rg -nP "class.*AsOf.*Join.*Factory" --type=java -C 3

Length of output: 17413


🏁 Script executed:

#!/bin/bash
# Look at the SqlCodeGenerator to see how it creates AsOfJoinSingleSymbolRecordCursorFactory
ast-grep --pattern 'new AsOfJoinSingleSymbolRecordCursorFactory($$$)'

# Also check AsOfJoinLightRecordCursorFactory creation
ast-grep --pattern 'new AsOfJoinLightRecordCursorFactory($$$)'

Length of output: 3577


🏁 Script executed:

#!/bin/bash
# Look at AsOfJoinSingleSymbolRecordCursorFactory constructor to see what parameters it takes
rg -nP "public\s+AsOfJoinSingleSymbolRecordCursorFactory" --type=java -A 20

Length of output: 3094


🏁 Script executed:

#!/bin/bash
# Find where NoopColumnAccessHelper.INSTANCE is used
rg -nP "NoopColumnAccessHelper\.INSTANCE" --type=java -C 5

Length of output: 6299


🏁 Script executed:

#!/bin/bash
# Look at the context around where columnAccessHelper is determined in SqlCodeGenerator before creating AsOfJoinSingleSymbolRecordCursorFactory
rg -nP "columnAccessHelper\s*=" --type=java -C 10 core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

Length of output: 7285


🏁 Script executed:

#!/bin/bash
# Check the code path that determines when to use AsOfJoinSingleSymbolRecordCursorFactory vs AsOfJoinLightRecordCursorFactory
rg -nP "AsOfJoinSingleSymbolRecordCursorFactory|AsOfJoinLightRecordCursorFactory" --type=java -B 20 -A 5 core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java | head -100

Length of output: 5828


🏁 Script executed:

#!/bin/bash
# Look at NoopColumnAccessHelper implementation to see what methods throw
cat core/src/main/java/io/questdb/griffin/engine/join/NoopColumnAccessHelper.java

Length of output: 2119


🏁 Script executed:

#!/bin/bash
# Check which AsOf join factories take columnAccessHelper as a constructor parameter
rg -nP "public\s+\w+\s*\(" --type=java -A 15 core/src/main/java/io/questdb/griffin/engine/join/AsOfJoin*Factory.java | rg -A 15 "columnAccessHelper"

Length of output: 10966


🏁 Script executed:

#!/bin/bash
# Look at the full context of AsOfJoinLightRecordCursorFactory to see if it uses column access helper
cat core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinLightRecordCursorFactory.java | head -100

Length of output: 4065


🏁 Script executed:

#!/bin/bash
# Check all places where join factories that take columnAccessHelper are instantiated
rg -nP "new AsOfJoin(Fast|Indexed|Memoized|SingleSymbol)RecordCursorFactory" --type=java -B 5 -A 1 core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

Length of output: 2641


🏁 Script executed:

#!/bin/bash
# Let's see how columnAccessHelper is used in AsOfJoinSingleSymbolRecordCursorFactory
rg -nP "columnAccessHelper\." --type=java core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java

Length of output: 138


🏁 Script executed:

#!/bin/bash
# Search for any other instantiations of AsOfJoinSingleSymbolRecordCursorFactory in the codebase
rg -nP "new\s+AsOfJoinSingleSymbolRecordCursorFactory" --type=java

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Check if there are any assertions in the AsOfJoinSingleSymbolRecordCursorFactory constructor
rg -nP "assert.*columnAccessHelper" --type=java -A 2 -B 2 core/src/main/java/io/questdb/griffin/engine/join/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the same for other factories that use columnAccessHelper
rg -nP "(AsOfJoinFast|AsOfJoinIndexed|AsOfJoinMemoized)RecordCursorFactory.*assert" --type=java -A 3 core/src/main/java/io/questdb/griffin/engine/join/

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at the test to see if it validates the single symbol optimization
cat core/src/test/java/io/questdb/test/griffin/AsOfJoinTest.java | grep -A 30 "testSingleSymbolAsOf"

Length of output: 3420


@bluestreak01, I've analyzed the code paths and found that while there is protection in place, it could be strengthened with explicit assertions. Here's what I found:

Current Protection

The code already prevents NoopColumnAccessHelper from being passed to AsOfJoinSingleSymbolRecordCursorFactory through a runtime check in SqlCodeGenerator.java:

// Line 2882-2884
AsofJoinColumnAccessHelper columnAccessHelper = createAsofColumnAccessHelper(masterMetadata, slaveMetadata, selfJoin);
boolean isOptimizable = columnAccessHelper != NoopColumnAccessHelper.INSTANCE;
if (isOptimizable) {
    // Only creates AsOfJoinSingleSymbolRecordCursorFactory when not NoopColumnAccessHelper
    master = new AsOfJoinSingleSymbolRecordCursorFactory(..., columnAccessHelper, ...);
}

If not optimizable, it falls back to AsOfJoinLightRecordCursorFactory (line 2908), which doesn't use columnAccessHelper at all.

Recommended Improvement

However, to fail fast and prevent future bugs, I recommend adding defensive assertions in the constructors:

AsOfJoinSingleSymbolRecordCursorFactory.java:

public AsOfJoinSingleSymbolRecordCursorFactory(
        CairoConfiguration configuration,
        RecordMetadata metadata,
        RecordCursorFactory masterFactory,
        RecordCursorFactory slaveFactory,
        @Transient ColumnTypes joinColumnTypes,
        @Transient ColumnTypes valueTypes,
        int columnSplit,
        int slaveSymbolColumnIndex,
        AsofJoinColumnAccessHelper columnAccessHelper,
        JoinContext joinContext,
        long toleranceInterval
) {
    super(metadata, joinContext, masterFactory, slaveFactory);
    assert columnAccessHelper != NoopColumnAccessHelper.INSTANCE : "NoopColumnAccessHelper cannot be used with AsOfJoinSingleSymbolRecordCursorFactory";
    this.slaveSymbolColumnIndex = slaveSymbolColumnIndex;
    this.columnAccessHelper = columnAccessHelper;
    // ... rest of constructor

The same assertion should be added to:

  • AsOfJoinFastRecordCursorFactory (line 64)
  • AsOfJoinIndexedRecordCursorFactory (line 69)
  • AsOfJoinMemoizedRecordCursorFactory (line 102)

These factories all use columnAccessHelper.getSlaveKey() or similar methods that would throw UnsupportedOperationException if NoopColumnAccessHelper were passed.

Why This Matters

Without the assertion, if a bug were introduced or the code were refactored, NoopColumnAccessHelper could be passed and would only fail at query execution time when getSlaveKey() is called (line 205 in the new factory). With the assertion, it would fail immediately during factory construction in development/testing.

Would you like me to proceed with adding these assertions?

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 179 / 200 (89.50%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/join/AsofJoinColumnAccessHelper.java 0 1 00.00%
🔵 io/questdb/griffin/engine/table/AsyncFilteredRecordCursor.java 1 2 50.00%
🔵 io/questdb/griffin/engine/functions/date/GenerateSeriesTimestampStringRecordCursorFactory.java 14 20 70.00%
🔵 io/questdb/griffin/engine/join/AsOfJoinSingleSymbolRecordCursorFactory.java 116 128 90.62%
🔵 io/questdb/griffin/SqlCodeGenerator.java 44 45 97.78%
🔵 io/questdb/griffin/engine/join/SingleSymbolColumnAccessHelper.java 4 4 100.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants