perf(sql): optimize ASOF JOIN for dense interleaving of left-hand and right-hand rows by mtopolnik · Pull Request #6362 · questdb/questdb

mtopolnik · 2025-11-07T16:51:56Z

The Dense ASOF JOIN algorithm is a variant of the Light algorithm:

Light algo starts the scan of the right-hand table at the top
Dense algo uses binary search to quickly jump to the timestamp matching the first left-hand row

This difference is highly important when the right-hand table has history that predates the first left-hand row. The entire history will be skipped, except for some recent history needed to find matches of initial few left-hand rows.

Let's use the diagram below to explain the key differences among algorithms. It shows two tables, LHS and RHS. LHS rows are less densely distributed over time than RHS rows, but not much less. We show the rows aligned on timestamp, so there are gaps in the LHS column. These gaps don't represent any LHS rows, it is just the way we visualize the two tables.

row | LHS | RHS
----|-----|----
 1  |     | G
 2  |     | C
 3  |     | G
 4  |     | A
 5  |     | F
 6  |   A | B
 7  |     | D
 8  |     | B
 9  |   C | G
10  |     | F
11  |     | D
12  |   B | E
13  |     | D
14  |     | C
15  |   A | B

Light algo

Light algo uses a forward-only scan of the RHS table. When matching the first RHS symbol (row 6, symbol A), it starts from RHS row 1, and proceeds all the way to row 6, collecting all the symbols into a hashtable. When done, it looks up symbol A in the hashtable and finds the prevailing RHS row is row 4. When matching the next RHS symbol (row 9, symbol C), it resumes the forward scan, touching rows 7, 8 and 9. Then it looks up symbol C, and finds the prevailing row is row 2.

Fast algo

Fast algo uses binary search over RHS timestamps to zero in on row 6 as the most recent row not newer than the first LHS row. Then it scans backward: rows 6, 5, 4, and there it finds the matching symbol A. When matching the next LHS symbol (row 9, symbol C), it uses binary search to zero in on RHS row 9, then scans all the way back to row 2, where it finds symbol C.

When matching symbol A in row LHS row 15, it uses binary search to zero in or RHS row 15, then scans backward, again all the way back to row 4.

There's also an optimization that avoids the fixed cost of binary search by first searching linearly for the matching timestamp in the RHS row, for a smallish number of steps. This doesn't affect the backward search for the symbol.

Memoized algo

The Memoized algo is a variant of the Fast algo. It uses the exact same linear/binary search to find the matching timestamp in the RHS, and then uses the same backward search for the symbol. However, it memorizes for each symbol where it started the backward search, and where it found it.

In our example, this means it handles the first LHS row (6) exactly the same way, scanning backward to row 4. But when it encounters the same symbol A in row 15, it scans backward only until reaching row 6, and then directly uses the remembered result of the previous scan, and matches up with row 4.

With Drive-By caching enabled, Memoized algo will memorize not just the symbol it's looking for, but also any other symbol. However, it can only memorize it on the first encounter. This is valuable for rare symbols that occur deep in the past, but otherwise it just introduces more overhead.

Dense algo

The Dense algo starts like the Fast algo, performing a binary search to zero in on RHS row 6 and searching backward to find symbol A in row 4 of RHS. From then on, it behaves more like the Light algo.

To match up LHS row 9 (symbol C), it first does a linear scan forward from row 6 to row 9 (exactly like the Light algo). Since it didn't find C in this scan, it resumes the backward scan, touching rows 3 and 2, and there it finds the symbol C.

At LHS row 12 (symbol B), it resumes the forward scan, touching rows 10, 11, and 12. Then it finds symbol B in the hashtable, getting row 8 as the prevailing row. No backward scan nedeed here.

At LHS row 15 (symbol A), it resumes the forward scan, touching rows 13, 14, and 15. Then it looks up symbol A in the hashtable of the forward scan, finding nothing. Then it looks up symbol A in the hashtable of the backward scan, and finds it there. The prevailing row is number 4. Again, no backward search was needed.

Discussion

We can see that the Fast and Memoized algos had to touch the most rows. Especially, when matching row 15, Fast algo had to scan backward to row 4, and Memoized did only slighly better, scanning until row 6.

Light algo had to initially scan all the history (rows 1 to 6), but from then on, it only needed to touch the additional rows that came into scope as the LHS timestamp was moving on.

Dense algo had the same advantage as Light, but it didn't have to scan all the history. It scanned only as far back into history as needed to find the most recent occurence of a symbol not yet seen in the forward scan.

Additional changes in the PR

The PR also optimizes symbol-to-symbol joins for the existing Light and Fast algos. It works through the RecordSink, which decides what data to copy from the table row to a buffer for comparison. Instead of copying the symbol string, it puts just the symbol key, after mapping the left-hand to the right-hand symbol key.

After applying this to Light cursor, there was no more need for a dedicated Single Symbol Light cursor, so the PR removes it. There's still some performance gap between writing a dedicated cursor that directly works with symbol keys, but it's a 30% difference and it isn't worth it for the Light cursor.

The Dense cursor has two implementations, one specialized for symbol-to-symbol comparison. This cursor could be critical to the performance of markout analysis, so I thought it's worth it. The implementation works through an abstract class, avoiding code duplication.

Benchmarking

Measurements taken on r7a.4xlarge.

Tables

Trades: 167.5 million rows, Jan 2, 00:00 to Jan 2, 08:00
Prices: 1.01 billion rows, Jan 1, 16:00 to Jan 2, 08:00

Time period of prices is a 50/50 split between history and overlap with trades.

CREATE TABLE trades (
        symbol SYMBOL,
        side SYMBOL,
        price DOUBLE,
        amount DOUBLE,
        timestamp TIMESTAMP
) timestamp(timestamp) PARTITION BY DAY WAL;

INSERT INTO trades SELECT
    rnd_symbol_zipf(1_000, 2.0) AS symbol,
    rnd_symbol('buy', 'sell') as side,
    rnd_double() * 20 + 10 AS price,
    rnd_double() * 20 + 10 AS amount,
    generate_series as timestamp
  FROM generate_series('2025-01-02', '2025-01-02T08', '172u');

CREATE TABLE prices (
      ts TIMESTAMP,
      sym SYMBOL CAPACITY 1024,
      bid DOUBLE,
      ask DOUBLE
  ) timestamp(ts) PARTITION BY DAY;
INSERT INTO prices
  SELECT
      '2025-01-01T16'::timestamp + (57*x) + rnd_long(-20, 20, 0) as ts,
      rnd_symbol_zipf(1_000, 2.0),
      rnd_double() * 10.0 + 5.0,
      rnd_double() * 10.0 + 5.0
      FROM long_sequence(1_010_000_000);

Query

To avoid timeouts, we limit trades to 10 million rows for most measurements. In the end we do Asof Dense with the full dataset.

Fast (Default)

SELECT sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN prices p on (t.symbol=p.sym);

59 seconds

Memoized

SELECT /*+ asof_memoized(t p) */ sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN prices p on (t.symbol=p.sym);

42 seconds

Memoized with Drive-By Caching

SELECT /*+ asof_memoized_driveby(t p) */ sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN prices p on (t.symbol=p.sym);

63 seconds

Light

SELECT /*+ asof_linear(t p) */ sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN prices p on (t.symbol=p.sym);

9.7 seconds

Dense Single Symbol

SELECT /*+ asof_dense(t p) */ sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN prices p on (t.symbol=p.sym);

730 milliseconds

Dense Single Symbol, Full Dataset

The full dataset is 16.7 times larger than the limited one used above.

SELECT sum(bid)
FROM trades t
ASOF JOIN prices p on (t.symbol=p.sym);

10.9 seconds

Bonus: Light with no history

Here we remove the history part of prices table, so it covers only the time period present in trades. This brings Light and Dense to an equal footing in terms of algorithm. The remaining difference is the additional single-symbol specialization in Dense algo.

SELECT /*+ asof_linear(t p) */ sum(bid)
FROM (trades limit 10_000_000) t
ASOF JOIN (prices limit 505_263_157, 1_010_000_000) p on (t.symbol=p.sym);

1.17 seconds

…sof-linear-symbol

coderabbitai · 2025-11-07T16:52:27Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mt_asof-dense

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bluestreak01 · 2025-11-10T13:37:33Z

@CodeRabbit please read PR description and review the PR. Pay attention to bugs, NPEs, off-by-ones, edge cases etc

coderabbitai · 2025-11-10T13:40:25Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the list of changed files in this PR
gh pr view 6362 --json files --jq '.files[].path'

Length of output: 2597

🏁 Script executed:

#!/bin/bash
# Read the core Dense algorithm implementation files
cat core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 15300

🏁 Script executed:

#!/bin/bash
cat core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseSingleSymbolRecordCursorFactory.java

Length of output: 5390

🏁 Script executed:

#!/bin/bash
cat core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactory.java

Length of output: 6526

🏁 Script executed:

#!/bin/bash
# Read symbol optimization related files
cat core/src/main/java/io/questdb/griffin/engine/join/SymbolToSymbolJoinKeyMapping.java

Length of output: 4103

🏁 Script executed:

#!/bin/bash
cat core/src/main/java/io/questdb/griffin/engine/join/ChainedSymbolShortCircuit.java

Length of output: 1810

🏁 Script executed:

#!/bin/bash
cat core/src/main/java/io/questdb/griffin/engine/join/SymbolKeyMappingRecordCopier.java

Length of output: 1715

🏁 Script executed:

#!/bin/bash
# Read the SQL code generator changes
cat core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java | head -n 500

Length of output: 34716

🏁 Script executed:

#!/bin/bash
# Continue reading SqlCodeGenerator to see the join-related code
cat core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java | tail -n +500 | head -n 1000

Length of output: 40930

🏁 Script executed:

#!/bin/bash
# Look for the ASOF join handling code in SqlCodeGenerator
rg -n -A 10 -B 3 "asof_dense|AsOfJoinDense" core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

Length of output: 2800

🏁 Script executed:

#!/bin/bash
# Check the modified Light cursor implementation
cat core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinLightRecordCursorFactory.java | head -n 300

Length of output: 11946

🏁 Script executed:

#!/bin/bash
# Check for potential null pointer issues in the symbol mapping classes
rg -n "null" core/src/main/java/io/questdb/griffin/engine/join/SymbolToSymbolJoinKeyMapping.java

Length of output: 281

🏁 Script executed:

#!/bin/bash
# Check the TimeFrameRecordCursor interface to understand frame boundaries
rg -n "getRowLo|getRowHi|getFrameIndex" core/src/main/java/io/questdb/cairo/sql/TimeFrameRecordCursor.java

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Look for test coverage of edge cases
rg -n "testAsofDense|test.*Dense" core/src/test/java/io/questdb/test/griffin/engine/join/AsOfJoinTest.java

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Check how slaveCursorReadyForForwardScan is managed
rg -n "slaveCursorReadyForForwardScan" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 285

🏁 Script executed:

#!/bin/bash
# Check BitSet usage and initialization
rg -n "class BitSet" core/src/main/java/io/questdb/std/BitSet.java

Length of output: 43

🏁 Script executed:

#!/bin/bash
# Find the TimeFrame interface definition
find core/src/main/java -name "TimeFrame.java" -o -name "*TimeFrame*.java" | head -5

Length of output: 203

🏁 Script executed:

#!/bin/bash
# Check the Rows utility class for row ID manipulation
rg -n "toPartitionIndex|toRowID|toRowIndex" core/src/main/java/io/questdb/std/Rows.java | head -20

Length of output: 128

🏁 Script executed:

#!/bin/bash
# Look for any test files that test the Dense algorithm
fd -e java ".*Dense.*Test" core/src/test/

Length of output: 109

🏁 Script executed:

#!/bin/bash
# Check how backwardScanExhausted and forwardScanExhausted are initialized in toTop
rg -n "backwardScanExhausted|forwardScanExhausted" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 405

🏁 Script executed:

#!/bin/bash
# Check if there are any issues with the initial state when forwardRowId == -1
rg -n -B 5 -A 10 "forwardRowId == -1" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 732

🏁 Script executed:

#!/bin/bash
# Check AbstractAsOfJoinFastRecordCursor for the nextSlave implementation
rg -n "nextSlave" core/src/main/java/io/questdb/griffin/engine/join/AbstractAsOfJoinFastRecordCursor.java | head -10

Length of output: 56

🏁 Script executed:

#!/bin/bash
# Check for potential integer overflow in row ID calculations
rg -n "Rows.toRowID" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 582

🏁 Script executed:

#!/bin/bash
# Check the complete toTop() method to see if backwardScanExhausted is reset
rg -n -A 15 "public void toTop\(\)" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 618

🏁 Script executed:

#!/bin/bash
# Check the TimeFrame interface to understand row boundaries
cat core/src/main/java/io/questdb/cairo/sql/TimeFrame.java

Length of output: 3432

🏁 Script executed:

#!/bin/bash
# Look at test files to understand expected behavior
rg -n "AsOfJoin.*dense|asof.*dense" -i core/src/test/java/io/questdb/test/griffin/engine/join/AsOfJoinTest.java | head -20

Length of output: 1049

🏁 Script executed:

#!/bin/bash
# Check if there's proper handling when slave cursor is empty
rg -n "hasSlave\(false\)" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java

Length of output: 127

🏁 Script executed:

#!/bin/bash
# Check AbstractKeyedAsOfJoinRecordCursor for inherited state management
cat core/src/main/java/io/questdb/griffin/engine/join/AbstractKeyedAsOfJoinRecordCursor.java | head -100

Length of output: 4289

🏁 Script executed:

#!/bin/bash
# Check if symbolJoinKeyMapping can be null and if it's handled
rg -n "symbolJoinKeyMapping|joinKeyMapping" core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinDenseSingleSymbolRecordCursorFactory.java

Length of output: 271

@bluestreak01, I've completed a thorough review of the Dense ASOF JOIN implementation. Here are my findings:

Critical Issues

1. Bug in `toTop()` - Missing Reset of `backwardScanExhausted`

File: AsOfJoinDenseRecordCursorFactoryBase.java, Lines 297-310

The toTop() method resets forwardScanExhausted to false but does not reset backwardScanExhausted. This will cause incorrect behavior when the cursor is reused:

public void toTop() {
    super.toTop();
    if (fwdScanKeyToRowId.isOpen()) {
        fwdScanKeyToRowId.clear();
    }
    if (bwdScanKeyToRowId.isOpen()) {
        bwdScanKeyToRowId.clear();
    }
    isMasterHasNextPending = true;
    slaveCursorReadyForForwardScan = false;
    forwardScanExhausted = false;  // ✓ reset
    backwardRowId = -1;
    forwardRowId = -1;
    // ❌ backwardScanExhausted NOT reset!
}

Impact: On second iteration after toTop(), if backwardScanExhausted was true from the previous scan, the backward scan will be skipped entirely, leading to incorrect join results.

Fix: Add backwardScanExhausted = false; in the toTop() method.

2. Potential NPE in `SymbolToSymbolJoinKeyMapping`

File: SymbolToSymbolJoinKeyMapping.java, Line 51

The assertion assert slaveSymbolTable != null only runs in debug mode. In production, if of() is not called before getSlaveKey(), you'll get an NPE:

public int getSlaveKey(Record masterRecord) {
    assert slaveSymbolTable != null : "slaveSymbolTable must be set before calling getSlaveKey";
    // ❌ No production null check - will NPE if of() not called
    int masterKey = masterRecord.getInt(masterSymbolIndex);
    ...
}

Recommendation: Add a production null check or make the API safer by ensuring initialization in the constructor.

3. Missing Null Safety in `ChainedSymbolShortCircuit`

File: ChainedSymbolShortCircuit.java

The record class doesn't validate the mappings array:

public record ChainedSymbolShortCircuit(
        SymbolJoinKeyMapping[] mappings
) implements SymbolShortCircuit {
    @Override
    public boolean isShortCircuit(Record masterRecord) {
        for (int i = 0, n = mappings.length; i < n; i++) {
            // ❌ NPE if mappings is null or contains null elements
            if (mappings[i].getSlaveKey(masterRecord) == StaticSymbolTable.VALUE_NOT_FOUND) {

Recommendation: Add validation or @NotNull annotations.

Potential Issues

4. Backward Scan Frame Boundary Edge Case

File: AsOfJoinDenseRecordCursorFactoryBase.java, Lines 250-280

In the backward scan loop, when transitioning to a previous frame:

if (backwardRowId > frameRowLo) {
    backwardRowId--;
} else {
    if (!slaveTimeFrameCursor.prev()) {
        backwardScanExhausted = true;
        break;
    }
    slaveTimeFrameCursor.open();
    int frameIndex = slaveTimeFrame.getFrameIndex();
    frameRowLo = Rows.toRowID(frameIndex, slaveTimeFrame.getRowLo());
    backwardRowId = Rows.toRowID(frameIndex, slaveTimeFrame.getRowHi() - 1);
}

Analysis: This is actually correct. The code properly handles:

getRowLo() is inclusive, getRowHi() is exclusive (confirmed from TimeFrame interface)
Subtraction of 1 from getRowHi() gets the last valid row
The check backwardRowId > frameRowLo (not >=) is correct for inclusive lower bound

5. Forward Scan Frame Transition

File: AsOfJoinDenseRecordCursorFactoryBase.java, Lines 315-340

The forward scan correctly handles frame boundaries:

forwardRowId++;
if (forwardRowId == frameRowHi) {  // ✓ Correct: frameRowHi is exclusive
    if (!slaveTimeFrameCursor.next()) {
        forwardScanExhausted = true;
        break;
    }
    ...
}

This is correct - using == with exclusive frameRowHi.

6. Symbol Cache Management

File: SymbolToSymbolJoinKeyMapping.java, Lines 77-78

// we reserve space in the cache for null, so < instead of <=
if (masterKeyToSlaveKey.size() < maxCacheSize) {

This is intentional and correct - the comment explains the design decision.

Minor Observations

7. Circuit Breaker Not Used in Dense Cursor

The circuitBreaker parameter is passed to of() but never actually checked during the scanning loops. The Light and Fast algorithms likely check it. Consider adding circuit breaker checks in the forward/backward scan loops for long-running joins.

8. Map Memory Not Freed on Error

File: AsOfJoinDenseSingleSymbolRecordCursorFactory.java, Lines 62-79

The error handling properly frees maps with try-catch, which is good:

try {
    fwdScanKeyToRowId = MapFactory.createUnorderedMap(...);
    bwdScanKeyToRowId = MapFactory.createUnorderedMap(...);
    this.cursor = new AsOfJoinDenseSingleSymbolRecordCursor(...);
} catch (Throwable th) {
    Misc.free(bwdScanKeyToRowId);
    Misc.free(fwdScanKeyToRowId);
    close();
    throw th;
}

This is correct - good resource management.

Positive Findings

✅ Frame boundary handling is correct (inclusive lo, exclusive hi)
✅ Row ID calculations use Rows.toRowID() consistently
✅ Timestamp scaling is handled properly
✅ Tolerance interval logic is correctly applied
✅ State initialization when forwardRowId == -1 looks correct
✅ Map evacuation strategy is reasonable (no evacuation for performance)
✅ Resource cleanup in factories is proper

Summary

The implementation is generally solid, but the missing reset of backwardScanExhausted in toTop() is a critical bug that will cause correctness issues. The NPE risks are lower priority but should be addressed for robustness.

mtopolnik · 2025-11-10T14:28:02Z

@coderabbitai

Missing Reset of backwardScanExhausted

Results of backward scan are reusable across cursor rewinds.

The assertion assert slaveSymbolTable != null only runs in debug mode. In production, if of() is not called before getSlaveKey(), you'll get an NPE.

Yes. Both are equally bad, and aren't supposed to happen. The assert reassures the IDE that the field isn't expected to ever be null.

ChainedSymbolShortCircuit record class doesn't validate the mappings array

The caller guarantees no nulls in array.

coderabbitai · 2025-11-10T14:28:49Z

Tip

For best results, initiate chat on the files or code changes.

@mtopolnik, thank you for the clarifications!

Backward scan reusability: Understood - the backward scan results are intentionally cached and reusable across toTop() rewinds. That's a smart optimization that avoids re-scanning historical data that hasn't changed.

Assertion pattern: Makes sense - the assert serves as an IDE hint while the API contract ensures of() is always called before getSlaveKey().

Caller guarantees: Acknowledged - the ChainedSymbolShortCircuit relies on the caller to provide valid non-null arrays.

All three points are intentional design decisions. The implementation looks solid!

✏️ Learnings added

Learnt from: mtopolnik
URL: https://github.com/questdb/questdb/pull/6362

Timestamp: 2025-11-10T14:28:48.307Z
Learning: In AsOfJoinDenseRecordCursorFactoryBase.java, the `backwardScanExhausted` flag is intentionally NOT reset in `toTop()` because backward scan results are reusable across cursor rewinds. The backward scan caches historical matches that remain valid when the cursor is rewound.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

glasstiger · 2025-11-10T20:46:41Z

[PR Coverage check]

😍 pass : 368 / 389 (94.60%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/griffin/engine/table/TimeFrameRecordCursorImpl.java	0	2	00.00%
🔵	io/questdb/griffin/engine/join/SymbolKeyMappingRecordCopier.java	2	4	50.00%
🔵	io/questdb/griffin/engine/join/ChainedSymbolShortCircuit.java	5	6	83.33%
🔵	io/questdb/griffin/engine/join/AsOfJoinDenseSingleSymbolRecordCursorFactory.java	30	35	85.71%
🔵	io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactory.java	43	48	89.58%
🔵	io/questdb/griffin/engine/join/AsOfJoinDenseRecordCursorFactoryBase.java	154	159	96.86%
🔵	io/questdb/griffin/SqlCodeGenerator.java	81	82	98.78%
🔵	io/questdb/griffin/engine/join/StringToSymbolJoinKeyMapping.java	2	2	100.00%
🔵	io/questdb/griffin/engine/join/VarcharToSymbolJoinKeyMapping.java	2	2	100.00%
🔵	io/questdb/griffin/engine/join/AsOfJoinNoKeyFastRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/std/BitSet.java	4	4	100.00%
🔵	io/questdb/griffin/engine/join/LtJoinNoKeyFastRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/join/AsOfJoinFastRecordCursorFactory.java	4	4	100.00%
🔵	io/questdb/griffin/engine/join/FilteredAsOfJoinNoKeyFastRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/griffin/SqlHints.java	1	1	100.00%
🔵	io/questdb/griffin/engine/join/AsOfJoinIndexedRecordCursorFactory.java	3	3	100.00%
🔵	io/questdb/griffin/engine/join/SymbolToSymbolJoinKeyMapping.java	12	12	100.00%
🔵	io/questdb/griffin/engine/join/FilteredAsOfJoinFastRecordCursorFactory.java	1	1	100.00%
🔵	io/questdb/griffin/engine/join/NoopSymbolShortCircuit.java	2	2	100.00%
🔵	io/questdb/griffin/engine/join/AsOfJoinMemoizedRecordCursorFactory.java	3	3	100.00%
🔵	io/questdb/griffin/engine/join/AsOfJoinLightRecordCursorFactory.java	16	16	100.00%

puzpuzpuz · 2025-11-18T09:58:13Z

core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java

+                            master,
+                            slave,
+                            keyTypes,
+                            new SymbolKeyMappingRecordCopier(joinKeyMapping),


@mtopolnik this change broke consistency of the key types provided to the map and the actual Key calls: keyTypes contain [11] (single string key) while the actual copier is using Key#putInt() method.

mtopolnik and others added 30 commits October 30, 2025 15:24

Prototype

714570a

Small fix

3e1a500

Modified hashcode

98776a7

bugfix, integration, test

ffa59a8

Merge branch 'master' into mt_asof-linear-symbol

1835239

remove debris

8f3e640

fixes

8e29a58

Merge remote-tracking branch 'origin/master' into mt_asof-linear-symbol

fa6710d

Merge branch 'master' into mt_asof-linear-symbol

38a1926

AsOfJoinDenseRecordCursorFactory

b65ba09

Fix log line

6a01908

Simplify expression in Ligt record cursor

19a6f62

Merge branch 'master' into mt_asof-linear-symbol

760b58a

fix bugs found in review

d3b969c

oops

d8ed506

Fix some bugs

901fabe

Update test

a8c3bb7

Fix some bugs

5eceb55

Fix AsOfJoinFuzzTest bug

acaebfc

Fix bug

ae54424

Merge remote-tracking branch 'origin/mt_asof-linear-symbol' into mt_a…

55625e8

…sof-linear-symbol

Revert default ASOF JOIN to Fast

63f49d3

Merge branch 'master' into mt_asof-dense

1db9317

Cleanup after merge

62d2fe8

Fix test

a411ec2

Fix test

50ed933

Auto-format

f66b22d

Formatting

8d5387c

Formatting

e56e889

Formatting

274b911

mtopolnik added 10 commits November 7, 2025 12:31

Remove unnecessary ignores

13876f4

Stronger self-join test

5bdb549

Test self-join optimization with time offset

e853834

Simplify tests using multiline strings

d7ac53b

Implement Single Symbol Dense Cursor

7abbfa0

Merge branch 'mt_asof-cleanup' into mt_asof-dense

b5c1a60

Let Light cursor set symbolTable on RecordCopier

9f28713

Short-circuit Dense cursor when symbol not found

52cbacb

Delete test of deleted cursor

feb4588

Delete Single Symbol Light cursor

3903341

mtopolnik added 7 commits November 7, 2025 18:01

Remove extra blank line

2071edc

Merge branch 'master' into mt_asof-dense

e827978

Merge branch 'master' into mt_asof-dense

6c4a201

Remove "Scan" from cursor names

d0b772b

Merge branch 'master' into mt_asof-dense

2ee73fe

Remove "Scan" from cursor names

d1125a6

Simplify test assertion

6d0ee6a

mtopolnik and others added 2 commits November 10, 2025 15:30

Add circuit breaker and @NotNull

8aac540

small cleanup

32da72c

bluestreak01 approved these changes Nov 10, 2025

View reviewed changes

bluestreak01 merged commit f400522 into master Nov 10, 2025
36 checks passed

bluestreak01 deleted the mt_asof-dense branch November 10, 2025 21:05

puzpuzpuz reviewed Nov 18, 2025

View reviewed changes

tris0laris mentioned this pull request Nov 18, 2025

Real-time markouts for capital markets questdb/roadmap#98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sql): optimize ASOF JOIN for dense interleaving of left-hand and right-hand rows#6362

perf(sql): optimize ASOF JOIN for dense interleaving of left-hand and right-hand rows#6362
bluestreak01 merged 72 commits intomasterfrom
mt_asof-dense

mtopolnik commented Nov 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 7, 2025

Review skipped

Uh oh!

bluestreak01 commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025

Uh oh!

mtopolnik commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025

Uh oh!

glasstiger commented Nov 10, 2025

Uh oh!

Uh oh!

puzpuzpuz Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mtopolnik commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Light algo

Fast algo

Memoized algo

Dense algo

Discussion

Additional changes in the PR

Benchmarking

Tables

Query

Fast (Default)

Memoized

Memoized with Drive-By Caching

Light

Dense Single Symbol

Dense Single Symbol, Full Dataset

Bonus: Light with no history

Uh oh!

coderabbitai bot commented Nov 7, 2025

Review skipped

Uh oh!

bluestreak01 commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025

Critical Issues

1. Bug in toTop() - Missing Reset of backwardScanExhausted

2. Potential NPE in SymbolToSymbolJoinKeyMapping

3. Missing Null Safety in ChainedSymbolShortCircuit

Potential Issues

4. Backward Scan Frame Boundary Edge Case

5. Forward Scan Frame Transition

6. Symbol Cache Management

Minor Observations

7. Circuit Breaker Not Used in Dense Cursor

8. Map Memory Not Freed on Error

Positive Findings

Summary

Uh oh!

mtopolnik commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025

Uh oh!

glasstiger commented Nov 10, 2025

[PR Coverage check]

file detail

Uh oh!

Uh oh!

puzpuzpuz Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mtopolnik commented Nov 7, 2025 •

edited

Loading

1. Bug in `toTop()` - Missing Reset of `backwardScanExhausted`

2. Potential NPE in `SymbolToSymbolJoinKeyMapping`

3. Missing Null Safety in `ChainedSymbolShortCircuit`