Skip to content

perf(sql): optimize parallel HORIZON JOIN for various data distributions#6867

Merged
bluestreak01 merged 19 commits intomasterfrom
puzpuzpuz_deterministic_horizon_join_perf
Mar 16, 2026
Merged

perf(sql): optimize parallel HORIZON JOIN for various data distributions#6867
bluestreak01 merged 19 commits intomasterfrom
puzpuzpuz_deterministic_horizon_join_perf

Conversation

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@puzpuzpuz puzpuzpuz commented Mar 11, 2026

Summary

  • HORIZON JOIN's keyed ASOF lookup now starts in backward-only mode (cheap for low-cardinality key spaces) and adaptively switches to forward scan mode within a frame when backward scan cost
    becomes excessive — e.g., with high-cardinality symbols or rare/infrequent keys that cause deep backward scans.
  • The switch uses two criteria: a relative threshold (SWITCH_FACTOR=8, MIN_GAP=1,024) where backward scan cost at a position must exceed 8x the gap to trigger, and an absolute threshold (BWD_SCAN_ABSOLUTE_THRESHOLD=131,072) to handle cross-partition boundaries where the relative check cannot trigger. Once switched, the frame stays in forward mode for its remainder.
  • Re-enables the ASOF row cache in HorizonJoinTimeFrameHelper and resets bookmarks on toTop() to eliminate jitter from out-of-order frame processing.

Background

The previous backward-only strategy (commit 6145bafc) clears the key map on each ASOF position change and backward-scans ~K rows to repopulate it. This is 30x faster than forward scan for K=50 FX symbols, but degrades badly when K is large (e.g., K=5,000 equity tickers) or when rare keys appear infrequently — each rare-key lookup scans thousands of rows backward to the beginning of the table.

The adaptive approach retains the backward-only advantage for the common low-K case while automatically switching to forward scan when the data distribution makes backward scans expensive.

How it works

Each frame starts in backward-only mode. On each ASOF position change, the algorithm compares the backward scan row count at the previous position against the gap (row distance) to the current position. If bwdScanCost > gap * 8 and gap > 1,024, it switches to forward scan mode permanently for that frame.

Gap calculation uses raw rowId subtraction (asOfRowId - prevAsOfRowId). For same-frame positions, this gives the exact row count. For cross-frame positions, the frame-index bits produce a value >= 2^44, which safely prevents the relative switch condition from triggering. An absolute threshold (bwdScanCost > 131,072) handles the cross-partition case — e.g., a deep backward scan for a rare key right before a partition boundary.

Benchmark results

Benchmarked with 100M-row market_data and 50K-row fx_trades tables across five data distribution scenarios (see this gist).

Test query:

SELECT
    t.symbol,
    t.side,
    h.offset / 1000000000 AS horizon_sec,
    count() AS n,
    avg(
        CASE t.side
            WHEN 'buy'  THEN ((m.best_bid + m.best_ask) / 2 - t.price)
                             / t.price * 10000
            WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2)
                             / t.price * 10000
        END
    ) AS avg_markout_bps,
    sum(
        CASE t.side
            WHEN 'buy'  THEN ((m.best_bid + m.best_ask) / 2 - t.price)
                             * t.quantity
            WHEN 'sell' THEN (t.price - (m.best_bid + m.best_ask) / 2)
                             * t.quantity
        END
    ) AS total_pnl
FROM fx_trades_xyz t
HORIZON JOIN market_data_xyz m ON (symbol)
    RANGE FROM 0s TO 30s STEP 5s AS h;

Dev box results

Environment: GraalVM CE 17, Ryzen 7900x, 64GB RAM

Scenario Patch Master Change
dense (K=10, gap~10) 175ms 231ms -24%
sparse (K=15, gap~2000) 13ms 406ms -97%
rare (K=200, Zipf 2.0) 232ms 448ms -48%
equity (K=5000) 1.19s 1.4s -15%
injected (10 common + 5 rare at start) 171s 156s +10%

The _injected scenario is a synthetic worst case where rare symbols appear only once at the very beginning of market_data. Both master and patch handle it poorly (156s and 171s respectively). The patch regresses by 10% because backward-only mode must complete at least one deep scan before the adaptive switch triggers, whereas master's always-forward approach never clears the key map.

c6a.16xlarge EC2 instance results

Scenario Master Patch Change
dense 422.8 ms 331.8 ms -21.5%
sparse 514.2 ms 19.4 ms -96.2%
rare 629.3 ms 509.9 ms -19.0%
equity 1293.5 ms 1120.0 ms -13.4%
injected 121023.7 ms 122681.8 ms +1.4%

The patch shows clear wins on dense (-21%), sparse (-96%, dramatic), rare (-19%), and equity (-13%). The injected scenario (rare symbols planted at the very start forcing full backward scans) is essentially unchanged (+1.4%, within noise).

The sparse result stands out — the adaptive scan switch appears to be extremely effective when the gap between LHS rows is large (~2000 market_data rows per trade) with few symbols.

Also, no huge jitter noticed: the worst jitter was ~26% and was in the equity scenario. In other scenarios it was within 3-10%.

Test plan

  • Existing HorizonJoinTest (113 tests) and HorizonJoinFuzzTest pass
  • New testHorizonJoinKeyedAdaptiveScanSwitch in HorizonJoinTest: deterministic test with a rare key at row 0 that triggers the backward-to-forward switch
  • New testParallelHorizonJoinAdaptiveScan and testParallelHorizonJoinAdaptiveScan2 in ParallelHorizonJoinFuzzTest: parallel tests with microsecond-spaced data in one partition, comparing HORIZON JOIN results against reference UNION ALL of ASOF JOINs
  • Manual benchmarking with test_distributions.sql scenarios (dense, sparse, rare, equity, injected) to verify switching behavior across different cardinalities and distributions

@puzpuzpuz puzpuzpuz self-assigned this Mar 11, 2026
@puzpuzpuz puzpuzpuz added Bug Incorrect or unexpected behavior SQL Issues or changes relating to SQL execution Performance Performance improvements labels Mar 11, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 11, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b661e7ac-f455-4cf5-a7ef-ee5552c12056

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch puzpuzpuz_deterministic_horizon_join_perf
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can validate your CodeRabbit configuration file in your editor.

If your editor has YAML language server, you can enable auto-completion and validation by adding # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json at the top of your CodeRabbit configuration file.

@puzpuzpuz puzpuzpuz marked this pull request as draft March 11, 2026 14:20
@puzpuzpuz puzpuzpuz force-pushed the puzpuzpuz_deterministic_horizon_join_perf branch from 5350812 to 1be65f9 Compare March 11, 2026 16:31
@puzpuzpuz puzpuzpuz changed the title fix(sql): remove jitter in HORIZON JOIN queries over large tables fix(sql): optimize parallel HORIZON JOIN for different data distributions Mar 13, 2026
@puzpuzpuz puzpuzpuz changed the title fix(sql): optimize parallel HORIZON JOIN for different data distributions fix(sql): optimize parallel HORIZON JOIN for various data distributions Mar 13, 2026
@puzpuzpuz puzpuzpuz removed the Bug Incorrect or unexpected behavior label Mar 13, 2026
@puzpuzpuz puzpuzpuz changed the title fix(sql): optimize parallel HORIZON JOIN for various data distributions perf(sql): optimize parallel HORIZON JOIN for various data distributions Mar 13, 2026
@puzpuzpuz puzpuzpuz marked this pull request as ready for review March 13, 2026 16:52
@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 85 / 89 (95.51%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 0 3 00.00%
🔵 io/questdb/griffin/engine/table/HorizonJoinTimeFrameHelper.java 23 24 95.83%
🔵 io/questdb/PropertyKey.java 3 3 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 3 3 100.00%
🔵 io/questdb/PropServerConfiguration.java 6 6 100.00%
🔵 io/questdb/griffin/engine/table/AsyncHorizonJoinRecordCursorFactory.java 22 22 100.00%
🔵 io/questdb/griffin/engine/table/BaseAsyncHorizonJoinAtom.java 6 6 100.00%
🔵 io/questdb/griffin/engine/table/HorizonJoinRecordCursorFactory.java 22 22 100.00%

@bluestreak01 bluestreak01 merged commit 6c79b41 into master Mar 16, 2026
53 checks passed
@bluestreak01 bluestreak01 deleted the puzpuzpuz_deterministic_horizon_join_perf branch March 16, 2026 14:19
maciulis pushed a commit to maciulis/questdb that referenced this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants