perf(sql): optimized Markout Horizon CROSS JOIN#6283
Merged
bluestreak01 merged 243 commits intomasterfrom Nov 21, 2025
Merged
Conversation
Contributor
Author
Addressed in a1abc77 |
puzpuzpuz
reviewed
Nov 20, 2025
puzpuzpuz
reviewed
Nov 20, 2025
core/src/main/java/io/questdb/griffin/engine/join/MarkoutHorizonRecordCursorFactory.java
Outdated
Show resolved
Hide resolved
puzpuzpuz
reviewed
Nov 20, 2025
puzpuzpuz
reviewed
Nov 20, 2025
core/src/main/java/io/questdb/griffin/engine/join/MarkoutHorizonRecordCursorFactory.java
Outdated
Show resolved
Hide resolved
puzpuzpuz
reviewed
Nov 20, 2025
core/src/main/java/io/questdb/griffin/engine/join/MarkoutHorizonRecordCursorFactory.java
Show resolved
Hide resolved
puzpuzpuz
reviewed
Nov 20, 2025
core/src/main/java/io/questdb/griffin/engine/join/MarkoutHorizonRecordCursorFactory.java
Show resolved
Hide resolved
puzpuzpuz
previously approved these changes
Nov 20, 2025
core/src/test/java/io/questdb/test/griffin/engine/join/AsOfJoinFuzzTest.java
Show resolved
Hide resolved
Contributor
[PR Coverage check]😍 pass : 334 / 353 (94.62%) file detail
|
bluestreak01
approved these changes
Nov 21, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the Markout Curve query we're optimizing:
This subquery generates the sampling points over the markout horizon of every order:
It performs poorly on a larger number of orders, because the CROSS JOIN output must be fully materialized and then sorted.
This PR introduces a special-case cursor factory that emits the CROSS JOIN output directly in the required order, without having to materialize the
orderstable. It does materializeoffsetsbecause it needs random access to it, but this table is of a very limited size, coming fromlong_sequence().Example usage and benchmark:
This creates 3,000,001 orders spaced at 200 µs.
This creates the markout horizon sampling grid over 10 minutes, spaced at 10 seconds. There are 121 sampling points for each order. Therefore, this results in 121 * 3,000,001 = 363,000,121 rows.
The query is built to emphasize the worst case for the Markout Horizon algo in terms of RAM usage: tight spacing of orders vs. the markout horizon. The algorithm must hold 3 million iterator structures in RAM at once. It uses 40 bytes per iterator.
I benchmarked it on a
r7a.4xlargeEC2 box.Without the markout hint, the query took 135 seconds, and RAM usage went from 2.3 GB baseline to 10.7 GB.
With the hint, the query took 17 seconds, with an even split between aggregation and row generation. RAM usage went from 2.3 to 2.4 GB.