feat(sql): ASOF and LT JOIN TOLERANCE support#5713
Conversation
and the slave table does not contain a given symbol at all
|
could we improve ergonomics of this error message - it is total dead end for the user: edit by @jerrinot: done |
|
could we also open a PR to improve syntax highlighting? edit by @jerrinot: PR pending: questdb/sql-grammar#47 |
|
These two errors should be the same: and edit by @jerrinot: done |
|
can we say what the limit is? edit by @jerrinot: done |
|
asof perf is great, just the ergonomics of the SQL could be better |
|
also, as a nit, it would be good to throttle back the asof fuzz test, these tests take close to 1 min on my desktop. A way to do it is not to go through all combinations at all times, but rather pick a combination of parameters randomly. Right now the test randomizes the data, but not the parameter combinations. edit by @jerrinot: done |
|
an unrelated failure, fixed by #5758 |
|
grammar PR: questdb/sql-grammar#47 |
|
@bluestreak01: about year units: we would need to adjust for leap years. for each master row. instead of using the same constant for all rows as we do now. do you think it's worth it? it could have a performance impact too. |
|
@jerrinot we support months - don't we already adjust for leap years? |
|
@bluestreak01 TOLERANCE does not support months either. For the same reason. See supported units: questdb/core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java Lines 771 to 794 in 87117b6 I took the list of units from SAMPLE BY, but removed units where SAMPLE BY has a dedicated sampler instead of converting the period to micros. |
[PR Coverage check]😍 pass : 707 / 724 (97.65%) file detail
|
Documents questdb/questdb#5713 Besides documenting the `TOLERANCE` keyword it also change the SQL HINTS page to reflect ASOF/LT JOINs noiw uses binary search by default. --------- Co-authored-by: Emre Berk Kaya <[email protected]> Co-authored-by: Nick Woolmer <[email protected]>






Currently, ASOF JOIN matches records from the right table with timestamps that are equal to or earlier than the timestamp in the left table. This PR addresses the feature request to limit how far back in time the join should look for a match.
This enhancement adds a
TOLERANCEclause to the ASOF and LT JOIN syntax. TheTOLERANCEparameter accepts a time interval value (e.g., 2s, 100ms, 1d). When specified, a record from the left table t1 at t1.ts will only be joined with a record from the right table t2 at t2.ts if t2.ts <= t1.ts AND t1.ts - t2.ts <= tolerance_value.This provides more fine-grained control over ASOF joins, particularly useful in scenarios with sparse data where a simple "equal or earlier" match might pick records that are too distant in time to be relevant.
Or without keys:
Performance impact
Specifying
TOLERANCEcan also improve performance.ASOF JOINexecution plans often scan backward in time on the right table to find a matching entry for each left-table row.TOLERANCEallows these scans to terminate early - once a right-table record is older than the left-table record by more than the specified tolerance - thus avoiding unnecessary processing of more distant records.💥 Breaking Change: TOLERANCE as a new keyword in JOIN
This change introduces
TOLERANCEas a new keyword specifically within the JOIN clause. This may break existing queries whereTOLERANCEwas used as an unquoted table alias for the right-hand table in a JOIN.Example of affected query:
Reason for breakage:
After this change,
TOLERANCEin the position above is interpreted as the new keyword. The query will fail because the parser expects an interval value (e.g., 2s) to follow theTOLERANCEkeyword, which is missing in the example.Solution:
To use
TOLERANCEas a table alias in this context, it must now be enclosed in double-quotes, as per standard SQL for identifiers that are also keywords:Additional optimizations
This PR introduces a performance enhancement for specific keyed ASOF JOIN scenarios, particularly when the join key is of the
SYMBOLtype:The optimization works as follows: When processing a row from the left_table, if its particular
SYMBOLkey value is entirely absent in the right_table's corresponding symbol_key column (meaning no records in right_table share this key value), the improved execution plan can now detect this. By "exiting early" from the search for this non-existent key, the overall query performance can be significantly improved, especially in cases with many such missing keys.Closes #5562
Documentation PR: questdb/documentation#195