feat: custom join scan planning (with simple execution)#3930
Merged
Conversation
- Add CompositeKey enum to store actual key values (not just hashes) - Add KeyValue struct to store copied datum bytes for any PostgreSQL type - Add JoinKeyInfo struct for runtime key extraction info - Update JoinKeyPair to include type_oid, typlen, typbyval - Update extract_join_conditions to capture type info from Vars - Replace i64 hash key with CompositeKey for correct equality comparison - Remove CROSS_JOIN_KEY constant in favor of CompositeKey::CrossJoin - Add extract_composite_key and copy_datum_to_key_value helper functions - Support varlena (TEXT, BYTEA), cstring, and fixed-length types This fixes: - Issue 1: Hash table key type limitation (was i64 only) - Issue 2: Single join key only (now supports composite keys) - Issue 3: Cross-join magic key collision (now uses distinct enum variant)
- Add extract_non_equijoin_quals helper to filter restrictlist - Initialize join_qual_state and join_qual_econtext in begin_custom_scan - Only create qual state when has_other_conditions is true - Skip equi-join conditions (Var = Var) that are handled by hash lookup This fixes Issue 4: join_qual_state was never initialized, causing non-equijoin predicates to be silently ignored during execution.
Previously, if extract_join_level_conditions failed, it would silently return None. Now it logs a debug1 message to help with troubleshooting why JoinScan wasn't proposed for a particular query.
These methods were never called - PostgreSQL's Limit node handles limiting. Remove the dead code rather than keeping it with #[allow(dead_code)].
Factor out the join-level predicate evaluation logic into a reusable helper method used by both hash join and nested loop execution paths. This eliminates ~40 lines of duplicated code.
stuhood
reviewed
Jan 21, 2026
Collaborator
stuhood
left a comment
There was a problem hiding this comment.
Thanks Moe!
This is awesome work, but it would be great to simplify/restrict it before landing.
mdashti
commented
Jan 22, 2026
stuhood
approved these changes
Jan 22, 2026
Collaborator
stuhood
left a comment
There was a problem hiding this comment.
Thanks a lot Moe!
Let's get it in and iterate.
59eacbe to
51461c0
Compare
This was referenced Jan 23, 2026
stuhood
added a commit
that referenced
this pull request
Jan 28, 2026
) ## What This change swaps from the hash join execution method which was added in #3930 to explicitly using DataFusion's hash join. Future changes will introduce DataFusion's optimizer (by producing logical nodes rather than physical nodes) so that it can take advantage of the sorted segments which will be provided by #3988. ## Why As described in #3930, the implementation there was explicitly temporary. We will be leaning in to using DataFusion to execute columnar joins. ## Tests The regression tests and proptests pass, with one exception: numeric columns cannot safely be pulled up from fast fields currently (see #2968). Additionally, improved `qgen`'s handling of panics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ticket(s) Closed
What
Implements the planning infrastructure for
JoinScan, a PostgreSQL custom scan operator for JOIN queries with BM25 full-text search predicates. Includes a basic execution implementation to validate the planning logic.Why
When joining tables where one side has a BM25 search predicate and the query has a LIMIT, PostgreSQL's native planner doesn't know that Tantivy can efficiently return top-N results (in score order). This PR lays the groundwork for optimized join execution by:
How
Planning (main focus):
create_custom_path,plan_custom_path)Execution (simple/unoptimized):
work_memlimit and nested loop fallbackTests