Skip to content

optimizer: Convert cross joins to hash joins#6302

Merged
mattnibs merged 3 commits intomainfrom
cross-join-convert
Oct 15, 2025
Merged

optimizer: Convert cross joins to hash joins#6302
mattnibs merged 3 commits intomainfrom
cross-join-convert

Conversation

@mattnibs
Copy link
Collaborator

@mattnibs mattnibs commented Oct 13, 2025

This commit adds functionality to the optimizer that rewrites cross joins with equality comparisons filters into hash joins where appropriate.

Partially fixes #6074

@mattnibs mattnibs requested a review from a team October 13, 2025 22:56
@mattnibs mattnibs changed the title Convert appropriate cross joins to hash joins optimizer: Convert cross joins to hash joins Oct 13, 2025
@mattnibs mattnibs force-pushed the cross-join-convert branch 3 times, most recently from 8c82abe to 98992f3 Compare October 13, 2025 23:14
@philrz
Copy link
Contributor

philrz commented Oct 14, 2025

This tests out nicely on the sqllogictests!

First, all the 1,129,186 sqllogictests that have consistently been running clean on tip of main (thousands of which are cross joins) also run clean on this branch at commit 98992f3. 👍

Also, there's been thousands of additional sqllogictests that had been skipped previously because the large cartesian product from their cross joins meant they took an extremely long time and in many cases would effectively never have allowed the queries to complete (e.g., nearly all the select5 ones). However, with this branch I can see many of these can now complete in what I'd call "a reasonable amount of time" (e.g., less than 10 seconds).

To cherry pick one as an example, here's the q48 query from that set which has been slow on main:

$ cat q48.spq 
SELECT x22,x44,x30,x36,x48,x27,x2,x57
  FROM t48 (FORMAT parquet), t57 (FORMAT parquet), t27 (FORMAT parquet), t2 (FORMAT parquet), t36 (FORMAT parquet), t30 (FORMAT parquet), t44 (FORMAT parquet), t22 (FORMAT parquet)
 WHERE b30=a48
   AND b44=a27
   AND a22=b27
   AND b22=a36
   AND b48=a57
   AND a57=7
   AND a30=b36
   AND a44=b2;

$ super -version
Version: b166423d6

$ time super -I q48.spq 
{x22:"table t22 row 7",x44:"table t44 row 10",x30:"table t30 row 5",x36:"table t36 row 9",x48:"table t48 row 7",x27:"table t27 row 4",x2:"table t2 row 6",x57:"table t57 row 7"}

real	0m45.500s
user	0m57.231s
sys	0m1.978s

Whereas on this branch:

$ time super -I q48.spq 
{x22:"table t22 row 7",x44:"table t44 row 10",x30:"table t30 row 5",x36:"table t36 row 9",x48:"table t48 row 7",x27:"table t27 row 4",x2:"table t2 row 6",x57:"table t57 row 7"}

real	0m0.706s
user	0m0.115s
sys	0m0.057s

I need to do some enhancements to my scripts to separate out all the ones that still take forever even with the changes on this branch (there's still definitely some), but since I only see improvements and no new failures I'm a fan of seeing this merged and will report to the team with a more complete summary when I've got it. 👍

This commit adds functionality to the optimizer that rewrites cross
joins with equality comparisons filters into hash joins where
appropriate.
@philrz
Copy link
Contributor

philrz commented Oct 15, 2025

@mattnibs: I know you've got your approval so if you wanna go ahead and merge this and pick it up as a separate bug afterwards, that's fine with me. But I've been enhancing my sqllogictest scripts as mentioned in my last comment and now I can see what look like some bugs.

The first involves this sqllogictest, and the test data and queries to repro are in the attached repro.tgz.

With the contents of that attachment unpacked, here's the expected result in Postgres:

$ psql --version
psql (PostgreSQL) 17.6 (Homebrew)

$ psql postgres -f repro.sql 
...
       x27       |       x41       |       x54       |       x32       
-----------------+-----------------+-----------------+-----------------
 table t27 row 9 | table t41 row 4 | table t54 row 2 | table t32 row 5
(1 row)

Here's that same result when running super at current tip of main.

$ super -version
Version: 18a3cf935

$ super -I q5.spq 
{x27:"table t27 row 9",x41:"table t41 row 4",x54:"table t54 row 2",x32:"table t32 row 5"}

Here's the unexpected result with commit 3f070e5 on this PR's branch.

$ super -version
Version: 3f070e5a6

$ super -I q5.spq 
{x27:"table t27 row 9",x41:"table t41 row 4",x54:"table t54 row 2",x32:"table t32 row 5"}
{x27:"table t27 row 1",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 3",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 4",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 5",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 6",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 8",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 10",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 2",x41:error("missing"),x54:error("missing"),x32:error("missing")}
{x27:"table t27 row 7",x41:error("missing"),x54:error("missing"),x32:error("missing")}

Here's an interesting twist, though! I also get the correct result if I use the earlier commit on this branch. 🤔

$ super -version
Version: b707672e0

$ super -I q5.spq 
{x27:"table t27 row 9",x41:"table t41 row 4",x54:"table t54 row 2",x32:"table t32 row 5"}

There's several others in that "select5" set, but the unexpected outputs all have a similar essence (i.e., an expected row, then a bunch of unexpected rows with one or more columns having all error("missing") columns), so I'll let you chew on this one and maybe what fixes that first one will fix 'em all.

@mattnibs
Copy link
Collaborator Author

@philrz thanks for pointing this out, this should be fixed in this pr as of 79ca6bd

@mattnibs mattnibs requested a review from nwt October 15, 2025 18:35
@philrz
Copy link
Contributor

philrz commented Oct 15, 2025

@mattnibs: Nice! Indeed, I just tried out the branch at commit 79ca6bd and the fix looks effective from here too! 👍

@mattnibs mattnibs merged commit 26e42ee into main Oct 15, 2025
3 checks passed
@mattnibs mattnibs deleted the cross-join-convert branch October 15, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL: Large cartesian product causes very long query runtime

3 participants