optimizer: Convert cross joins to hash joins#6302
Conversation
8c82abe to
98992f3
Compare
|
This tests out nicely on the sqllogictests! First, all the 1,129,186 sqllogictests that have consistently been running clean on tip of Also, there's been thousands of additional sqllogictests that had been skipped previously because the large cartesian product from their cross joins meant they took an extremely long time and in many cases would effectively never have allowed the queries to complete (e.g., nearly all the select5 ones). However, with this branch I can see many of these can now complete in what I'd call "a reasonable amount of time" (e.g., less than 10 seconds). To cherry pick one as an example, here's the q48 query from that set which has been slow on Whereas on this branch: I need to do some enhancements to my scripts to separate out all the ones that still take forever even with the changes on this branch (there's still definitely some), but since I only see improvements and no new failures I'm a fan of seeing this merged and will report to the team with a more complete summary when I've got it. 👍 |
This commit adds functionality to the optimizer that rewrites cross joins with equality comparisons filters into hash joins where appropriate.
98992f3 to
b707672
Compare
42c730c to
c9a3a71
Compare
c9a3a71 to
3f070e5
Compare
|
@mattnibs: I know you've got your approval so if you wanna go ahead and merge this and pick it up as a separate bug afterwards, that's fine with me. But I've been enhancing my sqllogictest scripts as mentioned in my last comment and now I can see what look like some bugs. The first involves this sqllogictest, and the test data and queries to repro are in the attached repro.tgz. With the contents of that attachment unpacked, here's the expected result in Postgres: Here's that same result when running Here's the unexpected result with commit 3f070e5 on this PR's branch. Here's an interesting twist, though! I also get the correct result if I use the earlier commit on this branch. 🤔 There's several others in that "select5" set, but the unexpected outputs all have a similar essence (i.e., an expected row, then a bunch of unexpected rows with one or more columns having all |
This commit adds functionality to the optimizer that rewrites cross joins with equality comparisons filters into hash joins where appropriate.
Partially fixes #6074