Skip to content

Conversation

@ankitsultana
Copy link
Contributor

Summary

The existing tables in the Colocated Join Quickstart have 4 partitions. Adding a table with 8 partitions allows us to test colocation with mismatching partition counts.

Test Plan

Ran the Quickstart. The new physical optimizer is able to eliminate shuffles without any hints (as we expected).

Example queries and plans below (notice how everything is identity exchange):

SET useMultistageEngine=true;
SET usePhysicalOptimizer=true;

EXPLAIN PLAN FOR SELECT * FROM userFactEvents WHERE userUUID NOT IN (
    SELECT userUUID FROM userGroups WHERE groupUUID = 'group-1'
  )
LIMIT 10

Execution Plan
PhysicalSort(fetch=[10])
  PhysicalExchange(exchangeStrategy=[SINGLETON_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
    PhysicalSort(fetch=[10])
      PhysicalProject(cityName=[$0], deviceOS=[$1], distanceMeters=[$2], productCode=[$3], tripAmount=[$4], tripCurrencyCode=[$5], ts=[$6], userUUID=[$7])
        PhysicalFilter(condition=[IS NOT TRUE($10)])
          PhysicalJoin(condition=[=($8, $9)], joinType=[left])
            PhysicalExchange(exchangeStrategy=[IDENTITY_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
              PhysicalProject(cityName=[$3], deviceOS=[$4], distanceMeters=[$5], productCode=[$6], tripAmount=[$7], tripCurrencyCode=[$8], ts=[$9], userUUID=[$10], userUUID0=[$10])
                PhysicalTableScan(table=[[default, userFactEvents]])
            PhysicalExchange(exchangeStrategy=[IDENTITY_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
              PhysicalAggregate(group=[{0}], agg#0=[MIN($1)])
                PhysicalProject(userUUID=[$4], $f1=[true])
                  PhysicalFilter(condition=[=($3, _UTF-8'group-1')])
                    PhysicalTableScan(table=[[default, userGroups]])
SET useMultistageEngine=true;
SET usePhysicalOptimizer=true;

explain plan for WITH tmp AS (
  SELECT userUUID FROM userFactEvents WHERE deviceOS IN ('macos', 'ios') OR tripAmount > 100
  UNION ALL
	SELECT userUUID FROM userAttributes 
      WHERE deviceOS IN ('macos') OR daysSinceFirstTrip > 100
),
tmp2 AS (
  SELECT DISTINCT userUUID FROM tmp
)
SELECT
  userUUID
FROM
  userFactEvents
WHERE
  deviceOS IN ('macos', 'ios')
  AND userUUID IN (SELECT userUUID FROM tmp2)

Execution Plan
PhysicalJoin(condition=[=($0, $1)], joinType=[semi])
  PhysicalExchange(exchangeStrategy=[IDENTITY_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
    PhysicalProject(userUUID=[$10])
      PhysicalFilter(condition=[SEARCH($4, Sarg[_UTF-8'ios':VARCHAR CHARACTER SET "UTF-8", _UTF-8'macos':VARCHAR CHARACTER SET "UTF-8"]:VARCHAR CHARACTER SET "UTF-8")])
        PhysicalTableScan(table=[[default, userFactEvents]])
  PhysicalAggregate(group=[{0}])
    PhysicalUnion(all=[true])
      PhysicalExchange(exchangeStrategy=[IDENTITY_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
        PhysicalProject(userUUID=[$10])
          PhysicalFilter(condition=[OR(SEARCH($4, Sarg[_UTF-8'ios':VARCHAR CHARACTER SET "UTF-8", _UTF-8'macos':VARCHAR CHARACTER SET "UTF-8"]:VARCHAR CHARACTER SET "UTF-8"), >($7, 100.0E0))])
            PhysicalTableScan(table=[[default, userFactEvents]])
      PhysicalExchange(exchangeStrategy=[IDENTITY_EXCHANGE], distKeys=[[]], execStrategy=[STREAMING], collation=[[]])
        PhysicalProject(userUUID=[$6])
          PhysicalFilter(condition=[OR(=($4, _UTF-8'macos'), >($3, 100))])
            PhysicalTableScan(table=[[default, userAttributes]])

@ankitsultana ankitsultana added multi-stage Related to the multi-stage query engine mse-physical-optimizer labels May 14, 2025
@ankitsultana ankitsultana requested a review from Jackie-Jiang May 14, 2025 22:29
Copy link
Collaborator

@shauryachats shauryachats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@codecov-commenter
Copy link

codecov-commenter commented May 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.36%. Comparing base (1a476de) to head (d4e6afd).
Report is 82 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15800      +/-   ##
============================================
+ Coverage     62.90%   63.36%   +0.46%     
- Complexity     1386     1394       +8     
============================================
  Files          2867     2893      +26     
  Lines        163354   165514    +2160     
  Branches      24952    25312     +360     
============================================
+ Hits         102755   104882    +2127     
+ Misses        52847    52713     -134     
- Partials       7752     7919     +167     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.34% <ø> (+0.47%) ⬆️
java-21 63.30% <ø> (+0.48%) ⬆️
skip-bytebuffers-false ?
skip-bytebuffers-true ?
temurin 63.36% <ø> (+0.46%) ⬆️
unittests 63.36% <ø> (+0.46%) ⬆️
unittests1 56.34% <ø> (+0.52%) ⬆️
unittests2 33.54% <ø> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ankitsultana ankitsultana merged commit c49675d into apache:master May 15, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mse-physical-optimizer multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants