Skip to content

[multistage] Filter Clause Not Pushed Down In Some Join Queries #10392

@ankitsultana

Description

@ankitsultana

Repro-1: In this case the query has two not-in. With 1 not-in the filters are pushed down but not when there are two or more.

EXPLAIN PLAN FOR 
  SELECT COUNT(*) FROM userAttributes_OFFLINE 
    WHERE deviceOS = 'mac-os' 
    AND userUUID NOT IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-1') 
    AND userUUID NOT IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-2')
Execution Plan
LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)])
  LogicalExchange(distribution=[hash])
    LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
      LogicalFilter(condition=[IS NOT TRUE($5)])
        LogicalJoin(condition=[=($3, $4)], joinType=[left])
          LogicalExchange(distribution=[hash[3]])
            LogicalProject(deviceOS=[$0], userUUID0=[$2], $f1=[$4], userUUID2=[$1])
              LogicalFilter(condition=[AND(=($0, 'mac-os'), IS NOT TRUE($4))])
                LogicalJoin(condition=[=($2, $3)], joinType=[left])
                  LogicalExchange(distribution=[hash[2]])
                    LogicalProject(deviceOS=[$4], userUUID=[$5], userUUID0=[$5])
                      LogicalTableScan(table=[[userAttributes_OFFLINE]])
                  LogicalExchange(distribution=[hash[0]])
                    LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                      LogicalExchange(distribution=[hash[0]])
                        LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                          LogicalProject(userUUID=[$3], $f1=[true])
                            LogicalFilter(condition=[=($0, 'group-1')])
                              LogicalTableScan(table=[[userGroups_OFFLINE]])
          LogicalExchange(distribution=[hash[0]])
            LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
              LogicalExchange(distribution=[hash[0]])
                LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                  LogicalProject(userUUID=[$3], $f1=[true])
                    LogicalFilter(condition=[=($0, 'group-2')])
                      LogicalTableScan(table=[[userGroups_OFFLINE]])

Repro-2: In this case a broadcast is done on the right side, but still the filters should have been pushed down to the left side

EXPLAIN PLAN FOR
  SELECT COUNT(*) FROM userAttributes_OFFLINE 
    WHERE deviceOS = 'mac-os' 
	AND (
	  userUUID IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-1') 
	  OR userUUID IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-2')
	)


Execution Plan
LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)])
  LogicalExchange(distribution=[hash])
    LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
      LogicalFilter(condition=[OR(CAST(OR(AND(IS NOT NULL($4), <>($1, 0)), AND(<($2, $1), null, <>($1, 0), IS NULL($4)))):BOOLEAN NOT NULL, CAST(OR(AND(IS NOT NULL($9), <>($5, 0)), AND(<($6, $5), null, <>($5, 0), IS NULL($9)))):BOOLEAN NOT NULL)])
        LogicalJoin(condition=[=($7, $8)], joinType=[left])
          LogicalExchange(distribution=[hash[7]])
            LogicalProject(deviceOS=[$0], $f0=[$2], $f1=[$3], userUUID0=[$4], $f10=[$5], $f00=[$6], $f11=[$7], userUUID2=[$1])
              LogicalFilter(condition=[=($0, 'mac-os')])
                LogicalJoin(condition=[true], joinType=[inner])
                  LogicalExchange(distribution=[random])
                    LogicalProject(deviceOS=[$0], userUUID=[$1], $f0=[$2], $f1=[$3], userUUID0=[$4], $f10=[$6])
                      LogicalJoin(condition=[=($4, $5)], joinType=[left])
                        LogicalExchange(distribution=[hash[4]])
                          LogicalProject(deviceOS=[$0], userUUID=[$1], $f0=[$2], $f1=[$3], userUUID0=[$1])
                            LogicalJoin(condition=[true], joinType=[inner])
                              LogicalExchange(distribution=[random])
                                LogicalProject(deviceOS=[$4], userUUID=[$5])
                                  LogicalTableScan(table=[[userAttributes_OFFLINE]])
                              LogicalExchange(distribution=[broadcast])
                                LogicalProject($f0=[$0], $f00=[$0])
                                  LogicalAggregate(group=[{}], agg#0=[$SUM0($0)])
                                    LogicalExchange(distribution=[hash])
                                      LogicalAggregate(group=[{}], agg#0=[COUNT()])
                                        LogicalProject(groupUUID=[$0], userUUID=[$3])
                                          LogicalFilter(condition=[=($0, 'group-1')])
                                            LogicalTableScan(table=[[userGroups_OFFLINE]])
                        LogicalExchange(distribution=[hash[0]])
                          LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                            LogicalExchange(distribution=[hash[0]])
                              LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                                LogicalProject(userUUID=[$3], $f1=[true])
                                  LogicalFilter(condition=[=($0, 'group-1')])
                                    LogicalTableScan(table=[[userGroups_OFFLINE]])
                  LogicalExchange(distribution=[broadcast])
                    LogicalProject($f0=[$0], $f00=[$0])
                      LogicalAggregate(group=[{}], agg#0=[$SUM0($0)])
                        LogicalExchange(distribution=[hash])
                          LogicalAggregate(group=[{}], agg#0=[COUNT()])
                            LogicalProject(groupUUID=[$0], userUUID=[$3])
                              LogicalFilter(condition=[=($0, 'group-2')])
                                LogicalTableScan(table=[[userGroups_OFFLINE]])
          LogicalExchange(distribution=[hash[0]])
            LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
              LogicalExchange(distribution=[hash[0]])
                LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                  LogicalProject(userUUID=[$3], $f1=[true])
                    LogicalFilter(condition=[=($0, 'group-2')])
                      LogicalTableScan(table=[[userGroups_OFFLINE]])

Metadata

Metadata

Assignees

No one assigned

    Labels

    multi-stageRelated to the multi-stage query engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions