Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29231

Constraints should be inferred from cast equality constraint

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0
    • SQL
    • None

    Description

      How to reproduce:

      scala> spark.sql("create table t1(c11 int, c12 decimal) ")
      res0: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("create table t2(c21 bigint, c22 decimal) ")
      res1: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("select t1.*, t2.* from t1 left join t2 on t1.c11=t2.c21 where t1.c11=1").explain
      == Physical Plan ==
      SortMergeJoin [cast(c11#0 as bigint)], [c21#2L], LeftOuter
      :- *(2) Sort [cast(c11#0 as bigint) ASC NULLS FIRST], false, 0
      :  +- Exchange hashpartitioning(cast(c11#0 as bigint), 200), true, [id=#30]
      :     +- *(1) Filter (isnotnull(c11#0) AND (c11#0 = 1))
      :        +- Scan hive default.t1 [c11#0, c12#1], HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c11#0, c12#1], Statistics(sizeInBytes=8.0 EiB)
      +- *(4) Sort [c21#2L ASC NULLS FIRST], false, 0
         +- Exchange hashpartitioning(c21#2L, 200), true, [id=#37]
            +- *(3) Filter isnotnull(c21#2L)
               +- Scan hive default.t2 [c21#2L, c22#3], HiveTableRelation `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c21#2L, c22#3], Statistics(sizeInBytes=8.0 EiB)
      
      

      PostgreSQL suport this feature:

      postgres=# create table t1(c11 int4, c12 decimal);
      CREATE TABLE
      postgres=# create table t2(c21 int8, c22 decimal);
      CREATE TABLE
      postgres=# explain select t1.*, t2.* from t1 left join t2 on t1.c11=t2.c21 where t1.c11=1;
                                 QUERY PLAN
      ----------------------------------------------------------------
       Nested Loop Left Join  (cost=0.00..51.43 rows=36 width=76)
         Join Filter: (t1.c11 = t2.c21)
         ->  Seq Scan on t1  (cost=0.00..25.88 rows=6 width=36)
               Filter: (c11 = 1)
         ->  Materialize  (cost=0.00..25.03 rows=6 width=40)
               ->  Seq Scan on t2  (cost=0.00..25.00 rows=6 width=40)
                     Filter: (c21 = 1)
      (7 rows)
      

      Attachments

        Issue Links

          Activity

            People

              yumwang Yuming Wang
              yumwang Yuming Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: