Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31078

outputOrdering should handle aliases correctly

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0
    • SQL
    • None

    Description

      Currently, `outputOrdering` doesn't respect aliases. Thus, the following would produce an unnecessary sort node:

      withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
            val df = (0 until 20).toDF("i").as("df")
            df.repartition(8, df("i")).write.format("parquet")
              .bucketBy(8, "i").sortBy("i").saveAsTable("t")
            val t1 = spark.table("t")
            val t2 = t1.selectExpr("i as ii")
            t1.join(t2, t1("i") === t2("ii")).explain
          }
      

      would produce an unnecessary sort node:

      == Physical Plan ==
      *(3) SortMergeJoin [i#8], [ii#10], Inner
      :- *(1) Project [i#8]
      :  +- *(1) Filter isnotnull(i#8)
      :     +- *(1) ColumnarToRow
      :        +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
      +- *(2) Sort [ii#10 ASC NULLS FIRST], false, 0
         +- *(2) Project [i#8 AS ii#10]
            +- *(2) Filter isnotnull(i#8)
               +- *(2) ColumnarToRow
                  +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
      

      Attachments

        Issue Links

          Activity

            People

              imback82 Terry Kim
              imback82 Terry Kim
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: