[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve #231

markhamstra · 2018-05-08T21:11:01Z

LogicalPlan.resolve(...) uses linear searches to find an attribute matching a name. This is fine in normal cases, but gets problematic when you try to resolve a large number of columns on a plan with a large number of attributes.

This PR adds an indexing structure to resolve(...) in order to find potential matches quicker. This PR improves the reference resolution time for the following code by 4x (11.8s -> 2.4s):

val n = 4000
val values = (1 to n).map(_.toString).mkString(", ")
val columns = (1 to n).map("column" + _).mkString(", ")
val query =
  s"""
     |SELECT $columns
     |FROM VALUES ($values) T($columns)
     |WHERE 1=2 AND 1 IN ($columns)
     |GROUP BY $columns
     |ORDER BY $columns
     |""".stripMargin

spark.time(sql(query))

Existing tests.

Author: Herman van Hovell [email protected]

Closes apache#14083 from hvanhovell/SPARK-16406.

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

`LogicalPlan.resolve(...)` uses linear searches to find an attribute matching a name. This is fine in normal cases, but gets problematic when you try to resolve a large number of columns on a plan with a large number of attributes. This PR adds an indexing structure to `resolve(...)` in order to find potential matches quicker. This PR improves the reference resolution time for the following code by 4x (11.8s -> 2.4s): ``` scala val n = 4000 val values = (1 to n).map(_.toString).mkString(", ") val columns = (1 to n).map("column" + _).mkString(", ") val query = s""" |SELECT $columns |FROM VALUES ($values) T($columns) |WHERE 1=2 AND 1 IN ($columns) |GROUP BY $columns |ORDER BY $columns |""".stripMargin spark.time(sql(query)) ``` Existing tests. Author: Herman van Hovell <[email protected]> Closes apache#14083 from hvanhovell/SPARK-16406.

markhamstra · 2018-05-08T21:12:11Z

JMWG

ianlcsd

LGTM

markhamstra · 2018-05-08T22:48:37Z

jttp

markhamstra · 2018-05-08T23:16:50Z

retest this please

markhamstra · 2018-05-14T21:16:21Z

JMWG

csd-jenkins · 2018-05-15T00:15:40Z

Saw merge directive 'JMWG'. CSD Jenkins auto merging

markhamstra requested a review from ianlcsd May 8, 2018 21:11

ianlcsd approved these changes May 8, 2018

View reviewed changes

Keep old-style messages for AnalysisException with ambiguous references

89f2eb9

csd-jenkins merged commit cdcacda into alteryx:csd-2.2 May 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve #231

[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve #231

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

ianlcsd left a comment

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

markhamstra commented May 14, 2018

Uh oh!

csd-jenkins commented May 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve #231

[SPARK-16406][SQL] Improve performance of LogicalPlan.resolve #231

Uh oh!

Conversation

markhamstra commented May 8, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

ianlcsd left a comment

Choose a reason for hiding this comment

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

markhamstra commented May 8, 2018

Uh oh!

markhamstra commented May 14, 2018

Uh oh!

csd-jenkins commented May 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants