[SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() #19821

kiszk · 2017-11-26T09:48:54Z

What changes were proposed in this pull request?

This PR adds a new API to CodeGenenerator.splitExpression since since several CodeGenenerator.splitExpression are used with ctx.INPUT_ROW to avoid code duplication.

How was this patch tested?

Used existing test suits

SparkQA · 2017-11-26T12:44:39Z

Test build #84191 has finished for PR 19821 at commit 7b6526a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-11-27T00:09:50Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

keep it unchanged?

good catch, thanks.

gatorsmile · 2017-11-27T00:16:05Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

To make it consistent, how about changing it to

def splitExpressions(row: String, arguments: Seq[(String, String)]): String = {

I see. In addition to that, how about this since many caller passes INPUT_ROW?

def splitExpressions(row: String, arguments: Seq[(String, String)] = ("InternalRow", INPUT_ROW)): String = {

gatorsmile · 2017-11-27T00:17:45Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

Let the caller also provides ctx.INPUT_ROW?
Change it to

arguments: Seq[(String, String)]

I agree that it is good from the view of consistency. I have one question in my mind.

If we use the same argument name arguments, is it possible to for developer to distinguish this splitExpressions from the below (rich) splitExpressions when they want to pass only three arguments expressions, funcName, and arguments?

Could we combine these two functions? since the next one already provides the default values.

Now, we have three splitExpressions in this PR.

splitExpression(row, expressions)

splitExpression(expressions, funcName, arguments)

splitExpression(expressions, funcName, arguments, returnType, ...)

It is hard to combine 2. and 3. since 2. takes care of INPUT_ROW and currentVars while 3. does not take care of them.
Are you suggesting me to combine 1. and 2. which take care of INPUT_ROW and currentVars?

Could you check all the callers of case 3 to ensure we do not miss checking INPUT_ROW and currentVars for any of them?

In addition, case 1 and 2 can be easily combined.

I think we need a different name for case 1 and 2. How about splitExpressionsOnInputRow?

cc @cloud-fan

I see. I will check it tonight.

I confirmed there are some cases that do not require to check INPUT_ROW and currentVars.

access fields in struct

UnsafeJoiner

comparison for ordering

I will try to merge cases 1 and 2. If a different name is required, I will use splitExpressionsOnInputRow.

gatorsmile · 2017-11-27T00:18:07Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

Could we avoid using return?

I followed to use return like the above splitExpressions. Is it better for this place to write as follows?

if (....) { expressions.mkString("\n") } else { splitExpressions(...) }

If-else looks better. Thanks!

cloud-fan · 2017-11-27T12:16:23Z

is it really worth? seems not used in many places and eventually the if-else will be removed after we make splitExpression work with whole stage codegen

kiszk · 2017-11-28T01:37:53Z

I have no strong preference.
@gatorsmile WDYT?

cloud-fan · 2017-11-29T01:52:06Z

@kiszk Can you fix the conflict? now we can add a middle-advanced version:

def splitExpressions(
    expressions: Seq[String],
    funcName: String,
    extraArguments: Seq[(String, String)])

kiszk · 2017-11-29T06:21:29Z

Sure, I have resolved the conflict in my environment. I will commit soon.

cloud-fan · 2017-11-29T11:39:24Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

+
+  /**
+   * Splits the generated code of expressions into multiple functions, because function has
+   * 64kb code size limit in JVM. This version takes care of INPUT_ROW and currentVars


nit

Similar to [[splitExpressions(expressions: Seq[String])]], but has customized function name and extra arguments.

cloud-fan · 2017-11-29T11:40:48Z

LGTM, can you remove WIP in PR title?

SparkQA · 2017-11-29T13:14:06Z

Test build #84293 has finished for PR 19821 at commit 5332f12.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-29T17:14:02Z

Test build #84298 has finished for PR 19821 at commit 0a218fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-11-29T17:19:57Z

thanks, merging to master!

gatorsmile · 2017-11-29T17:21:20Z

LGTM

kiszk mentioned this pull request Nov 26, 2017

[SPARK-22603][SQL] Fix 64KB JVM bytecode limit problem with FormatString #19817

Closed

kiszk changed the title ~~[WIP][SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions()~~ [SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() Nov 26, 2017

kiszk changed the title ~~[SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions()~~ [WIP][SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() Nov 26, 2017

gatorsmile reviewed Nov 27, 2017

View reviewed changes

Initial commit

0669ac4

address review comments

5332f12

kiszk force-pushed the SPARK-22608 branch from 7b6526a to 5332f12 Compare November 29, 2017 10:17

cloud-fan reviewed Nov 29, 2017

View reviewed changes

address review comment

0a218fc

kiszk changed the title ~~[WIP][SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions()~~ [SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() Nov 29, 2017

kiszk mentioned this pull request Nov 29, 2017

[SPARK-22570][SQL] Avoid to create a lot of global variables by using a local variable with allocation of an object in generated code #19797

Closed

asfgit closed this in 2848368 Nov 29, 2017

gatorsmile mentioned this pull request Dec 4, 2017

[SPARK-22682][SQL] HashExpression does not need to create global variables #19878

Closed

[SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() #19821

[SPARK-22608][SQL] add new API to CodeGeneration.splitExpressions() #19821

Uh oh!

Conversation

kiszk commented Nov 26, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Nov 26, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk Nov 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 27, 2017

Uh oh!

kiszk commented Nov 28, 2017

Uh oh!

cloud-fan commented Nov 29, 2017

Uh oh!

kiszk commented Nov 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 29, 2017

Uh oh!

SparkQA commented Nov 29, 2017

Uh oh!

SparkQA commented Nov 29, 2017

Uh oh!

cloud-fan commented Nov 29, 2017

Uh oh!

gatorsmile commented Nov 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kiszk Nov 27, 2017 •

edited

Loading

gatorsmile Nov 27, 2017 •

edited

Loading

gatorsmile Nov 27, 2017 •

edited

Loading

kiszk Nov 27, 2017 •

edited

Loading