Skip to content

ZEPPELIN-1442. UDF can not be found due to 2 instances of SparkSession is created#1452

Closed
zjffdu wants to merge 2 commits intoapache:masterfrom
zjffdu:ZEPPELIN-1442
Closed

ZEPPELIN-1442. UDF can not be found due to 2 instances of SparkSession is created#1452
zjffdu wants to merge 2 commits intoapache:masterfrom
zjffdu:ZEPPELIN-1442

Conversation

@zjffdu
Copy link
Copy Markdown
Contributor

@zjffdu zjffdu commented Sep 23, 2016

What is this PR for?

The issue is that we create 2 SparkSession in zeppelin_pyspark.py (Because we create SQLContext first which will create SparkSession underlying). This cause 2 instances of SparkSession in JVM side and this means we have 2 instances of Catalog as well. So udf registered in SQLContext can not be used in SparkSession. This PR will create SparkSession first and then assign its internal SQLContext to sqlContext in pyspark.

What type of PR is it?

[Bug Fix]

Todos

  • - Task

What is the Jira issue?

How should this be tested?

Integration test is added.

Screenshots (if appropriate)

image

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@zjffdu zjffdu closed this Sep 23, 2016
@zjffdu zjffdu reopened this Sep 23, 2016
@felixcheung
Copy link
Copy Markdown
Member

LGTM

@Leemoonsoo
Copy link
Copy Markdown
Member

LGTM
@zjffdu Do you mind trigger CI one more time?

@zjffdu zjffdu closed this Sep 23, 2016
@zjffdu zjffdu reopened this Sep 23, 2016
@minahlee
Copy link
Copy Markdown
Member

@zjffdu could you rebase and resolve conflicts?

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Sep 26, 2016

@minahlee PR is rebased, and the failed test is irrelevant.

�[31m- should provide onclick method *** FAILED ***�[0m
�[31m  The code passed to eventually never returned normally. Attempted 1 times over 325.359079 milliseconds. Last failure message: 0 was not equal to 1. (AbstractAngularElemTest.scala:72)�[0m
�[32mAngularElem�[0m

@minahlee
Copy link
Copy Markdown
Member

@zjffdu Thank you! Merging to master and branch-0.6 if there is no more discussion

@asfgit asfgit closed this in 89cf826 Sep 27, 2016
asfgit pushed a commit that referenced this pull request Sep 27, 2016
…n is created

### What is this PR for?
The issue is that we create 2 SparkSession in zeppelin_pyspark.py (Because we create SQLContext first which will create SparkSession underlying). This cause 2 instances of SparkSession in JVM side and this means we have 2 instances of Catalog as well. So udf registered in SQLContext can not be used in SparkSession. This PR will create SparkSession first and then assign its internal SQLContext to sqlContext in pyspark.

### What type of PR is it?
[Bug Fix]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-1442

### How should this be tested?
Integration test is added.

### Screenshots (if appropriate)
![image](https://cloud.githubusercontent.com/assets/164491/18774832/7f270de4-818f-11e6-9e4f-c4def4353e5c.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

…

Author: Jeff Zhang <[email protected]>

Closes #1452 from zjffdu/ZEPPELIN-1442 and squashes the following commits:

a15e3c6 [Jeff Zhang] fix unit test
93060b6 [Jeff Zhang] ZEPPELIN-1442. UDF can not be found due to 2 instances of SparkSession is created

(cherry picked from commit 89cf826)
Signed-off-by: Mina Lee <[email protected]>
pedrozatta pushed a commit to pedrozatta/zeppelin that referenced this pull request Oct 27, 2016
…n is created

### What is this PR for?
The issue is that we create 2 SparkSession in zeppelin_pyspark.py (Because we create SQLContext first which will create SparkSession underlying). This cause 2 instances of SparkSession in JVM side and this means we have 2 instances of Catalog as well. So udf registered in SQLContext can not be used in SparkSession. This PR will create SparkSession first and then assign its internal SQLContext to sqlContext in pyspark.

### What type of PR is it?
[Bug Fix]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-1442

### How should this be tested?
Integration test is added.

### Screenshots (if appropriate)
![image](https://cloud.githubusercontent.com/assets/164491/18774832/7f270de4-818f-11e6-9e4f-c4def4353e5c.png)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

…

Author: Jeff Zhang <[email protected]>

Closes apache#1452 from zjffdu/ZEPPELIN-1442 and squashes the following commits:

a15e3c6 [Jeff Zhang] fix unit test
93060b6 [Jeff Zhang] ZEPPELIN-1442. UDF can not be found due to 2 instances of SparkSession is created
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants