ZEPPELIN-160 Working with provided Spark, Hadoop. by Leemoonsoo · Pull Request #244 · apache/zeppelin

Leemoonsoo · 2015-08-22T01:46:02Z

Zeppelin currently embeds all spark dependencies under interpreter/spark and loading them on runtime.

Which is useful because of user can try Zeppelin + Spark with local mode without installation and configuration of spark.

However, when user has existing spark and hadoop installation, it'll be really helpful to just pointing them instead of build zeppelin with specific version of spark and hadoop combination.

This PR implements ability to use external spark and hadoop installation, by doing

spark-dependencies module packages spark/hadoop dependencies under interpreter/spark/dep, to support local mode (current behavior)
When SPARK_HOME and HADOOP_HOME is defined, bin/interpreter.sh exclude interpreter/spark/dep from classpath and include system installed spark and hadoop into the classpath.

This patch makes Zeppelin binary independent from spark version. Once Zeppelin is been built, SPARK_HOME can point any version of spark.

Leemoonsoo · 2015-08-23T01:48:02Z

Here's summary of changes made by this patch.

Add spark-dependencies submodule

spark-dependencies maven submodule is created. It is responsible for copy all the spark/hadoop dependencies under interpreter/spark/dep.

Spark/Hadoop dependencies in spark maven submodule is set to provided, while they're loaded on runtime from either interpreter/spark/dep or SPARK_HOME,HADOOP_HOME.

bin/interpreter.sh

bin/interpreter.sh checks if SPARK_HOME and HADOOP_HOME is defined.
If they're not defined, it adds interpreter/spark/dep into classpath.
If they're defined, it does add directories from SPARK_HOME and HADOOP_HOME into classpath.

It also searches for spark-*.conf file under SPARK_HOME/conf and automatically add them into ZEPPELIN_JAVA_OPTS.

remove use of travis-install.sh from .travis

While travis-install.sh reduces logs, it brings some problem.
When build is hanging for some reason, before travis-install.sh gets error and printing them, travis terminates the build container. It makes very hard to debug.

This is ready to review.

felixcheung · 2015-08-23T21:07:46Z

spark/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java

Would you elaborate on why we would need to do this? Curious - isn't SQLContext always has the sql() method?

It's because of method signature is not exactly the same.
version 1.3 and later has def sql(sqlText: String): DataFrame while version 1.2 and prior has def sql(sqlText: String): SchemaRDD

Good catch! May be it is worth adding a comment to the source code itself, for the history?

felixcheung · 2015-08-24T07:12:25Z

looks good. this is a very important change, thanks for making it

bzz · 2015-08-25T06:50:26Z

Thank you for a really cool feature!

Quick question:

This patch makes Zeppelin binary independent from spark version. Once Zeppelin is been built, SPARK_HOME can point any version of spark.

This is true only for Spark, not for Hadoop though, am I right?

It's a bit unclear are both, SPARK_HOME and HADOOP_HOME are required to use this mode or only SPARK_HOME should be enough.

bzz · 2015-08-25T06:58:13Z

bin/interpreter.sh

Why not

if [[ -z "${PYTHONPATH}" ]]; then

Leemoonsoo · 2015-08-25T18:17:23Z

It's a bit unclear are both, SPARK_HOME and HADOOP_HOME are required to use this mode or only SPARK_HOME should be enough.

It depends on spark distribution.
If system provided spark is "pre-build with user-provided Hadoop", then both SPARK_HOME and HADOOP_HOME is required. Otherwise, only SPARK_HOME would be enough.

…re defined

Leemoonsoo · 2015-09-01T00:34:39Z

I have rebased to resolve merge conflict.
Merging if there're no more discussions.

bzz · 2015-09-01T07:46:44Z

👍

Zeppelin currently embeds all spark dependencies under interpreter/spark and loading them on runtime. Which is useful because of user can try Zeppelin + Spark with local mode without installation and configuration of spark. However, when user has existing spark and hadoop installation, it'll be really helpful to just pointing them instead of build zeppelin with specific version of spark and hadoop combination. This PR implements ability to use external spark and hadoop installation, by doing * spark-dependencies module packages spark/hadoop dependencies under interpreter/spark/dep, to support local mode (current behavior) * When SPARK_HOME and HADOOP_HOME is defined, bin/interpreter.sh exclude interpreter/spark/dep from classpath and include system installed spark and hadoop into the classpath. This patch makes Zeppelin binary independent from spark version. Once Zeppelin is been built, SPARK_HOME can point any version of spark. Author: Lee moon soo <[email protected]> Closes apache#244 from Leemoonsoo/spark_provided and squashes the following commits: 654c378 [Lee moon soo] use consistant, simpler expressions 57b3f96 [Lee moon soo] Add comment eb4ec09 [Lee moon soo] fix reading spark-*.conf file bacfd93 [Lee moon soo] Update readme 3a88c77 [Lee moon soo] Test use explicitly %spark 5a17d9c [Lee moon soo] Call sqlContext.sql using reflection 615c395 [Lee moon soo] get correct method 0c28561 [Lee moon soo] call listenerBus() using reflection 62b8c45 [Lee moon soo] Print all logs 5edb6fd [Lee moon soo] Use reflection to call addListener af7a925 [Lee moon soo] add pyspark flag 5f8a734 [Lee moon soo] test -> package a0150cf [Lee moon soo] not use travis-install for mvn test cd4519c [Lee moon soo] try sys.stdout.write instead of print 6304180 [Lee moon soo] enable 1.2.x test 797c0e2 [Lee moon soo] enable 1.3.x test 8de7add [Lee moon soo] trying to find why travis is not closing the test cf0a61e [Lee moon soo] rm -rf only interpreter directory instead of mvn clean 2606c04 [Lee moon soo] bringing travis-install.sh back df8f0ba [Lee moon soo] test more efficiently 9d6b40f [Lee moon soo] Update .travis 2ca3d95 [Lee moon soo] set SPARK_HOME 2a61ecd [Lee moon soo] Clear interpreter directory on mvn clean f1e8789 [Lee moon soo] update travis config 9e812e7 [Lee moon soo] Use reflection not to use import org.apache.spark.scheduler.Stage c3d96c1 [Lee moon soo] Handle ZEPPELIN_CLASSPATH proper way 0f9598b [Lee moon soo] py4j version as a property 1b7f951 [Lee moon soo] Add dependency for compile and test b1d62a5 [Lee moon soo] Add scala-library in test scope c49be62 [Lee moon soo] Add hadoop jar and spark jar from HADOOP_HOME, SPARK_HOME when they are defined 2052aa3 [Lee moon soo] Load interpreter/spark/dep only when SPARK_HOME is undefined 54fdf0d [Lee moon soo] Separate spark-dependency into submodule (cherry picked from commit 5de01c6) Signed-off-by: Lee moon soo <[email protected]>

Spark CI

Leemoonsoo force-pushed the spark_provided branch from 2ce745d to 4ebc5ee Compare August 22, 2015 15:05

Leemoonsoo mentioned this pull request Aug 23, 2015

ZEPPELIN-256 Self diagnosis spark configuration #246

Closed

felixcheung reviewed Aug 23, 2015
View reviewed changes

bzz reviewed Aug 25, 2015
View reviewed changes

bin/interpreter.sh Outdated

Copy link
Copy Markdown

Member

bzz Aug 25, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

if [[ -z "${PYTHONPATH}" ]]; then

Leemoonsoo added 20 commits August 31, 2015 10:14

Separate spark-dependency into submodule

54fdf0d

Load interpreter/spark/dep only when SPARK_HOME is undefined

2052aa3

Add hadoop jar and spark jar from HADOOP_HOME, SPARK_HOME when they a…

c49be62

…re defined

Add scala-library in test scope

b1d62a5

Add dependency for compile and test

1b7f951

py4j version as a property

0f9598b

Handle ZEPPELIN_CLASSPATH proper way

c3d96c1

Use reflection not to use import org.apache.spark.scheduler.Stage

9e812e7

update travis config

f1e8789

Clear interpreter directory on mvn clean

2a61ecd

set SPARK_HOME

2ca3d95

Update .travis

9d6b40f

test more efficiently

df8f0ba

bringing travis-install.sh back

2606c04

rm -rf only interpreter directory instead of mvn clean

cf0a61e

trying to find why travis is not closing the test

8de7add

enable 1.3.x test

797c0e2

enable 1.2.x test

6304180

try sys.stdout.write instead of print

cd4519c

not use travis-install for mvn test

a0150cf

Leemoonsoo added 11 commits August 31, 2015 10:14

test -> package

5f8a734

add pyspark flag

af7a925

Use reflection to call addListener

5edb6fd

Print all logs

62b8c45

call listenerBus() using reflection

0c28561

get correct method

615c395

Call sqlContext.sql using reflection

5a17d9c

Test use explicitly %spark

3a88c77

Update readme

bacfd93

fix reading spark-*.conf file

eb4ec09

Add comment

57b3f96

Leemoonsoo force-pushed the spark_provided branch 2 times, most recently from 7c1745f to 57b3f96 Compare August 31, 2015 19:12

use consistant, simpler expressions

654c378

asfgit closed this in 5de01c6 Sep 1, 2015

lelou6666 pushed a commit to lelou6666/incubator-zeppelin that referenced this pull request Mar 25, 2016

Merge pull request apache#244 from epahomov/SparkCI

94f295f

Spark CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZEPPELIN-160 Working with provided Spark, Hadoop.#244

ZEPPELIN-160 Working with provided Spark, Hadoop.#244
Leemoonsoo wants to merge 32 commits intoapache:masterfrom
Leemoonsoo:spark_provided

Leemoonsoo commented Aug 22, 2015

Uh oh!

Leemoonsoo commented Aug 23, 2015

Uh oh!

felixcheung Aug 23, 2015

Uh oh!

Leemoonsoo Aug 23, 2015

Uh oh!

bzz Aug 25, 2015

Uh oh!

felixcheung commented Aug 24, 2015

Uh oh!

bzz commented Aug 25, 2015

Uh oh!

bzz Aug 25, 2015

Uh oh!

Leemoonsoo commented Aug 25, 2015

Uh oh!

Leemoonsoo commented Sep 1, 2015

Uh oh!

bzz commented Sep 1, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Leemoonsoo commented Aug 22, 2015

Uh oh!

Leemoonsoo commented Aug 23, 2015

Add spark-dependencies submodule

bin/interpreter.sh

remove use of travis-install.sh from .travis

Uh oh!

felixcheung Aug 23, 2015

Choose a reason for hiding this comment

Uh oh!

Leemoonsoo Aug 23, 2015

Choose a reason for hiding this comment

Uh oh!

bzz Aug 25, 2015

Choose a reason for hiding this comment

Uh oh!

felixcheung commented Aug 24, 2015

Uh oh!

bzz commented Aug 25, 2015

Uh oh!

bzz Aug 25, 2015

Choose a reason for hiding this comment

Uh oh!

Leemoonsoo commented Aug 25, 2015

Uh oh!

Leemoonsoo commented Sep 1, 2015

Uh oh!

bzz commented Sep 1, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants