ZEPPELIN-160 Working with provided Spark, Hadoop.#244
ZEPPELIN-160 Working with provided Spark, Hadoop.#244Leemoonsoo wants to merge 32 commits intoapache:masterfrom
Conversation
2ce745d to
4ebc5ee
Compare
|
Here's summary of changes made by this patch. Add spark-dependencies submodulespark-dependencies maven submodule is created. It is responsible for copy all the spark/hadoop dependencies under interpreter/spark/dep. Spark/Hadoop dependencies in spark maven submodule is set to provided, while they're loaded on runtime from either interpreter/spark/dep or SPARK_HOME,HADOOP_HOME. bin/interpreter.shbin/interpreter.sh checks if SPARK_HOME and HADOOP_HOME is defined. It also searches for spark-*.conf file under SPARK_HOME/conf and automatically add them into ZEPPELIN_JAVA_OPTS. remove use of travis-install.sh from .travisWhile travis-install.sh reduces logs, it brings some problem. This is ready to review. |
There was a problem hiding this comment.
Would you elaborate on why we would need to do this? Curious - isn't SQLContext always has the sql() method?
There was a problem hiding this comment.
It's because of method signature is not exactly the same.
version 1.3 and later has def sql(sqlText: String): DataFrame while version 1.2 and prior has def sql(sqlText: String): SchemaRDD
There was a problem hiding this comment.
Good catch! May be it is worth adding a comment to the source code itself, for the history?
|
looks good. this is a very important change, thanks for making it |
|
Thank you for a really cool feature! Quick question:
This is true only for Spark, not for Hadoop though, am I right? It's a bit unclear are both, |
bin/interpreter.sh
Outdated
There was a problem hiding this comment.
Why not
if [[ -z "${PYTHONPATH}" ]]; then
It depends on spark distribution. |
7c1745f to
57b3f96
Compare
|
I have rebased to resolve merge conflict. |
|
👍 |
Zeppelin currently embeds all spark dependencies under interpreter/spark and loading them on runtime. Which is useful because of user can try Zeppelin + Spark with local mode without installation and configuration of spark. However, when user has existing spark and hadoop installation, it'll be really helpful to just pointing them instead of build zeppelin with specific version of spark and hadoop combination. This PR implements ability to use external spark and hadoop installation, by doing * spark-dependencies module packages spark/hadoop dependencies under interpreter/spark/dep, to support local mode (current behavior) * When SPARK_HOME and HADOOP_HOME is defined, bin/interpreter.sh exclude interpreter/spark/dep from classpath and include system installed spark and hadoop into the classpath. This patch makes Zeppelin binary independent from spark version. Once Zeppelin is been built, SPARK_HOME can point any version of spark. Author: Lee moon soo <[email protected]> Closes apache#244 from Leemoonsoo/spark_provided and squashes the following commits: 654c378 [Lee moon soo] use consistant, simpler expressions 57b3f96 [Lee moon soo] Add comment eb4ec09 [Lee moon soo] fix reading spark-*.conf file bacfd93 [Lee moon soo] Update readme 3a88c77 [Lee moon soo] Test use explicitly %spark 5a17d9c [Lee moon soo] Call sqlContext.sql using reflection 615c395 [Lee moon soo] get correct method 0c28561 [Lee moon soo] call listenerBus() using reflection 62b8c45 [Lee moon soo] Print all logs 5edb6fd [Lee moon soo] Use reflection to call addListener af7a925 [Lee moon soo] add pyspark flag 5f8a734 [Lee moon soo] test -> package a0150cf [Lee moon soo] not use travis-install for mvn test cd4519c [Lee moon soo] try sys.stdout.write instead of print 6304180 [Lee moon soo] enable 1.2.x test 797c0e2 [Lee moon soo] enable 1.3.x test 8de7add [Lee moon soo] trying to find why travis is not closing the test cf0a61e [Lee moon soo] rm -rf only interpreter directory instead of mvn clean 2606c04 [Lee moon soo] bringing travis-install.sh back df8f0ba [Lee moon soo] test more efficiently 9d6b40f [Lee moon soo] Update .travis 2ca3d95 [Lee moon soo] set SPARK_HOME 2a61ecd [Lee moon soo] Clear interpreter directory on mvn clean f1e8789 [Lee moon soo] update travis config 9e812e7 [Lee moon soo] Use reflection not to use import org.apache.spark.scheduler.Stage c3d96c1 [Lee moon soo] Handle ZEPPELIN_CLASSPATH proper way 0f9598b [Lee moon soo] py4j version as a property 1b7f951 [Lee moon soo] Add dependency for compile and test b1d62a5 [Lee moon soo] Add scala-library in test scope c49be62 [Lee moon soo] Add hadoop jar and spark jar from HADOOP_HOME, SPARK_HOME when they are defined 2052aa3 [Lee moon soo] Load interpreter/spark/dep only when SPARK_HOME is undefined 54fdf0d [Lee moon soo] Separate spark-dependency into submodule (cherry picked from commit 5de01c6) Signed-off-by: Lee moon soo <[email protected]>
Zeppelin currently embeds all spark dependencies under interpreter/spark and loading them on runtime.
Which is useful because of user can try Zeppelin + Spark with local mode without installation and configuration of spark.
However, when user has existing spark and hadoop installation, it'll be really helpful to just pointing them instead of build zeppelin with specific version of spark and hadoop combination.
This PR implements ability to use external spark and hadoop installation, by doing
This patch makes Zeppelin binary independent from spark version. Once Zeppelin is been built, SPARK_HOME can point any version of spark.