You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/interpreter/spark.md
+40-35Lines changed: 40 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
layout: page
3
3
title: "Apache Spark Interpreter for Apache Zeppelin"
4
-
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
4
+
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
5
5
group: interpreter
6
6
---
7
7
<!--
@@ -25,9 +25,8 @@ limitations under the License.
25
25
26
26
## Overview
27
27
[Apache Spark](http://spark.apache.org) is a fast and general-purpose cluster computing system.
28
-
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs
29
-
Apache Spark is supported in Zeppelin with
30
-
Spark Interpreter group, which consists of five interpreters.
28
+
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
29
+
Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters.
31
30
32
31
<tableclass="table-configuration">
33
32
<tr>
@@ -38,12 +37,12 @@ Spark Interpreter group, which consists of five interpreters.
38
37
<tr>
39
38
<td>%spark</td>
40
39
<td>SparkInterpreter</td>
41
-
<td>Creates a SparkContext and provides a scala environment</td>
40
+
<td>Creates a SparkContext and provides a Scala environment</td>
42
41
</tr>
43
42
<tr>
44
43
<td>%spark.pyspark</td>
45
44
<td>PySparkInterpreter</td>
46
-
<td>Provides a python environment</td>
45
+
<td>Provides a Python environment</td>
47
46
</tr>
48
47
<tr>
49
48
<td>%spark.r</td>
@@ -139,53 +138,55 @@ You can also set other Spark properties which are not listed in the table. For a
139
138
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps.
140
139
141
140
### 1. Export SPARK_HOME
142
-
In **conf/zeppelin-env.sh**, export `SPARK_HOME` environment variable with your Spark installation path.
141
+
In `conf/zeppelin-env.sh`, export `SPARK_HOME` environment variable with your Spark installation path.
143
142
144
-
for example
143
+
For example,
145
144
146
145
```bash
147
146
export SPARK_HOME=/usr/lib/spark
148
147
```
149
148
150
-
You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
149
+
You can optionally export `HADOOP_CONF_DIR` and `SPARK_SUBMIT_OPTIONS`
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. For more details please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems)
156
+
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. Please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems) for the details.
158
157
159
158
### 2. Set master in Interpreter menu
160
159
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
161
160
162
-
for example,
161
+
For example,
163
162
164
163
***local[*]** in local mode
165
164
***spark://master:7077** in standalone cluster
166
165
***yarn-client** in Yarn client mode
167
166
***mesos://host:5050** in Mesos cluster
168
167
169
-
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way. (Zeppelin 0.5.6-incubating release works up to Spark 1.6.1 )
168
+
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way.
169
+
For the further information about Spark & Zeppelin version compatibility, please refer to "Available Interpreters" section in [Zeppelin download page](https://zeppelin.apache.org/download.html).
170
170
171
171
> Note that without exporting `SPARK_HOME`, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
175
-
Staring from 0.6.1 SparkSession is available as variable 'spark' when you are using Spark 2.x.
174
+
SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names `sc`, `sqlContext` and `z`, respectively, in Scala, Python and R environments.
175
+
Staring from 0.6.1 SparkSession is available as variable `spark` when you are using Spark 2.x.
176
176
177
-
> Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
177
+
> Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance.
178
178
179
179
<aname="dependencyloading"> </a>
180
180
181
181
## Dependency Management
182
-
There are two ways to load external library in spark interpreter. First is using Interpreter setting menu and second is loading Spark properties.
182
+
There are two ways to load external libraries in Spark interpreter. First is using interpreter setting menu and second is loading Spark properties.
183
183
184
184
### 1. Setting Dependencies via Interpreter Setting
185
185
Please see [Dependency Management](../manual/dependencymanagement.html) for the details.
186
186
187
187
### 2. Loading Spark Properties
188
-
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
188
+
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations.
189
+
The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properties that user can set to distribute libraries are:
189
190
190
191
<tableclass="table-configuration">
191
192
<tr>
@@ -201,7 +202,7 @@ Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit`
201
202
<tr>
202
203
<td>spark.jars.packages</td>
203
204
<td>--packages</td>
204
-
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
205
+
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be <code>groupId:artifactId:version</code>.</td>
205
206
</tr>
206
207
<tr>
207
208
<td>spark.files</td>
@@ -212,28 +213,32 @@ Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit`
### 3. Dynamic Dependency Loading via %spark.dep interpreter
226
231
> Note: `%spark.dep` interpreter is deprecated since v0.6.0.
227
-
`%spark.dep` interpreter load libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter so we recommend you to use first option instead.
232
+
`%spark.dep` interpreter loads libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter. So we recommend you to use the first option instead.
228
233
229
234
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%spark.dep` interpreter.
230
235
231
-
* Load libraries recursively from Maven repository
236
+
* Load libraries recursively from maven repository
232
237
* Load libraries from local filesystem
233
238
* Add additional maven repository
234
239
* Automatically add libraries to SparkCluster (You can turn off)
235
240
236
-
Dep interpreter leverages scala environment. So you can write any Scala code here.
241
+
Dep interpreter leverages Scala environment. So you can write any Scala code here.
237
242
Note that `%spark.dep` interpreter should be used before `%spark`, `%spark.pyspark`, `%spark.sql`.
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.
281
+
Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
277
282
278
283
### Object Exchange
279
-
ZeppelinContext extends map and it's shared between scala, python environment.
280
-
So you can put some object from scala and read it from python, vise versa.
284
+
`ZeppelinContext` extends map and it's shared between Scala and Python environment.
285
+
So you can put some objects from Scala and read it from Python, vice versa.
281
286
282
287
<divclass="codetabs">
283
288
<divdata-lang="scala"markdown="1">
@@ -303,8 +308,8 @@ myObject = z.get("objName")
303
308
304
309
### Form Creation
305
310
306
-
ZeppelinContext provides functions for creating forms.
307
-
In scala and python environments, you can create forms programmatically.
311
+
`ZeppelinContext` provides functions for creating forms.
312
+
In Scala and Python environments, you can create forms programmatically.
@@ -360,7 +365,7 @@ To learn more about dynamic form, checkout [Dynamic Form](../manual/dynamicform.
360
365
361
366
## Interpreter setting option
362
367
363
-
Interpreter setting can choose one of 'shared', 'scoped', 'isolated' option. Spark interpreter creates separate scala compiler per each notebook but share a single SparkContext in 'scoped' mode (experimental). It creates separate SparkContext per each notebook in 'isolated' mode.
368
+
You can choose one of `shared`, `scoped` and `isolated` options wheh you configure Spark interpreter. Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in `scoped` mode (experimental). It creates separated SparkContext per each notebook in `isolated` mode.
364
369
365
370
366
371
## Setting up Zeppelin with Kerberos
@@ -373,14 +378,14 @@ Logical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark o
373
378
1. On the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.
374
379
This is to make the server communicate with KDC.
375
380
376
-
2. Set SPARK\_HOME in `[ZEPPELIN\_HOME]/conf/zeppelin-env.sh` to use spark-submit
377
-
(Additionally, you might have to set `export HADOOP\_CONF\_DIR=/etc/hadoop/conf`)
381
+
2. Set `SPARK_HOME` in `[ZEPPELIN_HOME]/conf/zeppelin-env.sh` to use spark-submit
382
+
(Additionally, you might have to set `export HADOOP_CONF_DIR=/etc/hadoop/conf`)
378
383
379
-
3. Add the two properties below to spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384
+
3. Add the two properties below to Spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
380
385
381
386
spark.yarn.principal
382
387
spark.yarn.keytab
383
388
384
-
> **NOTE:** If you do not have access to the above spark-defaults.conf file, optionally, you may add the lines to the Spark Interpreter through the Interpreter tab in the Zeppelin UI.
389
+
> **NOTE:** If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.
0 commit comments