Skip to content

Commit ac01f2b

Browse files
authored
Merge pull request #1 from AhyoungRyu/spark_doc_fix/ahyoung
Improve spark.md
2 parents 40d4b11 + 5fa523f commit ac01f2b

File tree

1 file changed

+40
-35
lines changed

1 file changed

+40
-35
lines changed

docs/interpreter/spark.md

Lines changed: 40 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: page
33
title: "Apache Spark Interpreter for Apache Zeppelin"
4-
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
4+
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
55
group: interpreter
66
---
77
<!--
@@ -25,9 +25,8 @@ limitations under the License.
2525

2626
## Overview
2727
[Apache Spark](http://spark.apache.org) is a fast and general-purpose cluster computing system.
28-
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs
29-
Apache Spark is supported in Zeppelin with
30-
Spark Interpreter group, which consists of five interpreters.
28+
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
29+
Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters.
3130

3231
<table class="table-configuration">
3332
<tr>
@@ -38,12 +37,12 @@ Spark Interpreter group, which consists of five interpreters.
3837
<tr>
3938
<td>%spark</td>
4039
<td>SparkInterpreter</td>
41-
<td>Creates a SparkContext and provides a scala environment</td>
40+
<td>Creates a SparkContext and provides a Scala environment</td>
4241
</tr>
4342
<tr>
4443
<td>%spark.pyspark</td>
4544
<td>PySparkInterpreter</td>
46-
<td>Provides a python environment</td>
45+
<td>Provides a Python environment</td>
4746
</tr>
4847
<tr>
4948
<td>%spark.r</td>
@@ -139,53 +138,55 @@ You can also set other Spark properties which are not listed in the table. For a
139138
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps.
140139

141140
### 1. Export SPARK_HOME
142-
In **conf/zeppelin-env.sh**, export `SPARK_HOME` environment variable with your Spark installation path.
141+
In `conf/zeppelin-env.sh`, export `SPARK_HOME` environment variable with your Spark installation path.
143142

144-
for example
143+
For example,
145144

146145
```bash
147146
export SPARK_HOME=/usr/lib/spark
148147
```
149148

150-
You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
149+
You can optionally export `HADOOP_CONF_DIR` and `SPARK_SUBMIT_OPTIONS`
151150

152151
```bash
153152
export HADOOP_CONF_DIR=/usr/lib/hadoop
154153
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
155154
```
156155

157-
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. For more details please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems)
156+
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. Please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems) for the details.
158157

159158
### 2. Set master in Interpreter menu
160159
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
161160

162-
for example,
161+
For example,
163162

164163
* **local[*]** in local mode
165164
* **spark://master:7077** in standalone cluster
166165
* **yarn-client** in Yarn client mode
167166
* **mesos://host:5050** in Mesos cluster
168167

169-
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way. (Zeppelin 0.5.6-incubating release works up to Spark 1.6.1 )
168+
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way.
169+
For the further information about Spark & Zeppelin version compatibility, please refer to "Available Interpreters" section in [Zeppelin download page](https://zeppelin.apache.org/download.html).
170170

171171
> Note that without exporting `SPARK_HOME`, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
172172
173173
## SparkContext, SQLContext, SparkSession, ZeppelinContext
174-
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
175-
Staring from 0.6.1 SparkSession is available as variable 'spark' when you are using Spark 2.x.
174+
SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names `sc`, `sqlContext` and `z`, respectively, in Scala, Python and R environments.
175+
Staring from 0.6.1 SparkSession is available as variable `spark` when you are using Spark 2.x.
176176

177-
> Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
177+
> Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance.
178178
179179
<a name="dependencyloading"> </a>
180180

181181
## Dependency Management
182-
There are two ways to load external library in spark interpreter. First is using Interpreter setting menu and second is loading Spark properties.
182+
There are two ways to load external libraries in Spark interpreter. First is using interpreter setting menu and second is loading Spark properties.
183183

184184
### 1. Setting Dependencies via Interpreter Setting
185185
Please see [Dependency Management](../manual/dependencymanagement.html) for the details.
186186

187187
### 2. Loading Spark Properties
188-
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
188+
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations.
189+
The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properties that user can set to distribute libraries are:
189190

190191
<table class="table-configuration">
191192
<tr>
@@ -201,7 +202,7 @@ Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit`
201202
<tr>
202203
<td>spark.jars.packages</td>
203204
<td>--packages</td>
204-
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
205+
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be <code>groupId:artifactId:version</code>.</td>
205206
</tr>
206207
<tr>
207208
<td>spark.files</td>
@@ -212,28 +213,32 @@ Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit`
212213

213214
Here are few examples:
214215

215-
* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh
216+
* `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`
216217

218+
```bash
217219
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"
220+
```
221+
222+
* `SPARK_HOME/conf/spark-defaults.conf`
218223

219-
* SPARK_HOME/conf/spark-defaults.conf
220-
224+
```
221225
spark.jars /path/mylib1.jar,/path/mylib2.jar
222226
spark.jars.packages com.databricks:spark-csv_2.10:1.2.0
223227
spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
228+
```
224229

225230
### 3. Dynamic Dependency Loading via %spark.dep interpreter
226231
> Note: `%spark.dep` interpreter is deprecated since v0.6.0.
227-
`%spark.dep` interpreter load libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter so we recommend you to use first option instead.
232+
`%spark.dep` interpreter loads libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter. So we recommend you to use the first option instead.
228233

229234
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%spark.dep` interpreter.
230235

231-
* Load libraries recursively from Maven repository
236+
* Load libraries recursively from maven repository
232237
* Load libraries from local filesystem
233238
* Add additional maven repository
234239
* Automatically add libraries to SparkCluster (You can turn off)
235240

236-
Dep interpreter leverages scala environment. So you can write any Scala code here.
241+
Dep interpreter leverages Scala environment. So you can write any Scala code here.
237242
Note that `%spark.dep` interpreter should be used before `%spark`, `%spark.pyspark`, `%spark.sql`.
238243

239244
Here's usages.
@@ -273,11 +278,11 @@ z.load("groupId:artifactId:version").local()
273278
```
274279

275280
## ZeppelinContext
276-
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.
281+
Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
277282

278283
### Object Exchange
279-
ZeppelinContext extends map and it's shared between scala, python environment.
280-
So you can put some object from scala and read it from python, vise versa.
284+
`ZeppelinContext` extends map and it's shared between Scala and Python environment.
285+
So you can put some objects from Scala and read it from Python, vice versa.
281286

282287
<div class="codetabs">
283288
<div data-lang="scala" markdown="1">
@@ -303,8 +308,8 @@ myObject = z.get("objName")
303308

304309
### Form Creation
305310

306-
ZeppelinContext provides functions for creating forms.
307-
In scala and python environments, you can create forms programmatically.
311+
`ZeppelinContext` provides functions for creating forms.
312+
In Scala and Python environments, you can create forms programmatically.
308313
<div class="codetabs">
309314
<div data-lang="scala" markdown="1">
310315

@@ -350,7 +355,7 @@ z.select("formName", [("option1", "option1DisplayName"),
350355

351356
In sql environment, you can create form in simple template.
352357

353-
```
358+
```sql
354359
%spark.sql
355360
select * from ${table=defaultTableName} where text like '%${search}%'
356361
```
@@ -360,7 +365,7 @@ To learn more about dynamic form, checkout [Dynamic Form](../manual/dynamicform.
360365

361366
## Interpreter setting option
362367

363-
Interpreter setting can choose one of 'shared', 'scoped', 'isolated' option. Spark interpreter creates separate scala compiler per each notebook but share a single SparkContext in 'scoped' mode (experimental). It creates separate SparkContext per each notebook in 'isolated' mode.
368+
You can choose one of `shared`, `scoped` and `isolated` options wheh you configure Spark interpreter. Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in `scoped` mode (experimental). It creates separated SparkContext per each notebook in `isolated` mode.
364369

365370

366371
## Setting up Zeppelin with Kerberos
@@ -373,14 +378,14 @@ Logical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark o
373378
1. On the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.
374379
This is to make the server communicate with KDC.
375380

376-
2. Set SPARK\_HOME in `[ZEPPELIN\_HOME]/conf/zeppelin-env.sh` to use spark-submit
377-
(Additionally, you might have to set `export HADOOP\_CONF\_DIR=/etc/hadoop/conf`)
381+
2. Set `SPARK_HOME` in `[ZEPPELIN_HOME]/conf/zeppelin-env.sh` to use spark-submit
382+
(Additionally, you might have to set `export HADOOP_CONF_DIR=/etc/hadoop/conf`)
378383

379-
3. Add the two properties below to spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384+
3. Add the two properties below to Spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
380385

381386
spark.yarn.principal
382387
spark.yarn.keytab
383388

384-
> **NOTE:** If you do not have access to the above spark-defaults.conf file, optionally, you may add the lines to the Spark Interpreter through the Interpreter tab in the Zeppelin UI.
389+
> **NOTE:** If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.
385390
386391
4. That's it. Play with Zeppelin!

0 commit comments

Comments
 (0)