Skip to content

Commit 69cc1a0

Browse files
author
astroshim
committed
Merge branch 'master' into ZEPPELIN-1446
2 parents fada36b + 1e8559e commit 69cc1a0

File tree

15 files changed

+194
-110
lines changed

15 files changed

+194
-110
lines changed

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Apache Zeppelin documentation
22

3-
This README will walk you through building the documentation of Apache Zeppelin. The documentation is included here with Apache Zeppelin source code. The online documentation at [https://zeppelin.apache.org/docs/<ZEPPELIN_VERSION>](https://zeppelin.apache.org/docs/latest) is also generated from the files found in here.
3+
This README will walk you through building the documentation of Apache Zeppelin. The documentation is included here with Apache Zeppelin source code. The online documentation at [https://zeppelin.apache.org/docs/<ZEPPELIN_VERSION>](https://zeppelin.apache.org/docs/latest/) is also generated from the files found in here.
44

55
## Build documentation
66
Zeppelin is using [Jekyll](https://jekyllrb.com/) which is a static site generator and [Github Pages](https://pages.github.com/) as a site publisher. For the more details, see [help.github.com/articles/about-github-pages-and-jekyll/](https://help.github.com/articles/about-github-pages-and-jekyll/).

docs/interpreter/elasticsearch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ delete /index/type/id
243243
```
244244
245245
### Apply Zeppelin Dynamic Forms
246-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
246+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
247247
248248
```bash
249249
%elasticsearch

docs/interpreter/hive.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ select * from my_table;
151151
You can also run multiple queries up to 10 by default. Changing these settings is not implemented yet.
152152

153153
### Apply Zeppelin Dynamic Forms
154-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
154+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
155155

156156
```sql
157157
%hive

docs/interpreter/livy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ When Zeppelin server is running with authentication enabled, then this interpret
174174

175175

176176
## Apply Zeppelin Dynamic Forms
177-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html). You can use both the `text input` and `select form` parameterization features.
177+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html). You can use both the `text input` and `select form` parameterization features.
178178

179179
```
180180
%livy.pyspark

docs/interpreter/markdown.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,9 @@ In Zeppelin notebook, you can use ` %md ` in the beginning of a paragraph to inv
3030

3131
In Zeppelin, Markdown interpreter is enabled by default.
3232

33-
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-interpreter-setting.png" width="60%" />
33+
<img src="../assets/themes/zeppelin/img/docs-img/markdown-interpreter-setting.png" width="60%" />
3434

3535
## Example
3636
The following example demonstrates the basic usage of Markdown in a Zeppelin notebook.
3737

38-
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/markdown-example.png" width="70%" />
38+
<img src="../assets/themes/zeppelin/img/docs-img/markdown-example.png" width="70%" />

docs/interpreter/python.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ z.show(plt, height='150px', fmt='svg')
108108

109109

110110
## Pandas integration
111-
Apache Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
111+
Apache Zeppelin [Table Display System](../displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
112112

113113
Example:
114114

@@ -120,7 +120,7 @@ z.show(rates)
120120

121121
## SQL over Pandas DataFrames
122122

123-
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
123+
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System](../displaysystem/basicdisplaysystem.html#table).
124124

125125
**Pre-requests**
126126

docs/interpreter/shell.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,6 @@ At the "Interpreters" menu in Zeppelin dropdown menu, you can set the property v
6363
## Example
6464
The following example demonstrates the basic usage of Shell in a Zeppelin notebook.
6565

66-
<img src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/shell-example.png" />
66+
<img src="../assets/themes/zeppelin/img/docs-img/shell-example.png" />
6767

6868
If you need further information about **Zeppelin Interpreter Setting** for using Shell interpreter, please read [What is interpreter setting?](../manual/interpreters.html#what-is-interpreter-setting) section first.

docs/interpreter/spark.md

Lines changed: 53 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
layout: page
33
title: "Apache Spark Interpreter for Apache Zeppelin"
4-
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
4+
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
55
group: interpreter
66
---
77
<!--
@@ -25,9 +25,8 @@ limitations under the License.
2525

2626
## Overview
2727
[Apache Spark](http://spark.apache.org) is a fast and general-purpose cluster computing system.
28-
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs
29-
Apache Spark is supported in Zeppelin with
30-
Spark Interpreter group, which consists of five interpreters.
28+
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
29+
Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters.
3130

3231
<table class="table-configuration">
3332
<tr>
@@ -38,25 +37,25 @@ Spark Interpreter group, which consists of five interpreters.
3837
<tr>
3938
<td>%spark</td>
4039
<td>SparkInterpreter</td>
41-
<td>Creates a SparkContext and provides a scala environment</td>
40+
<td>Creates a SparkContext and provides a Scala environment</td>
4241
</tr>
4342
<tr>
44-
<td>%pyspark</td>
43+
<td>%spark.pyspark</td>
4544
<td>PySparkInterpreter</td>
46-
<td>Provides a python environment</td>
45+
<td>Provides a Python environment</td>
4746
</tr>
4847
<tr>
49-
<td>%r</td>
48+
<td>%spark.r</td>
5049
<td>SparkRInterpreter</td>
5150
<td>Provides an R environment with SparkR support</td>
5251
</tr>
5352
<tr>
54-
<td>%sql</td>
53+
<td>%spark.sql</td>
5554
<td>SparkSQLInterpreter</td>
5655
<td>Provides a SQL environment</td>
5756
</tr>
5857
<tr>
59-
<td>%dep</td>
58+
<td>%spark.dep</td>
6059
<td>DepInterpreter</td>
6160
<td>Dependency loader</td>
6261
</tr>
@@ -139,111 +138,113 @@ You can also set other Spark properties which are not listed in the table. For a
139138
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps.
140139

141140
### 1. Export SPARK_HOME
142-
In **conf/zeppelin-env.sh**, export `SPARK_HOME` environment variable with your Spark installation path.
141+
In `conf/zeppelin-env.sh`, export `SPARK_HOME` environment variable with your Spark installation path.
143142

144-
for example
143+
For example,
145144

146145
```bash
147146
export SPARK_HOME=/usr/lib/spark
148147
```
149148

150-
You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
149+
You can optionally export `HADOOP_CONF_DIR` and `SPARK_SUBMIT_OPTIONS`
151150

152151
```bash
153152
export HADOOP_CONF_DIR=/usr/lib/hadoop
154153
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"
155154
```
156155

157-
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. For more details please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems)
156+
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. Please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems) for the details.
158157

159158
### 2. Set master in Interpreter menu
160159
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
161160

162-
for example,
161+
For example,
163162

164163
* **local[*]** in local mode
165164
* **spark://master:7077** in standalone cluster
166165
* **yarn-client** in Yarn client mode
167166
* **mesos://host:5050** in Mesos cluster
168167

169-
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way. (Zeppelin 0.5.6-incubating release works up to Spark 1.6.1 )
168+
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way.
169+
For the further information about Spark & Zeppelin version compatibility, please refer to "Available Interpreters" section in [Zeppelin download page](https://zeppelin.apache.org/download.html).
170170

171171
> Note that without exporting `SPARK_HOME`, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
172172
173-
## SparkContext, SQLContext, ZeppelinContext
174-
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
173+
## SparkContext, SQLContext, SparkSession, ZeppelinContext
174+
SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names `sc`, `sqlContext` and `z`, respectively, in Scala, Python and R environments.
175+
Staring from 0.6.1 SparkSession is available as variable `spark` when you are using Spark 2.x.
175176

176-
> Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
177+
> Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance.
177178
178179
<a name="dependencyloading"> </a>
179180

180181
## Dependency Management
181-
There are two ways to load external library in spark interpreter. First is using Interpreter setting menu and second is loading Spark properties.
182+
There are two ways to load external libraries in Spark interpreter. First is using interpreter setting menu and second is loading Spark properties.
182183

183184
### 1. Setting Dependencies via Interpreter Setting
184185
Please see [Dependency Management](../manual/dependencymanagement.html) for the details.
185186

186187
### 2. Loading Spark Properties
187-
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
188+
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations.
189+
The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properties that user can set to distribute libraries are:
188190

189191
<table class="table-configuration">
190192
<tr>
191193
<th>spark-defaults.conf</th>
192194
<th>SPARK_SUBMIT_OPTIONS</th>
193-
<th>Applicable Interpreter</th>
194195
<th>Description</th>
195196
</tr>
196197
<tr>
197198
<td>spark.jars</td>
198199
<td>--jars</td>
199-
<td>%spark</td>
200200
<td>Comma-separated list of local jars to include on the driver and executor classpaths.</td>
201201
</tr>
202202
<tr>
203203
<td>spark.jars.packages</td>
204204
<td>--packages</td>
205-
<td>%spark</td>
206-
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
205+
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be <code>groupId:artifactId:version</code>.</td>
207206
</tr>
208207
<tr>
209208
<td>spark.files</td>
210209
<td>--files</td>
211-
<td>%pyspark</td>
212210
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
213211
</tr>
214212
</table>
215-
> Note that adding jar to pyspark is only availabe via `%dep` interpreter at the moment.
216213

217214
Here are few examples:
218215

219-
* SPARK\_SUBMIT\_OPTIONS in conf/zeppelin-env.sh
216+
* `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`
220217

218+
```bash
221219
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar --files /path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"
220+
```
221+
222+
* `SPARK_HOME/conf/spark-defaults.conf`
222223

223-
* SPARK_HOME/conf/spark-defaults.conf
224-
224+
```
225225
spark.jars /path/mylib1.jar,/path/mylib2.jar
226226
spark.jars.packages com.databricks:spark-csv_2.10:1.2.0
227227
spark.files /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip
228+
```
228229

229-
### 3. Dynamic Dependency Loading via %dep interpreter
230-
> Note: `%dep` interpreter is deprecated since v0.6.0.
231-
`%dep` interpreter load libraries to `%spark` and `%pyspark` but not to `%spark.sql` interpreter so we recommend you to use first option instead.
230+
### 3. Dynamic Dependency Loading via %spark.dep interpreter
231+
> Note: `%spark.dep` interpreter is deprecated since v0.6.0.
232+
`%spark.dep` interpreter loads libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter. So we recommend you to use the first option instead.
232233

233-
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%dep` interpreter.
234+
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%spark.dep` interpreter.
234235

235-
* Load libraries recursively from Maven repository
236+
* Load libraries recursively from maven repository
236237
* Load libraries from local filesystem
237238
* Add additional maven repository
238239
* Automatically add libraries to SparkCluster (You can turn off)
239240

240-
Dep interpreter leverages scala environment. So you can write any Scala code here.
241-
Note that `%dep` interpreter should be used before `%spark`, `%pyspark`, `%sql`.
241+
Dep interpreter leverages Scala environment. So you can write any Scala code here.
242+
Note that `%spark.dep` interpreter should be used before `%spark`, `%spark.pyspark`, `%spark.sql`.
242243

243244
Here's usages.
244245

245246
```scala
246-
%dep
247+
%spark.dep
247248
z.reset() // clean up previously added artifact and repository
248249

249250
// add maven repository
@@ -277,11 +278,11 @@ z.load("groupId:artifactId:version").local()
277278
```
278279

279280
## ZeppelinContext
280-
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.
281+
Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
281282

282283
### Object Exchange
283-
ZeppelinContext extends map and it's shared between scala, python environment.
284-
So you can put some object from scala and read it from python, vise versa.
284+
`ZeppelinContext` extends map and it's shared between Scala and Python environment.
285+
So you can put some objects from Scala and read it from Python, vice versa.
285286

286287
<div class="codetabs">
287288
<div data-lang="scala" markdown="1">
@@ -298,7 +299,7 @@ z.put("objName", myObject)
298299

299300
{% highlight python %}
300301
# Get object from python
301-
%pyspark
302+
%spark.pyspark
302303
myObject = z.get("objName")
303304
{% endhighlight %}
304305

@@ -307,8 +308,8 @@ myObject = z.get("objName")
307308

308309
### Form Creation
309310

310-
ZeppelinContext provides functions for creating forms.
311-
In scala and python environments, you can create forms programmatically.
311+
`ZeppelinContext` provides functions for creating forms.
312+
In Scala and Python environments, you can create forms programmatically.
312313
<div class="codetabs">
313314
<div data-lang="scala" markdown="1">
314315

@@ -333,7 +334,7 @@ z.select("formName", "option1", Seq(("option1", "option1DisplayName"),
333334
<div data-lang="python" markdown="1">
334335

335336
{% highlight python %}
336-
%pyspark
337+
%spark.pyspark
337338
# Create text input form
338339
z.input("formName")
339340

@@ -354,8 +355,8 @@ z.select("formName", [("option1", "option1DisplayName"),
354355

355356
In sql environment, you can create form in simple template.
356357

357-
```
358-
%sql
358+
```sql
359+
%spark.sql
359360
select * from ${table=defaultTableName} where text like '%${search}%'
360361
```
361362

@@ -364,7 +365,7 @@ To learn more about dynamic form, checkout [Dynamic Form](../manual/dynamicform.
364365

365366
## Interpreter setting option
366367

367-
Interpreter setting can choose one of 'shared', 'scoped', 'isolated' option. Spark interpreter creates separate scala compiler per each notebook but share a single SparkContext in 'scoped' mode (experimental). It creates separate SparkContext per each notebook in 'isolated' mode.
368+
You can choose one of `shared`, `scoped` and `isolated` options wheh you configure Spark interpreter. Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in `scoped` mode (experimental). It creates separated SparkContext per each notebook in `isolated` mode.
368369

369370

370371
## Setting up Zeppelin with Kerberos
@@ -377,14 +378,14 @@ Logical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark o
377378
1. On the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.
378379
This is to make the server communicate with KDC.
379380

380-
2. Set SPARK\_HOME in `[ZEPPELIN\_HOME]/conf/zeppelin-env.sh` to use spark-submit
381-
(Additionally, you might have to set `export HADOOP\_CONF\_DIR=/etc/hadoop/conf`)
381+
2. Set `SPARK_HOME` in `[ZEPPELIN_HOME]/conf/zeppelin-env.sh` to use spark-submit
382+
(Additionally, you might have to set `export HADOOP_CONF_DIR=/etc/hadoop/conf`)
382383

383-
3. Add the two properties below to spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384+
3. Add the two properties below to Spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384385

385386
spark.yarn.principal
386387
spark.yarn.keytab
387388

388-
> **NOTE:** If you do not have access to the above spark-defaults.conf file, optionally, you may add the lines to the Spark Interpreter through the Interpreter tab in the Zeppelin UI.
389+
> **NOTE:** If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.
389390
390391
4. That's it. Play with Zeppelin!

0 commit comments

Comments
 (0)