You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Apache Zeppelin documentation
2
2
3
-
This README will walk you through building the documentation of Apache Zeppelin. The documentation is included here with Apache Zeppelin source code. The online documentation at [https://zeppelin.apache.org/docs/<ZEPPELIN_VERSION>](https://zeppelin.apache.org/docs/latest) is also generated from the files found in here.
3
+
This README will walk you through building the documentation of Apache Zeppelin. The documentation is included here with Apache Zeppelin source code. The online documentation at [https://zeppelin.apache.org/docs/<ZEPPELIN_VERSION>](https://zeppelin.apache.org/docs/latest/) is also generated from the files found in here.
4
4
5
5
## Build documentation
6
6
Zeppelin is using [Jekyll](https://jekyllrb.com/) which is a static site generator and [Github Pages](https://pages.github.com/) as a site publisher. For the more details, see [help.github.com/articles/about-github-pages-and-jekyll/](https://help.github.com/articles/about-github-pages-and-jekyll/).
Copy file name to clipboardExpand all lines: docs/interpreter/elasticsearch.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -243,7 +243,7 @@ delete /index/type/id
243
243
```
244
244
245
245
### Apply Zeppelin Dynamic Forms
246
-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
246
+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
Copy file name to clipboardExpand all lines: docs/interpreter/hive.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -151,7 +151,7 @@ select * from my_table;
151
151
You can also run multiple queries up to 10 by default. Changing these settings is not implemented yet.
152
152
153
153
### Apply Zeppelin Dynamic Forms
154
-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
154
+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html) inside your queries. You can use both the `text input` and `select form` parameterization features.
Copy file name to clipboardExpand all lines: docs/interpreter/livy.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -174,7 +174,7 @@ When Zeppelin server is running with authentication enabled, then this interpret
174
174
175
175
176
176
## Apply Zeppelin Dynamic Forms
177
-
You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/manual/dynamicform.html). You can use both the `text input` and `select form` parameterization features.
177
+
You can leverage [Zeppelin Dynamic Form](../manual/dynamicform.html). You can use both the `text input` and `select form` parameterization features.
Apache Zeppelin [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
111
+
Apache Zeppelin [Table Display System](../displaysystem/basicdisplaysystem.html#table) provides built-in data visualization capabilities. Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration).
112
112
113
113
Example:
114
114
@@ -120,7 +120,7 @@ z.show(rates)
120
120
121
121
## SQL over Pandas DataFrames
122
122
123
-
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System]({{BASE_PATH}}/displaysystem/basicdisplaysystem.html#table).
123
+
There is a convenience `%python.sql` interpreter that matches Apache Spark experience in Zeppelin and enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System](../displaysystem/basicdisplaysystem.html#table).
If you need further information about **Zeppelin Interpreter Setting** for using Shell interpreter, please read [What is interpreter setting?](../manual/interpreters.html#what-is-interpreter-setting) section first.
Copy file name to clipboardExpand all lines: docs/interpreter/spark.md
+53-52Lines changed: 53 additions & 52 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
layout: page
3
3
title: "Apache Spark Interpreter for Apache Zeppelin"
4
-
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
4
+
description: "Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution engine."
5
5
group: interpreter
6
6
---
7
7
<!--
@@ -25,9 +25,8 @@ limitations under the License.
25
25
26
26
## Overview
27
27
[Apache Spark](http://spark.apache.org) is a fast and general-purpose cluster computing system.
28
-
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs
29
-
Apache Spark is supported in Zeppelin with
30
-
Spark Interpreter group, which consists of five interpreters.
28
+
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
29
+
Apache Spark is supported in Zeppelin with Spark interpreter group which consists of below five interpreters.
31
30
32
31
<tableclass="table-configuration">
33
32
<tr>
@@ -38,25 +37,25 @@ Spark Interpreter group, which consists of five interpreters.
38
37
<tr>
39
38
<td>%spark</td>
40
39
<td>SparkInterpreter</td>
41
-
<td>Creates a SparkContext and provides a scala environment</td>
40
+
<td>Creates a SparkContext and provides a Scala environment</td>
42
41
</tr>
43
42
<tr>
44
-
<td>%pyspark</td>
43
+
<td>%spark.pyspark</td>
45
44
<td>PySparkInterpreter</td>
46
-
<td>Provides a python environment</td>
45
+
<td>Provides a Python environment</td>
47
46
</tr>
48
47
<tr>
49
-
<td>%r</td>
48
+
<td>%spark.r</td>
50
49
<td>SparkRInterpreter</td>
51
50
<td>Provides an R environment with SparkR support</td>
52
51
</tr>
53
52
<tr>
54
-
<td>%sql</td>
53
+
<td>%spark.sql</td>
55
54
<td>SparkSQLInterpreter</td>
56
55
<td>Provides a SQL environment</td>
57
56
</tr>
58
57
<tr>
59
-
<td>%dep</td>
58
+
<td>%spark.dep</td>
60
59
<td>DepInterpreter</td>
61
60
<td>Dependency loader</td>
62
61
</tr>
@@ -139,111 +138,113 @@ You can also set other Spark properties which are not listed in the table. For a
139
138
Without any configuration, Spark interpreter works out of box in local mode. But if you want to connect to your Spark cluster, you'll need to follow below two simple steps.
140
139
141
140
### 1. Export SPARK_HOME
142
-
In **conf/zeppelin-env.sh**, export `SPARK_HOME` environment variable with your Spark installation path.
141
+
In `conf/zeppelin-env.sh`, export `SPARK_HOME` environment variable with your Spark installation path.
143
142
144
-
for example
143
+
For example,
145
144
146
145
```bash
147
146
export SPARK_HOME=/usr/lib/spark
148
147
```
149
148
150
-
You can optionally export HADOOP\_CONF\_DIR and SPARK\_SUBMIT\_OPTIONS
149
+
You can optionally export `HADOOP_CONF_DIR` and `SPARK_SUBMIT_OPTIONS`
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. For more details please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems)
156
+
For Windows, ensure you have `winutils.exe` in `%HADOOP_HOME%\bin`. Please see [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems) for the details.
158
157
159
158
### 2. Set master in Interpreter menu
160
159
After start Zeppelin, go to **Interpreter** menu and edit **master** property in your Spark interpreter setting. The value may vary depending on your Spark cluster deployment type.
161
160
162
-
for example,
161
+
For example,
163
162
164
163
***local[*]** in local mode
165
164
***spark://master:7077** in standalone cluster
166
165
***yarn-client** in Yarn client mode
167
166
***mesos://host:5050** in Mesos cluster
168
167
169
-
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way. (Zeppelin 0.5.6-incubating release works up to Spark 1.6.1 )
168
+
That's it. Zeppelin will work with any version of Spark and any deployment type without rebuilding Zeppelin in this way.
169
+
For the further information about Spark & Zeppelin version compatibility, please refer to "Available Interpreters" section in [Zeppelin download page](https://zeppelin.apache.org/download.html).
170
170
171
171
> Note that without exporting `SPARK_HOME`, it's running in local mode with included version of Spark. The included version may vary depending on the build profile.
172
172
173
-
## SparkContext, SQLContext, ZeppelinContext
174
-
SparkContext, SQLContext, ZeppelinContext are automatically created and exposed as variable names 'sc', 'sqlContext' and 'z', respectively, both in scala and python environments.
SparkContext, SQLContext and ZeppelinContext are automatically created and exposed as variable names `sc`, `sqlContext` and `z`, respectively, in Scala, Python and R environments.
175
+
Staring from 0.6.1 SparkSession is available as variable `spark` when you are using Spark 2.x.
175
176
176
-
> Note that scala / python environment shares the same SparkContext, SQLContext, ZeppelinContext instance.
177
+
> Note that Scala/Python/R environment shares the same SparkContext, SQLContext and ZeppelinContext instance.
177
178
178
179
<aname="dependencyloading"> </a>
179
180
180
181
## Dependency Management
181
-
There are two ways to load external library in spark interpreter. First is using Interpreter setting menu and second is loading Spark properties.
182
+
There are two ways to load external libraries in Spark interpreter. First is using interpreter setting menu and second is loading Spark properties.
182
183
183
184
### 1. Setting Dependencies via Interpreter Setting
184
185
Please see [Dependency Management](../manual/dependencymanagement.html) for the details.
185
186
186
187
### 2. Loading Spark Properties
187
-
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations. The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in conf/zeppelin-env.sh. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properites that user can set to distribute libraries are:
188
+
Once `SPARK_HOME` is set in `conf/zeppelin-env.sh`, Zeppelin uses `spark-submit` as spark interpreter runner. `spark-submit` supports two ways to load configurations.
189
+
The first is command line options such as --master and Zeppelin can pass these options to `spark-submit` by exporting `SPARK_SUBMIT_OPTIONS` in `conf/zeppelin-env.sh`. Second is reading configuration options from `SPARK_HOME/conf/spark-defaults.conf`. Spark properties that user can set to distribute libraries are:
188
190
189
191
<tableclass="table-configuration">
190
192
<tr>
191
193
<th>spark-defaults.conf</th>
192
194
<th>SPARK_SUBMIT_OPTIONS</th>
193
-
<th>Applicable Interpreter</th>
194
195
<th>Description</th>
195
196
</tr>
196
197
<tr>
197
198
<td>spark.jars</td>
198
199
<td>--jars</td>
199
-
<td>%spark</td>
200
200
<td>Comma-separated list of local jars to include on the driver and executor classpaths.</td>
201
201
</tr>
202
202
<tr>
203
203
<td>spark.jars.packages</td>
204
204
<td>--packages</td>
205
-
<td>%spark</td>
206
-
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.</td>
205
+
<td>Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be <code>groupId:artifactId:version</code>.</td>
207
206
</tr>
208
207
<tr>
209
208
<td>spark.files</td>
210
209
<td>--files</td>
211
-
<td>%pyspark</td>
212
210
<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
213
211
</tr>
214
212
</table>
215
-
> Note that adding jar to pyspark is only availabe via `%dep` interpreter at the moment.
### 3. Dynamic Dependency Loading via %dep interpreter
230
-
> Note: `%dep` interpreter is deprecated since v0.6.0.
231
-
`%dep` interpreter load libraries to `%spark` and `%pyspark` but not to `%spark.sql` interpreter so we recommend you to use first option instead.
230
+
### 3. Dynamic Dependency Loading via %spark.dep interpreter
231
+
> Note: `%spark.dep` interpreter is deprecated since v0.6.0.
232
+
`%spark.dep` interpreter loads libraries to `%spark` and `%spark.pyspark` but not to `%spark.sql` interpreter. So we recommend you to use the first option instead.
232
233
233
-
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%dep` interpreter.
234
+
When your code requires external library, instead of doing download/copy/restart Zeppelin, you can easily do following jobs using `%spark.dep` interpreter.
234
235
235
-
* Load libraries recursively from Maven repository
236
+
* Load libraries recursively from maven repository
236
237
* Load libraries from local filesystem
237
238
* Add additional maven repository
238
239
* Automatically add libraries to SparkCluster (You can turn off)
239
240
240
-
Dep interpreter leverages scala environment. So you can write any Scala code here.
241
-
Note that `%dep` interpreter should be used before `%spark`, `%pyspark`, `%sql`.
241
+
Dep interpreter leverages Scala environment. So you can write any Scala code here.
242
+
Note that `%spark.dep` interpreter should be used before `%spark`, `%spark.pyspark`, `%spark.sql`.
242
243
243
244
Here's usages.
244
245
245
246
```scala
246
-
%dep
247
+
%spark.dep
247
248
z.reset() // clean up previously added artifact and repository
Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility.
281
+
Zeppelin automatically injects `ZeppelinContext` as variable `z` in your Scala/Python environment. `ZeppelinContext` provides some additional functions and utilities.
281
282
282
283
### Object Exchange
283
-
ZeppelinContext extends map and it's shared between scala, python environment.
284
-
So you can put some object from scala and read it from python, vise versa.
284
+
`ZeppelinContext` extends map and it's shared between Scala and Python environment.
285
+
So you can put some objects from Scala and read it from Python, vice versa.
285
286
286
287
<divclass="codetabs">
287
288
<divdata-lang="scala"markdown="1">
@@ -298,7 +299,7 @@ z.put("objName", myObject)
298
299
299
300
{% highlight python %}
300
301
# Get object from python
301
-
%pyspark
302
+
%spark.pyspark
302
303
myObject = z.get("objName")
303
304
{% endhighlight %}
304
305
@@ -307,8 +308,8 @@ myObject = z.get("objName")
307
308
308
309
### Form Creation
309
310
310
-
ZeppelinContext provides functions for creating forms.
311
-
In scala and python environments, you can create forms programmatically.
311
+
`ZeppelinContext` provides functions for creating forms.
312
+
In Scala and Python environments, you can create forms programmatically.
@@ -364,7 +365,7 @@ To learn more about dynamic form, checkout [Dynamic Form](../manual/dynamicform.
364
365
365
366
## Interpreter setting option
366
367
367
-
Interpreter setting can choose one of 'shared', 'scoped', 'isolated' option. Spark interpreter creates separate scala compiler per each notebook but share a single SparkContext in 'scoped' mode (experimental). It creates separate SparkContext per each notebook in 'isolated' mode.
368
+
You can choose one of `shared`, `scoped` and `isolated` options wheh you configure Spark interpreter. Spark interpreter creates separated Scala compiler per each notebook but share a single SparkContext in `scoped` mode (experimental). It creates separated SparkContext per each notebook in `isolated` mode.
368
369
369
370
370
371
## Setting up Zeppelin with Kerberos
@@ -377,14 +378,14 @@ Logical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark o
377
378
1. On the server that Zeppelin is installed, install Kerberos client modules and configuration, krb5.conf.
378
379
This is to make the server communicate with KDC.
379
380
380
-
2. Set SPARK\_HOME in `[ZEPPELIN\_HOME]/conf/zeppelin-env.sh` to use spark-submit
381
-
(Additionally, you might have to set `export HADOOP\_CONF\_DIR=/etc/hadoop/conf`)
381
+
2. Set `SPARK_HOME` in `[ZEPPELIN_HOME]/conf/zeppelin-env.sh` to use spark-submit
382
+
(Additionally, you might have to set `export HADOOP_CONF_DIR=/etc/hadoop/conf`)
382
383
383
-
3. Add the two properties below to spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384
+
3. Add the two properties below to Spark configuration (`[SPARK_HOME]/conf/spark-defaults.conf`):
384
385
385
386
spark.yarn.principal
386
387
spark.yarn.keytab
387
388
388
-
> **NOTE:** If you do not have access to the above spark-defaults.conf file, optionally, you may add the lines to the Spark Interpreter through the Interpreter tab in the Zeppelin UI.
389
+
> **NOTE:** If you do not have permission to access for the above spark-defaults.conf file, optionally, you can add the above lines to the Spark Interpreter setting through the Interpreter tab in the Zeppelin UI.
0 commit comments