You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: "Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs."
5
5
group: manual
6
6
---
7
7
{% include JB/setup %}
8
8
9
9
10
-
## Pig nterpreter for Apache Zeppelin
10
+
# Pig Interpreter for Apache Zeppelin
11
+
12
+
<divid="toc"></div>
13
+
14
+
## Overview
11
15
[Apache Pig](https://pig.apache.org/) is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
12
16
13
17
## Supported interpreter type
14
-
- %pig.script (default) - All the pig script can run in the type of interpreter, and display type if plain text.
15
-
- %pig.query - Almost the same as %pig.script. the only difference is that you don't need to add alias in the last statement. And the display type is table.
16
-
18
+
-`%pig.script` (default)
19
+
20
+
All the pig script can run in the type of interpreter, and display type if plain text.
21
+
22
+
-`%pig.query`
23
+
24
+
Almost the same as `%pig.script`. The only difference is that you don't need to add alias in the last statement. And the display type is table.
17
25
18
26
## Supported runtime mode
19
27
- Local
20
28
- MapReduce
21
29
- Tez (Only Tez 0.7 is supported)
22
30
31
+
## How to use
32
+
23
33
### How to setup Pig
24
34
25
35
- Local Mode
26
-
Nothing needs to be done for local mode
36
+
37
+
Nothing needs to be done for local mode
27
38
28
39
- MapReduce Mode
29
-
HADOOP_CONF_DIR needs to be specified in `zeppelin-env.sh`
40
+
41
+
HADOOP\_CONF\_DIR needs to be specified in `ZEPPELIN_HOME/conf/zeppelin-env.sh`.
30
42
31
43
- Tez Mode
32
-
HADOOP_CONF_DIR and TEZ_CONF_DIR needs to be specified in `zeppelin-env.sh`
33
44
34
-
### How to configure interpreter
45
+
HADOOP\_CONF\_DIR and TEZ\_CONF\_DIR needs to be specified in `ZEPPELIN_HOME/conf/zeppelin-env.sh`.
35
46
36
-
At the Interpreters menu, you have to create a new Pig interpreter and provide next properties:
47
+
### How to configure interpreter
37
48
49
+
At the Interpreters menu, you have to create a new Pig interpreter. Pig interpreter has below properties by default.
38
50
39
51
<tableclass="table-configuration">
40
52
<tr>
@@ -50,18 +62,18 @@ At the Interpreters menu, you have to create a new Pig interpreter and provide n
50
62
<tr>
51
63
<td>zeppelin.pig.includeJobStats</td>
52
64
<td>false</td>
53
-
<td>whether display jobStats info in %pig</td>
65
+
<td>whether display jobStats info in <code>%pig</code></td>
54
66
</tr>
55
67
<tr>
56
68
<td>zeppelin.pig.maxResult</td>
57
69
<td>20</td>
58
-
<td>max row number displayed in %pig.query</td>
70
+
<td>max row number displayed in <code>%pig.query</code></td>
59
71
</tr>
60
72
</table>
61
73
62
-
### How to use
74
+
### Example
63
75
64
-
**pig**
76
+
##### pig
65
77
66
78
```
67
79
%pig
@@ -72,7 +84,8 @@ c = foreach b generate COUNT($1);
72
84
dump c;
73
85
```
74
86
75
-
**pig.query**
87
+
##### pig.query
88
+
76
89
```
77
90
%pig.query
78
91
@@ -81,5 +94,4 @@ c = group b by Category;
81
94
foreach c generate group as category, COUNT($1) as count;
82
95
```
83
96
84
-
85
-
Data is shared between %pig and %pig.query, so that you can do some common work in %pig, and do different kinds of query based on the data of %pig.
97
+
Data is shared between `%pig` and `%pig.query`, so that you can do some common work in `%pig`, and do different kinds of query based on the data of `%pig`.
0 commit comments