Skip to content

Commit 05a3b9b

Browse files
committed
add pig.md
1 parent a09a7f7 commit 05a3b9b

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed

docs/interpreter/pig.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
layout: page
3+
title: "Pig Interpreter"
4+
description: ""
5+
group: manual
6+
---
7+
{% include JB/setup %}
8+
9+
10+
## Pig nterpreter for Apache Zeppelin
11+
[Apache Pig](https://pig.apache.org/) is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
12+
13+
## Supported interpreter type
14+
- %pig.script (default) - All the pig script can run in the type of interpreter, and display type if plain text.
15+
- %pig.query - Almost the same as %pig.script. the only difference is that you don't need to add alias in the last statement. And the display type is table.
16+
17+
18+
## Supported runtime mode
19+
- Local
20+
- MapReduce
21+
- Tez (Only Tez 0.7 is supported)
22+
23+
### How to setup Pig
24+
25+
- Local Mode
26+
Nothing needs to be done for local mode
27+
28+
- MapReduce Mode
29+
HADOOP_CONF_DIR needs to be specified in `zeppelin-env.sh`
30+
31+
- Tez Mode
32+
HADOOP_CONF_DIR and TEZ_CONF_DIR needs to be specified in `zeppelin-env.sh`
33+
34+
### How to configure interpreter
35+
36+
At the Interpreters menu, you have to create a new Pig interpreter and provide next properties:
37+
38+
39+
<table class="table-configuration">
40+
<tr>
41+
<th>Property</th>
42+
<th>Default</th>
43+
<th>Description</th>
44+
</tr>
45+
<tr>
46+
<td>zeppelin.pig.execType</td>
47+
<td>mapreduce</td>
48+
<td>Execution mode for pig runtime. Local | mapreduce | tez </td>
49+
</tr>
50+
<tr>
51+
<td>zeppelin.pig.includeJobStats</td>
52+
<td>false</td>
53+
<td>whether display jobStats info in %pig</td>
54+
</tr>
55+
<tr>
56+
<td>zeppelin.pig.maxResult</td>
57+
<td>20</td>
58+
<td>max row number displayed in %pig.query</td>
59+
</tr>
60+
</table>
61+
62+
### How to use
63+
64+
**pig**
65+
66+
```
67+
%pig
68+
69+
raw_data = load 'dataset/sf_crime/train.csv' using PigStorage(',') as (Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y);
70+
b = group raw_data all;
71+
c = foreach b generate COUNT($1);
72+
dump c;
73+
```
74+
75+
**pig.query**
76+
```
77+
b = foreach raw_data generate Category;
78+
c = group b by Category;
79+
foreach c generate group as category, COUNT($1) as count;
80+
```
81+
82+
83+
Data is shared between %pig and %pig.query, so that you can do some common work in %pig, and do different kinds of query based on the data of %pig.

0 commit comments

Comments
 (0)