0% found this document useful (0 votes)

17 views10 pages

171 - PDFsam - Programming Pig

Uploaded by

mitmak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views10 pages

171 - PDFsam - Programming Pig

Uploaded by

mitmak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Using partitions

Some types of storage partition their data, allowing you to read only the relevant sec-
tions for a given job. The LoadMetadata interface also provides methods for working
with partitions in your data. In order for Pig to request the relevant partitions, it must
know how the data is partitioned. Pig determines this by calling getPartitionKeys. If
this returns a null or the LoadMetadata interface is not implemented by your loader, Pig
will assume it needs to read the entire input.
Pig expects getPartitionKeys to return an array of strings, where each string represents
one field name. Those fields are the keys used to partition the data. Pig will look for a
filter statement immediately following the load statement that includes one or more
of these fields. If such a statement is found, it will be passed to setPartitionFilter. If
the filter includes both partition and nonpartition keys and it can be split,† Pig will
split it and pass just the partition-key-related expression to setPartitionFilter. As an
example, consider an HCatalog‡ table web_server_logs that is partitioned by two fields,
date and colo:
logs = load 'web_server_logs' using HCatLoader();
cleaned = filter logs by date = '20110614' and NotABot(user_id);
...

Pig will call getPartitionKeys, and HCatLoader will return two key names, date and
colo. Pig will find the date field in the filter statement and rewrite the filter as shown
in the following example, pushing down the date = '20110614' predicate to HCat
Loader via setPartitionFilter:
logs = load 'web_server_logs' using HCatLoader();
cleaned = filter logs by NotABot(user_id);
...

It is now up to HCatalog loader to assure that it only returns data from

web_server_logs where date is 20110614.
The one exception to this is fields used in eval funcs or filter funcs. Pig assumes that
loaders do not understand how to invoke UDFs, so Pig will not push these expressions.
Our example loader works on file data, so it does not implement getPartitionKeys or
setPartitionFilter. For an example implementation of these methods, see the
HCatalog code at http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/
apache/hcatalog/pig/HCatLoader.java?view=markup.

† Meaning that the filter can be broken into two filters—one that contains the partition keys and one that does
not—and produce the same end result. This is possible when the expressions are connected by and but not
when they are connected by or.
‡ HCatalog is a table-management service for Hadoop. It includes Pig load and store functions. See “Metadata
in Hadoop” on page 169 for more information on HCatalog.

Load Functions | 155

www.it-ebooks.info
Casting bytearrays
If you need to control how binary data that your loader loads is cast to other data types,
you can implement the LoadCaster interface. Because this interface contains a lot of
methods, implementers often implement it as a separate class. This also allows load
functions to share implementations of LoadCaster, since Java does not support multiple
inheritance.
The interface consists of a series of methods: bytesToInteger, bytesToLong, etc. These
will be called to convert a bytearray to the appropriate type. Starting in 0.9, there are
two bytesToMap methods. You should implement the one that takes a ResourceField
Schema; the other one is for backward-compatibility. The bytesToBag, bytesToTuple, and
bytesToMap methods take a ResourceFieldSchema that describes the field being conver-
ted. Calling getSchema on this object will return a schema that describes this bag, tuple,
or map, if one exists. If Pig does not know the intended structure of the object, get
Schema will return null. Keep in mind that the schema of the bag will be one field, a
tuple, which in turn will have a schema describing the contents of that tuple.
A default load caster, Utf8StorageConverter, is provided. It handles converting UTF8-
encoded text to Pig types. Scalar conversions are done in a straightforward way. Maps
are expected to be surrounded by [] (square brackets), with keys separated by values
with # (hash) and key-value pairs separated by , (commas). Tuples are surrounded by
() (parentheses) and have fields separated by , (commas). Bags are surrounded by {}
(braces) and have tuples separated by , (commas). There is no ability to escape these
special characters.

Pushing down projections

Often a Pig Latin script will need to read only a few fields in the input. Some types of
storage formats store their data by fields instead of by records (for example, Hive’s
RCFile). For these types of formats, there is a significant performance gain to be had
by loading only those fields that will be used in the script. Even for record-oriented
storage formats, it can be useful to skip deserializing fields that will not be used.
As part of its optimizations, Pig analyzes Pig Latin scripts and determines what fields
in an input it needs at each step in the script. It uses this information to aggressively
drop fields it no longer needs. If the loader implements the LoadPushDown interface, Pig
can go a step further and provide this information to the loader.
Once Pig knows the fields it needs, it assembles them in a RequiredFieldList and passes
that to pushProjection. In the load function’s reply, it indicates whether it can meet the
request. It responds with a RequiredFieldResponse, which is a fancy wrapper around a
Boolean. If the Boolean is true, Pig will assume that only the required fields are being
returned from getNext. If it is false, Pig will assume that all fields are being returned by
getNext, and it will handle dropping the extra ones itself.

156 | Chapter 11: Writing Load and Store Functions

www.it-ebooks.info
The RequiredField class used to describe which fields are required is slightly complex.
Beyond allowing a user to specify whether a given field is required, it provides the ability
to specify which subfields of that field are required. For example, for maps, certain keys
can be listed as required. For tuples and bags, certain fields can be listed as required.
Load functions that implement LoadPushDown should not modify the schema object
returned by getSchema. This should always be the schema of the full input. Pig will
manage the translation between the schema having all of the fields and the results of
getNext having only some.
Our example loader does not implement LoadPushDown. For an example of a loader that
does, see HCatLoader at http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/
org/apache/hcatalog/pig/HCatLoader.java?view=markup.

Store Functions
Pig’s store function is, in many ways, a mirror image of the load function. It is built on
top of Hadoop’s OutputFormat. It takes Pig Tuples and creates key-value pairs that its
associated output format writes to storage.
StoreFunc is an abstract class, which allows it to provide default implementations for
some methods. However, some functions implement both load and store functionality;
PigStorage is one example. Because Java does not support multiple inheritance, the
interface StoreFuncInterface is provided. These dual load/store functions can imple-
ment this interface rather than extending StoreFunc.
Store function operations are split between the frontend and backend of Pig. Pig does
planning and optimization on the frontend. Store functions have an opportunity at this
time to check that a valid schema is being used and set up the storage location. On the
backend, store functions take a tuple from Pig, convert it to a key-value pair, and pass
it to a Hadoop RecordWriter. Store functions can pass information from frontend in-
vocations to backend invocations via UDFContext.

Store Function Frontend Planning

Store functions have three tasks to fulfill on the frontend:
• Instantiate the OutputFormat they will use to store data.
• Check the schema of the data being stored.
• Record the location where the data will be stored.

Determining OutputFormat
Pig calls getOutputFormat to get an instance of the output format that your store function
will use to store records. This method returns an instance rather than the classname or
the class itself. This allows your store function to control how the class is instantiated.

Store Functions | 157

www.it-ebooks.info
The example store function JsonStorage uses TextOutputFormat. This is an output for-
mat that stores text data in HDFS. We have to instantiate this with a key of LongWrita
ble and a value of Text to match the expectations of TextInputFormat:
// JsonStorage.java
public OutputFormat getOutputFormat() throws IOException {
return new TextOutputFormat<LongWritable, Text>();
}

Setting the output location

Pig calls setStoreLocation to communicate the location string the user provides to your
store function. Given the Pig Latin store Z into 'output';, “output” is the location
string. This method, called on both the frontend and the backend, could be called
multiple times; consequently, it should not have any side effects that will cause a prob-
lem if this happens. Your store function will need to communicate the location to its
output format. Our example store function uses the FileOutputFormat utility function
setOutputPath to do this:
// JsonStorage.java
public void setStoreLocation(String location, Job job) throws IOException {
FileOutputFormat.setOutputPath(job, new Path(location));
}

The Hadoop Job is passed to this function as well. Most output formats store the lo-
cation information in the job.
Pig calls setStoreLocation on both the frontend and backend because output formats
usually store their location in the job, as we see in our example store function. This
works for MapReduce jobs, where a single output format is guaranteed. But due to the
split operator, Pig can have more than one instance of the same store function in a
job. If multiple instances of a store function call FileOutputFormat.setOutputPath,
whichever instance calls it last will overwrite the others. Pig avoids this by keeping
output-specific information and calling setStoreLocation again on the backend so that
it can properly configure the output format.
For HDFS files, the user might provide a relative path. Pig needs to resolve these to
absolute paths using the current working directory at the time the store is called. To
accomplish this, Pig calls relToAbsPathForStoreLocation with the user-provided loca-
tion string before calling setStoreLocation. This method translates between relative
and absolute paths. For store functions writing to HDFS, the default implementation
in StoreFunc handles the conversion. If you are writing a store function that does not
use file paths (e.g., HBase), you should override this method to return the string it is
passed.

Checking the schema

As part of frontend planning, Pig gives your store function a chance to check the schema
of the data to be stored. If you are storing data to a system that expects a certain schema

158 | Chapter 11: Writing Load and Store Functions

www.it-ebooks.info
for the output (such as an RDBMS) or you cannot store certain data types, this is the
place to perform those checks. Oddly enough, this method returns a void rather than
a Boolean. So if you detect an issue with the schema, you must throw an IOException.
Our example store function does not have limitations on the schemas it can store.
However, it uses this function as a place to serialize the schema into UDFContext so that
it can be used on the backend when writing data:
// JsonStorage.java

public void checkSchema(ResourceSchema s) throws IOException {

UDFContext udfc = UDFContext.getUDFContext();
Properties p =
udfc.getUDFProperties(this.getClass(), new String[]{udfcSignature});
p.setProperty("pig.jsonstorage.schema", s.toString());
}

Store Functions and UDFContext

Store functions work with UDFContext exactly as load functions do, but with one ex-
ception: the signature for store functions is passed to the store function via setStore
FuncUDFContextSignature. See “Passing Information from the Frontend to the Back-
end” on page 148 for a discussion of how load functions work with UDFContext. Our
example store function stores the signature in a member variable for later use:
// JsonStorage.java
public void setStoreFuncUDFContextSignature(String signature) {
udfcSignature = signature;
}

Writing Data
During backend processing, the store function is first initialized, and then takes Pig
tuples and converts them to key-value pairs to be written to storage.

Preparing to write
Pig calls your store function’s prepareToWrite method in each map or reduce task before
writing any data. This call passes a RecordWriter instance to use when writing data.
RecordWriter is a class that OutputFormat uses to write individual records. Pig will get
the record writer it passes to your store function by calling getRecordWriter on the
output format your store function returned from getOutputFormat. Your store function
will need to keep this reference so that it can be used in putNext.
The example store function JsonStorage also uses this method to read the schema out
of the UDFContext. It will use this schema when storing data. Finally, it creates a Json
Factory for use in putNext:
// JsonStorage.java
public void prepareToWrite(RecordWriter writer) throws IOException {

Store Functions | 159

www.it-ebooks.info
// Store the record writer reference so we can use it when it's time
// to write tuples.
this.writer = writer;

// Get the schema string from the UDFContext object.

UDFContext udfc = UDFContext.getUDFContext();
Properties p =
udfc.getUDFProperties(this.getClass(), new String[]{udfcSignature});
String strSchema = p.getProperty("pig.jsonstorage.schema");
if (strSchema == null) {
throw new IOException("Could not find schema in UDF context");
}

// Parse the schema from the string stored in the properties object.
ResourceSchema schema =
new ResourceSchema(Utils.getSchemaFromString(strSchema));
fields = schema.getFields();

// Build a Json factory.

jsonFactory = new JsonFactory();
jsonFactory.configure(
JsonGenerator.Feature.WRITE_NUMBERS_AS_STRINGS, false);
}

Writing records
putNext is the core method in the store function class. Pig calls this method for every
tuple it needs to store. Your store function needs to take these tuples and produce the
key-value pairs that its output format expects. For information on the Java objects in
which the data will be stored and how to extract them, see “Interacting with Pig val-
ues” on page 122.
JsonStorage encodes the contents of the tuple in JSON format and writes the resulting
string into the value field of TextOutputFormat. The key field is left null:
// JsonStorage.java
public void putNext(Tuple t) throws IOException {
// Build a ByteArrayOutputStream to write the JSON into.
ByteArrayOutputStream baos = new ByteArrayOutputStream(BUF_SIZE);
// Build the generator.
JsonGenerator json =
jsonFactory.createJsonGenerator(baos, JsonEncoding.UTF8);

// Write the beginning of the top-level tuple object.

json.writeStartObject();
for (int i = 0; i < fields.length; i++) {
writeField(json, fields[i], t.get(i));
}
json.writeEndObject();
json.close();

// Hand a null key and our string to Hadoop.

try {
writer.write(null, new Text(baos.toByteArray()));

160 | Chapter 11: Writing Load and Store Functions

www.it-ebooks.info
} catch (InterruptedException ie) {
throw new IOException(ie);
}
}

private void writeField(JsonGenerator json,

ResourceFieldSchema field,
Object d) throws IOException {

// If the field is missing or the value is null, write a null.

if (d == null) {
json.writeNullField(field.getName());
return;
}

// Based on the field's type, write it out.

switch (field.getType()) {
case DataType.INTEGER:
json.writeNumberField(field.getName(), (Integer)d);
return;

case DataType.LONG:
json.writeNumberField(field.getName(), (Long)d);
return;

case DataType.FLOAT:
json.writeNumberField(field.getName(), (Float)d);
return;

case DataType.DOUBLE:
json.writeNumberField(field.getName(), (Double)d);
return;

case DataType.BYTEARRAY:
json.writeBinaryField(field.getName(), ((DataByteArray)d).get());
return;

case DataType.CHARARRAY:
json.writeStringField(field.getName(), (String)d);
return;

case DataType.MAP:
json.writeFieldName(field.getName());
json.writeStartObject();
for (Map.Entry<String, Object> e : ((Map<String, Object>)d).entrySet()) {
json.writeStringField(e.getKey(), e.getValue().toString());
}
json.writeEndObject();
return;

case DataType.TUPLE:
json.writeFieldName(field.getName());
json.writeStartObject();

ResourceSchema s = field.getSchema();

Store Functions | 161

www.it-ebooks.info
if (s == null) {
throw new IOException("Schemas must be fully specified to use "
+ "this storage function. No schema found for field " +
field.getName());
}
ResourceFieldSchema[] fs = s.getFields();

for (int j = 0; j < fs.length; j++) {

writeField(json, fs[j], ((Tuple)d).get(j));
}
json.writeEndObject();
return;

case DataType.BAG:
json.writeFieldName(field.getName());
json.writeStartArray();
s = field.getSchema();
if (s == null) {
throw new IOException("Schemas must be fully specified to use "
+ "this storage function. No schema found for field " +
field.getName());
}
fs = s.getFields();
if (fs.length != 1 || fs[0].getType() != DataType.TUPLE) {
throw new IOException("Found a bag without a tuple "
+ "inside!");
}
// Drill down the next level to the tuple's schema.
s = fs[0].getSchema();
if (s == null) {
throw new IOException("Schemas must be fully specified to use "
+ "this storage function. No schema found for field " +
field.getName());
}
fs = s.getFields();
for (Tuple t : (DataBag)d) {
json.writeStartObject();
for (int j = 0; j < fs.length; j++) {
writeField(json, fs[j], t.get(j));
}
json.writeEndObject();
}
json.writeEndArray();
return;
}
}

Failure Cleanup
When jobs fail after execution has started, your store function may need to clean up
partially stored results. Pig will call cleanupOnFailure to give your store function an
opportunity to do this. It passes the location string and the job object so that your store
function knows what it should clean up. In the HDFS case, the default implementation

162 | Chapter 11: Writing Load and Store Functions

www.it-ebooks.info
handles removing any output files created by the store function. You need to implement
this method only if you are storing data somewhere other than HDFS.

Storing Metadata
If your storage format can store schemas in addition to data, your store function can
implement the interface StoreMetadata. This provides a storeSchema method that is
called by Pig as part of its frontend operations. Pig passes storeSchema a Resource
Schema, the location string, and the job object so that it can connect to its storage. The
ResourceSchema is very similar to the Schema class described in “Input and Output Sche-
mas” on page 124. There is one important difference, however. In ResourceField
Schema, the schema object associated with a bag always has one field, which is a tuple.
The schema for the tuples in the bag is described by that tuple’s ResourceFieldSchema.
The example store function JsonStorage stores the schema in a side file named
_schema in the same directory as the data. The schema is stored as a string, using the
toString method provided by the class:
// JsonStorage.java
public void storeSchema(ResourceSchema schema, String location, Job job)
throws IOException {
// Store the schema in a side file in the same directory. MapReduce
// does not include files starting with "_" when reading data for a job.
FileSystem fs = FileSystem.get(job.getConfiguration());
DataOutputStream out = fs.create(new Path(location + "/_schema"));
out.writeBytes(schema.toString());
out.writeByte('\n');
out.close();
}

StoreMetadata also has a storeStatistics function, but Pig does not use this yet.

Store Functions | 163

www.it-ebooks.info
www.it-ebooks.info

161 - PDFsam - Programming Pig
No ratings yet
161 - PDFsam - Programming Pig
10 pages
Unit 5 (Pig, Hive, Hbase)
No ratings yet
Unit 5 (Pig, Hive, Hbase)
18 pages
181 - PDFsam - Programming Pig
No ratings yet
181 - PDFsam - Programming Pig
10 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
Bda V
No ratings yet
Bda V
10 pages
BDP Assignment 2
No ratings yet
BDP Assignment 2
12 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
Default - Parallel: You Can Set The Number of Reducers For A Map Job by Passing Any Whole Number As A
No ratings yet
Default - Parallel: You Can Set The Number of Reducers For A Map Job by Passing Any Whole Number As A
6 pages
BDS Unit 3 1
No ratings yet
BDS Unit 3 1
42 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Apache Pig Guide: Features & Functions
No ratings yet
Apache Pig Guide: Features & Functions
31 pages
Understanding Apache Pig Architecture
No ratings yet
Understanding Apache Pig Architecture
33 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
Pig Data Types and Features Overview
No ratings yet
Pig Data Types and Features Overview
16 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
BDA Unit 5-1
No ratings yet
BDA Unit 5-1
29 pages
Bda Module 5
No ratings yet
Bda Module 5
26 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Lecture 12
No ratings yet
Lecture 12
21 pages
Unit-V CC&BD CS62
No ratings yet
Unit-V CC&BD CS62
73 pages
Pig vs. SQL & MapReduce: Features & Benefits
No ratings yet
Pig vs. SQL & MapReduce: Features & Benefits
21 pages
Applications of Apache Pig in Big Data
No ratings yet
Applications of Apache Pig in Big Data
10 pages
Big Data Applications: Pig & Hive
No ratings yet
Big Data Applications: Pig & Hive
29 pages
Apache Pig: Big Data Analytics Guide
No ratings yet
Apache Pig: Big Data Analytics Guide
65 pages
Big Data Analytics: Pig & Hive Overview
No ratings yet
Big Data Analytics: Pig & Hive Overview
10 pages
Pig 2
No ratings yet
Pig 2
63 pages
Exercise 3 Pig Evaluation Functions: IBM Software
No ratings yet
Exercise 3 Pig Evaluation Functions: IBM Software
12 pages
Hadoop
No ratings yet
Hadoop
15 pages
Slide 5 High-Level Data Process Components Tutorial
No ratings yet
Slide 5 High-Level Data Process Components Tutorial
109 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Apache Pig for Data Analysts
No ratings yet
Apache Pig for Data Analysts
58 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
BDAV Sem3
No ratings yet
BDAV Sem3
11 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
6 Part2
No ratings yet
6 Part2
45 pages
Data Processing with Hadoop and Hive
No ratings yet
Data Processing with Hadoop and Hive
4 pages
Big Data Module V Notes
No ratings yet
Big Data Module V Notes
26 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
BIGDATUNIT5
No ratings yet
BIGDATUNIT5
32 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
Notes UNIT 5 Bigdata
No ratings yet
Notes UNIT 5 Bigdata
18 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
BDA Module 4 - Part 1 (Pig) 2023
100% (1)
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Notes of Aktu Btech 3 Yr Big Data
No ratings yet
Notes of Aktu Btech 3 Yr Big Data
15 pages
BD 5
No ratings yet
BD 5
28 pages
Pig Interview Questions
No ratings yet
Pig Interview Questions
3 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Pig Notes-1
No ratings yet
Pig Notes-1
6 pages
BDA Unit 5 Notes
No ratings yet
BDA Unit 5 Notes
19 pages
Comparing Pig, Hive, and Jaql
No ratings yet
Comparing Pig, Hive, and Jaql
40 pages
BDA Unit 5 Notes: Big Data Analytics (Anna University)
No ratings yet
BDA Unit 5 Notes: Big Data Analytics (Anna University)
20 pages
Pig Operations Load Store Dump Describe
No ratings yet
Pig Operations Load Store Dump Describe
8 pages
9 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
9 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
21 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
21 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
13 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
13 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
25 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
25 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
11 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
11 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
1 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
1 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
13 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
13 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
23 PDFsam IOQM 2029 Properties of GCD LCM
No ratings yet
23 PDFsam IOQM 2029 Properties of GCD LCM
2 pages
23 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
23 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
15 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
15 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
7 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
7 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
9 PDFsam IOQM 2024 Non Routine Equation YT
No ratings yet
9 PDFsam IOQM 2024 Non Routine Equation YT
2 pages
13 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
13 PDFsam Trigonometry RESULTS For IOQM
2 pages
11 PDFsam IOQm Theory Vedantu
No ratings yet
11 PDFsam IOQm Theory Vedantu
10 pages
1 PDFsam IOQm Theory Vedantu
No ratings yet
1 PDFsam IOQm Theory Vedantu
10 pages
25 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
25 PDFsam Trigonometry RESULTS For IOQM
2 pages
21 Pdfsam Ioqm Important CDF
No ratings yet
21 Pdfsam Ioqm Important CDF
2 pages
51 PDFsam IOQM-BY-FIITJEE
No ratings yet
51 PDFsam IOQM-BY-FIITJEE
10 pages
5 PDFsam Trigonometry RESULTS For IOQM
No ratings yet
5 PDFsam Trigonometry RESULTS For IOQM
2 pages
41 PDFsam IOQM-BY-FIITJEE
No ratings yet
41 PDFsam IOQM-BY-FIITJEE
10 pages
1 PDFsam IOQM-BY-FIITJEE
No ratings yet
1 PDFsam IOQM-BY-FIITJEE
10 pages
1 Pdfsam Ioqm Important CDF
100% (1)
1 Pdfsam Ioqm Important CDF
2 pages
301 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
301 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
6 pages
21 PDFsam IOQM-BY-FIITJEE
No ratings yet
21 PDFsam IOQM-BY-FIITJEE
10 pages
31 PDFsam Mathematical Formulae
No ratings yet
31 PDFsam Mathematical Formulae
10 pages
71 PDFsam IOQM-BY-FIITJEE
No ratings yet
71 PDFsam IOQM-BY-FIITJEE
10 pages
121 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
121 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
89 - PDFsam - Start Sketching and Drawing Now Simple Techniques For Drawing Landscapes, People and Objects
No ratings yet
89 - PDFsam - Start Sketching and Drawing Now Simple Techniques For Drawing Landscapes, People and Objects
8 pages
281 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
281 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
201 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
No ratings yet
201 - PDFsam - The Big Book of Realistic Drawing Secrets Easy Techniques For Drawing People, Animals and More
20 pages
Customer Churn Case Answers
No ratings yet
Customer Churn Case Answers
8 pages
Whale Keypoints
No ratings yet
Whale Keypoints
16 pages
Maguire Mlan-Protocol-File-File
No ratings yet
Maguire Mlan-Protocol-File-File
112 pages
Solving Inequalities: Word Problem Practice
No ratings yet
Solving Inequalities: Word Problem Practice
4 pages
L3 Java API-Online Code Review 1
No ratings yet
L3 Java API-Online Code Review 1
13 pages
Library Catalog
100% (1)
Library Catalog
17 pages
Su STVGBert A Visual-Linguistic Transformer Based Framework For Spatio-Temporal Video Grounding ICCV 2021 Paper
No ratings yet
Su STVGBert A Visual-Linguistic Transformer Based Framework For Spatio-Temporal Video Grounding ICCV 2021 Paper
10 pages
DS Lab File
No ratings yet
DS Lab File
124 pages
Standard Operating Procedure For (MAINTENANCE SM)
100% (3)
Standard Operating Procedure For (MAINTENANCE SM)
6 pages
Root Locus in Control Systems
100% (1)
Root Locus in Control Systems
3 pages
Biostar H510MH e 2.0 Spec
No ratings yet
Biostar H510MH e 2.0 Spec
7 pages
X-Mouse Button Control v2.19.2 Log
No ratings yet
X-Mouse Button Control v2.19.2 Log
4 pages
Generator Installation Earthing Guide
No ratings yet
Generator Installation Earthing Guide
14 pages
Deep Learning Chorale Prelude
No ratings yet
Deep Learning Chorale Prelude
6 pages
Faculty Profile Engineering Section Update
No ratings yet
Faculty Profile Engineering Section Update
4 pages
Security Configuration Assessment (SCA) : Getting Started Guide
No ratings yet
Security Configuration Assessment (SCA) : Getting Started Guide
5 pages
CRI Assignment Question
No ratings yet
CRI Assignment Question
9 pages
Khairul's BASIC MATH 2 Boimate Com
No ratings yet
Khairul's BASIC MATH 2 Boimate Com
540 pages
Pcwin2 015 Updating - Procedure en
No ratings yet
Pcwin2 015 Updating - Procedure en
1 page
Unit 1
No ratings yet
Unit 1
84 pages
1 Merged
No ratings yet
1 Merged
22 pages
Document 1
No ratings yet
Document 1
111 pages
CP Plus Face Recognition Manual
No ratings yet
CP Plus Face Recognition Manual
103 pages
Q1 2025 Trends Report - fb47dd7d65b8
No ratings yet
Q1 2025 Trends Report - fb47dd7d65b8
18 pages
BCD Data Processing Flowchart
No ratings yet
BCD Data Processing Flowchart
2 pages
ATH Series Digital Spring Tester
No ratings yet
ATH Series Digital Spring Tester
7 pages
WS 2025 Digital Assignment I-A1
No ratings yet
WS 2025 Digital Assignment I-A1
2 pages
Service Manual: Microwave Oven
100% (1)
Service Manual: Microwave Oven
38 pages
AX S4 ICCP V6 Applications1
No ratings yet
AX S4 ICCP V6 Applications1
18 pages
Features of MRP Live in S4HANA
No ratings yet
Features of MRP Live in S4HANA
25 pages

171 - PDFsam - Programming Pig

Uploaded by

171 - PDFsam - Programming Pig

Uploaded by

Using partitions

It is now up to HCatalog loader to assure that it only returns data from

Load Functions | 155

Pushing down projections

156 | Chapter 11: Writing Load and Store Functions

Store Function Frontend Planning

Store Functions | 157

Setting the output location

Checking the schema

158 | Chapter 11: Writing Load and Store Functions

public void checkSchema(ResourceSchema s) throws IOException {

Store Functions and UDFContext

Store Functions | 159

// Get the schema string from the UDFContext object.

// Build a Json factory.

// Write the beginning of the top-level tuple object.

// Hand a null key and our string to Hadoop.

160 | Chapter 11: Writing Load and Store Functions

private void writeField(JsonGenerator json,

// If the field is missing or the value is null, write a null.

// Based on the field's type, write it out.

Store Functions | 161

for (int j = 0; j < fs.length; j++) {

162 | Chapter 11: Writing Load and Store Functions

Store Functions | 163

You might also like