0% found this document useful (0 votes)

19 views39 pages

Big Data Analytics Module-4

Hive is a data warehousing tool developed by Facebook that operates on top of Hadoop, enabling data searching, management, and analysis. It features an SQL dialect called HiveQL, but has limitations such as lack of support for updates and real-time queries. Hive integrates with MapReduce and HDFS, allowing for efficient data processing and querying through its architecture components like Hive Server, Metastore, and execution engine.

Uploaded by

Yohima Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views39 pages

Big Data Analytics Module-4

Uploaded by

Yohima Shetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

Module-4

MapReduce, Hive and Pig

HIVE
• Hive was created by Facebook.

• Hive is a data warehousing tool and is also a data store on

the top of Hadoop.

• Enterprises uses a data warehouse as large data repositories

that are designed to enable the Searching, managing, and
analyzing the data.

• Additionally, also manages the volumes of data.

HIVE Features
Hive Characteristics
1. Has the capability to translate queries into MapReduce jobs.

2. Supports web interfaces as well.

3. Provides an SQL dialect (Hive Query Language, abbreviated HiveQL or

HQL).
HIVE Limitations
1. Not a full database. Main disadvantage is that Hive does not provide update,
alter and deletion of records in the database.

2. Not developed for unstructured data.

3. Not designed for real-time queries.

4. Performs the partition always from the last column.

Hive Architecture
Hive architecture components are:

• Hive Server(Thrift) - An optional service that allows a remote client to

submit requests to Hive and retrieve results.

• Hive CLI (Command Line Interface) - Popular interface to interact with

Hive.

• Web Interface - Hive can be accessed using a web browser as well. The
URL http:// hadoop:<port no.> / hwi command can be used to access
Hive through the web.
• Metastore - It is a central repository that stores the schema or
metadata of tables, databases, columns in a table, their data
types.

• Hive Driver - It processes and manages the execution of

queries.
Hive Data Types and File Formats
Hive has three Collection data types
HIVE file formats and their descriptions
Hive Data Model
Hive Integration and Workflow Steps
Hive integrates with the MapReduce and HDFS. Figure below shows the dataflow
sequences and workflow steps between Hive and Hadoop.
1. Execute Query: Hive interface sends a query to DatabaseDriver to
execute the query.

2. Get Plan: Driver sends the query to query compiler that parses the
query to check the syntax and query plan or the requirement of the
query.

3. Get Metadata: Compiler sends metadata request to Metastore (of

any database, such as MySQL).

4. Send Metadata: Metastore sends metadata as a response to

compiler.
5. Send Plan: Compiler checks the requirement and resends the plan
to driver.

6. Execute Plan: Driver sends the execute plan to execution engine.

7. Execute Job: Internally, the process of execution job is a

MapReduce job. The execution engine sends the job to JobTracker,
which is in Name node and it assigns this job to TaskTracker, which is
in Data node. Then , the query executes the job.
8. Metadata Operations: Meanwhile the execution engine can
execute the metadata operations with Metastore.

9. Fetch Result: Execution engine receives the results from Data

nodes.

10. Send Results: Execution engine sends the result to Driver.

11. Send Results: Driver sends the results to Hive Interfaces.

HIVEQL
Hive Query Language (abbreviated HiveQL) is for querying the large datasets which
reside in the HDFS environment.

HiveQL script commands enable data definition, data manipulation and query
processing.
HiveQL Data Definition Language (DDL)
HiveQL database commands for data definition for DBs and Tables are

CREATE DATABASE,

SHOW DATABASE (list of all DBs),

CREATE SCHEMA,

CREATE TABLE.
Following are HiveQL commands which create a table:
Example: on Creating a Table
Creating a Database

Showing Database
Dropping a Database

RESTRICT-Think of it as safe mode: You can’t delete the database unless it’s empty.
CASCADE-Forces deletion of the database and all its contents (tables, views, functions, etc.).

Example- Usages of database commands for CREATE, SHOW and DROP.

CREATE DATABASE IF NOT EXISTS toys_companyDB;

SHOW DATABASES;
Drop Database toys_companyDB.
HiveQL Data Manipulation Language (DML)
HiveQL commands for data manipulation are

USE <database name>,

DROP DATABASE,

DROP SCHEMA,

ALTER TABLE,

DROP TABLE, and

LOAD DATA.
Loading Data into HIVE DB

LOCAL (Optional) -If used, it means the file is on the local file system (like your laptop).
If omitted, Hive assumes the file is in HDFS (Hadoop Distributed File System).

INPATH '<file path>‘ - Specifies the path of the file you want to load.

OVERWRITE (Optional)- If used, it will delete existing data in the table (or partition) before loading the
new data.

PARTITION (...) (Optional)- If the table is partitioned, you must specify which partition to load the data
into.
Partitioning
• Table partitioning refers to dividing the table data into some parts based on the
values of particular set of columns.

• Hive organizes tables into partitions.

• Partition makes querying easy and fast.

• This is because SELECT is then from the smaller number of column fields.
Aggregation
Hive supports the following built-in aggregation functions.
Join
• A JOIN clause combines columns of two or more tables, based on a relation
between them.
• HiveQL Join is more or less similar to SQL JOINS.
• Example:
Join Example
Left outer Join Example
Right Outer Join Example
Full Outer Join Example
Group By Clause

Hive
No ratings yet
Hive
30 pages
Hive Final
No ratings yet
Hive Final
75 pages
Understanding Hive Map Types
No ratings yet
Understanding Hive Map Types
49 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
HIVE
No ratings yet
HIVE
18 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
Understanding Hive and Pig in Hadoop
No ratings yet
Understanding Hive and Pig in Hadoop
91 pages
HIVE
No ratings yet
HIVE
28 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
7 Hive
No ratings yet
7 Hive
30 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Bigdata Analytics
No ratings yet
Bigdata Analytics
13 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Understanding Apache Hive Architecture
No ratings yet
Understanding Apache Hive Architecture
5 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Hive
No ratings yet
Hive
12 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
Overview of Hive Data Warehouse System
No ratings yet
Overview of Hive Data Warehouse System
9 pages
Unit V
No ratings yet
Unit V
23 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Hive
No ratings yet
Hive
45 pages
HIVE
No ratings yet
HIVE
80 pages
Apache Hive Beginner's Guide
No ratings yet
Apache Hive Beginner's Guide
3 pages
Apache Hive for Big Data Processing
No ratings yet
Apache Hive for Big Data Processing
19 pages
Pig vs Hive: Features and Applications
No ratings yet
Pig vs Hive: Features and Applications
17 pages
BDA Hive
No ratings yet
BDA Hive
22 pages
Unit IV
No ratings yet
Unit IV
64 pages
Day 4
No ratings yet
Day 4
10 pages
Hive for Big Data Professionals
No ratings yet
Hive for Big Data Professionals
17 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Module 3-1
No ratings yet
Module 3-1
32 pages
94047595747
No ratings yet
94047595747
3 pages
A Hardware Programming Language
No ratings yet
A Hardware Programming Language
19 pages
Power Query M Functions Overview
No ratings yet
Power Query M Functions Overview
963 pages
AD3251 - DSD - QB - UNIT - 2 - 2 Mark
No ratings yet
AD3251 - DSD - QB - UNIT - 2 - 2 Mark
5 pages
Cho 2
No ratings yet
Cho 2
13 pages
Unit V Introduction To Ajax and Web Services
86% (7)
Unit V Introduction To Ajax and Web Services
39 pages
Advance Java Notes
No ratings yet
Advance Java Notes
32 pages
Android System Architecture Guide
No ratings yet
Android System Architecture Guide
9 pages
Understanding Oracle Alert Types
100% (2)
Understanding Oracle Alert Types
69 pages
Virtualization Insights for IT Pros
No ratings yet
Virtualization Insights for IT Pros
29 pages
Python Basics: Functions & Division
No ratings yet
Python Basics: Functions & Division
5 pages
Application Development-1
No ratings yet
Application Development-1
14 pages
RewindComponent 1h
No ratings yet
RewindComponent 1h
7 pages
AZ 400 Exam AZ-400 Designing and Implementing Microsoft DevOps Solutions
No ratings yet
AZ 400 Exam AZ-400 Designing and Implementing Microsoft DevOps Solutions
83 pages
Lab Report1
No ratings yet
Lab Report1
17 pages
Doctor Data Management System
No ratings yet
Doctor Data Management System
2 pages
Mysql Workbench Change Schema Name
No ratings yet
Mysql Workbench Change Schema Name
13 pages
Rahool Paliwal
No ratings yet
Rahool Paliwal
3 pages
240403 - Trần Thành Trung - Fullstack
No ratings yet
240403 - Trần Thành Trung - Fullstack
3 pages
(Operating System Worksheet) Operating System Worksheet
No ratings yet
(Operating System Worksheet) Operating System Worksheet
4 pages
1-Introduction To Software Engineering
No ratings yet
1-Introduction To Software Engineering
27 pages
Monitoring The Movements of Wild Animals and Alert System Using Deep Learning Algorithm
No ratings yet
Monitoring The Movements of Wild Animals and Alert System Using Deep Learning Algorithm
5 pages
CPP Week 1 Handout
No ratings yet
CPP Week 1 Handout
38 pages
FS MM PR Enhance
No ratings yet
FS MM PR Enhance
7 pages
Introduction to Software Engineering
No ratings yet
Introduction to Software Engineering
129 pages
Semester - 1-PROGRAMMING FOR PROBLEM SOLVING 24AF1000ES106
No ratings yet
Semester - 1-PROGRAMMING FOR PROBLEM SOLVING 24AF1000ES106
3 pages
SQP Cs Tricks
No ratings yet
SQP Cs Tricks
4 pages
Python Escape Codes
No ratings yet
Python Escape Codes
6 pages
Embedded Coder Getting Started Guide
No ratings yet
Embedded Coder Getting Started Guide
118 pages
10 Excercise Java Problem Solving
No ratings yet
10 Excercise Java Problem Solving
2 pages

Big Data Analytics Module-4

Uploaded by

Big Data Analytics Module-4

Uploaded by

Module-4

MapReduce, Hive and Pig

• Hive is a data warehousing tool and is also a data store on

• Enterprises uses a data warehouse as large data repositories

• Additionally, also manages the volumes of data.

2. Supports web interfaces as well.

3. Provides an SQL dialect (Hive Query Language, abbreviated HiveQL or

2. Not developed for unstructured data.

3. Not designed for real-time queries.

4. Performs the partition always from the last column.

• Hive Server(Thrift) - An optional service that allows a remote client to

• Hive CLI (Command Line Interface) - Popular interface to interact with

• Hive Driver - It processes and manages the execution of

3. Get Metadata: Compiler sends metadata request to Metastore (of

4. Send Metadata: Metastore sends metadata as a response to

6. Execute Plan: Driver sends the execute plan to execution engine.

7. Execute Job: Internally, the process of execution job is a

9. Fetch Result: Execution engine receives the results from Data

10. Send Results: Execution engine sends the result to Driver.

11. Send Results: Driver sends the results to Hive Interfaces.

SHOW DATABASE (list of all DBs),

Example- Usages of database commands for CREATE, SHOW and DROP.

CREATE DATABASE IF NOT EXISTS toys_companyDB;

USE <database name>,

DROP TABLE, and

• Hive organizes tables into partitions.

• Partition makes querying easy and fast.

You might also like