0% found this document useful (0 votes)

133 views22 pages

04 Bigdata Hive

Hive is a data warehousing system for Hadoop that allows users to query large datasets stored in Hadoop files using a SQL-like language called HiveQL. It addresses the issues of lack of structure and expressiveness in MapReduce programs for analyzing large datasets. Hive provides structure to data stored in files on HDFS and the ability to express queries using a simple SQL-like language. It uses a metastore to store metadata about tables, columns and partitions. The Hive driver compiles HiveQL queries into a directed acyclic graph of MapReduce jobs that are executed by the execution engine on Hadoop clusters.

Uploaded by

Rohit Uppal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views22 pages

04 Bigdata Hive

Uploaded by

Rohit Uppal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

HIVE

DATA WAREHOUSING USING HIVE QUERY

Additional Data Warehousing System

 The table shows the problems related to data inflow & expressiveness, and
the solutions adapted to address the need for an additional data
warehousing system.
 It was difficult to develop MapReduce program to express the
data , hence data lacked expressiveness.
 Hence, HIVE came into BIG Data . .
What is HIVE ?

HIVE is defined as warehouse system for Hadoop that facilitates ad-hoc Queries
and the analysis of large data sets stored in Hadoop . .
FACTS !!
 Hive provides SQL- like flavour to Big data
i.e. HiveQL(HQL). Because of that it’s a
popular choice for Big data Analytics on
Hadoop platform.
HIVE FACTS  It provides massive scale-out & fault
tolerance capabilities for data storage
and processing of commodity hardware.
 Relying on MapReduce for execution,
hive is batch-oriented and high latency
for query execution.
HIVE | Characteristics

Hive is a system for managing and querying unstructured data into structured data format.
It uses the concepts :
 It uses HDFS for storage and retrieval of data.
 The scripts of Hive uses MapReduce for execution

Hive commands are similar to SQL which Interoperability (extensible framework to

is data warehousing tool. support different files and data formats).
Principles
of
Hive Performance is better in Hive since Hive
Extensibility (pluggable MapReduce scripts
engine uses the best in-built script to
in the language of your choice – rich, user
reduce the execution time while enabling
def data types & user def functions.
high output.
System Architecture and Components of Hive

JDBC ODBC
Web
Command Line
Interface
Interface Thrift Server
MetaStore
Driver
(Compiler, Optimizer, Executor)

Hadoop
MapReduce + HDFS
Job Name
Tracker Node
 Metastore is the component that stores the
system catalog and metadata about tables,
columns, partitions, & so on . .
 Metadata is stored in traditional RDMS format,
Apache Hive uses by default DERBY database.
But its not compulosory its complimentary. If
METADATA you wish you can add any JDBC database like
MySQL.
 Metadata client : metastore_db

MetaStore
Hive driver is the component that:
 Manages the lifecycle of a HIVE Query
Language(HQL) statement as it moves through
Hive;
 It maintains a session handle and any session
statistics.
DRIVER  It includes three basic components:
 Compiler
 Optimizer
 Executer

Driver
(Compiler, Optimizer, Executor)
Query Compiler is the driver components of Hive
and checks for error, if no error encountered it
converts HiveQL to Directed Acyclic Graph(DAG)

Query COMPILER of MapReduce tasks.

Driver
(Compiler, Optimizer, Executor)
 Query optimizer optimizes the HiveQL scripts for
faster execution.
 It consists of a chain of transformations, so that
the operator DAG resulting from on
transformations is passed as an input to the
Query OPTIMIZER next transformations.

Driver
(Compiler, Optimizer, Executor)
 Hive Execution Engine:
 Executes the tasks produced by the compiler in
proper dependency order.
 Interacts with the underlying Hadoop Interface to
ensure perfect synchronization with Hadoop
Query EXECUTOR services.

Driver
(Compiler, Optimizer, Executor)
Hive Server

 Hive Server is the main component, providing a thrift interface & it provides connectivity
to modules Java DB Connectivity/Open DB Connectivity server namely JDBC/ODBC.
 It enables the integration of HIVE with other applications.

JDBC ODBC
Web
Command Line
Interface
Interface Thrift Server
HIVE
DATA MODEL & HIVE QUERY LANGUAGE
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Integers: TINYINT, SMALLINT, INT and BIGINT

Boolean: BOOLEAN
Floating Types: FLOAT , DOUBLE
String: STRING (VARCHAR, CHAR)
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structs: {a INT; b:INT}

Maps: M[‘group’]
Arrays: [‘a’, ‘b’, ‘c’], A[1] returns ‘b’
HIVE | DATA TYPES

HIVE have three different Data types that are involved in Table Creation.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structures with Attributes

Attributes can be of Any Type
HIVE | DATA MODELS

Tables in HIVE are analogous to Tables in Relational Databases. Tables can be filtered, projected,
joined and unioined. Additionally all the data of a table is stored in a directory in HDFS. Hive also
supports the notion of external tables wherein a table can be created on pre-existing files or
directories in HDFS by providing the appropriate location to the table creation.

Two Types of tables in HIVE

Managed Tables External Tables

HIVE | DATA MODELS TABLE

 HQL Command used to create Tables:

CREATE [TEMPORARY] [EXTERNAL] TABLE IF NOT EXISTS [db_name.]
tab_name[(col_name data_type [COMMENT col_comment], . . . )]
[COMMENT table_comment]
[ROW FORMAT row_format_type]
[STORED AS file_format_type]

COMMENT ‘db_details’ ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’ STORED AS file_format_type

IT SORES TABLES IN HIVE HDFS WAREHOUSE . . . .

HIVE | DATA MODELS EXTERNAL TABLE

 The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive
does not use a default location for this table. This comes in handy if you
already have data generated.
 Dropping an EXTERNAL table, data in the table is NOT deleted from the
file system.
 An EXTERNAL table points to any HDFS location for its storage, rather
than being stored in a folder specified by the configuration
property.
 HQL Command to create External Commands:

CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)

COMMENT ‘this is external table view’
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
STORED AS TEXTFILE LOCATION ‘ /hive/data/weatherext’;
HIVE | PARTITIONING TABLES

 Hive stores tables in partitions. Partitions are used to divide the table into related parts.
Partitions make data querying more efficient. For example in the above weather table
the data can be partitioned on the basis of year and month and when query is fired on
weather table this partition can be used as one of the column.
 HQL Commands to create the Partitioning the table:
CREATE EXTERNAL TABLE IF NOT EXSISTS weatherext
( wban INT, date STRING)
PARTITIONED BY (year INT, month STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
LOCATION ‘location of text_file’;
HIVE | FLOW OF CSV FILE INTO HIVE
That’s All for Theory
LETS PRACTICE NOW . . . .

Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
8 pages
Oral Questions and Answers For Dbms Mysql Mongodb Nosql
No ratings yet
Oral Questions and Answers For Dbms Mysql Mongodb Nosql
10 pages
Node.js v6.10.3 Developer Guide
No ratings yet
Node.js v6.10.3 Developer Guide
655 pages
Exactly Once Delivery and Transactional Messaging in Kafka
No ratings yet
Exactly Once Delivery and Transactional Messaging in Kafka
67 pages
Eagle Company Profile v1.5
No ratings yet
Eagle Company Profile v1.5
24 pages
Process Automation in Display Technology
No ratings yet
Process Automation in Display Technology
11 pages
HBase: Key Features and Architecture
No ratings yet
HBase: Key Features and Architecture
31 pages
Converting Design To Prototype
No ratings yet
Converting Design To Prototype
4 pages
Data Flow Diagram Guide & Tips
No ratings yet
Data Flow Diagram Guide & Tips
9 pages
AMP 2026: India's Automotive Roadmap
No ratings yet
AMP 2026: India's Automotive Roadmap
3 pages
Sustainability For Automotive Sector Suppliers: Self-Assessment Questionnaire On CSR
No ratings yet
Sustainability For Automotive Sector Suppliers: Self-Assessment Questionnaire On CSR
20 pages
CII's Young Indians 2011-12 National Annual Report
100% (1)
CII's Young Indians 2011-12 National Annual Report
77 pages
Understanding MapReduce Job Execution
No ratings yet
Understanding MapReduce Job Execution
24 pages
Apache Kudu: Fast Analytics Data Store
No ratings yet
Apache Kudu: Fast Analytics Data Store
9 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Hadoop IO Explanation
No ratings yet
Hadoop IO Explanation
3 pages
Apache Kafka Essentials
No ratings yet
Apache Kafka Essentials
10 pages
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
No ratings yet
Star Schema and Technology Review: Musa Sami Ata Abdel-Rahman Supervisor: Professor Sebastian Link
15 pages
AWS Partner-AWS Cloud Practitioner Essentials - Presentation Deck Final
No ratings yet
AWS Partner-AWS Cloud Practitioner Essentials - Presentation Deck Final
84 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Introduction to Hadoop HDFS
No ratings yet
Introduction to Hadoop HDFS
9 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
Automobile Industry Under Chinas Carbon Peaking A
No ratings yet
Automobile Industry Under Chinas Carbon Peaking A
13 pages
MoEFCC - OM - CPCB - 28.03.2023 - Guidelines On EPR For Plastic Packaging
No ratings yet
MoEFCC - OM - CPCB - 28.03.2023 - Guidelines On EPR For Plastic Packaging
4 pages
Hadoop Ecosystem & Node Communication
No ratings yet
Hadoop Ecosystem & Node Communication
18 pages
AWS Cloud Storage Use Cases Guide
No ratings yet
AWS Cloud Storage Use Cases Guide
12 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
02 HDP Introduction
No ratings yet
02 HDP Introduction
58 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
NLP Unit 5
No ratings yet
NLP Unit 5
15 pages
Intro to Apache NiFi for Data Flow
No ratings yet
Intro to Apache NiFi for Data Flow
7 pages
Web API Design Best Practices
No ratings yet
Web API Design Best Practices
5 pages
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
No ratings yet
Technologies For Handling Big Data: Prepared By: Saidatul Rahah Hamidi
49 pages
Dev Ops
No ratings yet
Dev Ops
16 pages
Tomcat Server 7: Architecture & Admin
100% (1)
Tomcat Server 7: Architecture & Admin
36 pages
Integration of Supply and Demand Chain in Emerging Markets
No ratings yet
Integration of Supply and Demand Chain in Emerging Markets
6 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
AWS (A B2B Study)
No ratings yet
AWS (A B2B Study)
17 pages
Unit 4 Hadoop Ecosystem - HIVE and PIG
No ratings yet
Unit 4 Hadoop Ecosystem - HIVE and PIG
157 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Module 1
No ratings yet
Module 1
30 pages
A Path To Event Sourcing With Amazon MSK - James Ousby
No ratings yet
A Path To Event Sourcing With Amazon MSK - James Ousby
42 pages
BDA Notes
No ratings yet
BDA Notes
13 pages
Data Warehousing Overview and Benefits
No ratings yet
Data Warehousing Overview and Benefits
63 pages
BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3
0% (1)
BDA - Chapter-1-Components of Hadoop Ecosystem - Lecture 3
38 pages
05,6 Guide Plate Tool
No ratings yet
05,6 Guide Plate Tool
15 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
60 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Basic AWS Interview Questions Intermediate AWS Interview Questions Advanced AWS Interview Questions Scenario Based AWS Interview Questions
No ratings yet
Basic AWS Interview Questions Intermediate AWS Interview Questions Advanced AWS Interview Questions Scenario Based AWS Interview Questions
17 pages
Tableau Data Visualization Guide
No ratings yet
Tableau Data Visualization Guide
1 page
Modeling Web Applications
No ratings yet
Modeling Web Applications
14 pages
Talha Nadeem 11610
100% (1)
Talha Nadeem 11610
6 pages
Hive Final
No ratings yet
Hive Final
75 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
HIVE
No ratings yet
HIVE
18 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
IT Equipment Sales Quote
No ratings yet
IT Equipment Sales Quote
2 pages
Integration of Ldap With Samba
No ratings yet
Integration of Ldap With Samba
13 pages
Anemometro Vaisala WAV151 User Guide M210294EN
No ratings yet
Anemometro Vaisala WAV151 User Guide M210294EN
28 pages
PLC Basics: A Comprehensive Guide
No ratings yet
PLC Basics: A Comprehensive Guide
13 pages
EFilm Scan 2.0.1 Release Notes
No ratings yet
EFilm Scan 2.0.1 Release Notes
7 pages
Blaine Star
No ratings yet
Blaine Star
4 pages
Linertec Catalogo Estacion Total Serie LTS 200N en
No ratings yet
Linertec Catalogo Estacion Total Serie LTS 200N en
2 pages
R30iB MATE PLUS SOFTWARE INSTALLATION BROCHURE
No ratings yet
R30iB MATE PLUS SOFTWARE INSTALLATION BROCHURE
2 pages
C Programming Concepts and Functions
100% (1)
C Programming Concepts and Functions
14 pages
A Bi-Directional Guard System Design of Car Based On GPS & A Bi-Directional Guard System Design of Car Based On GPS & GSMGSM
No ratings yet
A Bi-Directional Guard System Design of Car Based On GPS & A Bi-Directional Guard System Design of Car Based On GPS & GSMGSM
3 pages
(User Manual Ecdispilot Platinum - Basic
100% (2)
(User Manual Ecdispilot Platinum - Basic
338 pages
Presentation On Cloud Computing
100% (5)
Presentation On Cloud Computing
19 pages
Interfacing Servo Motor With LPC1769
No ratings yet
Interfacing Servo Motor With LPC1769
17 pages
Cotton Candy Machine Tuneup
No ratings yet
Cotton Candy Machine Tuneup
4 pages
Operation Manual KH-777 (09-10-2013) - R1 PDF
80% (5)
Operation Manual KH-777 (09-10-2013) - R1 PDF
62 pages
SpeederOne Software Interface
No ratings yet
SpeederOne Software Interface
36 pages
Atex PDF
0% (1)
Atex PDF
115 pages
PCT23MkII - Process Plant Trainer (Process Control Trainer)
No ratings yet
PCT23MkII - Process Plant Trainer (Process Control Trainer)
10 pages
LaTeX Counter Display Guide
No ratings yet
LaTeX Counter Display Guide
149 pages
General Information (Technical) : 1.1. Codes, Standards & Regulations
No ratings yet
General Information (Technical) : 1.1. Codes, Standards & Regulations
6 pages
Microprocessor MCQ Questions
No ratings yet
Microprocessor MCQ Questions
5 pages
Photo Copy Machine Cannon Specification
No ratings yet
Photo Copy Machine Cannon Specification
2 pages
How Does SQL Plan Management Match SQL Statements To SQL Plan Baselines
No ratings yet
How Does SQL Plan Management Match SQL Statements To SQL Plan Baselines
6 pages
Perl Array Basics and Operations
No ratings yet
Perl Array Basics and Operations
57 pages
BC Hidden Menus E90 PDF
No ratings yet
BC Hidden Menus E90 PDF
17 pages
Dual-Boot LMDE and Windows 7 Guide
0% (1)
Dual-Boot LMDE and Windows 7 Guide
23 pages
Essential Computer Hardware Components
No ratings yet
Essential Computer Hardware Components
10 pages
ME00116EN 01 Fieldbus PCIe Card
No ratings yet
ME00116EN 01 Fieldbus PCIe Card
86 pages
Features Description
No ratings yet
Features Description
16 pages
LCD Color TV Sharp
No ratings yet
LCD Color TV Sharp
193 pages

04 Bigdata Hive

Uploaded by

04 Bigdata Hive

Uploaded by

HIVE

DATA WAREHOUSING USING HIVE QUERY

Hive commands are similar to SQL which Interoperability (extensible framework to

Query COMPILER of MapReduce tasks.

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Integers: TINYINT, SMALLINT, INT and BIGINT

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structs: {a INT; b:INT}

PREMITIVE TYPES COMPLEX TYPES USER-DEFINED TYPES

Structures with Attributes

Two Types of tables in HIVE

Managed Tables External Tables

 HQL Command used to create Tables:

COMMENT ‘db_details’ ROW FORMAT DELIMITED

IT SORES TABLES IN HIVE HDFS WAREHOUSE . . . .

CREATE EXTERNAL TABLE weatherext ( wban INT, date STRING)

You might also like