0% found this document useful (0 votes)

51 views36 pages

LectureNotes Hive Final

Apache Hive is a data warehouse infrastructure built on Hadoop that enables data summarization and ad-hoc querying. It stores data in HDFS and supports a SQL-like query language called HiveQL. HiveQL statements are compiled into MapReduce jobs that are executed across a Hadoop cluster.

Uploaded by

bexifag461

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views36 pages

LectureNotes Hive Final

Uploaded by

bexifag461

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

APACHE HIVE

CIS 612
SUNNIE CHUNG
APACHE HIVE IS
Data warehouse infrastructure built on top of
Hadoop enabling data summarization and ad-hoc

Sunnie Chung CIS 612 Lecture Notes

queries.
Initially developed by Facebook.

Hive stores data in Hadoop Distributed File

System
Supports SQL like Query Language : HiveQL

Hive complied Hive Query Language statements

are broken down by the Hive service into
MapReduce jobs and executed across a Hadoop
cluster.
2
HOW HIVE WORKS ?
Hive structures data into well-understood
database concepts such as tables, rows, columns,

Sunnie Chung CIS 612 Lecture Notes

and partitions.
It supports primitive types, as well as Associative
Arrays, Lists, Struct.
HQL supports DDL and DML.

HQL has limited equality and join predicates,

and has no inserts on existing tables. (It can
override tables)
Users can embed Custom Map-Reduce scripts.

3
HIVE

Data in Hive is organized into Tables

Provides structure for unstructured Big Data

Sunnie Chung CIS 612 Lecture Notes

Work with data inside HDFS

Tables

Data : File or Group of Files in HDFS

Schema : In the form of metadata stored in
Relational Database
Have a corresponding HDFS directory
Data in a table is Serialized

Supports Primitive Column Types and Nestable

Collection Types: Array and Map(Key Value pair) 4
HIVE DATABASE
Data Model

Tables
Analogous to tables in relational database
Each table has a corresponding HDFS directory
Hive provides built-in serialization formats which exploit compression
and lazy-serialization

Partitions
Each table can have one or more partitions (Horizontal Partitions)
Example:
Table T in the directory : /wh/T.
If Tis partitioned on columns ds = ‘20090101’, and ctry = ‘US’, will be
stored /wh/T/ds=20090101/ctry=US.

Buckets
Data in each partition may in turn be divided into buckets based on the
hash of a column in the table
Each bucket is stored as a file in the partition directory
TABLE SCHEMA EXAMPLE
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,

Sunnie Chung CIS 612 Lecture Notes

friends ARRAY<BIGINT>, properties MAP<STRING, STRING>
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '1'
COLLECTION ITEMS TERMINATED BY '2'
MAP KEYS TERMINATED BY '3'
STORED AS SEQUENCEFILE;

6
HIVE QUERY LANGUAGE
SQL like language: HiveQL
DDL : to create tables with specific serialization

Sunnie Chung CIS 612 Lecture Notes

formats
DML : load and insert to load data from external
sources and insert query results into Hive tables
Do not support updating and deleting rows in
existing tables
Supports Multi-Table insert

Supports Select, Project, Join, Aggregate,

Supports Union all and Sub-queries in the From

clause 7
HIVEQL: UDTF, UDAF
Can be extended with custom functions (UDFs)
User Defined Transformation Function(UDTF)
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF

User Defined Aggregation Function (UDAF)

Users can embed custom map-reduce scripts written in

any language using a simple row-based streaming
interface
WHAT HIVE DOES ?
Hive allows SQL developers to write Hive Query
Language (HQL) statements that are similar to

Sunnie Chung CIS 612 Lecture Notes

SQL statements, but with limited in the
commands.
It therefore allows developers to explore and
structure massive amounts of data, analyze it
then turn into business insight.
Hive queries have very high latency because it is
based on Hadoop.
Hive is read-based and not appropriate for write
operation.
9
HIVEQL When Facebook
users update their
status, the updates
are logged into flat
files in an NFS
Running time example: Status Meme directory
/logs/status_updates

Compute daily
statistics on the
frequency of status
updates based on
gender and school
ADVANTAGES OF HIVE
Familiar: hundreds of unique users can
simultaneously query the data using a language
familiar to SQL users.

Sunnie Chung CIS 612 Lecture Notes

Fast Response: times are typically much faster than
other types of queries on the same type of huge
datasets.
Scalable and extensible: as data variety and volume
grows, more commodity machines can be added to the
cluster, without a corresponding reduction in
performance.
Informative Familiar JDBC and ODBC drivers: allow
many applications to pull Hive data for seamless
reporting. Hive allows users to read data in arbitrary
formats, using SerDes and Input/Output formats.
(SerDes: serialized and deserialized API is used to
move data in and out of tables) 11
HIVE ARCHITECTURE
External Interfaces:
Web UI : Management
Hive CLI : Run Queries, Browse Tables, etc
API : JDBC, ODBC

Sunnie Chung CIS 612 Lecture Notes

Metastore :
System catalog which contains metadata about Hive tables
Driver :
manages the life cycle of a Hive-QL statement during compilation,
optimization and execution
Compiler :
translates Hive-QL statement into a plan which consists of a DAG of
map-reduce jobs
Database: is a namespace for tables
Table: metadata for table contains list of columns and their
types, owner, storage and SerDe information. Also contains
any user supplied key and value data.
Partition: each partition can have it own columns and SerDe
and storage information. 12
Sunnie Chung CIS 612 Lecture Notes
13
HIVE ARCHITECTURE
Sunnie Chung CIS 612 Lecture Notes
14
HIVE ARCHITECTURE
HIVE ARCHITECTURE
External interface:
Both user interface
like command line
(cli)
and web UI
Thrift is a framework
for cross-language
services, where a
server written in one
language (like Java)
can also support
clients in other
languages.
Metastore is the system
catalog. All other
components of Hive
interact with metastore

The Driver manages the

life cycle (statistics) of a
HiveQL statement
during compilation,
optimization and
execution

Figure 1: Hive Architecture

COMMAND LINE INTERFACE
There are several ways to interact with Hive, including some
popular graphical user interface but CLI is sometimes
preferable. CLI allows creating, inspecting schema and query
tables, etc.

Sunnie Chung CIS 612 Lecture Notes

All commands and queries go to the Driver, which complies,
optimizes and executes queries usually with MapReduce jobs.

Hive doesn’t generate MapReduce programs, it uses generic

Mapper and Reducer modules. Hive communicates with Job
Tracker to initiate the MapReduce job.

Data files to be processed are usually in HDFS, managed by

NameNode.

Hive uses Hive Query Language HQL, which is similar to

SQL.
16
HIVE ARCHITECTURE
MetaStore
The system catalog which contains metadata about the
tables stored in Hive
This data is specified during table creation and reused very
time the table is referenced in HiveQL
Contains the following objects:

database : the namespace for tables

table : metadata for table contains list of columns

and their types, owners, storage and SerDe

information
Partition: each partition can have its own columns

and SerDe and storage information

HIVE ARCHITECTURE

Bottom

Top
Figure 2: Query plan with 3 map-reduce
jobs for multi-table insert query
HIVE ARCHITECTURE
Compile
The compiler converts the string(DDL/DML/query statement)
to a plan.
The parser transforms a query string to a parse tree
representation
The semantic analyzer transforms the parse tree to a block-based

internal query representation

The logical plan generator converts the internal query
representation to a logical plan
The optimizer performs multiple passes over the logical plan and

rewrites it in several ways

Combined multiple joins which share the join key into a single multi-
way join, and hence a single map-reduce job
adds repartition operators
Prunes columns early and pushes predicates closer to the table scan
operators
…
HIVE ARCHITECTURE
Compile (continue..)
The optimizer performs multiple passes over the logical plan and
rewrites it in several ways
Combined multiple joins which share the join key into a single multi-
way join, and hence a single map-reduce job
adds repartition operators
Prunes columns early and pushes predicates closer to the table scan
operators
In case of partitioned tables, prunes partitions that are not needed by
the query
In case of sampling queries, prunes buckets that are not needed
Users can also provide hints to the optimizer to
Add partial aggregation operators to handle large cardinality grouped
aggregation
Add repartition operators to handle skew in grouped aggregations
Perform joins in the map phrase instead of the reduce phase
The Physical Plan generator converts the logical plan into physical
plan, consisting a directed-acyclic graph(DAG)of map-reproduce
jobs
INPUT DATA

Hive has no row-level insert, update or delete operations. The

only way to put data into a table is to use one of load
operations.

Sunnie Chung CIS 612 Lecture Notes

There are four file formats supported in Hive, which are
TEXTFILE, SEQUENCEFILE, ORC and RCFILE.

Example: ’NASDAQ_daily_prices_B.csv’ a log file of stocks

record of NASDAQ.

exchange,stock_symbol,date,stock_price_open,stock_price_hig
h,stock_price_low,stock_price_close,stock_volume,stock_price_
adj_close
NASDAQ,BBND,2010-02-08,2.92,2.98,2.86,2.96,483800,2.96
NASDAQ,BBND,2010-02-05,2.85,2.94,2.79,2.93,884000,2.93
NASDAQ,BBND,2010-02-04,2.83,2.88,2.78,2.83,1333300,2.83 21
….
CREATE TABLE TO HOLD THE DATA:

hive> CREATE TABLE IF NOT EXISTS stocks (

exchange STRING,

Sunnie Chung CIS 612 Lecture Notes

symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','; 22
HIVE QUERY LANGUAGE: HIVEQL
Create a database:
hive> CREATE DATABASE financials;
or

Sunnie Chung CIS 612 Lecture Notes

hive> CREATE DATABASE IF NOT EXISTS financials;

Describe table:
hive> DESCRIBE DATABASE financials;
OK
Financials
hdfs://localhost:54310/user/hive/warehouse/financials.db

Use database:
hive> USE financials;

Drop database: 23
hive> DROP DATABASE IF EXISTS financials;
HOW TO LOAD DATA INTO HIVE TABLE
Use LOAD DATA to import data into a Hive
table

Sunnie Chung CIS 612 Lecture Notes

Hive>Load Data LOCAL INPATH
'/home/sunny/EmployeeDetails.txt ' INTO
TABLE Employee
Use the word OVERWRITE to write over a file of
the same name
We can Load data from Local file system by using
LOCAL keyword as above Example
Inserting Data into new table by using SELECT
statement
For Example, INSERT OVERWRITE
24
<table_name> SELECT * FROM Employee
MANAGING TABLES

Operation Command Syntax

See current tables Hive>Show TABLES

Sunnie Chung CIS 612 Lecture Notes

Check the table name Hive>Describe <Table_Name>

Change the table name Hive>Alter Table <table_Name>

Rename to mytab

Add a column Hive> Alter Table <table_Name> ADD

COLUMNS (MyID String)

Drop a partition Hive>Alter Table <table_Name>

DROP PARTITION (Age>70)

25
HIVE SUPPORTS THE FOLLOWINGS:
WHERE Clause
UNION All and DISTINCT

Sunnie Chung CIS 612 Lecture Notes

GROUP BY and HAVING

LIMIT Clause

Hive Supports Sub-Queries but only in FROM

Clause
JOINS , ORDER BY, SORT BY

26
OUTPUT DATA

Output data produced by Hive is structured,

typically stored in a relational database.

Sunnie Chung CIS 612 Lecture Notes

For cluster, MySQL or similar relational
database is required.

The result tables then can be manipulated using

HiveQL in the similar way of SQL to relational
database.

27
LOAD FILE INTO TABLE:

hive> LOAD DATA LOCAL INPATH

'/Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv'
> OVERWRITE INTO TABLE stocks;

Sunnie Chung CIS 612 Lecture Notes

Copying data from
file:/Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv
Copying file:
file:/Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv
Loading data to table mydb.stocks
Deleted
hdfs://localhost:54310/Users/nqt289/Desktop/NASDAQ_
daily_prices_B.csv
OK
Time taken: 0.231 seconds
28
EXAMPLE OF OUTPUT OF HIVE

hive> SELECT * FROM STOCKS WHERE price_open='2.92';

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201403311509_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201403311509_0003

Sunnie Chung CIS 612 Lecture Notes

Kill Command = /Users/nqt289/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_201403311509_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-03-31 15:39:20,577 Stage-1 map = 0%, reduce = 0%
2014-03-31 15:39:23,597 Stage-1 map = 100%, reduce = 0%
2014-03-31 15:39:26,625 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201403311509_0003
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 21998523 HDFS Write: 5166 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
NASDAQ BBND 2010-02-08 2.92 2.98 2.86 2.96 483800
2.96
NASDAQ BTFG 2009-12-21 2.92 2.92 2.75 2.79 15100
2.79
NASDAQ BJCT 2004-04-21 2.92 2.98 2.9 2.98 3200
2.98
NASDAQ BJCT 2004-04-20 2.92 3.0 2.92 2.95 27900
2.95
…

Time taken: 12.785 seconds

29
DEFINITION: ACID
Atomicity
Atomicity requires that each transaction be "all or nothing": if one part of the transaction
fails, the entire transaction fails, and the database state is left unchanged. An atomic
system must guarantee atomicity in each and every situation, including power failures,
errors, and crashes. To the outside world, a committed transaction appears (by its effects on

Sunnie Chung CIS 612 Lecture Notes

the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
Consistency
The consistency property ensures that any transaction will bring the database from one
valid state to another. Any data written to the database must be valid according to all
defined rules, including constraints, cascades, triggers, and any combination thereof. This
does not guarantee correctness of the transaction in all ways the application programmer
might have wanted (that is the responsibility of application-level code) but merely that any
programming errors cannot result in the violation of any defined rules.
Isolation
The isolation property ensures that the concurrent execution of transactions result in a
system state that would be obtained if transactions were executed serially, i.e. one after the
other. Providing isolation is the main goal of concurrency control. Depending on
concurrency control method, the effects of an incomplete transaction might not even be
visible to another transaction.[citation needed]
Durability
Durability means that once a transaction has been committed, it will remain so, even in the
event of power loss, crashes, or errors. In a relational database, for instance, once a group
of SQL statements execute, the results need to be stored permanently (even if the database
crashes immediately thereafter). To defend against power loss, transactions (or their
effects) must be recorded in a non-volatile memory. 30
ACID IN HIVE
ACID for Hive is added manually with the use
cases:

Sunnie Chung CIS 612 Lecture Notes

A set of Inserts and Updates is processed once an
hour.
A set of Deletes is processed once a day.

A log of transactions is exported from a RDBMS

to reflect new data once an hour.
The delay is not an important issue here due to
the purpose of Hive, also the number of
transactions committed each time is huge (100 to
500 thousands rows.)
31
HIVE ACHIEVEMENTS & FUTURE PLANS
First step to provide warehousing layer for
Hadoop(Web-based Map-Reduce data processing

Sunnie Chung CIS 612 Lecture Notes

system)
Accepts only sub-set of SQL: Working to subsume
SQL syntax
Working on Rule-based optimizer : Plans to build
Cost-based optimizer
Enhancing JDBC and ODBC drivers for making
the interactions with commercial BI tools.
Working on making it perform better

32
PROJECTS & TOOLS ON HADOOP
HBase
Hive

Sunnie Chung CIS 612 Lecture Notes

Pig

Jaql

ZooKeeper

AVRO

UIMA

Sqoop

33
HIVE TUTORIAL
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-HiveTutorial

Sunnie Chung CIS 612 Lecture Notes

34
REFERENCES
[1] "Apache Hadoop", http://hadoop.apache.org/Hadoop/

[2] “Apache Hive”, http://hive.apache.org/hive

Sunnie Chung CIS 612 Lecture Notes

[3] “Apache HBase”, https://hbase.apache.org/hbase

[4] “Apache ZooKeeper”, http://zookeeper.apache.org/zookeeper

[5] Jason Venner, "Pro Hadoop", Apress Books, 2009

[6] "Hadoop Wiki", http://wiki.apache.org/hadoop/

[7] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors,
Adam Manzanares, Xiao Qin, " Improving MapReduce Performance
through Data Placement in Heterogeneous Hadoop Clusters", 19th
International Heterogeneity in Computing Workshop, Atlanta, Georgia,
April 2010
35
REFERENCES
[8]Dhruba Borthakur, The Hadoop Distributed File System:
Architecture and Design, The Apache Software Foundation 2007.
[9] "Apache Hadoop",
http://en.wikipedia.org/wiki/Apache_Hadoop

Sunnie Chung CIS 612 Lecture Notes

[10] "Hadoop Overview",
http://www.revelytix.com/?q=content/hadoop-overview
[11] Konstantin Shvachko, Hairong Kuang, Sanjay Radia,
Robert Chansler, The Hadoop Distributed File System,
Yahoo!, Sunnyvale, California USA, Published in: Mass
Storage Systems and Technologies (MSST), 2010 IEEE
26th Symposium.

Hive
No ratings yet
Hive
30 pages
Hive
No ratings yet
Hive
52 pages
Understanding Hive Map Types
No ratings yet
Understanding Hive Map Types
49 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
HIVE
No ratings yet
HIVE
18 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
DSS U4 HIVE Rev1.1
No ratings yet
DSS U4 HIVE Rev1.1
23 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Hive Final
No ratings yet
Hive Final
75 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Understanding Hive in Hadoop
No ratings yet
Understanding Hive in Hadoop
17 pages
Understanding Apache Hive Architecture
No ratings yet
Understanding Apache Hive Architecture
5 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Apache Hive: Data Warehousing on Hadoop
No ratings yet
Apache Hive: Data Warehousing on Hadoop
28 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Hive
No ratings yet
Hive
45 pages
7 Hive
No ratings yet
7 Hive
30 pages
Module 4
No ratings yet
Module 4
34 pages
Actividad 7. Investigación Hive
No ratings yet
Actividad 7. Investigación Hive
25 pages
Apache Hive for Distributed Computing
No ratings yet
Apache Hive for Distributed Computing
23 pages
Apache Hive Execution Environments
No ratings yet
Apache Hive Execution Environments
23 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Understanding Hive and Pig in Hadoop
No ratings yet
Understanding Hive and Pig in Hadoop
91 pages
Introduction to Hive Data Warehousing
No ratings yet
Introduction to Hive Data Warehousing
4 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
Module-IV HIVE
No ratings yet
Module-IV HIVE
69 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Hive Updated
No ratings yet
Hive Updated
18 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Day 4
No ratings yet
Day 4
10 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Course3 Module2 Intro To Hive Slides
No ratings yet
Course3 Module2 Intro To Hive Slides
76 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
182 pages
Hive
No ratings yet
Hive
4 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
8 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Hive
No ratings yet
Hive
28 pages
Hive and Pig: High-Level Hadoop Languages
No ratings yet
Hive and Pig: High-Level Hadoop Languages
20 pages
BDO Business Online Banking User Manual - With Workflow
0% (1)
BDO Business Online Banking User Manual - With Workflow
49 pages
Go Binary
No ratings yet
Go Binary
9 pages
SAT Lunas2
No ratings yet
SAT Lunas2
10 pages
Final Rect Notice
No ratings yet
Final Rect Notice
8 pages
Jetway NF9 270 LF
No ratings yet
Jetway NF9 270 LF
2 pages
Network Visibility: Network Packet Broker Comparison Table
No ratings yet
Network Visibility: Network Packet Broker Comparison Table
9 pages
7th SA1
No ratings yet
7th SA1
3 pages
9 RossWilliams ArtStyleSpecial AcousticTurn 129-143
No ratings yet
9 RossWilliams ArtStyleSpecial AcousticTurn 129-143
16 pages
Computer Architecture Assignment
No ratings yet
Computer Architecture Assignment
9 pages
LML4807 Examination
No ratings yet
LML4807 Examination
7 pages
R&S smw-k144 - k145
No ratings yet
R&S smw-k144 - k145
463 pages
Log4shell Help Overview
No ratings yet
Log4shell Help Overview
47 pages
Unit 6 - Compression and Serialization in Hadoop
No ratings yet
Unit 6 - Compression and Serialization in Hadoop
24 pages
07 Identification and Authentication Failures
No ratings yet
07 Identification and Authentication Failures
20 pages
IEEE Standard For Interconnection and Interoperability of Distributed Energy Resources With Associated Electric Power Systems Interfaces
No ratings yet
IEEE Standard For Interconnection and Interoperability of Distributed Energy Resources With Associated Electric Power Systems Interfaces
16 pages
Message ISC
No ratings yet
Message ISC
5 pages
Week 11 12 Soil Compaction 1
No ratings yet
Week 11 12 Soil Compaction 1
8 pages
Resume Shaikh Zavi
No ratings yet
Resume Shaikh Zavi
2 pages
Safety Protocols for Industrial Tasks
No ratings yet
Safety Protocols for Industrial Tasks
51 pages
Column Design with Balance Loading
No ratings yet
Column Design with Balance Loading
8 pages
MikroTik Network Configuration Guide
No ratings yet
MikroTik Network Configuration Guide
2 pages
Starter Solenoids: Delco-Remy Solenoid
No ratings yet
Starter Solenoids: Delco-Remy Solenoid
3 pages
Ruteo 793
No ratings yet
Ruteo 793
202 pages
Quotation for Electronic Tools and Equipment
No ratings yet
Quotation for Electronic Tools and Equipment
1 page
Jessica Xujia Wei - Resume (2025.02.07)
No ratings yet
Jessica Xujia Wei - Resume (2025.02.07)
1 page
Appfigures 2025 Mobile Game Trends Strategies C
No ratings yet
Appfigures 2025 Mobile Game Trends Strategies C
47 pages
Study and Evaluation of Internal Control - Chapter 6 (Auditing)
No ratings yet
Study and Evaluation of Internal Control - Chapter 6 (Auditing)
40 pages
Understanding MissingNo. Variants
No ratings yet
Understanding MissingNo. Variants
18 pages
Choi Et Al. (2023) Influence of Pedagogical Beliefs
No ratings yet
Choi Et Al. (2023) Influence of Pedagogical Beliefs
14 pages
McGill - MR-20-N
No ratings yet
McGill - MR-20-N
1 page

LectureNotes Hive Final

Uploaded by

LectureNotes Hive Final

Uploaded by

APACHE HIVE

Sunnie Chung CIS 612 Lecture Notes

Hive stores data in Hadoop Distributed File

Hive complied Hive Query Language statements

Sunnie Chung CIS 612 Lecture Notes

HQL has limited equality and join predicates,

Data in Hive is organized into Tables

Sunnie Chung CIS 612 Lecture Notes

Data : File or Group of Files in HDFS

Supports Primitive Column Types and Nestable

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

Supports Select, Project, Join, Aggregate,

Supports Union all and Sub-queries in the From

User Defined Aggregation Function (UDAF)

Users can embed custom map-reduce scripts written in

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

The Driver manages the

Figure 1: Hive Architecture

Sunnie Chung CIS 612 Lecture Notes

Hive doesn’t generate MapReduce programs, it uses generic

Data files to be processed are usually in HDFS, managed by

Hive uses Hive Query Language HQL, which is similar to

database : the namespace for tables

and their types, owners, storage and SerDe

and SerDe and storage information

internal query representation

rewrites it in several ways

Hive has no row-level insert, update or delete operations. The

Sunnie Chung CIS 612 Lecture Notes

Example: ’NASDAQ_daily_prices_B.csv’ a log file of stocks

hive> CREATE TABLE IF NOT EXISTS stocks (

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

Operation Command Syntax

See current tables Hive>Show TABLES

Sunnie Chung CIS 612 Lecture Notes

Change the table name Hive>Alter Table <table_Name>

Add a column Hive> Alter Table <table_Name> ADD

Drop a partition Hive>Alter Table <table_Name>

Sunnie Chung CIS 612 Lecture Notes

Hive Supports Sub-Queries but only in FROM

Output data produced by Hive is structured,

Sunnie Chung CIS 612 Lecture Notes

The result tables then can be manipulated using

hive> LOAD DATA LOCAL INPATH

Sunnie Chung CIS 612 Lecture Notes

hive> SELECT * FROM STOCKS WHERE price_open='2.92';

Sunnie Chung CIS 612 Lecture Notes

Time taken: 12.785 seconds

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

A log of transactions is exported from a RDBMS

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

Sunnie Chung CIS 612 Lecture Notes

[2] “Apache Hive”, http://hive.apache.org/hive

Sunnie Chung CIS 612 Lecture Notes

[4] “Apache ZooKeeper”, http://zookeeper.apache.org/zookeeper

[5] Jason Venner, "Pro Hadoop", Apress Books, 2009

[6] "Hadoop Wiki", http://wiki.apache.org/hadoop/

Sunnie Chung CIS 612 Lecture Notes

You might also like