0% found this document useful (0 votes)

65 views10 pages

Unit V-Hive

hive

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views10 pages

Unit V-Hive

hive

Uploaded by

Smitha Rajesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Apache HIVE

 Hive is a data warehouse infrastructure tool to process structured data in Hadoop.

 Open source.
 For querying and analyzing large datasets stored in Hadoop files.
 Hive use language called HiveQL (HQL), which is similar to SQL.
 HiveQL automatically translates SQL-like queries into MapReduce jobs.
 Hive organizes data into tables.

Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates

Features of Hive
 It stores schema in a database and processed data into HDFS.
 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.
Hive Shell
 The shell is the primary way with which we interact with the Hive.
 We can issue our commands or queries in HiveQL inside the Hive shell.
 Hive Shell is almost similar to MySQL Shell.
 It is the command line interface for Hive.
 In Hive Shell users can run HQL queries.
 We can run the Hive Shell in two modes which are:
 Non-Interactive mode
◦ With -f option we can specify the location of a file which contains HQL queries.
◦ For example- hive -f my-script.q
 Interactive mode
◦ We directly need to go to the hive shell and run the queries there.
◦ In hive shell, we can submit required queries manually and get the result.
Hive Services

Hive provides various services like the Hive server2, Beeline, etc. To perform queries

1. Beeline
The Beeline is a command shell supported by HiveServer2, where the user can submit its queries
and command to the system. It is a JDBC client that is based on SQLLINE CLI (pure Java-console
based utility for connecting with relational database and executing SQL queries).

2. Hive Server 2
HiveServer2 is the successor of HiveServer1. HiveServer2 enables clients to execute queries against
the Hive. It allows multiple clients to submit requests to Hive and retrieve the final results. It is
basically designed to provide the best support for open API clients like JDBC and ODBC.

Hive Driver
The Hive driver receives the HiveQL statements submitted by the user through the command shell.
It creates the session handles for the query and sends the query to the compiler.

4. Hive Compiler
Hive compiler parses the query. It performs semantic analysis and type-checking on the different
query blocks and query expressions by using the metadata stored in metastore and generates an
execution plan.
The execution plan created by the compiler is the DAG(Directed Acyclic Graph), where each
stage is a map/reduce job, operation on HDFS, a metadata operation.

5. Optimizer
Optimizer performs the transformation operations on the execution plan and splits the task to
improve efficiency and scalability.

6. Execution Engine
Execution engine, after the compilation and optimization steps, executes the execution plan created
by the compiler in order of their dependencies using Hadoop.

7. Metastore
Metastore is a central repository that stores the metadata information about the structure of tables
and partitions, including column and column type information.
It also stores information of serializer and deserializer, required for the read/write operation, and
HDFS files where data is stored. This metastore is generally a relational database.
Metastore provides a Thrift interface for querying and manipulating Hive metadata.
We can configure metastore in any of the two modes:
 Remote: In remote mode, metastore is a Thrift service and is useful for non-Java
applications.
 Embedded: In embedded mode, the client can directly interact with the metastore using
JDBC.

8. HCatalog
HCatalog is the table and storage management layer for Hadoop. It enables users with different data
processing tools such as Pig, MapReduce, etc. to easily read and write data on the grid.
It is built on the top of Hive metastore and exposes the tabular data of Hive metastore to other data
processing tools.
Read the HCatalog article to explore the meaning and need for HCatalog in detail.

9. WebHCat
WebHCat is the REST API for HCatalog. It is an HTTP interface to perform Hive metadata
operations. It provides a service to the user for running Hadoop MapReduce (or YARN), Pig, Hive
jobs.
Architecture of Hive

The following diagram depicts Hive Architecture

Unit Name Operation

Hive is a data warehouse infrastructure software that can create interaction between
User Interface user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive
command line, and Hive HD Insight (In Windows server).
Hive chooses respective database servers to store the schema or Metadata of tables,
Meta Store
databases, columns in a table, their data types, and HDFS mapping.
HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of
HiveQL
the replacements of traditional approach for MapReduce program. Instead of
Process
writing MapReduce program in Java, we can write a query for MapReduce job and
Engine
process it.
Execution The conjunction part of HiveQL process Engine and MapReduce is Hive Execution
Engine. Execution engine processes the query and generates results as same as
Engine
MapReduce results. It uses the flavor of MapReduce.
HDFS or Hadoop distributed file system or HBASE are the data storage techniques to store
HBASE data into file system.

Hive Vs RDBMS

RDBMS Hive
It is used to maintain database. It is used to maintain data warehouse.
It uses SQL (Structured Query
It uses HQL (Hive Query Language).
Language).
Schema is fixed in RDBMS. Schema varies in it.
Normalized and de-normalized both type of data is
Normalized data is stored.
stored.
Tables in rdms are sparse. Table in hive are dense.
It doesn’t support partitioning. It supports automation partition.
No partition method is used. Sharding method is used for partition.

 Schema on READ only

◦ Functions like the update, modifications, etc. don't work with this.
◦ Because the Hive query in a typical cluster runs on multiple Data Nodes.
◦ In versions below 0.13 it is not possible to update and modify data across multiple
nodes.
 READ Many WRITE Once
◦ In latest versions updations are possible after insertion

Hive Data Model

Data in Apache Hive can be categorized into:

 Table
 same as the tables present in a Relational Database.
 The associated metadata describes the layout of the data in the table.
 Hive stores the metadata in a relational database and not in HDFS. Hive has two
types of tables which are as follows:
 Managed Table
 When we load data into a Managed table, Hive moves data into Hive
warehouse directory.
 External Table
Partition
 Apache Hive organizes tables into partitions for grouping same type of data together
based on a column or partition key.
 Each table in the hive can have one or more partition keys to identify a particular
partition.
 Using partition we can also make it faster to do queries on slices of the data.
 Bucket
 Tables or partition are subdivided into buckets based on the hash function of a column in the
table.
 Buckets give extra structure to the data that may be used for more efficient queries.

Hive Data Types

1. Arrays
 An ordered sequence of similar type elements that are indexable using the zero-based
integers.
 Arrays in Hive are similar to the arrays in JAVA.
array<datatype>
Example: array(‘Apple’,’Orange’). The second element is accessed as array[1].

2. maps
 Map in Hive is a collection of key-value pairs.
 Where the fields are accessed using array notations of keys (e.g., [‘key’]).
map<primitive_type, data_type>
Example: ‘first’ -> ‘John’, ‘last’ -> ‘Deo’, represented as map(‘first’, ‘John’, ‘last’, ‘Deo’).
Now ‘John’ can be accessed with map[‘first’].

3. structs
 Similar to the STRUCT in C language.
 It is a record type that encapsulates a set of named fields, which can be any primitive data
type.
 We can access the elements in STRUCT type using DOT (.) notation.
STRUCT <col_name : data_type [ COMMENT col_comment], ...>
Example: For a column c3 of type STRUCT {c1 INTEGER; c2 INTEGER}, the c1 field is
accessed by the expression c3.c1.

4. union
 Similar to the UNION in C.
 UNION types at any point of time can hold exactly one data type from its specified data
types.
 The full support for UNIONTYPE data type in Hive is still incomplete.
UNIONTYPE<data_type, data_type, ...>

 In Hive data types, the missing values are represented by the special value
NULL.

HiveQL

 Query language for Hive to process and analyze structured data in a Metastore.

SELECT

 SELECT statement is used to retrieve the data from a table.

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number];

 hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+------+--------------+-------------+-------------------+-----

SELECT ORDER BY

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number

 hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

SELECT GROUP BY

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]]
[LIMIT number];

 hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

JOINS

 JOIN clause is used to combine and retrieve the records from multiple tables.
 JOIN is same as OUTER JOIN in SQL.
 There are different types of joins given as follows:
 JOIN
 LEFT OUTER JOIN
 RIGHT OUTER JOIN
 FULL OUTER JOIN
Consider the following table named CUSTOMERS..
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |

Consider another table ORDERS as follows:

+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |

hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT

FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the following response:
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+

 LEFT OUTER JOIN

◦ The HiveQL LEFT OUTER JOIN returns all the rows from the left table, even if there
are no matches in the right table.
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
LEFT OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the following response:
+----+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+----+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+----+----------+--------+---------------------+

 RIGHT OUTER JOIN

◦ The HiveQL RIGHT OUTER JOIN returns all the rows from the right table, even if
there are no matches in the left table.

hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE FROM CUSTOMERS c RIGHT OUTER
JOIN ORDERS o ON (c.ID = o.CUSTOMER_ID);
On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+

 FULL OUTER JOIN

◦ The HiveQL FULL OUTER JOIN combines the records of both the left and the right
outer tables that fulfil the JOIN condition.
hive> SELECT c.ID, c.NAME, o.AMOUNT, o.DATE
FROM CUSTOMERS c
FULL OUTER JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

On successful execution of the query, you get to see the following response:
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+

VIEWS
 Same as view in SQL.

 CREATE VIEW
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT
column_comment], ...) ]
[COMMENT table_comment]
AS SELECT ...

 Example

hive> CREATE VIEW emp_30000 AS

SELECT * FROM employee
WHERE salary>30000;

 DROP VIEW

DROP VIEW view_name

◦ The following query drops a view named as emp_30000:

hive> DROP VIEW emp_30000;

MAP JOIN

 Auto Map Join, or Map Side Join, or Broadcast Join.

 In normal join there are too much activities spending on shuffling data around.

 This slows down the Hive Queries.

Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
HIVE
No ratings yet
HIVE
18 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Hive
No ratings yet
Hive
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Apache Hive for Data Analysts
No ratings yet
Apache Hive for Data Analysts
8 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Understanding Hive Map Types
No ratings yet
Understanding Hive Map Types
49 pages
BDA Unit-5
No ratings yet
BDA Unit-5
26 pages
Understanding Apache Hive Architecture
No ratings yet
Understanding Apache Hive Architecture
5 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Apache Hive for Big Data Processing
No ratings yet
Apache Hive for Big Data Processing
19 pages
Introduction to Hive Architecture
No ratings yet
Introduction to Hive Architecture
23 pages
Introduction to Hive Data Warehousing
No ratings yet
Introduction to Hive Data Warehousing
4 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Understanding Hive in Hadoop
No ratings yet
Understanding Hive in Hadoop
17 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Hive Final
No ratings yet
Hive Final
75 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
Introduction to Apache Hive Overview
No ratings yet
Introduction to Apache Hive Overview
8 pages
Apache Hive Overview & Architecture
No ratings yet
Apache Hive Overview & Architecture
27 pages
Big Data & Analytics (CSE6005) L6
No ratings yet
Big Data & Analytics (CSE6005) L6
56 pages
Unit V
No ratings yet
Unit V
23 pages
HIVE
No ratings yet
HIVE
16 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Unit-IV - BDA
No ratings yet
Unit-IV - BDA
42 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Apache Hive: Data Warehousing on Hadoop
No ratings yet
Apache Hive: Data Warehousing on Hadoop
28 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
182 pages
Bda Report
No ratings yet
Bda Report
16 pages
Big-Data-Unit 5
No ratings yet
Big-Data-Unit 5
54 pages
DSS U4 HIVE Rev1.1
No ratings yet
DSS U4 HIVE Rev1.1
23 pages
Hive
No ratings yet
Hive
52 pages
Unit IV
No ratings yet
Unit IV
22 pages
Hive for Data Analysts
No ratings yet
Hive for Data Analysts
16 pages
Hive: Big Data Analytics Overview
No ratings yet
Hive: Big Data Analytics Overview
59 pages
Big Data Analytics Module-4
No ratings yet
Big Data Analytics Module-4
39 pages
Hive Data Warehousing Overview
No ratings yet
Hive Data Warehousing Overview
9 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
LectureNotes Hive Final
No ratings yet
LectureNotes Hive Final
36 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Chapter - 4 - Data Access - Hive
No ratings yet
Chapter - 4 - Data Access - Hive
35 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Big Data
No ratings yet
Big Data
223 pages
STATISTICAL CONCEPTS-module1
No ratings yet
STATISTICAL CONCEPTS-module1
9 pages
Applications of Apache Pig in Big Data
No ratings yet
Applications of Apache Pig in Big Data
10 pages
Unit V-HBase
No ratings yet
Unit V-HBase
10 pages
Counting Distinct Elements in Streams
No ratings yet
Counting Distinct Elements in Streams
19 pages
Analog Modulation Explained
No ratings yet
Analog Modulation Explained
15 pages
Dual Access Control in Cloud Storage
No ratings yet
Dual Access Control in Cloud Storage
56 pages
Genetic Algorithms in AI Systems
No ratings yet
Genetic Algorithms in AI Systems
19 pages
Transition Dissertation Exemple
100% (2)
Transition Dissertation Exemple
8 pages
Data Science On The Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines, 2nd Edition Valliappa Lakshmanan PDF Available
No ratings yet
Data Science On The Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines, 2nd Edition Valliappa Lakshmanan PDF Available
101 pages
Machine Learning and Deep Learning in Medical Data Analytics and Healthcare Applications 1st Edition Unlimited Download
No ratings yet
Machine Learning and Deep Learning in Medical Data Analytics and Healthcare Applications 1st Edition Unlimited Download
16 pages
BCS602 ML Extra Important Questions
No ratings yet
BCS602 ML Extra Important Questions
2 pages
Risk-Reward Trade-Offs in Rank Fusion - Rodger Benham
No ratings yet
Risk-Reward Trade-Offs in Rank Fusion - Rodger Benham
8 pages
Intro To AI
No ratings yet
Intro To AI
31 pages
694685322-INFORMATICS-PRACTICES-PROJECT-221228-132356-1 (1) (1)
No ratings yet
694685322-INFORMATICS-PRACTICES-PROJECT-221228-132356-1 (1) (1)
30 pages
NoSQL Technologies Notes Unit 1
100% (1)
NoSQL Technologies Notes Unit 1
20 pages
Sams
No ratings yet
Sams
10 pages
Multimedia Information Retrieval Systems
No ratings yet
Multimedia Information Retrieval Systems
18 pages
Exam Form Be-Vii Aiml
No ratings yet
Exam Form Be-Vii Aiml
4 pages
OOAD QUESTION BANK - May 2024
100% (1)
OOAD QUESTION BANK - May 2024
1 page
What Is NLP
No ratings yet
What Is NLP
3 pages
Unit - I Database Mangement Systems
No ratings yet
Unit - I Database Mangement Systems
12 pages
Artificial Intelligence in Manufacturing - State of The Art, Perspectives, and Future Directions
No ratings yet
Artificial Intelligence in Manufacturing - State of The Art, Perspectives, and Future Directions
27 pages
Database Advice Guide
No ratings yet
Database Advice Guide
19 pages
CS3492 Database Management Systems Syllabus
No ratings yet
CS3492 Database Management Systems Syllabus
2 pages
StoreSim Optimizing Information Leakage in Multi-Cloud Storage Services
No ratings yet
StoreSim Optimizing Information Leakage in Multi-Cloud Storage Services
6 pages
ENGLISH 5 Q2 WEEK 6 Gathering Relevant Information From Various Sources, Online References by Sir Rei Marasigan
No ratings yet
ENGLISH 5 Q2 WEEK 6 Gathering Relevant Information From Various Sources, Online References by Sir Rei Marasigan
14 pages
AkhileshKumarPandey (4 0)
No ratings yet
AkhileshKumarPandey (4 0)
5 pages
Ai-Module 1
No ratings yet
Ai-Module 1
14 pages
UNIT I DBDM QB
No ratings yet
UNIT I DBDM QB
6 pages
File Management Basics
100% (1)
File Management Basics
14 pages
B.tech CS Data Structure Syllabus
No ratings yet
B.tech CS Data Structure Syllabus
1 page
Data Catalogs for Apache Iceberg Explained
No ratings yet
Data Catalogs for Apache Iceberg Explained
5 pages
Question Bank BD
No ratings yet
Question Bank BD
3 pages
Year 9 ICT Mid Term Exam Paper
No ratings yet
Year 9 ICT Mid Term Exam Paper
9 pages
Oracle DBA
No ratings yet
Oracle DBA
6 pages

Unit V-Hive

Uploaded by

Unit V-Hive

Uploaded by

Apache HIVE

 Hive is a data warehouse infrastructure tool to process structured data in Hadoop.

The following diagram depicts Hive Architecture

Unit Name Operation

 Schema on READ only

Hive Data Model

Data in Apache Hive can be categorized into:

Hive Data Types

 SELECT statement is used to retrieve the data from a table.

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

 hive> SELECT * FROM employee WHERE salary>30000;

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

 hive> SELECT Id, Name, Dept FROM employee ORDER BY DEPT;

SELECT [ALL | DISTINCT] select_expr, select_expr, ...

 hive> SELECT Dept,count(*) FROM employee GROUP BY DEPT;

Consider another table ORDERS as follows:

hive> SELECT c.ID, c.NAME, c.AGE, o.AMOUNT

 LEFT OUTER JOIN

 RIGHT OUTER JOIN

 FULL OUTER JOIN

hive> CREATE VIEW emp_30000 AS

DROP VIEW view_name

◦ The following query drops a view named as emp_30000:

hive> DROP VIEW emp_30000;

 Auto Map Join, or Map Side Join, or Broadcast Join.

 This slows down the Hive Queries.

You might also like