Technical Principles of
HBase
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.
Objectives
Upon completion of this course, you will be able to know:
System architecture of HBase
Key features of HBase
Basic functions of HBase
Huawei enhanced features of HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to HBase
2. Functions and Architecture of HBase
3. Key Processes of HBase
4. Huawei Enhanced Features of HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
HBase Overview
HBase is a column-based distributed storage system that
features high reliability, performance, and scalability.
HBase is suitable for storing big table data (which contains billions of rows
and millions of columns) and allows real-time data access.
HBase uses HDFS as the file storage system to provide a distributed
column-oriented database system that allows real-time data reading and
writing.
HBase uses ZooKeeper as the collaboration service.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
HBase vs. RDB
HBase RDB
1. Distributed storage and
column-oriented. 1. Fixed data structure.
2. Dynamic extension of 2. Pre-defined data
columns. structure.
3. Supports common 3. I/O intensive and cost-
commercial hardware, consuming expansion.
lowering the expansion cost.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Application Scenarios of HBase
HBase applies to the following scenarios:
Massive data (TB and PB)
The Atomicity, Consistency, Isolation, Durability (ACID) feature supported
by traditional relational databases is not required.
High throughput
Efficient random reading of massive data
High scalability
Simultaneous processing of structured and unstructured data
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Position of HBase in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog
Data Information Knowledge Wisdom
DataFarm Porter Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
HIVE M/R Spark Storm Flink
Hadoop LibrA
YARN/ Zookeeper Security
management
HDFS/HBase
HBase is a column-based distributed storage system that features high
reliability, performance, and scalability. It stores massive data and is designed
to eliminate limitations of relational databases in the processing of mass data.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Data Stored By Row
ID Name Phone Address
Data is stored by row in an underlying file system. Generally, a fixed amount
of space is allocated to each row.
Advantages: Data can be added, modified, or read by row.
Disadvantages: Some unnecessary data is obtained when data in a column is
queried.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Data Stored by Column
ID Name Phone Address
Data is stored by column in an underlying file system.
Advantages: Data can be read or calculated by column.
Disadvantages: When a row is read, multiple I/O operations may be
required.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
KeyValue Storage Model (1)
ID Name Phone Address
Key-01 Value-ID01 Key-01 Value-Name01
Key-01 Value-Phone01 Key-01 Value-Address01
KeyValue has a specific structure. Key is used to quickly query a data record,
and Value is used to store user data.
As a basic user data storage unit, KeyValue must store some description of
itself, such as timestamp and type information. This requires some structured
space.
Data can be expanded dynamically, adaptive to changes of data types and
structures. Data is read and written by block. Different Columns are not
associated, so are tables.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
KeyValue Storage Model (2)
Partition mode of a KeyValue Database - based on continuous Key range.
Region_01 Region_02 Region_05 Region_06 Region_09 Region_10
Region_03 Region_04 Region_07 Region_08 Region_11 Region_12
Node1 Node2 Node3
Region_01 Region_05 Region_02 Region_06 Region_03 Region_07
Region_09 Region_04 Region_10 Region_12 Region_11 Region_08
Data subregions are created based on the RowKey range (sorting based on a sorting
algorithm such as the alphabetic order based on RowKeys). Each subregion is a basic
distributed storage unit.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
KeyValue Storage Model (3)
The underlying data of HBase exists in the form of KeyValue. KeyValue has a
specific format.
KeyValue contains key information such as timestamp and type, etc.
The same key can be associated with multiple Values. Each KeyValue has a
qualifier.
There can be multiple KeyValues associated with the same Key and Qualifier.
In this case, they are distinguished using timestamps. This is why there are
multiple versions of the same data record.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Contents
1. Introduction to HBase
2. Functions and Architecture of HBase
3. Key Processes of HBase
4. Huawei Enhanced Features of HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
HBase Architecture (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
HBase Architecture (2)
Store: A Region consists of one or
multiple Stores. Each store corresponds
to a Column Family.
MemStore: A Store contains one MemStore.
Data inserted to a Region by client is
cached to the MemStore.
StoreFile: The data flushed to the HDFS is stored as a StoreFile in the HDFS.
Hfile: HFile defines the storage format of StoreFiles in a file system. HFile is underlying
implementation of StoreFile.
Hlog: HLogs prevent data loss when a RegionServer is faulty. Multiple Regions in a
RegionServer share the same HLog.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
HMaster (1)
"Hey, Region A, please move to
RegionServer 1!"
“RegionServer 2 was gone! Let others take
over it’s Regions!"
RegionServer1 RegionServer2 RegionServer3
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
HMaster (2)
The HMaster process manages all the RegionServers.
Handles RegionServer failovers.
The HMaster process performs cluster operations including creating,
modifying, and deleting tables.
The HMaster process migrates Regions.
Allocates Regions when a new table is created.
Ensures load balancing during operation.
Takes over Regions after a RegionServer failover occurs.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
RegionServer
RegionServer is the data service
Region process of HBase and is responsible
RegionServer for processing reading and writing
requests of user data.
RegionServer manages Regions. All
Region
reading and writing requests of user
data are handled based on interaction
among Regions on RegionServers.
Region Regions can be migrated between
RegionServers.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Region (1)
A data table is divided horizontally into subtables based on the
KeyValue range to implement distributed storage. A subtable is called
a Region in HBase.
Each Region is associated with a KeyValue range, which is described
using a StartKey and an EndKey.
Each Region only needs to record a StartKey, because its EndKey serves as
the StartKey of the next Region.
Region is the most basic distributed storage unit of HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Region (2)
Row001
Row001
Row002 Region-1
Row002 StartKey, EndKey
………..
……….. Row010
Row010
Row011
Row011 Row012 Region-2
Row012 ……….. StartKey, EndKey
……….. Row020
Row020 Row021
Row021 Row022 Region-3
Row022 ……….. StartKey, EndKey
……….. Row030
Row030 Row031
Row031 ……….. Region-4
……….. ……….. StartKey, EndKey
………..
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Region (3)
META
Region
Region Region Region Region Region
Regions are categorized as Meta Region and User Region.
Meta Region records routing information of User Regions.
Perform the following steps to access data in a Region:
Search for the address of the Meta Region.
Search for the address of the User Regions in the Meta Region.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Column Family
Region Region Region Region
/HBase/table
/region-1/ColumnFamily-1
/region-1/ColumnFamily-2
/region-2/ColumnFamily-1
/region-2/ColumnFamily-2
/HBase/table
/region-1 /region-3/ColumnFamily-1
/region-2 /region-3/ColumnFamily-2
/region-3
HDFS
A ColumnFamily is a physical storage unit of a Region. Multiple column families of the
same Region have different paths in HDFS.
ColumnFamily information is table-level configuration information. That is, multiple
Regions of the same table have the same column family information. (For example,
each Region has two column families and the configuration information of the same
column family of different Regions is the same.)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
ZooKeeper
ZooKeeper provides the following functions for HBase:
Distributed lock service
Multiple HMaster processes will try registering a node in ZooKeeper and the node can be
registered only by one HMaster process. The process that successfully registers the node
becomes the active HMaster process.
Event listening mechanism
The active Hmaster’s record is deleted after the active process fails and the standby
processes will receive an update message which indicates the Active HMaster is down.
Micro database roles
ZooKeeper stores the addresses of RegionServers. In this case, ZooKeeper can be regarded
as a micro database.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
MetaData Table
User Table 1
The MetaData Table HBase:Meta
stores the information about
Regions to locate the Specific
Region for Client.
The MetaData Table is splitted User Table N
into multiple Regions,and
metadata information of Region is
stored in ZooKeeper.
Mapping relation
Metadata Table
User table
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Contents
1. Introduction to HBase
2. Functions and Architecture of HBase
3. Key Processes of HBase
4. Huawei Enhanced Features of HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Writing Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Client Initiating a Data Writing Request
Client
The process of initiating a writing request by a client is like sending
books to a library by a book supplier. The book supplier must
determine to which building and floor the books should be sent.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Writing Process - Locating a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Writing Process - Grouping Data (1)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Writing Process - Grouping Data (2)
Data groups includes two
division steps:
Find the information of
region and regionserver
of tables based on the
meta table
Transfer data to specific region according to rwokey
Data on each RegionServer is sent at the same time. In this case, the
data has been divided by Region.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Writing Process - Sending a Request to
a RegionServer
Data is sent using the encapsulated RPC
framework of HBase.
Operations of sending requests to multiple
RegionServers are implemented concurrently.
After sending a data writing request, a client
waits for the request processing result.
If the client does not capture any exception, it
deems that all data has been written successfully.
If writing the data fails completely or partially,
the client can obtain a detailed KeyValue list
relevant to the failure.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Writing Process - Process of Writing
Data to a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Writing Process - Flush
MemStore-1
(ColumnFamily-1)
HFile
Region
MemStore-2 HFile
(ColumnFamily-2)
In either of the following scenarios, a Flush operation of Memstore is
triggered:
The total usage of MemStore of a Region reaches the predefined Flush Size
threshold.
The ratio of occupied memory to total memory of RegionServer reaches the
threshold.
The number of WALs reaches the threshold.
Memstore is updated every 1 hour by default.Hbase
Users can flush a table or Region separately by a shell command.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Impacts of Multiple HFiles
As time passes by, the number of HFiles increases and a query request
will take much more time.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Compaction (1)
Compaction aims to reduce the number of small files in a column family in a
Region, thereby increasing reading performance.
There are two kinds of compaction: major and minor.
Minor: compaction covering a small range. Minimum and maximum numbers of
files are specified. Small files at a consecutive time duration are combined.
Major: compaction covering the HFiles in a column family in a Region. During
major compaction, deleted data is cleared.
Files are selected based on a certain algorithm during minor compaction.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Compaction (2)
Write
put MemStore
Flush
HFile HFile HFile HFile HFile HFile HFile
Minor Compaction
HFile HFile HFile
Major Compaction
HFile
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Region Split
A common Region splitting operation is
performed to split a Region into two subregions
if the data size of the Region exceeds the Parent
predefined threshold. Region
During splitting, the split Region suspends
the reading and writing services. During
splitting, data files of the parent Region are
not split and rewritten to the two subregions.
Reference files are created in the new Region
to achieve quick splitting. Therefore, services
of the Region are suspended only for a short
time. DaughterRegion-2
Routing information of the parent Region
DaughterRegion-1
cached in clients must be updated.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
Reading Process
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Client Initiating a Data Reading Request
Get When a precise key is provided, the
Get operation is performed to read a
single row of user data.
Scan The Scan operation is to batch scan
user data of a specified Key range.
Client
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Locating a Region
Hi, META, I want to look for books whose code ranges is
from xxx to xxx, please find the bookshelf number and the
floor information within the code range.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
OpenScanner
ColumnFamily-1
MemStore
HFile-11
HFile-12
Region
ColumnFamily-2
MemStore
HFile-21
HFile-22
During the OpenScanner process, scanners corresponding to
MemStore and each HFile are created:
The scanner corresponding to HFile is StoreFileScanner.
The scanner corresponding to MemStore is MemStoreScanner.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Filter
Filter allows users to set filtering criteria during the Scan
Satisfied Row
operation. Only user data that meets the criteria returns.
There are some typical Filter types:
Satisfied Row
RowFilter
SingleColumnValueFilter
KeyOnlyFilter
FilterList
Satisfied Row
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 42
BloomFilter
BloomFilter is used to optimize scenarios where data is randomly read, that is,
scenarios where the Get operation is performed. It can be used to quickly
check whether a piece of user data exists in a large dataset (most data in the
dataset cannot be loaded to the memory).
A certain error rate exists when BloomFilter checks whether a piece of data
exits. Nevertheless, the conclusion indicated by the message "User data XXXX
does not exist" is accurate.
The data relevant to BloomFilter of HBase is stored in HFiles.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 43
Contents
1. Introduction to HBase
2. Functions and Architecture of HBase
3. Key Processes of HBase
4. Huawei Enhanced Features of HBase
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Supporting Secondary Index
The secondary index enables HBase to query data based on specific column
values.
Column Family A Column Family B
RowKey A:Name A:Addr. A:Age B:Mobile B:Email
01 ZhangSan Beijing 23 6875349 ……
02 LiLei Hangzhou 43 6831475 ……
03 WangWu Shenzhen 35 6809568 ……
04 …… Wuhan 28 6812645 ……
05 …… Changsha 26 6889763 ……
06 …… Jinan 35 6854912 ……
When the secondary index is not used, the mobile field needs to be matched in the entire table by row
to search for specified mobile numbers such as ‘68XXX’ which results in long time delay.
When the secondary index is used, the index table is searched first to identify the location of the
mobile number, which narrows down the search scope and reduces the time delay.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 45
HFS
HBase FileStream (HFS) is a separate module of Hbase. As an
encapsulation of Hbase and HDFS interfaces, HFS provides
capabilities, such as storing, reading and deleting files for
upper-level applications.
HFS provides the ability of storing massive small files and large
files in HDFS。 That is, massive small files (less than 10MB) and
some large files (larger than 10MB) can be stored in HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 46
HBase MOB (1)
MOB Data(100KB to 10MB)is directly stored in the file
system (HDFS for example)as HFile. And the information about
address and size of file is stored in HBase as a value. With tools
managing these files, the frequency of compation and split
can be highly reduced, and performance can be improved.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 47
HBase MOB (2)
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 48
Summary
This module describes the following information about HBase:
KeyValue Storage Model, technical architecture, reading and
writing process and enhanced features of FusionInsight HBase.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Quiz
1. Can the services of the Region in HBase be provided when splitting?
2. What are the advantages of the Region splitting of HBase?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 50
Quiz
1. What is Compaction used for? ( )
A. Reducing the number of files in a column family and Region
B. Improving data reading performance
C. Reducing the number of files in a column family
D. Reducing the number of files in a Region
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 51
Quiz
1. What is the physical storage unit of HBase? ( )
A. Region
B. Column Family
C. Column
D. Cell
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 52
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 53
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 54