Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts

The document discusses big data storage concepts, focusing on storage and data models that form the foundation of big data ecosystems. It covers three main storage models: block-based, file-based, and object-based storage, detailing their architectures, functionalities, and applications in distributed environments. The document emphasizes the importance of understanding these models for effective data processing and management in big data frameworks.

Uploaded by

Dina Bardakji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views19 pages

Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts

Uploaded by

Dina Bardakji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Big Data Storage Concepts

Lecture 5: Chapter 5 Part 1

Big Data Storage and Data Models
• Data and storage models are the basis for big data ecosystems
• While storage model captures the physical aspects and features
for data storage, data model captures the logical representation
and structures for data processing and management
• Understanding storage and data model together is essential for
understanding big data ecosystems
• In this chapter we are going to investigate and compare the key
storage and data models in the spectrum of big data frameworks
Big Data Storage Models
• A storage model is the core of any big-data related
systems
• It affects the scalability, data-structures, programming
and computational models for the systems that are built
on top of any big data-related systems
Big Data Main Storage Models
• Block-based storage
• File-based Storage
• Object-based Storage
Block-based storage
• Data is stored as blocks which normally have a fixed size yet with no
additional information (metadata)
• Each block has a unique identifier, stored in a data lookup table
• Block based storage focus on performance and scalability to store and
access very large-scale data
• When data needs to be retrieved, the data lookup table is used to find
the required blocks, which are then reassembled into their original
form
• Block-based storage is usually used as a low-level storage paradigm
which are widely used for higher level storage systems such as File-
based systems, Object-based systems and Transactional Databases
Block-based storage Architecture
A simple model of block-based storage can
be seen in this Figure 1
• Basically, data are stored as blocks which
normally have a fixed size yet with no
additional information (metadata)
• A unique identifier is used to access each
block
• The identifier is mapped to the exact
location of actual data blocks through
access interfaces
• Traditionally, block-based storage is bound
to physical storage protocols, such as Figure 1: Block-based storage model
SCSI, iSCSI, ATA and SATA
Block-based storage Architecture Cont.
• With the development of distributed computing
and big data, block-based storage model are
also developed to support distributed and
cloud-based environments
• As shown in Figure 2, the architecture of a
distributed block-storage system is composed
of the block server and a group of block nodes
• The block server is responsible for maintaining
the mapping or indexing from block IDs to the
actual data blocks in the block nodes
• The block nodes are responsible for storing the Figure 2: Architecture of distributed
actual data into fixed-size partitions, each of Block-based storage
which is considered as a block.
File-Based Storage
• File-based storage inherits from the traditional file system
architecture, considers data as files that are maintained
in a hierarchical structure
• It is the most common storage model and is relatively
easy to implement and use
• In big data scenario, a file-based storage system could be
built on some other low-level abstraction to improve its
performance and scalability
File-Based Storage Architecture
• The file-based storage
paradigm is shown in Figure 3
• File paths are organized in a
hierarchy and are used as the
entries for accessing data in
the physical storage

Figure 3: File-based storage model

File-Based Storage Architecture Cont.
• For a big data scenario, Distributed File
Systems (DFS) are commonly used as
basic storage Systems
• Figure 4 shows a typical architecture of a
distributed file system which normally
contains one or several name nodes and a
bunch of data nodes
• The name node is responsible for
maintaining the file entries hierarchy for
the entire system while the data nodes are Figure 4: Architecture of distributed file systems
responsible for the persistence of file data
File-Based Storage Architecture Cont.

• For a distributed infrastructure,

replication is very important for
providing fault tolerance in file-
based systems
• Normally, every file has multiple
copies stored on the underlying
storage nodes. And if one of the
copies is lost or failed, the name
node can automatically find the
next available copy to make the
failure transparent for users Figure 5: Architecture of Hadoop distributed file systems
HDFS: Hadoop Distributed File System
• As shown in Figure 5, the architecture of HDFS consists of a name
node and a set of data nodes
• Name node manages the file system namespace, regulates the
access to files and also executes some file system operations
such as renaming, closing, etc.
• Data node performs read-write operations on the actual data
stored in each node and also performs operations such as block
creation, deletion, and replication according to the instructions of
the name node
HDFS: Hadoop Distributed File System Cont.
• Data in HDFS is seen as files and automatically partitioned and
replicated within the cluster
• The capacity of storage for HDFS grows almost linearly by adding
new data nodes into the cluster
• HDFS also provides an automated balancer to improve the
utilization of cluster storage
• In addition, recent versions of HDFS have introduced a backup
node to solve the problem caused by single-node failure of the
primary name node
HDFS: Hadoop Distributed File System Cont.
• HDFS is an open-source distributed file system written in Java
• HDFS is the open-source implementation of Google File System (GFS)
• HDFS is the core storage for Hadoop ecosystems and the majority of
the existing big data platforms
• HDFS inherits the design principles from GFS to provide highly scalable
and reliable data storage across a large set of commodity server nodes
• HDFS has demonstrated production scalability of up to 200 PB of
storage and a single cluster of 4500 servers, supporting close to a
billion files and blocks
HDFS: Hadoop Distributed File System Cont.
HDFS is designed to serve the following goals:
• Fault detection and recovery: since HDFS includes a large number
of commodity hardware, failure of components is expected to be
frequent. Therefore, HDFS have mechanisms for quick and
automatic fault detection and recovery
• Huge datasets: HDFS should have hundreds of nodes per cluster
to manage the applications having huge datasets
• Hardware at data: a requested task can be done efficiently, when
the computation takes place near the data. Especially where huge
datasets are involved, it reduces the network traffic and increases
the throughput
Object-Based Storage
• In the object-based storage model, data
is managed as objects. As shown in
Figure 6, every object includes the data
itself, some meta-data, attributes and a
globally unique object identifier (OID)
• Object-based storage model abstracts
the lower layers of storage away from
the administrators and applications

Figure 6: Object-based storage model

Object-Based Storage Architecture
The typical architecture of an object-based storage system is shown in Figure 7

Figure 7: Architecture of object-based storage

Object-Based Storage Architecture Cont.
• The object-based storage system normally uses a flat namespace,
in which the identifier of data and their locations are usually
maintained as key-value pairs in the object server
• The object server provides location-independent addressing and
constant lookup latency for reading every object
• Meta-data of the data is separated from data and is also
maintained as objects in a meta-data server
• As a result, it provides a standard and easier way of processing, analyzing
and manipulating of the meta-data without affecting the data itself
Object-Based Storage Architecture Cont.
• Due to the flat architecture, it is very easy to scale out object-
based storage systems by adding additional storage nodes to the
system
• Besides, the added storage can be automatically expanded as
capacity that is available for all users
• Drawing on the object container and meta-data maintained, it is
also able to provide much more flexible and fine-grained data
policies at different levels

HDFS & MapReduce Explained
No ratings yet
HDFS & MapReduce Explained
16 pages
Object Storage
No ratings yet
Object Storage
10 pages
Testbank Final
No ratings yet
Testbank Final
14 pages
Alternatives To HIVE SQL in Hadoop File Structure
No ratings yet
Alternatives To HIVE SQL in Hadoop File Structure
5 pages
CS621 Week 15
No ratings yet
CS621 Week 15
64 pages
Chapter 10
No ratings yet
Chapter 10
25 pages
Wa0001.
No ratings yet
Wa0001.
56 pages
Unit Ii LM
No ratings yet
Unit Ii LM
18 pages
Big Data
No ratings yet
Big Data
51 pages
Unit 5
No ratings yet
Unit 5
29 pages
Unit-4 BDA As On 25-11-2024
No ratings yet
Unit-4 BDA As On 25-11-2024
258 pages
Overview of Hadoop Distributed File System
No ratings yet
Overview of Hadoop Distributed File System
12 pages
Storage For Containers Whitepaper
No ratings yet
Storage For Containers Whitepaper
11 pages
HDFS: Architecture and Benefits
No ratings yet
HDFS: Architecture and Benefits
6 pages
Unit - 5 Learning Notes
No ratings yet
Unit - 5 Learning Notes
8 pages
Ccomputing Madurya
No ratings yet
Ccomputing Madurya
20 pages
BDA Exp 1
No ratings yet
BDA Exp 1
7 pages
HDFS Fault Tolerance Mechanisms
No ratings yet
HDFS Fault Tolerance Mechanisms
9 pages
Understanding HDFS and Distributed Filesystems
No ratings yet
Understanding HDFS and Distributed Filesystems
21 pages
RTK Notes m1
No ratings yet
RTK Notes m1
16 pages
BiG DaTa
100% (1)
BiG DaTa
9 pages
HCIA Big Data
No ratings yet
HCIA Big Data
20 pages
Object Storage for IT Professionals
100% (1)
Object Storage for IT Professionals
45 pages
Cloud Data Strategies Overview
No ratings yet
Cloud Data Strategies Overview
13 pages
BDA Unit 2
No ratings yet
BDA Unit 2
29 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
Module 3 Session 3 HDFS
No ratings yet
Module 3 Session 3 HDFS
3 pages
Hadoop for Fault-Tolerant Storage
No ratings yet
Hadoop for Fault-Tolerant Storage
9 pages
DW - Bigdata9
No ratings yet
DW - Bigdata9
113 pages
Big Data Lecture Presentation
No ratings yet
Big Data Lecture Presentation
28 pages
Chap4 BigDataStorageAndManagement
No ratings yet
Chap4 BigDataStorageAndManagement
46 pages
Understanding Hadoop and Big Data
No ratings yet
Understanding Hadoop and Big Data
12 pages
Unit 3 Da
No ratings yet
Unit 3 Da
43 pages
UNIT 5 Storage Systems
No ratings yet
UNIT 5 Storage Systems
9 pages
BDA Module-02 Search Creators
No ratings yet
BDA Module-02 Search Creators
33 pages
Unit-Iv CC&BD CS71
No ratings yet
Unit-Iv CC&BD CS71
148 pages
Hadoop Basics for Engineering Students
No ratings yet
Hadoop Basics for Engineering Students
18 pages
Understanding Apache Hadoop Ecosystem
No ratings yet
Understanding Apache Hadoop Ecosystem
48 pages
Lec 5 - Big Data Storage Technologies I - Hadoop
No ratings yet
Lec 5 - Big Data Storage Technologies I - Hadoop
44 pages
Big Data 3rd Module
No ratings yet
Big Data 3rd Module
22 pages
A Novel Architecture To Efficient Utilization of Hadoop Distributed File Systems For Small Files
No ratings yet
A Novel Architecture To Efficient Utilization of Hadoop Distributed File Systems For Small Files
8 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
25 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Unit 3 Full
No ratings yet
Unit 3 Full
89 pages
Defining HDFS in Big Data
No ratings yet
Defining HDFS in Big Data
7 pages
CH 05
No ratings yet
CH 05
20 pages
Efficient Ways To Improve The Performance of HDFS For Small Files
No ratings yet
Efficient Ways To Improve The Performance of HDFS For Small Files
5 pages
GFS vs HDFS: Key Features Explained
No ratings yet
GFS vs HDFS: Key Features Explained
60 pages
The Hadoop Approach
100% (2)
The Hadoop Approach
14 pages
Big Data
No ratings yet
Big Data
51 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE - Mini Xerox - Easy Read
16 pages
CS19741-Cloud Computing-Unit 3 Notes
No ratings yet
CS19741-Cloud Computing-Unit 3 Notes
37 pages
1 - HADOOP Crash Course
No ratings yet
1 - HADOOP Crash Course
52 pages
Bigdata 15cs82 Vtu Module 1 2 Notes
57% (14)
Bigdata 15cs82 Vtu Module 1 2 Notes
49 pages
HDFS Basics for Big Data Analytics
No ratings yet
HDFS Basics for Big Data Analytics
49 pages
Bigdata Lecture 2
No ratings yet
Bigdata Lecture 2
17 pages
5.apache Hadoop Updated
No ratings yet
5.apache Hadoop Updated
57 pages
Bda Unit 2
No ratings yet
Bda Unit 2
79 pages
CH 1
No ratings yet
CH 1
25 pages
Second Exam Summary
No ratings yet
Second Exam Summary
44 pages
SoftSkills1 BTech 1year Question Bank
No ratings yet
SoftSkills1 BTech 1year Question Bank
51 pages
Mid Full Summary
No ratings yet
Mid Full Summary
44 pages
Apologia Final
No ratings yet
Apologia Final
12 pages
Self Inflected Wound
No ratings yet
Self Inflected Wound
13 pages
The Futility of War's Exposure
No ratings yet
The Futility of War's Exposure
3 pages
Important Techniques For Analyzing Visual Tex
No ratings yet
Important Techniques For Analyzing Visual Tex
6 pages
Interesting Python
No ratings yet
Interesting Python
5 pages
Demystifying Innovation in The Value Chain
No ratings yet
Demystifying Innovation in The Value Chain
8 pages
Act I
No ratings yet
Act I
3 pages
Finance CH 3 Booklet - Financial Statements
No ratings yet
Finance CH 3 Booklet - Financial Statements
8 pages
Poem Annotation
No ratings yet
Poem Annotation
26 pages
Docx
No ratings yet
Docx
15 pages
Evolution 5.1 Natural Selection Edit
No ratings yet
Evolution 5.1 Natural Selection Edit
3 pages
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
No ratings yet
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts
14 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts
26 pages
English A Language and Literature Internal Assessment Class of 2022
No ratings yet
English A Language and Literature Internal Assessment Class of 2022
6 pages
Lecture 6 Chapter 5 Part 2 Big Data Storage Concepts
No ratings yet
Lecture 6 Chapter 5 Part 2 Big Data Storage Concepts
6 pages
Chem Practice
No ratings yet
Chem Practice
2 pages
Cells IB
No ratings yet
Cells IB
37 pages
Mis Laudon 14 Chapter 4 Test Bank
No ratings yet
Mis Laudon 14 Chapter 4 Test Bank
29 pages
Microsoft Word - Lecture 1
No ratings yet
Microsoft Word - Lecture 1
55 pages
Soft Skills Summary
No ratings yet
Soft Skills Summary
17 pages
PCA - Colab
No ratings yet
PCA - Colab
2 pages
Lecture 1
No ratings yet
Lecture 1
68 pages
Coral Bleaching and Ecosystem Impact
No ratings yet
Coral Bleaching and Ecosystem Impact
1 page
Teilnehmerliste - Mündlicher Ausdruck - Labs
No ratings yet
Teilnehmerliste - Mündlicher Ausdruck - Labs
14 pages
Ds Bida
No ratings yet
Ds Bida
2 pages
Chapter 9 Test Bank
No ratings yet
Chapter 9 Test Bank
31 pages
International Case Study - Ebay
No ratings yet
International Case Study - Ebay
39 pages
Maybe You Should Talk To Someone A Therapist Her Therapist and Our Lives Revealed Lori Gottlieb Instant Download
100% (4)
Maybe You Should Talk To Someone A Therapist Her Therapist and Our Lives Revealed Lori Gottlieb Instant Download
25 pages
WhatsApp Media Access Denied Error
No ratings yet
WhatsApp Media Access Denied Error
1 page
Lab Manual SCOA 20 21
0% (1)
Lab Manual SCOA 20 21
37 pages
ZTE ZXMW NR 8120D 1+0 Hardware Installation Guide
No ratings yet
ZTE ZXMW NR 8120D 1+0 Hardware Installation Guide
9 pages
Dbms 2
No ratings yet
Dbms 2
26 pages
VR Moto User Guide
100% (1)
VR Moto User Guide
20 pages
Algorithm Analysis: Recursive & Non-Recursive
No ratings yet
Algorithm Analysis: Recursive & Non-Recursive
33 pages
OS Session 4 Embedded OS Slides
No ratings yet
OS Session 4 Embedded OS Slides
45 pages
Answer Key
No ratings yet
Answer Key
5 pages
Khalil-Neural Networks
No ratings yet
Khalil-Neural Networks
8 pages
Graphic Design Principles for Online Use
No ratings yet
Graphic Design Principles for Online Use
18 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
CTSD C03 &co4
No ratings yet
CTSD C03 &co4
38 pages
12 CS EM Kalviexpress Practical Hand Book
100% (1)
12 CS EM Kalviexpress Practical Hand Book
29 pages
Guide Microplate Readers
No ratings yet
Guide Microplate Readers
12 pages
KNX System Design Guide
100% (2)
KNX System Design Guide
28 pages
Ritik Mahapatro: Professional Summary
No ratings yet
Ritik Mahapatro: Professional Summary
1 page
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
No ratings yet
Heist CTF Notes - TryHackMe. Solve Ethereum Smart Contract CTF On - by Sle3pyHead ? - ? - May, 2025 - Medium
7 pages
Post-to-Pre Conversion Ufone COPS/CS/SOP/460/12.0: Franchise
No ratings yet
Post-to-Pre Conversion Ufone COPS/CS/SOP/460/12.0: Franchise
5 pages
BΩSS - B1000 NEW
No ratings yet
BΩSS - B1000 NEW
8 pages
ITE S23-24 Lab2 PartA
No ratings yet
ITE S23-24 Lab2 PartA
4 pages
User Manual For EMR Accessibility Conditions
No ratings yet
User Manual For EMR Accessibility Conditions
10 pages
(ICCS121) L08 - C Programming
No ratings yet
(ICCS121) L08 - C Programming
32 pages
Colour Image Watermarking Based On Wavelet and QR Decomposition
No ratings yet
Colour Image Watermarking Based On Wavelet and QR Decomposition
4 pages
Electronic Enclosure Design Guide
No ratings yet
Electronic Enclosure Design Guide
20 pages
OSI Model Questions
No ratings yet
OSI Model Questions
13 pages
Ancestor and Predecessor in Trees
No ratings yet
Ancestor and Predecessor in Trees
8 pages
CH-1 Assignment
No ratings yet
CH-1 Assignment
4 pages
Delete Files Protected by TrustedInstaller
No ratings yet
Delete Files Protected by TrustedInstaller
7 pages

Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts

Uploaded by

Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts

Uploaded by

Big Data Storage Concepts

Lecture 5: Chapter 5 Part 1

Figure 3: File-based storage model

• For a distributed infrastructure,

Figure 6: Object-based storage model

Figure 7: Architecture of object-based storage

You might also like