0% found this document useful (0 votes)

101 views43 pages

Advanced Database Management System Mod14

This chapter discusses Big Data and NoSQL, focusing on the characteristics of Big Data, including volume, velocity, and variety, and how these exceed traditional database capabilities. It also covers the Hadoop framework and its ecosystem, highlighting components like HDFS and MapReduce, as well as various NoSQL database models such as key-value, document, column-oriented, and graph databases. The chapter aims to equip readers with an understanding of modern data management technologies and their applications in business.

Uploaded by

razel gicale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views43 pages

Advanced Database Management System Mod14

Uploaded by

razel gicale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Database

Systems:
Design,
Implementation,
and
Management,
14e
Module 14: Big Data and
NoSQL

By the end of this chapter, you should be able to:

1. Explain the role of Big Data in modern business

2. Describe the primary characteristics of Big Data and how these go beyond the
traditional “3 Vs”

3. Explain how the core components of the Hadoop framework operate

4. Identify the major components of the Hadoop ecosystem

5. Summarize the four major approaches of the NoSQL data model and how they
differ from the relational model

By the end of this chapter, you should be able to (continued):

6. Describe the characteristics of NewSQL databases

7. Understand how to work with document databases using MongoDB

8. Understand how to work with graph databases using Neo4j

• Big Data refers to a set of data that displays the characteristics of volume, velocity,
and variety (the 3 Vs) to an extent that makes the data unsuitable for management
by a relational DBMS

• These characteristics can be defined as follows:

− Volume – the quantity of data to be stored
− Velocity – the speed at which data is entering the system
− Variety – the variations in the structure of the data to be stored

• Volume, the quantity of data to be stored, is a key characteristic of Big Data

• Scaling up is keeping the same number of systems but migrating each one to a
larger system

• Scaling out means that when the workload exceeds server capacity, it is spread
out across a number of servers

• Velocity refers to the rate at which new data enters the system as well as the rate at
which the data must be processed

• The velocity of processing can be broken down into two categories:

− Stream processing focuses on input processing and requires analysis of data
stream as it enters the system
 Scientists have created algorithms to decide ahead of time which data will
be kept
− Feedback loop processing refers to the analysis of the data to produce
actionable results

Figure 14.3 Feedback Loop

Processing

• Variety refers to the vast array of formats and structures in which data may be
captured

• Structured data is data that has been organized to fit a predefined data model

• Unstructured data is data that is not organized to fit into a predefined data model

• Semistructured data combines elements of both – some parts of the data fit a
predefined model while other parts do not

• Relational databases rely on structured data

• One advantage of providing structure is the flexibility of being able to structure the
data in different ways for different applications

• Variability refers to the changes in the meaning of data based on context

• Sentimental analysis is a method of text analysis that attempts to determine if a

statement conveys a positive, negative, or neutral attitude about a topic

• Veracity refers to the trustworthiness of data

• Value refers to the degree to which the data can be analyzed for meaningful insight

• Visualization is the ability to graphically resent data to make it understandable

• Polyglot persistence is the coexistence of a variety of data storage and

management technologies within an organization’s infrastructure

• De facto standard for most Big Data storage and processing

• Hadoop is a Java-based framework for distributing and processing very large data
sets across clusters of computers

• The two most important components include the following:

− Hadoop Distributed File System (HDFS) is a low-level distributed file processing
system that can be used directly for data storage
− MapReduce is a programming model that supports processing large data sets

• The Hadoop Distributed File System (HDFS) approach to distributing is based

on the following key assumptions:
− High volume – Hadoop has a default block sizes is 64 MB and can be configured
to even larger values
− Write-once, read-many: this model simplifies concurrency issues and improves
data throughput
− Streaming access: Hadoop is optimized for batch processing of entire files as a
continuous stream of data
− Fault tolerance: Hadoop is designed to replicate data across many different
devices so that when one fails, data is still available from another device

• Hadoop uses several types of nodes, which are computers that perform one or more
types of tasks within the system
− Data nodes store the actual file data
− The name node contains file system metadata
− The client node makes requests to the file system as needed to support user
applications

• The data node communicates with the name node and sends block reports and
heartbeats
− A block report is sent every 6 hours and informs the name node which blocks
are on that data node
− A heartbeat is used to let the name node know that the data node is still
available
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 12
whole or in part.
HDFS (3 of 3)

Figure 14.4 Hadoop Distributed File

System (HDFS)

• MapReduce is the computing framework used to process large data sets across
clusters

• A map function takes a collection of data and sorts and filters it into a set of key-
value pairs
− The map function is performed by a program called a mapper

• A reduce function summarizes the results of the map function into a single result
− The reduce function is performed by a program called a reducer

• The implementation of MapReduce complements the HDFS structure

− Job tracker is a central control program used to report on MapReduce processing
jobs
− Task tracker is aandprogram
Coronel, Carlos Morris, Steven,responsible
Database Systems:for running
Design, mapandand
Implementation, reduce
Management, tasks
14 Edition. on a node
© 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 14
whole or in part.
Hadoop Ecosystem (1 of 3)

• Most organizations that use Hadoop also use a set of other related products that
interact and complement each other to produce an entire ecosystem of applications
and tools

• Like any ecosystem, the interconnected pieces are constantly evolving and their
relationships are changing, so it is a rather fluid situation

• MapReduce Simplification Applications

− Hive is a data warehousing system that sits on top of HDFS and supports its own
SQL-like language
− Pig is a tool for compiling a high-level scripting language, named Pig Latin, into
MapReduce jobs for executing in Hadoop

• Data Ingestion Applications

− Flume is a component for ingesting data in Hadoop
− Sqoop is a tool for converting data back and forth between a relational database
and the HDFS

• Direct Query Applications

− Hbase is a column-oriented NoSQL database designed to sit on top of the HDFS
that quickly processes sparse datasets
− Impala was the first SQL on Hadoop application

Figure 14.6 A Sample of the

Hadoop Ecosystem

• Many organizations benefit from having a customized Hadoop ecosystem that is

tailored to their specific needs in a manner that no other solution can duplicate
− However, the learning curve can be steep

• Companies such as IBM and Cloudera offer out-of-the-box Hadoop ecosystems

called data platforms

• The perceived complications of Hadoop have helped to propel interest in alternative

solutions, such as NoSQL databases

• What is Big Data? Give a brief definition.

Answer: Big Data is data of such volume, velocity, and/or variety that
it is difficult for traditional relational database technologies to store and
process it.

• NoSQL is the name given to a broad array of nonrelational database technologies

that have developed to address Big Data challenges

• The name does not describe what the NoSQL technologies are, but rather what they
are not

• There are hundreds of products that can be considered as being under the broadly
defined term NoSQL
− Most fit into one of four categories: key-value data stores, document databases,
column-oriented databases, and graph databases

• Key-value (KV) databases are conceptually the simplest of the NoSQL data
models
− A KV database is a NoSQL database that stores data as a collection of key-value
pairs

• Key-value pairs are typically organized into “bucket”

− A bucket can roughly be thought of as the KV database equivalent of a table
− A bucket is a logical grouping of keys

• Figure 14.7 Key-Value Database Storage

• Figure 14.8 Document Database Tagged Format

• Document databases are conceptually similar to key-value databases

− A document database stores data in key-value pairs in which the value
component is composed of a tag-encoded document

• JSON (JavaScript Object Notation) is a human-readable text format for data

interchange that defines attributes and values in a document

• BSON (Binary JSON) is a computer-readable format for data interchange that

expands the JSON format to include additional data types including binary objects

• A collection, in document databases, is a logical storage unit that contains similar

documents, roughly analogous to a table in a relational database

• Column-oriented databases refers to the following two technologies:

− Column-centric storage, which is a storage technique in which data is stored
in blocks which hold data from a single column across many rows
− Row-centric storage, which is a storage technique in which data is stored in
blocks which hold data from all columns of a given set of rows

• A column family database is a NoSQL database that organizes data in key-value

pairs with keys mapped to a set of columns in the value component

• A super column is a groups of columns that are logically related

• In a column family database, a collection of columns or super columns related to a

collection of rows are grouped together to create a column family

Figure 14.10 Column Family

Database

• A graph database is a NoSQL database based on graph theory to store data about
relationship-rich environments

• The primary components of graph databases are nodes, edges, and properties
− The node is a specific instance of something we want to keep data about
− An edge is a relationship between nodes
− Properties are the attributes or characteristics of a node or edge that are of
interest to the users

• A query in a graph database is called a traversal

Figure 14.11 Graph Database

Representation

• What are the four basic categories of NoSQL databases?

• Key-value, document, and column family databases are aggregate aware

− Aggregate aware means that the data is collected or aggregated around a
central topic or entity

• The aggregate aware database models achieve clustering efficiency by making each
piece of data relatively independent

• Graph databases, like relational databases, are aggregate ignorant

− Aggregate ignorant models do not organize the data into collections based on a
central entity

• NewSQL is a database model that attempts to provide ACID-compliant transactions

across a highly distributed infrastructure

• Characteristics of NewSQL include the following:

− Have no proven track record
− Have been adopted by relatively few organizations

• NewSQL databases support:

− SQL as the primary interface
− ACID-compliant transactions

Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 32
whole or in part.
Working with Document Databases
Using MongoDB (1 of 3)
• MongoDB is a popular document database
− Among the NoSQL databases currently available, MongoDB has been one of the
most successful in penetrating the database market

• MongoDB, comes from the word humongous as its developers intended their new
product to support extremely large data sets

• It is designed for the following:

− High availability
− High scalability
− High performance

Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 33
whole or in part.
Working with Document Databases
Using MongoDB (2 of 3)
• Importing Documents in MongoDB
− Refer to the text for an importation example and considerations

• Example of a MongoDB Query Using find()

− Methods are programed functions to manipulate objects
 The find() method retrieves objects from a collection that match the
restrictions provided
− Refer to the text for a query example

Figure 14.12 Example of MongoDB

Document Query

Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 35
whole or in part.
Working with Graph Databases Using
Neo4j
• Even though Neo4j is not yet as widely adopted as MongoDB, it has been one of the
fastest growing NoSQL databases

• Graph databases still work with concepts similar to entities and relationships
− The focus is on the relationships

• Graph databases are used in environments with complex relationships among

entities
− Graph databases are heavily reliant on interdependence among their data

• Neo4j provides several interface options

− It was originally designed with Java programming in mind and optimized for
interaction through a Java API
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 36
whole or in part.
Creating Nodes in Neo4j

• Nodes in a graph database correspond to entity instances in a relational database

• In Neo4j, a label is the closest thing to the concept of a table from the relational
model
− A label is a tag that is used to associate a collection of nodes as being of the
same type or belonging to the same group

• Cypher is the interactive, declarative query language in Neo4j

• Nodes and relationships are created using a CREATE command

Figure 14.13 Neo4j Query Using

MATCH/WHERE/RETURN

• Explain what it means for a database to be aggregate aware.

Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023
Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in 40
whole or in part.
Knowledge Check Activity 14-3:
Answer
• Explain what it means for a database to be aggregate aware.
• Answer: Aggregate aware means that the designer of the database
has to be aware of the way the data in the database will be used, and
then design the database around whichever component would be
central to that usage. Instead of decomposing the data structures to
eliminate redundancy, an aggregate aware database is collects, or
aggregates, all of the data around a central component to minimize
the structures required during processing.

Now that the lesson has ended, you should be able to:

1. Explain the role of Big Data in modern business

2. Describe the primary characteristics of Big Data and how these go beyond the
traditional “3 Vs”

3. Explain how the core components of the Hadoop framework operate

4. Identify the major components of the Hadoop ecosystem

5. Summarize the four major approaches of the NoSQL data model and how they
differ from the relational model

Now that the lesson has ended, you should be able to (continued):

6. Describe the characteristics of NewSQL databases

7. Understand how to work with document databases using MongoDB

8. Understand how to work with graph databases using Neo4j

Module 1
No ratings yet
Module 1
42 pages
Coronel Morris - DatabaseSystems - 14e - PPT - Mod01
No ratings yet
Coronel Morris - DatabaseSystems - 14e - PPT - Mod01
41 pages
Coronel-Morris DatabaseSystems 14e PPT Mod01
No ratings yet
Coronel-Morris DatabaseSystems 14e PPT Mod01
41 pages
Flume: Data Ingestion for Hadoop
No ratings yet
Flume: Data Ingestion for Hadoop
35 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Coronel DatabaseSystems 13e Ch14
No ratings yet
Coronel DatabaseSystems 13e Ch14
30 pages
It-222 Reviewer
No ratings yet
It-222 Reviewer
3 pages
Big Data Insights and Hadoop Overview
No ratings yet
Big Data Insights and Hadoop Overview
29 pages
Coronel DatabaseSystems 13e Ch14
No ratings yet
Coronel DatabaseSystems 13e Ch14
30 pages
Big Data Analytics and NoSQL Overview
No ratings yet
Big Data Analytics and NoSQL Overview
8 pages
Data Science Essentials & Big Data Concepts
No ratings yet
Data Science Essentials & Big Data Concepts
20 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Top Big Data Platforms & Use Cases
No ratings yet
Top Big Data Platforms & Use Cases
9 pages
Ijeme V13 N4 5
No ratings yet
Ijeme V13 N4 5
9 pages
Introduction to Big Data Concepts
100% (1)
Introduction to Big Data Concepts
17 pages
Adbms Finals Reviewer
No ratings yet
Adbms Finals Reviewer
3 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
WK 3
No ratings yet
WK 3
29 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Understanding Big Data and Hadoop
No ratings yet
Understanding Big Data and Hadoop
25 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
CS 441 Handouts
No ratings yet
CS 441 Handouts
300 pages
Module 2
No ratings yet
Module 2
46 pages
Coronel Morris - DatabaseSystems - 14e - PPT - Mod02
No ratings yet
Coronel Morris - DatabaseSystems - 14e - PPT - Mod02
45 pages
Big Data Analytics 18CS72 - Module 1
No ratings yet
Big Data Analytics 18CS72 - Module 1
84 pages
Database Connectivity & Web Tech
No ratings yet
Database Connectivity & Web Tech
6 pages
CH 2
No ratings yet
CH 2
23 pages
Big Data Challenges & Solutions
100% (1)
Big Data Challenges & Solutions
17 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
29 pages
BDS Session 1
100% (1)
BDS Session 1
70 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
Big Data Analytics Module 1
No ratings yet
Big Data Analytics Module 1
31 pages
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Nosql
No ratings yet
Nosql
26 pages
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
No ratings yet
G7 - P3 - Big Data Concepts and Application - NoSQL Vs Relational DB - Key-Value Model
33 pages
Big Data Challenges and Hadoop Insights
No ratings yet
Big Data Challenges and Hadoop Insights
55 pages
Big Data Analytics Tool Comparison
No ratings yet
Big Data Analytics Tool Comparison
5 pages
Course Name: Introduction To Emerging Technologies
No ratings yet
Course Name: Introduction To Emerging Technologies
24 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
Big Data Overview and Frameworks Guide
No ratings yet
Big Data Overview and Frameworks Guide
14 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
Presentation 1
No ratings yet
Presentation 1
27 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data
No ratings yet
Big Data
25 pages
Advanced Database Management System Mod13
No ratings yet
Advanced Database Management System Mod13
50 pages
Database Models & Concepts Quiz
No ratings yet
Database Models & Concepts Quiz
14 pages
NORMALIZATIOn 2014
No ratings yet
NORMALIZATIOn 2014
4 pages
Normalization Database
No ratings yet
Normalization Database
14 pages
Lesson5 NORMALIZATION (Midtrem)
No ratings yet
Lesson5 NORMALIZATION (Midtrem)
29 pages
Lesson2 Dbms (Prelim)
No ratings yet
Lesson2 Dbms (Prelim)
30 pages
Lesson3a-Intro To Mysql (Prelim)
No ratings yet
Lesson3a-Intro To Mysql (Prelim)
30 pages
Java Classes - Objects
No ratings yet
Java Classes - Objects
19 pages
9.BS Entertainment and Multimedia Computing
No ratings yet
9.BS Entertainment and Multimedia Computing
12 pages
Planning For Security
No ratings yet
Planning For Security
69 pages
Lesson 1 Proposition
No ratings yet
Lesson 1 Proposition
5 pages
BDA Unit-5
No ratings yet
BDA Unit-5
44 pages
Cloud Computing MODULE3
No ratings yet
Cloud Computing MODULE3
165 pages
Unit 3 Hbase, Mongodb and Couch DB
No ratings yet
Unit 3 Hbase, Mongodb and Couch DB
12 pages
Hadoop
77% (13)
Hadoop
65 pages
Sas Viya
No ratings yet
Sas Viya
4 pages
Cloud Computing Unit 5
No ratings yet
Cloud Computing Unit 5
49 pages
Gautham - Data Engineer
No ratings yet
Gautham - Data Engineer
6 pages
Hadoop Distributed File System Guide
No ratings yet
Hadoop Distributed File System Guide
15 pages
Week 5 CC
No ratings yet
Week 5 CC
7 pages
Cloudera Introduction PDF
No ratings yet
Cloudera Introduction PDF
97 pages
Cloud Computing, Service-Oriented Computing, Distributed Computing, and Virtualization
No ratings yet
Cloud Computing, Service-Oriented Computing, Distributed Computing, and Virtualization
26 pages
GuideToApacheAirflow PDF
100% (1)
GuideToApacheAirflow PDF
6 pages
EMTE Final Exam
No ratings yet
EMTE Final Exam
3 pages
Bigdata Imp Ques
No ratings yet
Bigdata Imp Ques
5 pages
HP Fortify Software Security Center v3.60 System Requirements
No ratings yet
HP Fortify Software Security Center v3.60 System Requirements
24 pages
R & Hadoop Workshop for Students
No ratings yet
R & Hadoop Workshop for Students
2 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
Cloud Final Experiments For Students
No ratings yet
Cloud Final Experiments For Students
96 pages
Unit 3 Iot II
No ratings yet
Unit 3 Iot II
12 pages
Fundamentals of Business Analytics With Spreadsheet
100% (1)
Fundamentals of Business Analytics With Spreadsheet
22 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
B.Tech CS VII & VIII Semester Syllabus
No ratings yet
B.Tech CS VII & VIII Semester Syllabus
11 pages
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
No ratings yet
Machine Learning and Cloud Computing: Survey of Distributed and Saas Solutions
13 pages
Introduction to Emerging Technologies Guide
No ratings yet
Introduction to Emerging Technologies Guide
64 pages
Bheemi Reddy
No ratings yet
Bheemi Reddy
11 pages
Big Data Technology
No ratings yet
Big Data Technology
9 pages
Stanford CS246: Mining Massive Datasets
No ratings yet
Stanford CS246: Mining Massive Datasets
77 pages
Apache Ranger Auditing
No ratings yet
Apache Ranger Auditing
17 pages
Pue Big Data
No ratings yet
Pue Big Data
2 pages
HBase 1.1.2 Installation Guide
No ratings yet
HBase 1.1.2 Installation Guide
4 pages

Advanced Database Management System Mod14

Uploaded by

Advanced Database Management System Mod14

Uploaded by

Database

By the end of this chapter, you should be able to:

1. Explain the role of Big Data in modern business

3. Explain how the core components of the Hadoop framework operate

4. Identify the major components of the Hadoop ecosystem

By the end of this chapter, you should be able to (continued):

6. Describe the characteristics of NewSQL databases

7. Understand how to work with document databases using MongoDB

8. Understand how to work with graph databases using Neo4j

• These characteristics can be defined as follows:

• Volume, the quantity of data to be stored, is a key characteristic of Big Data

• The velocity of processing can be broken down into two categories:

Figure 14.3 Feedback Loop

• Relational databases rely on structured data

• Variability refers to the changes in the meaning of data based on context

• Sentimental analysis is a method of text analysis that attempts to determine if a

• Veracity refers to the trustworthiness of data

• Visualization is the ability to graphically resent data to make it understandable

• Polyglot persistence is the coexistence of a variety of data storage and

• De facto standard for most Big Data storage and processing

• The two most important components include the following:

• The Hadoop Distributed File System (HDFS) approach to distributing is based

Figure 14.4 Hadoop Distributed File

• The implementation of MapReduce complements the HDFS structure

• MapReduce Simplification Applications

• Data Ingestion Applications

• Direct Query Applications

Figure 14.6 A Sample of the

• Many organizations benefit from having a customized Hadoop ecosystem that is

• Companies such as IBM and Cloudera offer out-of-the-box Hadoop ecosystems

• The perceived complications of Hadoop have helped to propel interest in alternative

• What is Big Data? Give a brief definition.

• NoSQL is the name given to a broad array of nonrelational database technologies

• Key-value pairs are typically organized into “bucket”

• Figure 14.7 Key-Value Database Storage

• Figure 14.8 Document Database Tagged Format

• Document databases are conceptually similar to key-value databases

• JSON (JavaScript Object Notation) is a human-readable text format for data

• BSON (Binary JSON) is a computer-readable format for data interchange that

• A collection, in document databases, is a logical storage unit that contains similar

• Column-oriented databases refers to the following two technologies:

• A column family database is a NoSQL database that organizes data in key-value

• A super column is a groups of columns that are logically related

• In a column family database, a collection of columns or super columns related to a

Figure 14.10 Column Family

• A query in a graph database is called a traversal

Figure 14.11 Graph Database

• What are the four basic categories of NoSQL databases?

• Key-value, document, and column family databases are aggregate aware

• Graph databases, like relational databases, are aggregate ignorant

• NewSQL is a database model that attempts to provide ACID-compliant transactions

• Characteristics of NewSQL include the following:

• NewSQL databases support:

• It is designed for the following:

• Example of a MongoDB Query Using find()

Figure 14.12 Example of MongoDB

• Graph databases are used in environments with complex relationships among

• Neo4j provides several interface options

• Nodes in a graph database correspond to entity instances in a relational database

• Cypher is the interactive, declarative query language in Neo4j

• Nodes and relationships are created using a CREATE command

Figure 14.13 Neo4j Query Using

• Explain what it means for a database to be aggregate aware.

1. Explain the role of Big Data in modern business

3. Explain how the core components of the Hadoop framework operate

4. Identify the major components of the Hadoop ecosystem

6. Describe the characteristics of NewSQL databases

7. Understand how to work with document databases using MongoDB

8. Understand how to work with graph databases using Neo4j

You might also like