Chapter 2:
NoSQL Databases
Big Data Management and Analytics 50
DATABASE
SYSTEMS
NoSQL Database Systems
GROUP
Outline
• History
• Concepts
• ACID
• BASE
• CAP
• Data Models
• Key-Value
• Document
• Column-based
• Graph
Big Data Management and Analytics 51
DATABASE
SYSTEMS
History
GROUP
60s: IBM developed the Hierarchical Database Model
• Tree-like structure
• Data stored as records connected by links
• Support only one-to-one and one-to-many relationships
Mid 80‘s: Rise of Relational Database Model
• Data stored in a collection of tables (rows and columns)
→ Strict relational scheme
• SQL became standard language (based on relational algebra)
→ Impedance Mismatch!
Big Data Management and Analytics 52
DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP
Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter … … … … … … …
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 … … … … … … … …
Menge: 700
Given the LTP scheme from Datenbanksysteme I and an object
of type Supply:
How to incorporate the data bundled in the object Supply
into the DB?
Big Data Management and Analytics 53
DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP
Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter L1 Meier 20 Wetter P2 Pleite Bonn
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 T6 Schraube rot 03 … … … …
Menge: 700
INSERT INTO L VALUES (Supply.getSupplier().getLNR(), ...);
INSERT INTO P VALUES (Supply.getProject().getPNR(), ...);
...
Big Data Management and Analytics 54
DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP
Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter L1 Meier 20 Wetter P2 Pleite Bonn
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 T6 Schraube rot 03 L1 P2 T6 700
Menge: 700
INSERT INTO LTP VALUES (...);
• Object-oriented encapsulation vs. storing data distributed
among several tables
→ Lots of data type maintenance by the programmer
Big Data Management and Analytics 55
DATABASE
SYSTEMS
History
GROUP
Mid 90‘s: Trend of the Object-Relational Database Model
• Data stored as objects (including data and methods)
• Avoidance of object-relational mapping
→ Programmer-friendly
• But still Relational Databases prevailed in the 90‘s
Mid 2000‘s: Rise of Web 2.0
• Lots of user generated data through web applications
→ Storage systems had to become scaled up
Big Data Management and Analytics 56
DATABASE
SYSTEMS
History
GROUP
Approaches to scale up storage systems
• Two opportunities to solve the rising storage system:
• Vertical scaling
Enlarge a single machine
– Limited in space
– Expensive
• Horizontal scaling
Use many commodity ma-
chines and form computer
clusters or grids
– Cluster maintenance
Big Data Management and Analytics 57
DATABASE
SYSTEMS
History
GROUP
Approaches to scale up storage systems
• Two opportunities to solve the rising storage system:
• Vertical scaling
Enlarge a single machine
– Limited in space
– Expensive
• Horizontal scaling
Use many commodity ma-
chines and form computer
clusters or grids
– Cluster maintenance
Big Data Management and Analytics 58
DATABASE
SYSTEMS
History
GROUP
Mid 2000‘s: Birth of the NoSQL Movement
• Problem of computer clusters:
Relational databases do not scale well horizontally
→ Big Players like Google or Amazon developed their own
storage systems: NoSQL („Not-Only SQL“) databases were
born
Today: Age of NoSQL
• Several different NoSQL systems available (>225)
Big Data Management and Analytics 59
DATABASE
SYSTEMS
Characterstics of NoSQL Databases
GROUP
There is no unique definition but some characteristics for
NoSQL Databases:
• Horizontal scalability (cluster-friendliness)
• Non-relational
• Distributed
• Schema-less
• Open-source (at least most of the systems)
Big Data Management and Analytics 60
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
ACID – The holy grail of RDBMSs:
• Atomicity: Transactions happen entirely or not at all. If a
transaction fails (partly), the state of the database is
unchanged.
• Consistency: Any transaction brings the database from one
valid state to another and does not break one of the pre-
defined rules (like constraints).
• Isolation: Concurrent execution of transactions results in a
system state that would be obtained if transactions were
executed serially.
• Durability: Once a transaction has been commited, it will
remain so.
Big Data Management and Analytics 61
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
BASE – An artificial concept for NoSQL databases:
• Basically Available: The system is generally available, but
some data might not at any time (e.g. due to node failures)
• Soft State: The system‘s state changes over time. Stale data
may expire if not refreshed.
• Eventual consistency: The system is consistent from time to
time, but not always. Updates are propagated through the
system if there is enough time.
→ BASE is settled on the opposite site to ACID when
considering a „consistency-availability spectrum“
Big Data Management and Analytics 62
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
Levels of Consistency:
Eventual Consistency
Monotonic Read Consistency
M.R.C. + R.Y.O.W.
Immediate Consistency
Strong Consistency
Transactions
Read-Your-Own-Writes
Big Data Management and Analytics 63
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
Levels of Consistency:
• Eventual Consistency: Write operations are not spread
across all servers/partitions immediately
• Monotononic Read Consistency: A client who read an object
once will never read an older version of this object
• Read Your Own Writes: A client who wrote an object will
never read an older version of this object
• Immediate Consistency: Updates are propagated
immediately, but not atomic
Big Data Management and Analytics 64
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
Levels of Consistency:
• Strong consistency: Updates are propagated immediately +
support of atomic operations on single data entities (usually
on master nodes)
• Transactions: Full support of ACID transaction model
Big Data Management and Analytics 65
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
Data sharding Data replication
Document Document
The two types of consistency:
• Logical consistency:
Data is consistent within itself (Data Integrity)
• Replication consistency:
Data is consistent across multiple replicas (on multiple
machines)
Big Data Management and Analytics 66
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
Brewer‘s CAP Theorem:
CONSISTENCY
AVAILABILITY PARTITION
TOLERANCE
Any networked shared-data system can have at
most two of the three desired properties!
Big Data Management and Analytics 67
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP
DB-Systems allowed by CAP Theorem:
• CP-Systems: Fully consistent and partitioned systems
renounce availability. Only consistent nodes are available.
• AP-Systems: Fully available and partitioned systems
renounce consistency. All nodes answer to queries all the
time, even if answers are inconsistent.
• AC-Systems: Fully available and consistent systems
renounce partitioning. Only possible if the system is not
distributed.
Big Data Management and Analytics 68
DATABASE
SYSTEMS
Big Picture
GROUP
All clients always
CAP Theorem: have the same view
of the data
C C
A P
A Each client can al- The system works well
ways read and write despite physical
network partitions
Big Data Management and Analytics 69
DATABASE
SYSTEMS
Big Picture
GROUP
All clients always
CAP Theorem: have the same view
of the data
C C
ACID
AC-Systems CP-Systems
- RDBMSs (MySQL,
Postgres, …)
BASE
A P
A Each client can al-
AP-Systems
The system works well
ways read and write despite physical
network partitions
Big Data Management and Analytics 70