0% found this document useful (0 votes)
15 views51 pages

Lecture 01 Introduction

The document outlines the first lecture of the NDBI040 course on Modern Database Concepts, covering topics such as Big Data characteristics, NoSQL databases, and their various types including key-value, document, wide column, and graph databases. It discusses the evolution and necessity of NoSQL databases in response to the limitations of traditional relational databases in handling large volumes, variety, and velocity of data. The lecture also highlights current trends in data processing and the shift towards cloud-based solutions and real-time analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views51 pages

Lecture 01 Introduction

The document outlines the first lecture of the NDBI040 course on Modern Database Concepts, covering topics such as Big Data characteristics, NoSQL databases, and their various types including key-value, document, wide column, and graph databases. It discusses the evolution and necessity of NoSQL databases in response to the limitations of traditional relational databases in handling large volumes, variety, and velocity of data. The lecture also highlights current trends in data processing and the shift towards cloud-based solutions and real-time analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NDBI040: Modern Database Concepts

h p://[Link].mff.[Link]/~svoboda/courses/191-NDBI040/

Lecture 1

Introduc on
Mar n Svoboda
[email protected]ff.[Link]

1. 10. 2019

Charles University, Faculty of Mathema cs and Physics


Lecture Outline
Big Data
• Characteris cs
• Current trends
NoSQL databases
• Mo va on
• Features
Overview of NoSQL database types
• Key-value, wide column, document, graph, …

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 2


What is Big Data?
No standard defini on
• Gartner (research and advisory company):
High Performance Compu ng

Big Data is high volume, high velocity, and/or high variety


informa on assets that require new forms of processing to
enable enhanced decision making, insight discovery and pro-
cess op miza on.

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 4


Where is Big Data?
Sources of Big Data
• Social media and networks
…all of us are genera ng data
• Scien fic instruments
…collec ng all sorts of data
• Mobile devices
…tracking all objects all the me
• Sensor technology and networks
…measuring all kinds of data

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 5


Big Data Characteris cs
Volume (Scale)

Source: h p://[Link]/

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 6


Big Data Characteris cs
Variety (Complexity)

Source: h p://[Link]/

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 7


Big Data Characteris cs
Velocity (Speed)

Source: h p://[Link]/

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 8


Big Data Characteris cs
Veracity (Uncertainty)

Source: h p://[Link]/

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 9


Big Data Characteris cs
Basic 4V

• Volume (Scale)
Data volume is increasing exponen ally, not linearly
Even large amounts of small data can result into Big Data
• Variety (Complexity)
Various formats, types, and structures
(from semi-structured XML to unstructured mul media)
• Velocity (Speed)
Data is being generated fast and needs to be processed fast
• Veracity (Uncertainty)
Uncertainty due to inconsistency, incompleteness, latency,
ambigui es, or approxima ons

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 10


Rela onal Databases
Data model
Instance → database → table → row
Query languages
• Real-world: SQL (Structured Query Language)
• Formal: Rela onal algebra, rela onal calculi (domain, tuple)
Query pa erns
• Selec on based on complex condi ons, projec on, joins,
aggrega on, deriva on of new values, recursive queries, …
Representa ves
• Oracle Database, Microso SQL Server, IBM DB2
• MySQL, PostgreSQL

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 13


Rela onal Databases
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 14


Rela onal Databases
Features: Normal Forms

Model
• Func onal dependencies
• 1NF, 2NF, 3NF, BCNF (Boyce-Codd normal form)
Objec ve
• Normaliza on of database schema to BCNF or 3NF
• Algorithms: decomposi on or synthesis
Mo va on
• Diminish data redundancy, prevent update anomalies
• However:
Data is scattered into small pieces (high granularity), and so
these pieces have to be joined back together when querying!

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 15


Rela onal Databases
Features: Transac ons

Model
• Transac on = flat sequence of database opera ons
(READ, WRITE, COMMIT, ABORT)
Objec ves
• Enforcement of ACID proper es
• Efficient parallel / concurrent execu on (slow hard drives, …)
ACID proper es
• Atomicity – par al execu on is not allowed (all or nothing)
• Consistency – transac ons turn one valid database state into another
• Isola on – uncommi ed effects are concealed among transac ons
• Durability – effects of commi ed transac ons are permanent

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 16


Current Trends
Big Data
• Volume: terabytes → ze abytes
• Variety: structured → structured and unstructured data
• Velocity: batch processing → streaming data
• …
Big users
• Popula on online, hours spent online, devices online, …
• Rapidly growing companies / web applica ons
Even millions of users within a few months

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 17


Current Trends
Everything is in cloud
• SaaS: So ware as a Service
• PaaS: Pla orm as a Service
• IaaS: Infrastructure as a Service
Processing paradigms
• OLTP: Online Transac on Processing
• OLAP: Online Analy cal Processing
• …but also…
• RTAP: Real-Time Analy cal Processing

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 18


Current Trends
Data assump ons
• Data format is becoming unknown or inconsistent
• Linear growth → unpredictable exponen al growth
• Read requests o en prevail write requests
• Data updates are no longer frequent
• Data is expected to be replaced
• Strong consistency is no longer mission-cri cal

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 19


Current Trends
⇒ New approach is required
• Rela onal databases simply do not follow the current trends
Key technologies
• Distributed file systems
• MapReduce and other programming models
• Grid compu ng, cloud compu ng
• NoSQL databases
• Data warehouses
• Large scale machine learning

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 20


NoSQL Databases
What does NoSQL actually mean?
A bit of history …
• 1998
First used for a rela onal database that omi ed usage of SQL
• 2009
First used during a conference to advocate non-rela onal
databases
So?
• Not: no to SQL
• Not: not only SQL
• NoSQL is an accidental term with no precise defini on

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 21


NoSQL Databases
What does NoSQL actually mean?

NoSQL movement = The whole point of seeking alterna ves


is that you need to solve a problem that rela onal databases
are a bad fit for

NoSQL databases = Next genera on databases mostly ad-


dressing some of the points: being non-rela onal, dis-
tributed, open-source and horizontally scalable. The original
inten on has been modern web-scale databases. O en more
characteris cs apply as: schema-free, easy replica on sup-
port, simple API, eventually consistent, a huge data amount,
and more.
Source: h p://[Link]/

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 22


Types of NoSQL Databases
Core types
• Key-value stores
• Wide column (column family, column oriented, …) stores
• Document stores
• Graph databases
Non-core types
• Object databases
• Na ve XML databases
• RDF stores
• …

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 23


Key-Value Stores
Data model
• The most simple NoSQL database type
Works as a simple hash table (mapping)
• Key-value pairs
Key (id, iden fier, primary key)
Value: binary object, black box for the database system
Query pa erns
• Create, update or remove value for a given key
• Get value for a given key
Characteris cs
• Simple model ⇒ great performance, easily scaled, …
• Simple model ⇒ not for complex queries nor complex data

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 24


Key-Value Stores
Suitable use cases
• Session data, user profiles, user preferences, shopping carts, …
I.e. when values are only accessed via keys
When not to use
• Rela onships among en es
• Queries requiring access to the content of the value part
• Set opera ons involving mul ple key-value pairs
Representa ves
• Redis, MemcachedDB, Riak KV, Hazelcast, Ehcache, Amazon
SimpleDB, Berkeley DB, Oracle NoSQL, Infinispan, LevelDB,
Ignite, Project Voldemort
• Mul -model: OrientDB, ArangoDB

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 25


Key-Value Stores
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 26


Document Stores
Data model
• Documents
Self-describing
Hierarchical tree structures (JSON, XML, …)
– Scalar values, maps, lists, sets, nested documents, …
Iden fied by a unique iden fier (key, …)
• Documents are organized into collec ons
Query pa erns
• Create, update or remove a document
• Retrieve documents according to complex query condi ons
Observa on
• Extended key-value stores where the value part is examinable!

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 27


Document Stores
Suitable use cases
• Event logging, content management systems, blogs, web
analy cs, e-commerce applica ons, …
I.e. for structured documents with similar schema
When not to use
• Set opera ons involving mul ple documents
• Design of document structure is constantly changing
I.e. when the required level of granularity would outbalance
the advantages of aggregates

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 28


Document Stores
Representa ves
• MongoDB, Couchbase, Amazon DynamoDB, CouchDB,
RethinkDB, RavenDB, Terrastore
• Mul -model: MarkLogic, OrientDB, OpenLink Virtuoso,
ArangoDB

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 29


Document Stores
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 30


Wide Column Stores
Data model
• Column family (table)
Table is a collec on of similar rows (not necessarily iden cal)
• Row
Row is a collec on of columns
– Should encompass a group of data that is accessed together
Associated with a unique row key
• Column
Column consists of a column name and column value
(and possibly other metadata records)
Scalar values, but also flat sets, lists or maps may be allowed

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 31


Wide Column Stores
Query pa erns
• Create, update or remove a row within a given column family
• Select rows according to a row key or simple condi ons
Warning
• Wide column stores are not just a special kind of RDBMSs
with a variable set of columns!

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 32


Wide Column Stores
Suitable use cases
• Event logging, content management systems, blogs, …
I.e. for structured flat data with similar schema
When not to use
• ACID transac ons are required
• Complex queries: aggrega on (SUM, AVG, …), joining, …
• Early prototypes: i.e. when database design may change
Representa ves
• Apache Cassandra, Apache HBase, Apache Accumulo,
Hypertable, Google Bigtable

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 33


Wide Column Stores
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 34


Graph Databases
Data model
• Property graphs
Directed / undirected graphs, i.e. collec ons of …
– nodes (ver ces) for real-world en es, and
– rela onships (edges) between these nodes
Both the nodes and rela onships can be associated
with addi onal proper es
Types of databases
• Non-transac onal = small number of very large graphs
• Transac onal = large number of small graphs

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 35


Graph Databases
Query pa erns
• Create, update or remove a node / rela onship in a graph
• Graph algorithms (shortest paths, spanning trees, …)
• General graph traversals
• Sub-graph queries or super-graph queries
• Similarity based queries (approximate matching)
Representa ves
• Neo4j, Titan, Apache Giraph, InfiniteGraph, FlockDB
• Mul -model: OrientDB, OpenLink Virtuoso, ArangoDB

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 36


Graph Databases
Suitable use cases
• Social networks, rou ng, dispatch, and loca on-based
services, recommenda on engines, chemical compounds,
biological pathways, linguis c trees, …
I.e. simply for graph structures
When not to use
• Extensive batch opera ons are required
Mul ple nodes / rela onships are to be affected
• Only too large graphs to be stored
Graph distribu on is difficult or impossible at all

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 37


Graph Databases
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 38


Na ve XML Databases
Data model
• XML documents
Tree structure with nested elements, a ributes, and text values
(beside other less important constructs)
Documents are organized into collec ons
Query languages
• XPath: XML Path Language (naviga on)
• XQuery: XML Query Language (querying)
• XSLT: XSL Transforma ons (transforma on)
Representa ves
• Sedna, Tamino, BaseX, eXist-db
• Mul -model: MarkLogic, OpenLink Virtuoso

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 39


Na ve XML Databases
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 40


RDF Stores
Data model
• RDF triples
Components: subject, predicate, and object
Each triple represents a statement about a real-world en ty
• Triples can be viewed as graphs
Ver ces for subjects and objects
Edges directly correspond to individual statements
Query language
• SPARQL: SPARQL Protocol and RDF Query Language
Representa ves
• Apache Jena, rdf4j (Sesame), Algebraix
• Mul -model: MarkLogic, OpenLink Virtuoso

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 41


RDF Stores
Representa ves

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 42


Features of NoSQL Databases
Data model
• Tradi onal approach: rela onal model
• (New) possibili es:
Key-value, document, wide column, graph
Object, XML, RDF, …
• Goal
Respect the real-world nature of data
(i.e. data structure and mutual rela onships)

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 43


Features of NoSQL Databases
Aggregate structure
• Aggregate defini on
Data unit with a complex structure
Collec on of related data pieces we wish to treat as a unit
(with respect to data manipula on and data consistency)
• Examples
Value part of key-value pairs in key-value stores
Document in document stores
Row of a column family in wide column stores

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 44


Features of NoSQL Databases
Aggregate structure
• Types of systems
Aggregate-ignorant: rela onal, graph
– It is not a bad thing, it is a feature
Aggregate-oriented: key-value, document, wide column
• Design notes
No universal strategy how to draw aggregate boundaries
Atomicity of database opera ons:
just a single aggregate at a me

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 45


Features of NoSQL Databases
Elas c scaling
• Tradi onal approach: scaling-up - Vertical Scaling.
Buying bigger servers as database load increases
• New approach: scaling-out - Horizontal Scaling.
Distribu ng database data across mul ple hosts
– Graph databases (unfortunately): difficult or impossible at all
Data distribu on
• Sharding
Par cular ways how database data is split into separate groups
• Replica on
Maintaining several data copies (performance, recovery)

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 46


Features of NoSQL Databases
Automated processes
• Tradi onal approach
Expensive and highly trained database administrators
• New approach: automa c recovery, distribu on, tuning, …
Relaxed consistency
• Tradi onal approach
Strong consistency (ACID proper es and transac ons)
• New approach
Eventual consistency only (BASE proper es)
I.e. we have to make trade-offs because of the data distribu on

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 47


Features of NoSQL Databases
Schemalessness
• Rela onal databases
Database schema present and strictly enforced
• NoSQL databases
Relaxed schema or completely missing
Consequences: higher flexibility
– Dealing with non-uniform data
– Structural changes cause no overhead
However: there is (usually) an implicit schema
– We must know the data structure at the applica on level
anyway

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 48


Features of NoSQL Databases
Open source
• O en community and enterprise versions (with extended
features or extent of support)
Simple APIs
• O en state-less applica on interfaces (HTTP)

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 49


Features of NoSQL Databases
Current State: Five advantages

• Scaling
Horizontal distribu on of data among hosts
• Volume
High volumes of data that cannot be handled by RDBMS
• Administrators
No longer needed because of the automated maintenance
• Economics
Usage of cheap commodity servers, lower overall costs
• Flexibility
Relaxed or missing data schema, easier design changes

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 50


Features of NoSQL Databases
Current State: Five challenges

• Maturity
O en s ll in pre-produc on phase with key features missing
• Support
Mostly open source, limited sources of credibility
• Administra on
Some mes rela vely difficult to install and maintain
• Analy cs
Missing support for business intelligence and ad-hoc querying
• Exper se
S ll low number of NoSQL experts available in the market

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 51


Conclusion
The end of rela onal databases?
• Certainly no
They are s ll suitable for most projects
Familiarity, stability, feature set, available support, …
• However, we should also consider different database models
and systems
Polyglot persistence = usage of different data stores
in different circumstances

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 52


Lecture Conclusion
Big Data
• 4V characteris cs: volume, variety, velocity, veracity
NoSQL databases
• (New) logical models
Core: key-value, wide column, document, graph
Non-core: XML, RDF, …
• (New) principles and features
Horizontal scaling, data sharding and replica on, eventual
consistency, …

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 53


Course Overview
Outline and Objec ves

Principles
• Scaling, distribu on, consistency
• Transac ons, visualiza on, …
Technologies
• MapReduce programming model
Apache Hadoop
• Data formats
XML, JSON, RDF, …
• NoSQL databases
Core: RiakKV, Redis, MongoDB, Cassandra, Neo4j
Non-core: XML, RDF
Data models, query languages, …

NDBI040: Modern Database Concepts | Lecture 1: Introduc on | 1. 10. 2019 54

You might also like