RDBMS vs NoSQL
Summary
Б.Наранчимэг
Мэдээлэл, компьютерийн ухааны тэнхим
ХШУИС, МУИС
naranchimeg@[Link]
Relational and Non-relational DB
• Relational databases
• Also called RDBMS or SQL databases
• Most popular are Microsoft SQL server, MySQL, Oracle DB, PostgreSQL
• Mostly used in large enterprise scenarios
• Non-relational databases
• Also called NoSQL databases
• Most popular are MongoDB, DocumentDB, Redis, Neo4j
• Usually grouped into four categories:
• Key-value stores
• Wide-column stores
• Document stores
• Graph stores
NoSQL
• NoSQL is an umbrella term for all databases and data stores that
don’t follow the RDBMS principles
• A class of products
• A collection of several (related) concepts about data storage and
manipulation
• Often related to large data sets
NoSQL and Big Data
• NoSQL comes from Internet, thus it is often related to the “big data”
concept
• How much big are “big data”?
• Over few terabytes Enough to start spanning multiple storage units
• Challenges
• Efficiently storing and accessing large amounts of data is difficult, even more
considering fault tolerance and backups
• Manipulating large data sets involves running immensely parallel processes
• Managing continuously evolving schema and metadata for semi-structured
and un-structured data is difficult
How did we get here?
• Explosion of social media sites (Facebook, Twitter) with large data needs
• Rise of cloud-based solutions such as Amazon S3 (simple storage solution)
• Just as moving to dynamically-typed languages (Python, Ruby, Groovy), a shift to
dynamically-typed data with frequent schema changes
• Open-source community
Why are RDBMS not suitable for Big Data
• The context is Internet
• RDBMSs assume that data are
• Dense
• Largely uniform (structured data)
• Data coming from Internet are
• Massive and sparse
• Semi-structured or unstructured
• With massive sparse data sets, the typical storage mechanisms and
access methods get stretched
NoSQL Database Types
Discussing NoSQL databases is complicated
because there are a variety of types:
•Sorted ordered Column Store
•Optimized for queries over large datasets, and store
columns of data together, instead of rows
•Document databases:
•pair each key with a complex data structure known as a document.
•Key-Value Store :
•are the simplest NoSQL databases. Every single item in the database is stored as an attribute
name (or 'key'), together with its value.
•Graph Databases :
•are used to store information about networks of data, such as social connections.
Document Databases (Document Store)
• Documents
• Loosely structured sets of key/value pairs in documents, e.g., XML, JSON,
BSON
• Encapsulate and encode data in some standard formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a document into its
constituent name/value pairs
• Allow documents retrieving by keys or contents
• Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and more)
Key/Value stores
• Store data in a schema-less way
• Store data as maps
• HashMaps or associative arrays
• Provide a very efficient average running
time algorithm for accessing data
• Notable for:
• Couchbase (Zynga, Vimeo, NAVTEQ, ...)
• Redis (Craiglist, Instagram, StackOverfow,
flickr, ...)
• Amazon Dynamo (Amazon, Elsevier,
IMDb, ...)
• Apache Cassandra (Facebook, Digg,
Reddit, Twitter,...)
• Voldemort (LinkedIn, eBay, …)
• Riak (Github, Comcast, Mochi, ...)
Document Databases, JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T[Link].911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
Sorted Ordered Column-Oriented Stores
• Data are stored in a column-oriented way
• Data efficiently stored
• Avoids consuming space for storing nulls
• Columns are grouped in column-families
• Data isn’t stored as a single table but is stored by column families
• Unit of data is a set of key/value pairs
• Identified by “row-key”
• Ordered and sorted based on row-key
• Notable for:
• Google's Bigtable (used in all
Google's services)
• HBase (Facebook, StumbleUpon,
Hulu, Yahoo!, ...)
Яагаад мэдлэгийн сан (Мэдлэгийн граф)?
12
Мэдлэгийн граф Thing
Building
Organization
Course Хичээлийн
байр 3А
Person
is_a
ХШУИС
Knowledge
and data
integration
Мэдлэг ба
Course
Өгөгдлийн нэгтгэл schedule Room
Амарсанаа Наранчимэг
МКУТ
Knowledge
225
and data
integration
103
13
Лекц Мягмар
Summary
• Database
• DBMS
• Relational model
• Relational keys
• Relational integrity
• Entity integrity
• Referential integrity
• SQL
• DDL
• DML
Relational model - Data structures
Rows
Table name
Column name
Branch
Schema
Data type
branchNo Street City Postcode
char(4) varchar(25) varchar(15) varchar(8) Stable over time
B005 22 Deer Rd London SW1 4EH
Rows B007 16 Argyll St Aberdeen AB2 3SU
B003 163 Main St Glasgow G11 9QX Cardinality
B004 32 Manse Rd Bristol BS99 1NZ
Dynamic, changes
B002 56 Clover Dr London NQ10 6EU over time
Degree
Summary
• Relational Algebra
• Data modeling
• ERD
• EERD
• ERD, EERD - > mapping - > relation
• Normalization
• Database life cycle
• Database security