NOSQL
MODULE-1
Why NOSQL
Aggregate Data Models
More Details on Data Models
WHY NOSQL
NoSQL database provides much more
flexibility when it comes to handling
data. There is no requirement to specify
the schema to start working with the
application. Also, the NoSQL database
doesn't put a restriction on the types of
data you can store together. It allows you
to add more new types as your needs
change
THE VALUE OF RELATIONAL
DATABASES
Getting at Persistent Data
Concurrency
Integration
A (Mostly) Standard Model
GETTING AT PERSISTENT DATA
Need to Store Large Data.
Two Ways storing Data
Main Memory – Limited in Space – loss of Data due to
power failures
Backing Data - Large in Size – Slower
Productivity Apps – Word Processor – File System
Enterprise Applications – Database
CONCURRENCY
Multiple Users Accessing at a Time
Majorly Modifying Data
Transaction Handling (Deadlock)
Transactions should be Rolled Back if Needed
Hotel Room Booking
INTEGRATION
Applications Written by Multiple Teams
Collaboration
Shared Database Integration
Concurrency Control of Database handles Multiple
Applications
A (MOSTLY) STANDARD MODEL
Relational databases have succeeded because they provide
the core benefits we outlined earlier in a (mostly) standard
way
Vendors Might Differ but not the Benefits
Note:
Every RDBMS system must follow the same model or same structure or
same definition, only difference is queries may change.
IMPEDANCE MISMATCH
Though RDBMS provides many advantages still it is not
perfect. One of the dissatisfaction for developers is
“Impedance Mismatch”
Impedance Mismatch
The difference between the relational model and the in-
memory data structures
IMPEDANCE MISMATCH
The relational data model organizes data into a structure
of tables and rows, or more properly, relations and tuples
The values in a relational tuple have to be simple—they
cannot contain any structure, such as a nested record or a
list
if you want to use a richer in-memory data structure, you
have to translate it to a relational representation to store
it on disk
IMPEDANCE MISMATCH
IMPEDANCE MISMATCH
The Solution in earl 2000’s is OOP (object oriented
programming) and OOD (object oriented Database) .
OOD given solution to Impedance Mismatch
Major issue is Integration with RDBMS
Frame Works for Integrations like HIBERNATE
Solution is not Feasible
APPLICATION AND INTEGRATION
DATABASES
Integration Database
with multiple applications, usually developed by
separate teams, storing their data in a common
database. This improves communication because all
the applications are operating on a consistent set of
persistent data
Complexity has been Increased
Number of Applications is a Tedious Task
In 2000’s the Paradigm Shift is “WEB SERVICES”
APPLICATION AND INTEGRATION
DATABASES
HTTP
Flexibility in Exchanging the Data through HTTP REQ/RESP
XML or JSON
Application Specific Database instead of Integrated
Database
Eg: flipkart website- working as Application Specific.
ATTACK OF THE CLUSTERS
Growth in Millenium in the Name of Applications and
Databases
Y2K Problem
Traffic on Websites Increased
Social Media
Log Data
Mapping of Data
To handle this kind of increase, you have two choices: up or
out
SCALE UP or GO OUT OF THE MARKET
Eg: orkut
ATTACK OF THE CLUSTERS
Scaling up implies bigger machines, more processors,
disk storage, and memory. But bigger machines get more
and more expensive, not to mention that there are real
limits as your size increases. The alternative is to use lots
of small machines in a cluster.
A cluster of small machines can use commodity
hardware and ends up being cheaper at these kinds of
scales. It can also be more resilient—while individual
machine failures are common, the overall cluster can be
built to keep going despite such failures, providing high
reliability.
ATTACK OF THE CLUSTERS
Relational databases are not designed to be run on
clusters
Clustered relational databases, such as the Oracle RAC
or Microsoft SQL Server, work on the concept of a
shared disk subsystem
This mismatch between relational databases and clusters
led some organization to consider an alternative route to
data storage. Two companies in particular—Google and
Amazon
BigTable from Google and Dynamo from Amazon.
THE EMERGENCE OF NOSQL
Late 90’s
Open Source
Carlo Strozzi
This database stores its tables as ASCII files, each tuple
represented by a line with fields separated by tabs
The name comes from the fact that the database doesn’t
use SQL as a query language
The database is manipulated through shell scripts that
can be combined into the usual UNIX pipelines
THE EMERGENCE OF NOSQL
Relational databases use ACID transactions to handle
consistency across the whole database.
NoSQL databases offer a range of options for
consistency and distribution
Graph databases are one style of NoSQL databases that
uses a distribution model similar to relational databases
but offers a different data model that makes it better at
handling data with complex relationships.
NoSQL databases operate without a schema
Useful when dealing with nonuniform data
KEY POINTS
Relational databases have been a successful technology for twenty
years, providing persistence, concurrency control, and an integration
mechanism.
Application developers have been frustrated with the impedance
mismatch between the relational model and the in-memory data
structures.
There is a movement away from using databases as integration points
towards encapsulating databases within applications and integrating
through services.
The vital factor for a change in data storage was the need to support
large volumes of data by running on clusters. Relational databases are
not designed to run efficiently on clusters.
NoSQL is an accidental neologism. There is no prescriptive definition
—all you can make is an observation of common characteristics.
KEY POINTS
The common characteristics of NoSQL databases are
Not using the relational model
Running well on clusters
Open-source
Built for the 21st century web estates
Schemaless
The most important result of the rise of NoSQL is
Polyglot Persistence – Various Data Storage options are
available
AGGREGATE DATA MODELS
A data model is the model through which we perceive
and manipulate our data
Data Model describes how we interact with the data in
the database
Distinct from a storage model, which describes how the
database stores and manipulates the data internally
Developer might point to an entity-relationship diagram
of their database and refer to that as their data model
containing customers, orders, products, and the like
AGGREGATE DATA MODELS
Relational Model
Consists of Rows and Columns in the form of Tables
NoSQL solution has a different model that it uses, which
we put into four categories widely used in the NoSQL
ecosystem:
Key-Value
Document
Column-Family
Graph
AGGREGATES
Relational model takes the information that we want to
store and divides it into tuples (rows)
A tuple is a limited data structure
Cannot nest one tuple within another to get nested
records, nor can you put a list of values or tuples within
another.
aggregate is a collection of related objects that we wish to
treat as a unit. Aggregate will write with JSON or XML.
Eg: kaggle where you can get datasets
DATA MODEL ORIENTED AROUND A
RELATIONAL DATABASE(USING UML)
A column store database is a type of database
that stores data using a column oriented model.
A column store database can also be referred to
as a:
• Column database
• Column family database
• Column oriented database
• Wide column store database
• Wide column store
• Columnar database
• Columnar store
The Structure of a Column Store Database
Columns store databases use a concept called a keyspace. A
keyspace is kind of like a schema in the relational model. The
keyspace contains all the column families (kind of like tables in
the relational model), which contain rows, which contain
columns.
GRAPH DATABASES
UPDATING MV