Unit 5
Unit 5
NoSQL Database
By - Prof. Priyanka.H.Shingate
Department of Computer Engineering, ZCOER, Pune
CAP Theorem
CAP theorem states that in networked shared-data systems or distributed
systems, we can only achieve at most two out of three guarantees for a
database: Consistency, Availability and Partition Tolerance.
CAP Theorem
Consistency
Consistency means that all clients see the same data at the same time, no matter which
node they connect to.
For this to happen, whenever data is written to one node, it must be instantly forwarded or
replicated to all the other nodes in the system before the write is deemed ‘successful.’
CAP Theorem
Availability
Availability means that any client making a request for data gets a response, even if
one or more nodes are down.
Another way to state this—all working nodes in the distributed system return a valid
response for any request, without exception.
CAP Theorem
Partition tolerance:
CAP Theorem
CAP Theorem
CP database:
A CP database delivers consistency and partition Write (x=10)
Read (x)
tolerance at the expense of availability.
When a partition occurs between any two nodes, the
system has to shut down the non-consistent node
(i.e., make it unavailable) until the partition is resolved. Node A Node B
Example: In Banking system availability is not important
as consistency.
Availability Lost: (if we Fail One)
AP database:
An AP database delivers availability and partition
tolerance at the expense of consistency. Consistency Lost: Both request
When a partition occurs, all nodes remain available but should process.
those at the wrong end of a partition might return an
older version of data than others. (When the partition is
resolved, the AP databases typically resync the nodes to
Database Management Unit 4
Department of Computer Engineering, ZCOER, Pune
CAP Theorem
CA database:
A CA database delivers consistency and availability Write (x=10)
Read (x)
across all nodes.
It can’t do this if there is a partition between any two
nodes in the system, however, and therefore can’t
deliver fault tolerance. Node A Node B
Unstructured Data
All the unstructured files, log files, audio files,
and image files are included in the unstructured
data.
Some organizations have much data available, but
they did not know how to derive data value since
the data is raw.
Database Management Unit 4
Department of Computer Engineering, ZCOER, Pune
RDBMS vs NoSQL
NoSQL
• RDBMS
Stands for Not Only SQL
Structured and organized data
No declarative query language
Structured query language No predefined schema
(SQL) Key-Value pair storage, Column
Data and its relationships are Store, Document Store, Graph
stored in separate tables. databases
Data Manipulation Language, Eventual consistency rather ACID
Data Definition Language property
Tight Consistency Unstructured and unpredictable data
CAP Theorem
Prioritizes high performance, high
availability and scalability
BASE Transaction
Database Management Unit 4
Department of Computer Engineering, ZCOER, Pune
1) Non- Relational:
The relational model is never followed by NoSQL databases.
Tables with flat fixed-column records should never be used.
There are no advanced features such as query languages, query planners,
referential integrity joins, or ACID compliance.
2) Schema-Free:
NoSQL databases are either schema-free or contain schemas that are more
loose.
There is no requirement for any kind of data structure specification.
Provides data structures that are heterogeneous within the same domain.
2) Distributed:
A distributed execution of many NoSQL databases is possible.
Only ensuring long-term consistency
Graph-based Databases
• Graph-based databases focus on the relationship
between the elements.
• It stores the data in the form of nodes in the database.
• The connections between the nodes are called links or
relationships.
• Key features of graph database:
In a graph-based database, it is easy to identify the
relationship between the data by using the links.
The Query’s output is real-time results.
• An example of a social network graph.
• Given the people (nodes) and their relationships
(edges), you can find out who the "friends of friends" of
a particular person are—for example, the friends of
Howard's friends.
Database Management Unit 4
Department of Computer Engineering, ZCOER, Pune
BASE Properties
• The BASE properties of a database management system are a set of principles that
guide the design and operation of modern databases.
• The acronym BASE stands for Basically Available, Soft State, and Eventual
Consistency.
BASE Properties
1) Basically Available:
• This property refers to the fact that the database system should always be
available to respond to user requests, even if it cannot guarantee
immediate access to all data.
• The database may experience brief periods of unavailability, but it should
be designed to minimize downtime and provide quick recovery from
failures.
2) Soft State
• This property refers to the fact that the state of the database can change
over time, even without any explicit user intervention.
• This can happen due to the effects of background processes, updates
to data, and other factors.
• The database should be designed to handle this change gracefully, and
ensure that it does not lead to data corruption or loss.
Database Management Unit 4
Department of Computer Engineering, ZCOER, Pune
BASE Properties
3) Eventual Consistency
• This property refers to the eventual consistency of data in the database, despite
changes over time.
• In other words, the database should eventually converge to a consistent state, even
if it takes some time for all updates to propagate and be reflected in the data.
• This is in contrast to the immediate consistency required by traditional ACID-
compliant databases.
MongoDB Database
• MongoDB is an open-source, cross-platform, and distributed document-based
database designed for ease of application development and scaling.
• It's called a "NoSQL" database.
• It is opposite to SQL based databases where it does not normalize data under
schemas and tables where every table has a fixed structure.
• Instead, it stores data in the collections as JSON based documents and does not
enforce schemas.
• (JSON)JavaScript Object Notation, more commonly known by the acronym JSON,
is an open data interchange format that is both human and machine-readable.
MongoDB Database
• The following table lists the relation between MongoDB and RDBMS terminologies.
MongoDB Database
• In the RDBMS database, a table can have multiple rows and columns.
• Similarly in MongoDB, a collection can have multiple documents which are equivalent
to the rows.
• Each document has multiple "fields" which are equivalent to the columns. Documents
in a single collection can have different fields.
• The following is an example of JSON based document.
2) Document Oriented:
In MongoDB, all the data stored in the documents instead of tables like in
RDBMS.
In these documents, the data is stored in fields(key-value pair) instead of rows and
columns which make the data much more flexible in comparison to RDBMS. And
each document contains its unique object id.
Database Management Unit 5
Department of Computer Engineering, ZCOER, Pune
4) Replication:
MongoDB provides high availability and redundancy with the help of replication,
it creates multiple copies of the data and sends these copies to a different server
so that if one server fails, then the data is retrieved from another server.
5) High Performance:
The performance of MongoDB is very high and data persistence as compared to
another database due to its features like scalability, indexing, replication, etc.
MongoDB:CRUD Operations
• MongoDB provides a set of some basic but most essential operations that will help
you to easily interact with the MongoDB server and these operations are known as
CRUD operations.
• You can perform, create operations using the following methods provided by the
MongoDB:
Method Description
• .pretty() : this method is used to decorate the result such that it is easy to read.
It is used to update multiple documents in the collection that satisfy the given
db.collection.updateMany()
criteria.
It is used to replace single document in the collection that satisfy the given
db.collection.replaceOne()
criteria.
It is used to delete multiple documents from the collection that satisfy the given
db.collection.deleteMany()
criteria.
MongoDB: Indexing
• Indexes support the efficient resolution of queries.
• Without indexes, MongoDB must scan every document of a collection to select
those documents that match the query statement. This scan is highly inefficient
and require MongoDB to process a large volume of data.
• Indexes are special data structures, that store a small portion of the data set in
an easy-to-traverse form.
• The index stores the value of a specific field or set of fields, ordered by the
value of the field as specified in the index.
• A database index is a way to organize information so that the database engine can
quickly find the relevant results.
MongoDB: Indexing
1) The createIndex() Method:
• To create an index, you need to use createIndex() method of MongoDB.
Syntax:
>db.COLLECTION_NAME.createIndex({KEY:1})
Here key is the name of the field on which you want to create index and 1 is
for ascending order. To create index in descending order you need to use -1.
Example:
>db.mycol.createIndex({"title":1})
• In createIndex() method you can pass multiple fields, to create index on multiple
fields.
Example: >db.mycol.createIndex({"title":1,"description":-1})
MongoDB: Indexing
2. Display Index:
The getIndexes() method:
• This method returns the description of all the indexes int the collection.
Syntax: db.COLLECTION_NAME.getIndexes()
Example: > db.mycol.createIndex({"title":1,"description":-1})
> db.mycol.getIndexes()
3. Drop Index:
The dropIndex() method:
You can drop a particular index using the dropIndex() method of MongoDB.
Syntax: >db.COLLECTION_NAME.dropIndex({KEY:1})
Example: > db.mycol.dropIndex({"title":1})
MongoDB: Indexing
The dropIndexes() method:
• This method deletes multiple (specified) indexes on a collection
Syntax: >db.COLLECTION_NAME.dropIndexes()
Example: >db.mycol.dropIndexes({"title":1,"description":-1})
Descending
Example: db.Customer.find().sort({'CustID': -1})
MongoDB: Aggregation
• Aggregations operations process data records and return computed results.
• Aggregation operations group values from multiple documents together, and can
perform a variety of operations on the grouped data to return a single result.
• In SQL count(*) and with group by is an equivalent of MongoDB aggregation.
• One of the most common use cases of Aggregation is to calculate aggregate values
for groups of documents.
• This is similar to the basic aggregation available in SQL with the GROUP BY clause
and COUNT, SUM and AVG functions.
• Aggregation operations allow you to group, sort, perform calculations, analyze data,
and much more.
MongoDB: Aggregation
The aggregate() function is used to perform aggregation it can have three
operators stages, expression and accumulator.
MongoDB: Aggregation
Stages: Each stage starts from stage operators which are:
$match: It is used for filtering the documents can reduce the amount of documents
that are given as input to the next stage.
$project: It is used to select some specific fields from a collection.
$group: It is used to group documents based on some value.
$sort: It is used to sort the document that is rearranging them
$skip: It is used to skip n number of documents and passes the remaining
documents
$limit: It is used to pass first n number of documents thus limiting them.
$out: It is used to write resulting documents to a new collection
MongoDB: Aggregation
Expressions: It refers to the name of the field in input documents for e.g. { $group :
{ _id : “$id“, total:{$sum:”$fare“}}} here $id and $fare are expressions.
MongoDB: Aggregation
1. Find the customer whose status is "completed"
customer> db.Customer.aggregate([{$match:{'Status':'completed'}}])
[
{
_id: ObjectId("635a9a5d29e81711afa59d0b"),CustID: 'A123',
Amount: 500, Status: 'completed'
},
{
_id: ObjectId("635a9a7c29e81711afa59d0c"),CustID: 'A124',
Amount: 200, Status: 'completed'
}
]
MongoDB: Aggregation
2. Find sum of amount of customer under status "completed"
Example: customer>db.Customer.aggregate([{$match:{'Status':'completed'}},{$group:
{_id:'$CustID','totalAmount':{$sum:'$Amount'}}}])
Output:
[
{ _id: 'A123', totalAmount: 500 },
{ _id: 'A124', totalAmount: 200 }
]
MongoDB: Aggregation
1. Find the customer whose status is "completed"
customer> db.Customer.aggregate([{$match:{'Status':'completed'}}])
[
{
_id: ObjectId("635a9a5d29e81711afa59d0b"),CustID: 'A123',
Amount: 500, Status: 'completed'
},
{
_id: ObjectId("635a9a7c29e81711afa59d0c"),CustID: 'A124',
Amount: 200, Status: 'completed'
}
]
MongoDB: Replication
MongoDB: Replication
MongoDB: Replication
• In MongoDB, replication can be implemented for the processing where it is taken
care of that same data is accessible on more than a single MongoDB server.
• Replication also helps in protecting a database from the loss of a particular
server.
• Data can be recovered in case there are a hardware failure or service
interruptions through this concept and approach.
Advantages Disadvantages
• Helps in disaster recovery and backup of data. More space required.
• Data can be kept safe through this redundant backup
approach.
• Minimizes downtime for maintenance.
MongoDB: Replication
How Replication Works in MongoDB:
• MongoDB makes use of a replica set to achieve replication.
• Replica sets are collections of mongod instances which
targets to host the identical dataset.
• There is only one primary node associated with the replica
set.
To perform this, a minimum of three nodes are
required.
In this operation of replication, MongoDB assumes one
node of replica set as the primary node and the
remaining are secondary nodes.
From within the primary node, data gets replicated to
secondary nodes.
New primary nodes get elected in case there is
automatic maintenance or failover
Database Management Unit 5
Department of Computer Engineering, ZCOER, Pune
MongoDB: Sharding
• Sharding is a method for distributing or partitioning data across multiple
machines.
• MongoDB uses sharding to support deployments with very large data sets and high
throughput operations.
MongoDB: Sharding
There are two methods for addressing system growth: vertical and horizontal
scaling.
MongoDB: Sharding
A MongoDB sharded cluster consists of the
Sharded Cluster:
following components:
shard: Each shard contains a subset of the
sharded data. Each shard can be deployed as
a replica set.
mongos: The mongos acts as a query router,
providing an interface between client
applications and the sharded cluster. Starting
in MongoDB 4.4, mongos can support
hedged reads to minimize latencies.
config servers: Config servers store
metadata and configuration settings for the
cluster.