NoSQL - MongoDB
By
Dr T.LOGESWARI
Assistant Professor
Department of
BUSINESS ANALYTICS
PSG College of Arts
and Science
Coimbatore – 641014.
UNIT -I
Introduction to MongoDB
• Big Databases
• SQL-NoSQL Tradeoffs
• CAP Theorem
• Eventual Consistency NoSQL
• Database Types
MongoDB - Introduction
• Need
• MongoDB Vs RDBMS
• MongoDB Driver Installation
• Configuration
• Import & Export MongoDB Server Configuration
Big Database
•Definition: Databases specifically designed to store and
manage large-scale data efficiently.
•Types:
•Relational (e.g., PostgreSQL, MySQL – but limited for very
big data)
•NoSQL (e.g., MongoDB, Cassandra – designed for scale)
•Distributed Databases (e.g., Hadoop HDFS, Amazon
Redshift)
•Focus: How to organize, store, retrieve, and update large
volumes of data efficiently and reliably.
•Examples: Google Bigtable, Apache Cassandra, Amazon
DynamoDB.
Big Databases are designed for:
• Big databases are specifically designed to handle
large-scale data that traditional databases cannot
efficiently manage. Here's what they are designed for:
1. Scalability
• Handle huge volumes of data, from terabytes to
petabytes.
• Support horizontal scaling (adding more machines)
and vertical scaling (upgrading hardware).
2.High Availability & Fault Tolerance
• Ensure data is always accessible, even if some
servers fail.
• Use replication and clustering to avoid data loss
and downtime.
3. Fast Data Access & Query Performance
• Optimize read and write operations for large data.
• Use indexing, partitioning, and caching to ensure
quick responses.
4. Support for Different Data Models
• Manage structured, semi-structured,
and even some unstructured data.
• Examples:
• Relational databases (structured)
• Document databases like MongoDB (semi-
structured)
• Column stores like Cassandra (structured but
flexible)
5. Distributed Architecture
• Data is stored across multiple machines
or locations.
• Used in cloud environments or on-
premise data centers.
Examples of Big Databases
Type Example Use Case
Flexible
NoSQL
MongoDB schema, fast
Document
inserts
High write
Apache
Column Store throughput, IoT
Cassandra
data
Real-time
Wide-column Google Bigtable analytics, time-
series data
Distributed High-availability
Amazon Aurora
RDBMS SQL-based apps
Basic Understanding
1.Which big database is a NoSQL Document
type?
a) Google Bigtable
b) MongoDB
c) Amazon Aurora
d) Apache Cassandra
2.Fill in the blank:
________ is used for high write throughput and is
suitable for IoT data.
3.What type of database is Amazon Aurora?
Answer: __________
4.Which database supports real-time
analytics and time-series data?
5.True or False: Apache Cassandra is a wide-
column database.
Introduction to NoSQL
• The term “NoSQL” actually means “Not
Only SQL“.
• non-relational” database
• store data in an unstructured form, without
following a fixed schema.
• distributed data stores with high storage
capacity requirements.
• used for Big Data and real-time web
applications.
• Twitter, Facebook or Google collect several
terabytes of data about their users every day.
Introduction to NoSQL
History - NoSQL
• 1998 by Carl Strozz, in order to
designate his lightweight and
open source relational database.
• Adopted and popularized by
GAFAMs such as Google,
Facebook or Amazon faced with
huge volumes of data
• 2000 - graphical database Neo4j
• 2004 - Google Bigtable
• 2005 - CouchDB
• 2007 - Amazon Dynamo
• 2008 - Facebook made open
source the non-relational
database it uses internally:
Cassandra
Data Store - RDBMS
Data Store - NoSQL
Categories (Data Store)-
NoSQL
• Document databases: These databases store data
as semi-structured documents, such as JSON or
XML, and can be queried using document-oriented
query languages.
• Key-value stores: These databases store data as
key-value pairs, and are optimized for simple and
fast read/write operations.
• Column-family stores: These databases store
data as column families, which are sets of columns
that are treated as a single entity. They are
optimized for fast and efficient querying of large
amounts of data.
• Graph databases: These databases store data as
nodes and edges, and are designed to handle
complex relationships between data.
Categories - NoSQL
Data Store - Key Value
True Key-Value Store Format:
In a Key-Value Store, each entry is a key and a value, like this:
• Key → Value
• "user:111" → {FName: "ABC", LName:
"XYZ", City: "Bangalore", ...}
• "order:200" → {UserId: 111, Product:
"Mobile", Amount: 1000}
Data Store - Document
• Analogy:
• “A Document Store is like a folder full of individual Word
files (documents), each with different structures. You
can open and read any one without needing to follow a
strict format.”
Data Store - Column
• What is a Wide Column Store?
• A wide column store stores data in rows and columns,
but columns are grouped into families (super
columns), and each row can have a different structure.
• 🔸 Example Systems: Apache Cassandra, Apache HBase
• Think of it like a shelf of boxes, where each box is a row,
and inside each box you can have different sets of
labeled folders (column families) depending on what
that row needs."
Data Store - Graph
SQL - NoSQL Tradeoff
SQL - NoSQL Tradeoff
CAP Theorem
CAP Theorem
• CA(Consistency and Availability)-
CA systems prioritize Consistency and
Availability, but are not Partition Tolerant.
CA is mostly theoretical and rarely
implemented in modern distributed databases.
Example databases: Cassandra,
CouchDB, Riak, Voldemort.
• AP(Availability and Partition
Tolerance)-
• The system prioritizes
availability over consistency and can
respond with possibly stale data.
• The system can be distributed
across multiple nodes and is designed
to operate reliably even in the face of
network partitions.
• Example databases: Amazon
DynamoDB, Google Cloud Spanner.
• CP(Consistency and Partition
Tolerance)-
• The system prioritizes consistency
over availability and responds with the
latest updated data.
• The system can be distributed
across multiple nodes and is designed to
operate reliably even in the face of network
partitions.
• Example databases: Apache HBase,
MongoDB, Redis.
Database
Hierarchical Databases
This database follows the progression of data being
categorized in ranks or levels, wherein data is
categorized based on a common point of linkage.
EG: IBM - IMS
Network Databases
A network or net of database files linked with multiple
threads is observed. Notice how the Student, Faculty,
and Resources elements each have two-parent
records, which are Departments and Clubs.
EG: Integrated Data Store
Object-Oriented Databases
Information stored in a database is capable of
being represented as an object which
response as an instance of the database
model. Therefore, the object can be
referenced and called without any difficulty.
• Different objects linked to one another using
methods; one can get the address of the Person
(represented by the Person Object) using the
livesAt() method. Eg: ObjectDB
Relational Databases
In this database, every piece of information has a
relationship with every other piece of information.
This is on account of every data value in the database
having a unique identity in the form of a record.
Cloud Databases
• A cloud database is used
where data requires a
virtual environment for
storing and executing
over the cloud platforms
and there are so many
cloud computing services
for accessing the data
from the databases (like
SaaS, Paas, etc).
• There are some names of
cloud platforms are-
• Amazon Web Services (AWS)
• Google Cloud Platform (GCP)
• Microsoft Azure
•
Centralized Databases
•A centralized database is
basically a type of database that
is stored, located as well as
maintained at a single location
and it is more secure when the
user wants to fetch the data from
the Centralized Database.
Advantages
• Data Security
• Reduced Redundancy
• Consistency
Disadvantages
• The size of the centralized database is
large which increases the response and
retrieval time.
• It is not easy to modify, delete and
update.
Distributed Databases
• A type of database which
consists of multiple
databases that are
connected with each other
and are spread across
different physical
locations.
• Typically, distributed
databases operate on two
or more interconnected
servers on a computer
network.
• EG: Apache Ignite, Apache
Cassandra, Apache HBase,
Amazon SimpleDB
NoSQL Databases
Advantages of NoSQL
• There are many advantages of working
with NoSQL databases such as MongoDB
and Cassandra. The main advantages are
high scalability and high availability.
Disadvantages of NoSQL
• NoSQL has the following disadvantages.
• NoSQL is an open-source database.
• GUI is not available
• Backup is a weak point for some NoSQL databases
like MongoDB.
• Large document size.
MongoDB - Introduction
MongoDB - Introduction
• MongoDB is one of the most popular open-
source NoSQL database written in C++. As of
February 2015,
• MongoDB is the fourth most popular
database management system.
• It was developed by a company 10gen which
is now known as MongoDB Inc.
• MongoDB is a document-oriented
database which stores data in JSON-like
documents with dynamic schema.
• It means you can store your records
without worrying about the data
structure such as the number of fields or
types of fields to store values.
• MongoDB documents are similar to
JSON objects
MongoDB - History
• Dwight Merriman and Eliot Horowitz,
encountered development and scalability
issues with traditional relational database
approaches
• MongoDB - document-based database
which is developed in the C++
programming languages.
• The word Mongo is basically derived from
Humongous. MongoDB was first
developed by a New York-based
organization named 10gen in the year of
2007.
• Later 10gen changed the name and known
as MongoDB Inc as of today.
• At the beginning, MongoDB is basically
developed as a PAAS (Platform as a
Service) database.
• But, in the year 2009, it was introduced as
an open source database as named
MongoDB 1.0.
• The below diagram demonstrates the
release history of MongoDB to date.
• MongoDB 4.0 is the current stable version
which is released in February, 2018.
MongoDB - Need
• Flexibility:
MongoDB uses documents that can contain sub-
documents in complex hierarchies making it expressive
and flexible.
MongoDB can map objects from any programming
language, ensuring easy implementation and
maintenance.
• Flexible Query Model:
The user can selectively index some parts of each
document or a query based on regular expressions,
ranges, or attribute values, and have as many
properties per object as needed by the application layer.
• Native Aggregation:
• Native aggregation allows users to extract
and transform data from the database.
• The data can either be loaded into a new
format or exported to other data sources.
• Schema-less model:
• Applications get the power and
responsibility to interpret different
properties found in a collection's
documents.
MongoDB - Need
MongoDB - Need
1.General-Purpose Database:
MongoDB can serve diverse sets of data and multiple
purposes within a single application.
2. Flexible Schema Design:
The document-oriented approach allows non-defined
attributes to be modified on the fly.
This is a key contrast between MongoDB and other
relational databases.
3. Load Balancing and Scalability:
It is built to scale, both vertically and horizontally. Using the
technique of sharding, an architect can achieve both write
and read scalability.
Data balancing occurs automatically and transparently to
MongoDB - Need
4. Aggregation Framework:
MongoDB offers an Extract, Transform, Load (ETL) framework which
eliminates the need for complex data pipelines.
5. Native Replication:
Data gets replicated across a replica set without a complicated setup.
6. Security Features:
Authentication and authorization are taken into account.
7. JSON:
JSON is widely used across for frontend and API communication. It only
makes sense for the database to use the same protocol.
8. MapReduce:
MongoDB offers a great tool, MapReduce to build data pipelines(Batch
Processing Framework).
Install MongoDB on Windows
• MongoDB 4.4 and later only support 64-bit versions of
Windows.
• MongoDB 7.0 Community Edition supports the following
64-bit versions of Windows on x86_64 architecture:
• Windows Server 2022
• Windows Server 2019
• Windows 11
Install MongoDB on Windows
• Step 1: Go to the MongoDB Download Center
to download the MongoDB Community Server.
Install MongoDB on Windows
• Step 2: When the download is complete
open the msi file and click the next button in
the startup screen
Install MongoDB on Windows
• Step 3: Now accept the End-User License
Agreement and click the next button
Install MongoDB on Windows
• Step 4: Now select the complete option to
install all the program features.
Install MongoDB on Windows
• Step 5: Select “Run service as Network
Service user” and copy the path of the
data directory. Click Next
Install MongoDB on Windows
• Step 6: Click the Install button to start the
MongoDB installation process
Install MongoDB on Windows
• Step 7: After clicking on the install button
installation of MongoDB begins
• Step 8: Now click the Finish button to
complete the MongoDB installation
process