100% found this document useful (2 votes)
1K views14 pages

Unit-5 NoSQL Data Management-Big Data

The document provides information about NoSQL databases. It defines NoSQL as a non-relational data management system that does not require a fixed schema. It explains that NoSQL databases are used for big data and real-time web applications to store large volumes of data. The document then discusses some of the key features of NoSQL databases, including being non-relational, schema-free, having simple APIs, and being distributed. It also covers the different types of NoSQL databases.

Uploaded by

Purnachary Chary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (2 votes)
1K views14 pages

Unit-5 NoSQL Data Management-Big Data

The document provides information about NoSQL databases. It defines NoSQL as a non-relational data management system that does not require a fixed schema. It explains that NoSQL databases are used for big data and real-time web applications to store large volumes of data. The document then discusses some of the key features of NoSQL databases, including being non-relational, schema-free, having simple APIs, and being distributed. It also covers the different types of NoSQL databases.

Uploaded by

Purnachary Chary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 14

UNIT-5|NoSQL Data Management

Unit-5
NoSQL Data Management

What is NoSQL?
NoSQL Database is a non-relational Data Management System, that does not
require a fixed schema. It avoids joins, and is easy to scale. The major purpose of
using a NoSQL database is for distributed data stores with humongous data
storage needs. NoSQL is used for Big data and real-time web apps. For example,
companies like Twitter, Facebook and Google collect terabytes of user data every
single day.

NoSQL database stands for "Not Only SQL" or "Not SQL." Though a better term
would be "NoREL", NoSQL caught on. Carl Strozz introduced the NoSQL concept in
1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights.
Instead, a NoSQL database system encompasses a wide range of database
technologies that can store structured, semi-structured, unstructured and
polymorphic data. Let's understand about NoSQL with a diagram in this NoSQL
database tutorial:

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data. The system
response time becomes slow when you use RDBMS for massive volumes of data.

To resolve this problem, we could "scale up" our systems by upgrading our
existing hardware. This process is expensive.

The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as "scaling out."

NoSQL database is non-relational, so it scales out better than relational databases


as they are designed with web applications in mind.

Brief History of NoSQL Databases


• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched 2
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
Big Data Analytics Notes composed by M.Purnachary MCA V SEM
Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Features of NoSQL
Non-relational

• NoSQL databases never follow the relational model


• Never provide tables with flat fixed-column records
• Work with self-contained aggregates or BLOBs
• Doesn't require object-relational mapping and data normalization
• No complex features like query languages, query planners,

referential integrity joins, ACID

Schema-free

• NoSQL databases are either schema-free or have relaxed schemas


• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain

NoSQL is Schema-Free

Simple API

• Offers easy to use interfaces for storage and querying data provided 3
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management
• Mostly used no standard based NoSQL query language
• Web-enabled databases running as internet-facing services

Distributed

• Multiple NoSQL databases can be executed in a distributed fashion


• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Mostly no synchronous replication between distributed nodes
Asynchronous Multi-Master Replication, peer-to-peer, HDFS Replication
• Only providing eventual consistency
• Shared Nothing Architecture. This enables less coordination and higher
distribution.

NoSQL is Shared Nothing.

Architecture with NoSQL


In order to understand how to properly architect applications with NoSQL
databases you must understand the separation of concerns between data
management and data storage. The past era of SQL based databases attempted
to satisfy both concerns with databases. This is very difficult, and inevitably
applications would take on part of the task of data management, providing
certain validation tasks and adding modeling logic. One of the key concepts of the
NoSQL movement is to have DBs focus on the task of high-performance scalable 4
data storage, and provide low-level access to a data management layer in a way
that allows data management tasks to be conveniently written in the

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

programming language of choice rather than having data management logic


spread across Turing-complete application languages, SQL, and sometimes even
DB-specific stored procedure languages.

Complex Data Structures

One important capability that most NoSQL databases provide is hierarchical


nested structures in data entities. Hierarchical data and data with list type
structures are easily described with JSON and other formats used by NoSQL
databases, where multiple tables with relations would be necessary in traditional
SQL databases to describe these data structures. Furthermore, JSON (or
alternatives) provide a format that much more closely matches the common
programming languages data structure, greatly simplifying object mapping. The
ability to easily store object-style structures without impedance mismatch is a big
attractant of NoSQL.

Nested data structures work elegantly in situations where the


children/substructures are always accessed from within a parent document.
5
Object oriented and RDF databases also work well with data structures that are
uni-directional, one object is accessed from another, but not vice versa. However,

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

if the data entities may need to be individually accessed and updated or relations
are bi-directional, real relations become necessary. For example, if we had a
database of employees and employers, we could easily envision scenarios where
we would start with an employee and want to find their employer, or start with
an employer and find all their employees. It may also be desirable to individually
update an employee or employer without having to worry about updating all the
related entities.

In some situations, nested structures can eliminate unnecessary bi-directional


relations and greatly simplify database design, but there are still critical parts of
real applications where relations are essential.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair, Column-
oriented, Graph-based and Document-oriented. Every category has its unique
attributes and limitations. None of the above-specified database is better to solve
all the problems. Users should select the database based on their product needs.

Types of NoSQL Databases:

• Key-value Pair Based


• Column-oriented Graph
• Graphs based
• Document-oriented

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management
Key Value Pair Based

Data is stored in key/value pairs. It is designed in such a way to handle lots of data
and heavy load.

Key-value pair storage databases store data as a hash table where each key is
unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like "Website" associated with a
value like "Guru99".

It is one of the most basic NoSQL database example. This kind of NoSQL database
is used as a collection, dictionaries, associative arrays, etc. Key value stores help
the developer to store schema-less data. They work best for shopping cart
contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases.
They are all based on Amazon's Dynamo paper.

Column-based

Column-oriented databases work on columns and are based on BigTable paper by


Google. Every column is treated separately. Values of single column databases are
stored contiguously.
7

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT, AVG,
MIN etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data


warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based
database.

Document-Oriented:

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but
the value part is stored as a document. The document is stored in JSON or XML
formats. The value is understood by the DB and can be queried.

Relational Vs. Document


8
In this diagram on your left you can see we have rows and columns, and in the
right, we have a document database which has a similar structure to JSON. Now

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management
for the relational database, you have to know what columns you have and so on.
However, for a document database, you have data store like JSON object. You do
not require to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e-commerce applications. It should not use for complex transactions
which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Graph-Based

A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph


database is a multi-relational in nature. Traversing relationship is fast as they are
already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based 9

databases.

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Query Mechanism tools for NoSQL


The most common data retrieval mechanism is the REST-based retrieval of a value
based on its key/ID with GET resource

Document store Database offers more difficult queries as they understand the
value in a key-value pair. For example, CouchDB allows defining views with
MapReduce

What is the CAP Theorem?

CAP theorem is also called brewer's theorem. It states that is impossible for a
distributed data store to offer more than two out of three guarantees

1. Consistency
2. Availability
3. Partition Tolerance

Consistency - This means that the data in the database remains consistent after
the execution of an operation. For example after an update operation all clients
see the same data.

Availability - This means that the system is always on (service guarantee


availability), no downtime.

Partition Tolerance - This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers may be
partitioned into multiple groups that cannot communicate with one another.

In theoretically it is impossible to fulfill all 3 requirements. CAP provides the


basic requirements for a distributed system to follow 2 of the 3
requirements. Therefore all the current NoSQL database follow the
different combinations of the C, A, P from the CAP theorem. Here is the
brief description of three combinations CA, CP, AP :
CA - Single site cluster, therefore all nodes are always in contact. When a
10
partition occurs, the system blocks.
CP -Some data may not be accessible, but the rest is still
consistent/accurate.
Big Data Analytics Notes composed by M.Purnachary MCA V SEM
Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

AP - System is still available under partitioning, but some of the data


returned may be inaccurate.

Eventual Consistency

The term "eventual consistency" means to have copies of data on multiple


machines to get high availability and scalability. Thus, changes made to any data
item on one machine has to be propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated


immediately while others in due course of time. These copies may be mutually,
but in due course of time, they become consistent. Hence, the name eventual
consistency.

BASE: Basically Available, Soft state, Eventual consistency

• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without an input; the system state may change
• Eventual consistency means that the system will become consistent over
time 11

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Advantages of NoSQL
• Can be used as Primary or Analytic Data Source
• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal
effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don't need a dedicated high-performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and
complexity
• Excels at distributed database and multi-data center operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design which can easily be altered without
downtime or service disruption

12

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when
multiple transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values
as keys become difficult
• Doesn't work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.

Summary
• NoSQL is a non-relational DMS, that does not require a fixed schema,
avoids joins, and is easy to scale
• The concept of NoSQL databases beccame popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data
• In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight,
open-source relational database
• NoSQL databases never follow the relational model it is either schema-free
or has relaxed schemas
• Four types of NoSQL Database are 1).Key-value Pair Based 2).Column-
oriented Graph 3). Graphs based 4).Document-oriented
• NOSQL can handle structured, semi-structured, and unstructured data with
equal effect
• CAP theorem consists of three words Consistency, Availability, and Partition
Tolerance
• BASE stands for Basically Available, Soft state, Eventual consistency
• The term "eventual consistency" means to have copies of data on multiple
machines to get high availability and scalability
• NOSQL offer limited query capabilities

13

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU
UNIT-5|NoSQL Data Management

References:

1. https://www.w3resource.com/
2. https://www.sitepen.com/
3. https://www.guru99.com/

14

Big Data Analytics Notes composed by M.Purnachary MCA V SEM


Department of Informatics| Nizam College,OU

You might also like