Unit-5 NoSQL Data Management-Big Data
Unit-5 NoSQL Data Management-Big Data
Unit-5
NoSQL Data Management
What is NoSQL?
NoSQL Database is a non-relational Data Management System, that does not
require a fixed schema. It avoids joins, and is easy to scale. The major purpose of
using a NoSQL database is for distributed data stores with humongous data
storage needs. NoSQL is used for Big data and real-time web apps. For example,
companies like Twitter, Facebook and Google collect terabytes of user data every
single day.
NoSQL database stands for "Not Only SQL" or "Not SQL." Though a better term
would be "NoREL", NoSQL caught on. Carl Strozz introduced the NoSQL concept in
1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights.
Instead, a NoSQL database system encompasses a wide range of database
technologies that can store structured, semi-structured, unstructured and
polymorphic data. Let's understand about NoSQL with a diagram in this NoSQL
database tutorial:
Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data. The system
response time becomes slow when you use RDBMS for massive volumes of data.
To resolve this problem, we could "scale up" our systems by upgrading our
existing hardware. This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as "scaling out."
Features of NoSQL
Non-relational
Schema-free
NoSQL is Schema-Free
Simple API
• Offers easy to use interfaces for storage and querying data provided 3
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST with JSON
Distributed
if the data entities may need to be individually accessed and updated or relations
are bi-directional, real relations become necessary. For example, if we had a
database of employees and employers, we could easily envision scenarios where
we would start with an employee and want to find their employer, or start with
an employer and find all their employees. It may also be desirable to individually
update an employee or employer without having to worry about updating all the
related entities.
Data is stored in key/value pairs. It is designed in such a way to handle lots of data
and heavy load.
Key-value pair storage databases store data as a hash table where each key is
unique, and the value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like "Website" associated with a
value like "Guru99".
It is one of the most basic NoSQL database example. This kind of NoSQL database
is used as a collection, dictionaries, associative arrays, etc. Key value stores help
the developer to store schema-less data. They work best for shopping cart
contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases.
They are all based on Amazon's Dynamo paper.
Column-based
They deliver high performance on aggregation queries like SUM, COUNT, AVG,
MIN etc. as the data is readily available in a column.
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based
database.
Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but
the value part is stored as a document. The document is stored in JSON or XML
formats. The value is understood by the DB and can be queried.
The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e-commerce applications. It should not use for complex transactions
which require multiple operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.
Graph base database mostly used for social networks, logistics, spatial data.
databases.
Document store Database offers more difficult queries as they understand the
value in a key-value pair. For example, CouchDB allows defining views with
MapReduce
CAP theorem is also called brewer's theorem. It states that is impossible for a
distributed data store to offer more than two out of three guarantees
1. Consistency
2. Availability
3. Partition Tolerance
Consistency - This means that the data in the database remains consistent after
the execution of an operation. For example after an update operation all clients
see the same data.
Partition Tolerance - This means that the system continues to function even the
communication among the servers is unreliable, i.e. the servers may be
partitioned into multiple groups that cannot communicate with one another.
Eventual Consistency
• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without an input; the system state may change
• Eventual consistency means that the system will become consistent over
time 11
Advantages of NoSQL
• Can be used as Primary or Analytic Data Source
• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal
effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don't need a dedicated high-performance server
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and
complexity
• Excels at distributed database and multi-data center operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design which can easily be altered without
downtime or service disruption
12
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when
multiple transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values
as keys become difficult
• Doesn't work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.
Summary
• NoSQL is a non-relational DMS, that does not require a fixed schema,
avoids joins, and is easy to scale
• The concept of NoSQL databases beccame popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data
• In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight,
open-source relational database
• NoSQL databases never follow the relational model it is either schema-free
or has relaxed schemas
• Four types of NoSQL Database are 1).Key-value Pair Based 2).Column-
oriented Graph 3). Graphs based 4).Document-oriented
• NOSQL can handle structured, semi-structured, and unstructured data with
equal effect
• CAP theorem consists of three words Consistency, Availability, and Partition
Tolerance
• BASE stands for Basically Available, Soft state, Eventual consistency
• The term "eventual consistency" means to have copies of data on multiple
machines to get high availability and scalability
• NOSQL offer limited query capabilities
13
References:
1. https://www.w3resource.com/
2. https://www.sitepen.com/
3. https://www.guru99.com/
14