NOSQL Module-3

This document discusses key-value databases and the MapReduce programming model. It begins by describing the basic components of MapReduce including the map and reduce functions, partitioning, and combining. It then provides an example of a two-stage MapReduce job for monthly sales records. The document also discusses key-value databases, describing them as simple hash tables that store data via a key-value pair. Popular key-value databases like Redis, Riak and DynamoDB are listed. Finally, suitable and unsuitable use cases for key-value stores are outlined.

Uploaded by

Amina Sultana

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

100% found this document useful (2 votes)

588 views67 pages

NOSQL Module-3

Uploaded by

Amina Sultana

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 67

Module:3 Map-reduce

Map Function
Reduce Function
Partitioning
Combining
Unique Reduce Functions
Composing Map-Reduce Calculations
A Two Stage Map-Reduce Example
Creating records for monthly sales of a
product
The second stage mapper creates base
records for year-on-year comparisons.
The reduction step is a merge of incomplete
records
Incremental Map-Reduce
Key-Value Databases
• A key-value store is a simple hash table, primarily used when all
access to the database is via primary key.
• Think of a table in a traditional RDBMS with two columns, such as ID
and NAME.
• The ID column being the key and NAME column storing the value. In
an RDBMS, the NAME column is restricted to storing data of type
String.
• If the ID already exists the current value is overwritten, otherwise a
new entry is created.
• Let’s look at how terminology compares in Oracle and Riak.
8.1. What Is a Key-Value Store
• Key-value stores are the simplest NoSQL data stores to use from an
API perspective.
• The client can either get the value for the key, put a value for a key, or
delete a key from the data store.
• The value is a blob that the data store just stores, without caring or
knowing what’s inside; it’s the responsibility of the application to
understand what was stored.
• Since key-value stores always use primary-key access, they generally
have great performance and can be easily scaled.
Some of the popular key-value databases are
• Riak [Riak]
• Redis (often referred to as Data Structure server) [Redis],
Memcached DB and its flavors [Memcached],
• Berkeley DB [Berkeley DB], HamsterDB (especially suited for
embedded use) [HamsterDB],
• Amazon DynamoDB [Amazon’s Dynamo] (not open-source)
• In some key-value stores, such as Redis, the aggregate being stored
does not have to be a domain object—it could be any data structure.
Redis supports storing lists, sets, hashes and can do range, diff,
union, and intersection operations.
• These features allow Redis to be used in more different ways than a
standard key-value store.
• Riak store keys into buckets, which are just a way to segment the
keys—think of buckets as flat namespaces for the keys.
• If we wanted to store user session data, shopping cart information,
and user preferences in Riak, we could just store all of them in the
same bucket with a single key and single value for all of these
objects.
• In this scenario, we would have a single object that stores all the data
and is put into a single bucket
• The downside of storing all the different objects (aggregates) in the
single bucket would be that one bucket would store different types
of aggregates, increasing the chance of key conflicts.
• An alternate approach would be to append the name of the object to
the key, such as 288790b8a421_userProfile, so that we can get to
individual objects as they are needed
8.2. Key-Value Store Features
Some of the features for all the NoSQL data stores are
• consistency,
• transactions,
• query features,
• structure of the data, and
• scaling.
1.Consistency:
• Consistency is applicable only for operations on a single key, since
these operations are either a get, put, or delete on a single key.
• Optimistic writes can be performed, but are very expensive to
implement, because a change in value cannot be determined by the
data store.
• In distributed key-value store implementations like Riak, the
eventually consistent model of consistency is implemented.
• Since the value may have already been replicated to other nodes, Riak
has two ways of resolving update conflicts: either the newest write
wins and older writes loose, or both (all) values are returned
allowing the client to resolve the conflict
• In Riak, these options can be set up during the bucket creation.
• Buckets are just a way to namespace keys so that key collisions can
be reduced.
• for example, all customer keys may reside in the customer bucket.
• When creating a bucket, default values for consistency can be
provided, for example that a write is considered good only when the
data is consistent across all the nodes where the data is stored.
• If we need data in every node to be consistent, we can increase the
numberOfNodesToRespondToWrite set by w to be the same as nVal.
• Of course doing that will decrease the write performance of the cluster.
• To improve on write or read conflicts, we can change the allowSiblings flag
during bucket creation: If it is set to false, we let the last write to win and
not create siblings.
2. Transactions
• Different products of the key-value store kind have different
specifications of transactions.
• Many data stores do implement transactions in different ways.
• Riak uses the concept of quorum implemented by using the W value
—replication factor—during the write API call.
• Assume we have a Riak cluster with a replication factor of 5 and we
supply the W value of 3.
• When writing, the write is reported as successful only when it is
written and reported as a success on at least three of the nodes.
• This allows Riak to have write tolerance; in our example, with N equal
to 5
• 3. Query Features
• All key-value stores can query by the key.
• If you have requirements to query by using some attribute of the
value column, it’s not possible to use the database: Your application
needs to read the value to figure out if the attribute meets the
conditions.
• Query by key also has an interesting side effect. What if we don’t
know the key, especially during ad-hoc querying during debugging?
Most of the data stores will not give you a list of all the primary keys;
even if they did, retrieving lists of keys and then querying for the
value would be very cumbersome
• 4. Structure of Data
• Key-value databases don’t care what is stored in the value part of the
key-value pair. The value can be a blob, text, JSON, XML, and so on. In
Riak, we can use the Content-Type in the POST request to specify the
data type
• 5. Scaling
• Many key-value stores scale by using sharding .
• With sharding, the value of the key determines on which node the
key is stored.
• Let’s assume we are sharding by the first character of the key; if the
key is f4b19d79587d, which starts with an f, it will be sent to different
node than the key ad9c7a396542.
• This kind of sharding setup can increase performance as more nodes
are added to the cluster.
• Sharding also introduces some problems.
• If the node used to store f goes down, the data stored on that node
becomes unavailable, nor can new data be written with keys that
start with f.
• Data stores such as Riak allow you to control the aspects of the CAP
Theorem .
• N (number of nodes to store the key-value replicas), R (number of
nodes that have the data being fetched before the read is considered
successful), and W (the number of nodes the write has to be written
to before it is considered successful)
• Let’s assume we have a 5-node Riak cluster.
• Setting N to 3 means that all data is replicated to at least three nodes,
• Setting R to 2 means any two nodes must reply to a GET request for it
to be considered successful, and
• Setting W to 2 ensures that the PUT request is written to two nodes
before the write is considered successful.
8.3. Suitable Use Cases
• Let’s discuss where key-value stores are a good fit.
8.3.1. Storing Session Information
• Generally, every web session is unique and is assigned a unique sessionid
value.
• Applications that store the sessionid on disk or in an RDBMS will greatly
benefit from moving to a key-value store, since everything about the
session can be stored by a single PUT request or retrieved using GET.
• This single-request operation makes it very fast, as everything about the
session is stored in a single object. Solutions such as Memcached are used
by many web applications, and Riak can be used when availability is
important.
8.3.2. User Profiles, Preferences:
• Almost every user has a unique userId, username, or some other
attribute, as well as preferences such as language, color, timezone,
which products the user has access to, and so on.
• This can all be put into an object, so getting preferences of a user
takes a single GET operation. Similarly, product profiles can be stored
8.3.3. Shopping Cart Data:
• E-commerce websites have shopping carts tied to the user.
• As we want the shopping carts to be available all the time, across
browsers, machines, and sessions, all the shopping information can
be put into the value where the key is the userid.
• A Riak cluster would be best suited for these kinds of applications.
8.4. When Not to Use
• There are problem spaces where key-value stores are not the best
solution.
8.4.1. Relationships among Data:
• If you need to have relationships between different sets of data, or
correlate the data between different sets of keys, key-value stores are
not the best solution to use, even though some key-value stores
provide link-walking features
8.4.2. Multioperation Transactions
• If you’re saving multiple keys and there is a failure to save any one of
them, and you want to revert or roll back the rest of the operations,
key-value stores are not the best solution to be used.
8.4.3. Query by Data
• If you need to search the keys based on something found in the value
part of the key-value pairs, then key-value stores are not going to
perform well for you. There is no way to inspect the value on the
database side, with the exception of some products like Riak Search
or indexing engines like Lucene [Lucene] or Solr [Solr].
8.4.4. Operations by Sets
• Since operations are limited to one key at a time, there is no way to
operate upon multiple keys at the same time. If you need to operate
upon multiple keys, you have to handle this from the client side

Big - Data Lab Manual
No ratings yet
Big - Data Lab Manual
65 pages
Development On Solana
No ratings yet
Development On Solana
9 pages
Nptel Big Data Full Assignment Solution 2021
100% (8)
Nptel Big Data Full Assignment Solution 2021
36 pages
IT Policies-Anti-Virus and Malicious Software Policy
100% (2)
IT Policies-Anti-Virus and Malicious Software Policy
5 pages
ISU - Installation, Meter Reading, Bill Order, Billing, Invoice and Invoice Printing Process
No ratings yet
ISU - Installation, Meter Reading, Bill Order, Billing, Invoice and Invoice Printing Process
12 pages
Unit Iv Mapreduce Applications
No ratings yet
Unit Iv Mapreduce Applications
70 pages
ADA Complete Notes
33% (3)
ADA Complete Notes
151 pages
Unit-5 NoSQL Data Management-Big Data
100% (2)
Unit-5 NoSQL Data Management-Big Data
14 pages
Steps To Output Microsoft Word Doc From SAP
0% (1)
Steps To Output Microsoft Word Doc From SAP
12 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
Aggregate Data Models
100% (1)
Aggregate Data Models
55 pages
Unit 4 - Lecture 3 - DGIM Algorithm Notes
No ratings yet
Unit 4 - Lecture 3 - DGIM Algorithm Notes
8 pages
IAT-I Question Paper With Solution of 18CS823 Nosql Database May-2021-Poonam Tijare
100% (1)
IAT-I Question Paper With Solution of 18CS823 Nosql Database May-2021-Poonam Tijare
12 pages
Distribution Model
100% (1)
Distribution Model
24 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Counting Ones in A Window: The Cost of Exact Counts
100% (1)
Counting Ones in A Window: The Cost of Exact Counts
13 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Data Analytics - Unit-V
No ratings yet
Data Analytics - Unit-V
9 pages
Ccs341 - Data Warehousing
100% (1)
Ccs341 - Data Warehousing
2 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
2 pages
CCS341 Set1
100% (2)
CCS341 Set1
2 pages
Multimedia Mining Presentation
No ratings yet
Multimedia Mining Presentation
18 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
cp5293 Big Data Analytics Unit 5 PDF
No ratings yet
cp5293 Big Data Analytics Unit 5 PDF
28 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
Unit 1 Notes in NoSQL
No ratings yet
Unit 1 Notes in NoSQL
20 pages
Module-3 Core IoT Functional Stack
No ratings yet
Module-3 Core IoT Functional Stack
67 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
CS3492 Database Management Systems Two Mark Questions 1
100% (1)
CS3492 Database Management Systems Two Mark Questions 1
38 pages
Unit-Iii 3.1 Regression Modelling
100% (1)
Unit-Iii 3.1 Regression Modelling
7 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
11 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
Question Paper Code:: (10×2 20 Marks)
No ratings yet
Question Paper Code:: (10×2 20 Marks)
2 pages
CB3401 Unit1
No ratings yet
CB3401 Unit1
60 pages
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
No ratings yet
Anatomy of Mapreduce Job Run: Some Slides Are Taken From Cmu PPT Presentation
73 pages
Me cp4212 Software Engineering Manual
No ratings yet
Me cp4212 Software Engineering Manual
34 pages
Unit-5 IBM Big Data Strategy
No ratings yet
Unit-5 IBM Big Data Strategy
7 pages
System Models For Distributed and Cloud Computing
No ratings yet
System Models For Distributed and Cloud Computing
15 pages
Chapter Two Searching and Sorting: Algorithm
No ratings yet
Chapter Two Searching and Sorting: Algorithm
53 pages
Unit 3-BDA
50% (2)
Unit 3-BDA
26 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
38 pages
Module-1: Review Questions: Automata Theory and Computability - 15CS54
No ratings yet
Module-1: Review Questions: Automata Theory and Computability - 15CS54
4 pages
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
No ratings yet
Unit I Content Beyond Syllabus - I Introduction To Data Mining and Data Warehousing What Are Data Mining and Knowledge Discovery?
12 pages
Mongodb
No ratings yet
Mongodb
19 pages
Big Data Analytics Unit 2 MINING DATA STREAMS
100% (2)
Big Data Analytics Unit 2 MINING DATA STREAMS
22 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
Subject: Computer Organisation (18Cs34) Question Bank
No ratings yet
Subject: Computer Organisation (18Cs34) Question Bank
5 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
UNIT - IV - Syllabus The Collections Framework (Java - Util)
No ratings yet
UNIT - IV - Syllabus The Collections Framework (Java - Util)
24 pages
Requirement Analysis and Specification
No ratings yet
Requirement Analysis and Specification
8 pages
DBMS Module 2 Notes
No ratings yet
DBMS Module 2 Notes
20 pages
Introduction To Cloud Storage Models & Communication Apis
0% (1)
Introduction To Cloud Storage Models & Communication Apis
20 pages
Data Warehousing and Data Mining JNTU Previous Years Question Papers
No ratings yet
Data Warehousing and Data Mining JNTU Previous Years Question Papers
4 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Ad3301-Data-Exploration-And-Visualization Lab Manual
No ratings yet
Ad3301-Data-Exploration-And-Visualization Lab Manual
24 pages
Syllabus 6th Sem 21cs63
No ratings yet
Syllabus 6th Sem 21cs63
7 pages
OGSA
No ratings yet
OGSA
164 pages
OOAD Notes PDF
100% (2)
OOAD Notes PDF
92 pages
Unit II
No ratings yet
Unit II
83 pages
AWS Redshift
No ratings yet
AWS Redshift
145 pages
Key-Value Based Databases
No ratings yet
Key-Value Based Databases
64 pages
Design Key-Value Database + Real Usecase
No ratings yet
Design Key-Value Database + Real Usecase
49 pages
qb1
No ratings yet
qb1
9 pages
Build A Small Network
No ratings yet
Build A Small Network
39 pages
Ghost Imputation Project
No ratings yet
Ghost Imputation Project
45 pages
10 Ways To Earn Online-CopyAI
No ratings yet
10 Ways To Earn Online-CopyAI
2 pages
OpenVMS Runtime Library
No ratings yet
OpenVMS Runtime Library
19 pages
Wellarchitected Analytics Lens
No ratings yet
Wellarchitected Analytics Lens
59 pages
SAP Analytics Cloud Partner Enablement 27-06-2017
No ratings yet
SAP Analytics Cloud Partner Enablement 27-06-2017
38 pages
Interview Questions
No ratings yet
Interview Questions
36 pages
Flussonic - Watcher
No ratings yet
Flussonic - Watcher
24 pages
Software Requirements Specification: Sanni Kumar Gupta
No ratings yet
Software Requirements Specification: Sanni Kumar Gupta
25 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
Mindanao Data Exchange_rev2 (1)
No ratings yet
Mindanao Data Exchange_rev2 (1)
19 pages
Pi Network Redefining Cryptocurrency Mining
100% (1)
Pi Network Redefining Cryptocurrency Mining
2 pages
ChatGPT Power Sheet
No ratings yet
ChatGPT Power Sheet
9 pages
Virtual Memory
No ratings yet
Virtual Memory
4 pages
ravaत किस्मत बदल देते हैं रावण संहिता के ये 10 तांत्रिक उपाय Patrika Hindi News
No ratings yet
ravaत किस्मत बदल देते हैं रावण संहिता के ये 10 तांत्रिक उपाय Patrika Hindi News
12 pages
Chapter 3 - Embedded OS For WSNs
No ratings yet
Chapter 3 - Embedded OS For WSNs
44 pages
HP XP Architecture
No ratings yet
HP XP Architecture
24 pages
2-Network Security's Nuts & Bolts
No ratings yet
2-Network Security's Nuts & Bolts
29 pages
Case Study-Ict
No ratings yet
Case Study-Ict
5 pages
Uskanov Javohir 5 Blue: Cambridge Lower Secondary Grade 5
No ratings yet
Uskanov Javohir 5 Blue: Cambridge Lower Secondary Grade 5
4 pages
00001
No ratings yet
00001
2 pages
FDT Users Guide
No ratings yet
FDT Users Guide
104 pages
Script Ok
No ratings yet
Script Ok
2 pages
Final Year Project Report: "Shaikh Zayed Hospital Official Website"
No ratings yet
Final Year Project Report: "Shaikh Zayed Hospital Official Website"
12 pages
Azure PostgreSQL Guide - Final
No ratings yet
Azure PostgreSQL Guide - Final
42 pages