Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
(Assignment No 4)
Map reduce
Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated results. For map-reduce operations, MongoDB provides the mapReduce database
command.
Map-reduce supports operations on sharded collections, both as an input and as an output.
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to
support deployments with very large data sets and high throughput operations.
Consider the following map-reduce operation:
Prof. Pallavi M. Tekade
Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
> [Link]([{cust_id:"A123",amount:500,status:"A"},
{cust_id:"A123",amount:250,status:"A"},{cust_id:"B212",amount:200,status:"A"},
{cust_id:"A123",amount:300,status:"D"}])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("59cde47a1ea6abd9870348f9"), ObjectId("59cde47a1ea6abd9870348fa"),
ObjectId("59cde47a1ea6abd9870348fb"),
ObjectId("59cde47a1ea6abd9870348fc")
]
> [Link]()
{ "_id" : ObjectId("59cde47a1ea6abd9870348f9"), "cust_id" : "A123", "amount" : 500, "status" :
"A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fa"), "cust_id" : "A123", "amount" : 250, "status" : "A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fb"), "cust_id" : "B212", "amount" : 200, "status" : "A" }
{ "_id" : ObjectId("59cde47a1ea6abd9870348fc"), "cust_id" : "A123", "amount" : 300, "status" :
"D" }
> [Link]( function(){emit (this.cust_id,[Link]);}, function(key,values){return
[Link](values)}, { query:{status:"A"}, out:"orders_totals" } )
{
"result" : "orders_totals",
"timeMillis" : 419,
"counts" : {
"input" : 3,
"emit" : 3,
"reduce" : 1,
"output" : 2
},
"ok" : 1
}
>db.orders_totals.find()
Prof. Pallavi M. Tekade
Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
Another example
The collection c1 contains documents which store user_name of the users and the status of posts.
>>[Link]([{"post_text": "India is an awesome country","user_name":
"sachin", "status":"active"},
{"post_text": "welcome to India","user_name": "saurav","status":"active"},
{"post_text": "I live in India","user_name": "yuvraj","status":"active"},
{"post_text": "India is great","user_name": "gautam","status":"active"}])
Now, we will use a mapReduce function on our c1 collection to select all the active posts, group
them on the basis of user_name and then count the number of posts by each user using the
following code.
>>[Link](
function() { emit(this.user_name,1); },
function(key, values) {return [Link](values)}, {
query:{status:"active"}, out:"post_total"
}
)
>>db.post_total.find()
>>> [Link]([{"post_text": "India is an awesome country","user_name": "sachin",
"status":"active"},
... {"post_text": "welcome to India","user_name": "saurav","status":"active"},
... {"post_text": "I live in India","user_name": "yuvraj","status":"active"},
... {"post_text": "India is great","user_name": "gautam","status":"active"}])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("59dcdd4a26deb533e8c00fb6"),
ObjectId("59dcdd4a26deb533e8c00fb7"),
ObjectId("59dcdd4a26deb533e8c00fb8"),
ObjectId("59dcdd4a26deb533e8c00fb9")
]
}
>>> [Link](
... function() { emit(this.user_name,1); },
...
... function(key, values) {return [Link](values)}, {
... query:{status:"active"},
... out:"post_total"
... }
... )
{
"result" : "post_total",
Prof. Pallavi M. Tekade
Academic Year 2022-23
Course : SY BTech IT
Course Name : ADBMSL
"timeMillis" : 101,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 0,
"output" : 4
},
"ok" : 1
}
>>> db.post_total.find()
{ "_id" : "gautam", "value" : 1 }
{ "_id" : "sachin", "value" : 1 }
{ "_id" : "saurav", "value" : 1 }
{ "_id" : "yuvraj", "value" : 1 }
Prof. Pallavi M. Tekade