CSCI461 Assignment 2 Spring24

The assignment for CSCI 461 at Nile University requires students to work on a JSON file, import it into MongoDB, and perform specific queries. Additionally, students must convert the JSON to CSV, copy it to HDFS, and write a MapReduce job to calculate average ratings for genres. The assignment is due on May 11, 2024, with penalties for late submissions and requirements for all team members to participate in discussions and submissions.

Uploaded by

nourhano021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views3 pages

CSCI461 Assignment 2 Spring24

Uploaded by

nourhano021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Nile University - CSCI 461: Introduction to Big Data - Spring 2025

Assignment #2
—-------------------------------------------------------------------------------------------------
INSTRUCTIONS:

- EACH MEMBER MUST UNDERSTAND EVERYTHING IN THE ASSIGNMENT.

- ANY MEMBER MAY BE ASKED TO DO AN UPDATE OR CHANGE IN CODE.
- Assignment deadline will be May 11 2024 @ 11:45 PM.
- Assignment discussions start in the week that starts May 11, 2024. [Discussion slots will
be announced for each TA]
- Assignment's total grade is 10 marks. [Distribution is specified in assignment
requirements below] + 1 Bonus mark.
- Any submission after the deadline will be considered as -2 from the assignment’s total
grade. [Unless you have a clearly accepted reason sent by mail immediately]
- CHEATING in the assignment is considered as ZERO from each member’s total grade.
- In the discussion, all members MUST present in the discussion [Unless you have a
clearly accepted reason sent by mail to the TA before the discussion], otherwise, there
will be a grade deduction of 1 mark.

ASSIGNMENT REQUIREMENTS:

PART ONE:
1. Work on the data in the JSON file attached named mds.json.
2. Use the MongoDB container we worked on during the lab to have access to MongoDB,
create a database with the name moviesDB, and import the JSON file data to MongoDB
in a collection with the name moviesColl.
3. Write at least 2 queries:
a. The first query should include the feature of indexing in MongoDB. [1.5 Marks]
b. The second query should include the use of logical operations. [1.5 Marks]

PART TWO:
1. Convert the mds.json from a JSON format to a CSV format.
2. Use the container we used during the lab which contains Hadoop, and copy the data (after
conversion from JSON to CSV) to HDFS.
3. Write a MapReduce job (Using IntelliJ, instructions on how to create and run a
MapReduce job in an IntelliJ project explained in the lab and the slides) to calculate the
average rating for each genre in both movies and TV shows. (Write the code in one

1
Nile University - CSCI 461: Introduction to Big Data - Spring 2025

class named ARDriver.java, which contains also mapper and reducer code, use only the
first genre if more than one exists, and exclude items with nulls in the genre) [1 Mark for
the driver, 3 Marks for the mapper, 3 Marks for the reducer]

BONUS
 Additional query in MongoDB. [0.5 Mark]
 Using the concept of Combiner in MapReduce in PART TWO. [0.5 Mark]

DELIVERABLES

ALL TEAM MEMBERS MUST SUBMIT THE FOLLOWING

AS ONE ZIP FILE ON moodle:
- A text file named mDBQ.txt contains your MongoDB queries.
- A text file named mDBR.txt contains your MongoDB query results.
- The Java class ARDriver.java is used for the MapReduce job. (ARDriver.java should
contain the Driver code, Mapper Code, and Reducer Code)
- The output file of the MapReduce job which is named part-r-00000.

NOTE #1: JSON format to CSV format should be like

{
"name": "Alice Johnson",
"age": 25,
"city": "Wonderland",
"isStudent": true,
"grades": [90, 88, 95],
"address": {
"street": "456 Oak Avenue",
"zipCode": "54321",
"country": "Fantasyland"
}
}

name,age,city,isStudent,grades,address.street,address.zipCode,address.country
Alice Johnson,25,Wonderland,true,"90, 88, 95",456 Oak Avenue,54321,Fantasyland

2
Nile University - CSCI 461: Introduction to Big Data - Spring 2025

NOTE #2: The output of the MapReduce job will be like

Movie, Drama, 7.8

Movie, Horror, 8.9
TV, Drama, 7.6
TV, Comedy, 9.8

Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
BDA Assignment
No ratings yet
BDA Assignment
2 pages
BDA Manual SHUBHAM
No ratings yet
BDA Manual SHUBHAM
22 pages
Precautions for 24-Hour Timer EDA Experiment
No ratings yet
Precautions for 24-Hour Timer EDA Experiment
9 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
3 pages
BDA - Manual - 1to6 Ayushi
No ratings yet
BDA - Manual - 1to6 Ayushi
22 pages
BDA Mayur
No ratings yet
BDA Mayur
43 pages
BDA Lab Manual 200305105108
No ratings yet
BDA Lab Manual 200305105108
44 pages
IA Big Data Lab Works
No ratings yet
IA Big Data Lab Works
7 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
5 pages
BgiData QB
100% (1)
BgiData QB
3 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
No ratings yet
Gujarat Technological University: Sr. No. Content Total Hrs % Weightage 1 13
3 pages
Big Data Course Overview and Tools
No ratings yet
Big Data Course Overview and Tools
4 pages
ISPE23
No ratings yet
ISPE23
3 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Practice Question Bank
No ratings yet
Practice Question Bank
2 pages
B.Tech Hadoop & Big Data Exam Paper
No ratings yet
B.Tech Hadoop & Big Data Exam Paper
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Ivth Year, B.Tech Cs&E Big Data Processing: Odd Semester Examination, 2023 - 24
No ratings yet
Ivth Year, B.Tech Cs&E Big Data Processing: Odd Semester Examination, 2023 - 24
1 page
Big Data Analytics Question Bank 21CS71
No ratings yet
Big Data Analytics Question Bank 21CS71
4 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
9 pages
Important Questions-Bigdata
No ratings yet
Important Questions-Bigdata
4 pages
Assignment - 1
No ratings yet
Assignment - 1
16 pages
CP7019-Managing Big Data-Anna University - Question Paper
75% (4)
CP7019-Managing Big Data-Anna University - Question Paper
4 pages
MongoDB NoSQL Assignment Guide
No ratings yet
MongoDB NoSQL Assignment Guide
2 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
36 pages
Qustion Bank
No ratings yet
Qustion Bank
2 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
BDA Final Manual 1-8 Sourav
No ratings yet
BDA Final Manual 1-8 Sourav
43 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
Question Bank Big Data Analytics
No ratings yet
Question Bank Big Data Analytics
2 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
94 pages
Tutorial For Course Work
No ratings yet
Tutorial For Course Work
15 pages
NoSQL Databases Course Overview
No ratings yet
NoSQL Databases Course Overview
4 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Course Plan - BDA - 2025-26
No ratings yet
Course Plan - BDA - 2025-26
3 pages
Part A & B Big Data Questions
No ratings yet
Part A & B Big Data Questions
5 pages
Important Questions
No ratings yet
Important Questions
1 page
Bda Manual Index Ayushi
No ratings yet
Bda Manual Index Ayushi
2 pages
@vtucode - in 18CS72 Previous Year Paper
No ratings yet
@vtucode - in 18CS72 Previous Year Paper
2 pages
Bda Sem 7 Book
No ratings yet
Bda Sem 7 Book
188 pages
Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
BDT MSE2Scheme 23-24
No ratings yet
BDT MSE2Scheme 23-24
4 pages
III II CSM 10m Bda Question Bank
No ratings yet
III II CSM 10m Bda Question Bank
2 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Wa0037.
No ratings yet
Wa0037.
3 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
Cse2024 Set A
No ratings yet
Cse2024 Set A
3 pages
Experiment Pgno
No ratings yet
Experiment Pgno
50 pages
Big Data and Hadoop Training Course
No ratings yet
Big Data and Hadoop Training Course
9 pages
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
No ratings yet
18CS72-Big Data and Analytics 3rd Internal QP 7th Semester - Scheme of Evaluation
14 pages
Hadoop Analytics Course Plan
No ratings yet
Hadoop Analytics Course Plan
9 pages
Bda Lab
No ratings yet
Bda Lab
36 pages
21PCS203 - Big Data Analytics
No ratings yet
21PCS203 - Big Data Analytics
4 pages
Big Data Analytics Question Bank
No ratings yet
Big Data Analytics Question Bank
3 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Programming Assignment 3 v03
No ratings yet
Programming Assignment 3 v03
7 pages
CSCI461 Assignment 1 Spring25
No ratings yet
CSCI461 Assignment 1 Spring25
2 pages
10lecture - Technology and Tools (Pig-ZooKeeper)
No ratings yet
10lecture - Technology and Tools (Pig-ZooKeeper)
44 pages
Lecture
No ratings yet
Lecture
9 pages
Lecture
No ratings yet
Lecture
5 pages
Lab 1
No ratings yet
Lab 1
21 pages
Lab 4
No ratings yet
Lab 4
20 pages
BMD303 Lec5 Interoperability2 S25
No ratings yet
BMD303 Lec5 Interoperability2 S25
10 pages
MidtermExamPractice B
No ratings yet
MidtermExamPractice B
2 pages
Midterm Answer Key for English 201 Writing Skills
No ratings yet
Midterm Answer Key for English 201 Writing Skills
2 pages
Beam Design Principles and Analysis
No ratings yet
Beam Design Principles and Analysis
49 pages
Oman Drilling Performance Enhancement
No ratings yet
Oman Drilling Performance Enhancement
9 pages
Global Assessment Certificate
No ratings yet
Global Assessment Certificate
131 pages
Coursera - Getting Started With Azure DevOps
No ratings yet
Coursera - Getting Started With Azure DevOps
1 page
Invoice INV LE1265 WTB Business Private Limited WTB Business Private Limited 10-02-27
No ratings yet
Invoice INV LE1265 WTB Business Private Limited WTB Business Private Limited 10-02-27
1 page
Crisc D1 Qa
No ratings yet
Crisc D1 Qa
280 pages
My Project Proposal
No ratings yet
My Project Proposal
4 pages
Terms & Conditions
No ratings yet
Terms & Conditions
1 page
Laqshya 2026 Prelims Plan Publish Lyst1743431405355
No ratings yet
Laqshya 2026 Prelims Plan Publish Lyst1743431405355
23 pages
Phys Exp 4
No ratings yet
Phys Exp 4
3 pages
Physic Ecre
No ratings yet
Physic Ecre
3 pages
Paperback 8.500x11.000 64 BW White en Us
No ratings yet
Paperback 8.500x11.000 64 BW White en Us
1 page
Welding Filler Metals Guide
No ratings yet
Welding Filler Metals Guide
28 pages
BITS F214 Science, Technology and Modernity
No ratings yet
BITS F214 Science, Technology and Modernity
3 pages
Atmosphere Printable
No ratings yet
Atmosphere Printable
1 page
Functions of CSO
No ratings yet
Functions of CSO
25 pages
Simplified Piled Raft Design
No ratings yet
Simplified Piled Raft Design
7 pages
Current Affairs Compendium - October 2024: Follow Us On: Telegram, Instagram
No ratings yet
Current Affairs Compendium - October 2024: Follow Us On: Telegram, Instagram
137 pages
How To Collect Dset Logs in Linux
No ratings yet
How To Collect Dset Logs in Linux
3 pages
Third Culture Kids (Pollock David C, Van Reken Ruth E) (Z-Library)
No ratings yet
Third Culture Kids (Pollock David C, Van Reken Ruth E) (Z-Library)
370 pages
Research and Design I
100% (1)
Research and Design I
18 pages
Vivo India Marketing Role Application
No ratings yet
Vivo India Marketing Role Application
1 page
5 Standard Costing
No ratings yet
5 Standard Costing
5 pages
Akashdeep Singh (Georgian - Final)
No ratings yet
Akashdeep Singh (Georgian - Final)
5 pages
REESfinalpaper - Jordonez CBurbano Csanchez CGamboa
No ratings yet
REESfinalpaper - Jordonez CBurbano Csanchez CGamboa
13 pages
Mux Demux Encoder Decoder
No ratings yet
Mux Demux Encoder Decoder
18 pages
Dissolved Gas Flotation (DGF) Unit
No ratings yet
Dissolved Gas Flotation (DGF) Unit
17 pages
HRM Mini Test 1 Student Name: Student ID
No ratings yet
HRM Mini Test 1 Student Name: Student ID
5 pages
Sulphur January-February 2016
No ratings yet
Sulphur January-February 2016
33 pages
Sentences, Utterances, and Propositions
No ratings yet
Sentences, Utterances, and Propositions
7 pages

CSCI461 Assignment 2 Spring24

Uploaded by

CSCI461 Assignment 2 Spring24

Uploaded by

Nile University - CSCI 461: Introduction to Big Data - Spring 2025

- EACH MEMBER MUST UNDERSTAND EVERYTHING IN THE ASSIGNMENT.

ALL TEAM MEMBERS MUST SUBMIT THE FOLLOWING

NOTE #1: JSON format to CSV format should be like

NOTE #2: The output of the MapReduce job will be like

Movie, Drama, 7.8

You might also like