0% found this document useful (0 votes)

15 views4 pages

Lab2 Assignment Statement

The lab assignment focuses on using MongoDB for document stores, where students will practice importing, creating, and querying document databases. Participants will explore different modeling alternatives and implement queries related to persons and companies using Python, while measuring execution times. Deliverables include Python scripts and a PDF answering specific questions about query performance and data modeling conclusions.

Uploaded by

lgavidiap31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

Lab2 Assignment Statement

Uploaded by

lgavidiap31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

23D020: Big Data Management for Data Science

Lab 1: Document Stores

Assignment
Note: This is a hands-on lab on Document Stores. We will be using one
of the most popular document databases: MongoDB. We will practice how to
import, create and model document databases, as well as how to query them.
In the training document, we provide instructions for setting up the envi-
ronment and here we list the exercises to be solved. One group member, in
the name of the group, must upload the solution. Remember to include the
name of all group members in your solutions. Please check the assignment
deadline and be sure to meet it. It is a strict deadline!

A Lab Statement
In this lab you will explore the di↵erent modeling alternatives in MongoDB.
You will work with the following conceptual model depicted in UML.

The queries that you will need to implement with this model are the
following:

Q1: For each person, retrieve their full name and their company’s name.

Q2: For each company, retrieve its name and the number of employees.

Q3: For each person born before 1988, update their age to “30”.

1
Q4: For each company, update its name to include the word “Company”.

In this exercise you are asked to design the database using the three
following models:

M1: Two types of documents, one for each class and referenced fields.

M2: One document for “Person” with “Company” as embedded document.

M3: One document for “Company” with “Person” as embedded documents.

B Python Implementation
For each design model, you need to implement in Python the following
tasks:

1. Using Faker (a random Python data generator), generate random data

for persons and companies. Be consistent with the number of compa-
nies and the proportion of employers for the three models. That is,
use that the same number of companies and employees in the three
models in order to make them comparable. The assumption is that
you are modeling the same data with three di↵erent models.

2. Insert the data into MongoDB with each of the specified models.

3. Program queries Q1, Q2, Q3 and Q4 for each of the models and write
their results in the console.

4. Measure the execution time of each query by adding the following

instructions:
start_time = time.time()
/** Query code ... **/
query_time = end_time - time.time()

To aid you in doing the exercise, in the provided python project you
can find sample code for the above tasks in the class example.py. For a full
reference on MongoDB Python API you can check https://docs.mongodb.
com/drivers/pymongo/.

C Results and Discussion

Once you have completed the above tasks, fill this table with the query
execution times obtained with a high number of documents (e.g., at least
50000).

2
Table 1: Query Execution Times per Model
Q1 Q2 Q3 Q4
M1
M2
M3

Afterwards, answer the following questions (make sure you justify your
answers and not only list them):

1. Order queries from best to worst for Q1. Which model performs best?
Why?

2. Order queries from best to worst for Q2. Which model performs best?
Why?

3. Order queries from worst to best for Q3. Which model performs
worst? Why?

4. Order queries from worst to best for Q4. Which model performs
worst? Why?

5. What are your conclusions about denormalization or normalization of

data in MongoDB? In the case of updates, which o↵ers better perfor-
mance?

Deliverables
1. Python scripts to implement the tasks defined in Section B. The
scripts must be included in a single zip file.

• The Python code must include comments to facilitate the under-

standing. At the header of each file, include an overall comment
explaining what are the steps implemented in the pipeline, and
refer to these steps when explaining the code in the subsequent
comments.
• The execution of the three pipelines should be facilitated. For
instance, the code should not include absolute paths or fixed user
credentials (e.g., they should be requested by the user or stored
in configuration files).

2. A PDF file (max two A4 pages) to answer the questions in Section C.

In this document you can also include any assumptions made or justify
the decisions you made (if any).

3
Assessment Criteria
i) Conciseness of explanations

ii) Understandability

iii) Coherence

iv) Soundness

Data Science Practical Workbook Overview
No ratings yet
Data Science Practical Workbook Overview
141 pages
MongoDB Basics for Big Data Management
No ratings yet
MongoDB Basics for Big Data Management
4 pages
Updated Mongodb Lab Manual IV Sem
No ratings yet
Updated Mongodb Lab Manual IV Sem
48 pages
Lab-4 Python With MongoDB
No ratings yet
Lab-4 Python With MongoDB
6 pages
2 - Advanced DBMS 6
No ratings yet
2 - Advanced DBMS 6
6 pages
MongoDB Operations and Applications
No ratings yet
MongoDB Operations and Applications
17 pages
Design and Implementation of A NoSQL Database
No ratings yet
Design and Implementation of A NoSQL Database
94 pages
CCS368 Stream Processing Record
No ratings yet
CCS368 Stream Processing Record
35 pages
NGT Unit 1 - 5
No ratings yet
NGT Unit 1 - 5
4 pages
Full Stack Questions
No ratings yet
Full Stack Questions
3 pages
MongoDB Homework Help in Python
100% (1)
MongoDB Homework Help in Python
6 pages
Assignment 1 Cloud
No ratings yet
Assignment 1 Cloud
3 pages
Gen Ai Assignment
No ratings yet
Gen Ai Assignment
4 pages
Nosql Lab Mongodb
No ratings yet
Nosql Lab Mongodb
3 pages
Beta Exam Guide For Associa
No ratings yet
Beta Exam Guide For Associa
9 pages
Programming Assignment 3 v03
No ratings yet
Programming Assignment 3 v03
7 pages
NoSQL Lab MongoDB Submission
No ratings yet
NoSQL Lab MongoDB Submission
3 pages
MongoDB Data Modeling - Sample Chapter
No ratings yet
MongoDB Data Modeling - Sample Chapter
40 pages
Next Generation Technologies - Sem VI
No ratings yet
Next Generation Technologies - Sem VI
2 pages
MongoDB Lab Manual
No ratings yet
MongoDB Lab Manual
33 pages
MongoDB Ex
No ratings yet
MongoDB Ex
3 pages
Lab 10 Mongo - DB Installtion Aand Config
No ratings yet
Lab 10 Mongo - DB Installtion Aand Config
22 pages
Mongodb Manual3
No ratings yet
Mongodb Manual3
20 pages
MongoDB 4.3 Homework Help Services
100% (1)
MongoDB 4.3 Homework Help Services
6 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
60 pages
BSC Sem 5 Next Generatin Database Practical Set 3 PDF
No ratings yet
BSC Sem 5 Next Generatin Database Practical Set 3 PDF
3 pages
PDE Exam Dump 3
No ratings yet
PDE Exam Dump 3
98 pages
MongoDB Teacher Collection Guide
No ratings yet
MongoDB Teacher Collection Guide
28 pages
MongoDb Lab Progam Syllabus
No ratings yet
MongoDb Lab Progam Syllabus
3 pages
DBMS
No ratings yet
DBMS
5 pages
MongoDB - Course Curriculum
No ratings yet
MongoDB - Course Curriculum
5 pages
CET341 Assignment Two 2021 - 22
No ratings yet
CET341 Assignment Two 2021 - 22
9 pages
GCP Data Engineer
No ratings yet
GCP Data Engineer
100 pages
mongoDB Syllabus
No ratings yet
mongoDB Syllabus
3 pages
LU 3 and LU 4 NoSQL
No ratings yet
LU 3 and LU 4 NoSQL
36 pages
Exam 1
No ratings yet
Exam 1
6 pages
Data Science Practicals With Answers
No ratings yet
Data Science Practicals With Answers
10 pages
Syllabus ADBMS
No ratings yet
Syllabus ADBMS
3 pages
Database Management Lab - 1 - 1741580078762 3
No ratings yet
Database Management Lab - 1 - 1741580078762 3
3 pages
Pyqs
No ratings yet
Pyqs
9 pages
ADM Summer 23
No ratings yet
ADM Summer 23
4 pages
Mock Data Generation for Full Stack App
No ratings yet
Mock Data Generation for Full Stack App
3 pages
Optimizing MongoDB Data Models
100% (1)
Optimizing MongoDB Data Models
39 pages
Node.js MongoDB Lab Guide CS-213
No ratings yet
Node.js MongoDB Lab Guide CS-213
4 pages
BDS456B MongoDB
No ratings yet
BDS456B MongoDB
2 pages
Ia-2 QB
No ratings yet
Ia-2 QB
2 pages
WK - 2 - 2 - MongoDB CRUD
No ratings yet
WK - 2 - 2 - MongoDB CRUD
6 pages
All Adm Previous Year Question Paper
No ratings yet
All Adm Previous Year Question Paper
15 pages
Dbms Assignment
0% (2)
Dbms Assignment
15 pages
Mongodb Q.B
No ratings yet
Mongodb Q.B
5 pages
NGT Syllabus (E-Next - In)
No ratings yet
NGT Syllabus (E-Next - In)
3 pages
ADMT Pritical Book-1sem
No ratings yet
ADMT Pritical Book-1sem
8 pages
3 Hours / 70 Marks: Seat No
No ratings yet
3 Hours / 70 Marks: Seat No
2 pages
21UCS867 MongoDB List of Experiments
No ratings yet
21UCS867 MongoDB List of Experiments
1 page
ADM Summer 22
No ratings yet
ADM Summer 22
2 pages
Wa0037.
No ratings yet
Wa0037.
3 pages
BAD601 Important Question
No ratings yet
BAD601 Important Question
2 pages
Google Professional Engineer
No ratings yet
Google Professional Engineer
13 pages
The Present Perfect Tense
No ratings yet
The Present Perfect Tense
2 pages
Epos Office User Manual
No ratings yet
Epos Office User Manual
167 pages
When To Use Party in PI Configuration?
No ratings yet
When To Use Party in PI Configuration?
3 pages
Parasitic Capacitance in Circuits
No ratings yet
Parasitic Capacitance in Circuits
4 pages
DIY Pirate Impulse Metal Detector Guide
100% (2)
DIY Pirate Impulse Metal Detector Guide
12 pages
Lec 7 PDF
No ratings yet
Lec 7 PDF
42 pages
Sem III Question Paper - Google Forms
No ratings yet
Sem III Question Paper - Google Forms
9 pages
Introduction to Basic PLCs
No ratings yet
Introduction to Basic PLCs
119 pages
Viral Scandal62
No ratings yet
Viral Scandal62
5 pages
EViews Workshop for Students
No ratings yet
EViews Workshop for Students
24 pages
Mobile Bill - Virgin
No ratings yet
Mobile Bill - Virgin
5 pages
0.dinh Thanh Hien - LLM Algorithm
No ratings yet
0.dinh Thanh Hien - LLM Algorithm
2 pages
Tkinter GUI Programming Guide
No ratings yet
Tkinter GUI Programming Guide
17 pages
Power Supply/Battery Charger Guide
No ratings yet
Power Supply/Battery Charger Guide
2 pages
Comparison Between DSRC & C-V2X
100% (1)
Comparison Between DSRC & C-V2X
2 pages
Red Hat Openshift I Containers and Kubernetes
No ratings yet
Red Hat Openshift I Containers and Kubernetes
2 pages
Grade 6 Math: Decimal & Fraction Operations
No ratings yet
Grade 6 Math: Decimal & Fraction Operations
2 pages
Data Paths and Control Units
No ratings yet
Data Paths and Control Units
5 pages
360 Value Brochure GCOP
No ratings yet
360 Value Brochure GCOP
8 pages
Crash 2024 05 17 - 16.24.32 Client
No ratings yet
Crash 2024 05 17 - 16.24.32 Client
10 pages
Lecture 2 - BJT
No ratings yet
Lecture 2 - BJT
37 pages
Info Tech Tutorial Questions
No ratings yet
Info Tech Tutorial Questions
21 pages
Zenoss Service Dynamics Extended Monitoring 27-032014-4.2-V13
No ratings yet
Zenoss Service Dynamics Extended Monitoring 27-032014-4.2-V13
258 pages
Resource Utilization Dashboard Template
No ratings yet
Resource Utilization Dashboard Template
3 pages
H5a PTZ PDF
0% (1)
H5a PTZ PDF
5 pages
Instruction Sheet 734 111: Set of Machines 10 W
No ratings yet
Instruction Sheet 734 111: Set of Machines 10 W
2 pages
Reverse Engineering in Cybersecurity
No ratings yet
Reverse Engineering in Cybersecurity
2 pages
Online Admission and Payment Guide
No ratings yet
Online Admission and Payment Guide
8 pages
Dumping User Passwords From Windows Memory With Mimikatz - Windows OS Hub
No ratings yet
Dumping User Passwords From Windows Memory With Mimikatz - Windows OS Hub
8 pages
High-Density Mobile Shelving System
No ratings yet
High-Density Mobile Shelving System
5 pages

Lab2 Assignment Statement

Uploaded by

Lab2 Assignment Statement

Uploaded by

23D020: Big Data Management for Data Science

Lab 1: Document Stores

M2: One document for “Person” with “Company” as embedded document.

M3: One document for “Company” with “Person” as embedded documents.

1. Using Faker (a random Python data generator), generate random data

4. Measure the execution time of each query by adding the following

C Results and Discussion

5. What are your conclusions about denormalization or normalization of

• The Python code must include comments to facilitate the under-

2. A PDF file (max two A4 pages) to answer the questions in Section C.

You might also like