0% found this document useful (0 votes)

18 views7 pages

23CS5PCDEV

This document outlines the structure and content of the April 2025 Semester End Make-Up Examinations for the B.E. program in Computer Science and Engineering at B.M.S. College of Engineering. It includes instructions for answering questions, a breakdown of units with specific questions related to data exploration, visualization, and analysis techniques, as well as Python programming tasks. The exam covers various topics such as exploratory data analysis, data deduplication, random sampling, and visualization strategies.

Uploaded by

siddanthn.me24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views7 pages

23CS5PCDEV

Uploaded by

siddanthn.me24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

U.S.N.

B.M.S. College of Engineering, Bengaluru-560019

Autonomous Institute Affiliated to VTU

April 2025 Semester End Make-Up Examinations

Programme: B.E. Semester: V

Branch: Computer Science and Engineering Duration: 3 hrs.
Course Code: 23CS5PCDEV Max Marks: 100
Course: Data Exploration and Visualization

Instructions: 1. Answer any FIVE full questions, choosing one full question from each unit.
2. Missing data, if any, may be suitably assumed.
Important Note: Completing your answers, compulsorily draw diagonal cross lines on the remaining blank

UNIT - I CO PO Marks

1 a) Discuss the generic steps involved in Exploratory data analysis CO1 PO 06

(EDA) process. 1

b) Imagine you are a data analyst working for an educational CO2 PO2 06
institute. You have been given a dataset containing information
about students' performance in various subjects. The dataset
includes the following columns:
pages. Revealing of identification, appeal to evaluator will be treated as malpractice.

Student ID, Name, Gender, Age, Grade: The grade level of the
student (e.g., 10th, 11th), Math Score, Science Score, English
Score, Participation in Extracurricular Activities: Indicates if the
student participates in extracurricular activities (e.g., Yes, No).,
Parent's Education Level.
Classify the types of data in the given dataset and identify their
measurement scales.
c) Demonstrate any four types of analysis that could be done for each CO2 PO2 08
of the following domains using EDA techniques. Justify your
answer.
i. Professional Sports
ii. Healthcare
OR
2 a) Define Exploratory data analysis. Point out the primary aim and CO1 PO1 06
explain the significance of the same.
b) Demonstrate any three types of analysis that could be done for CO2 PO2 06
Marketing domain using EDA techniques. Justify your answer.
c) You have been provided with a dataset from a health and fitness CO2 PO2 08
app that tracks users' activities, dietary habits, and other lifestyle
factors. The goal is to help the company understand different types
of data, including their measurement scales, to improve the design
of the app and offer personalized recommendations. The dataset
includes the following variables:
• User Demographics: Age, Gender, Height, Weight, City.
• Physical Activity Data: Daily Step Count, Hours of Sleep,
Calories Burned, Exercise Frequency.
• Dietary Data: Daily Water Intake, Daily Caloric Intake,
Food Categories.
• Health Metrics: Blood Pressure, Cholesterol Level, Blood
Sugar Level.
• User Feedback: Overall Health Rating, Exercise
Enjoyment Rating, Sleep Quality Rating
Classify the types of data in the given dataset and identify their
measurement scales.
UNIT - II
3 a) Discuss the process of data deduplication and the steps involved CO1 PO1 06
in the same with an example program.
b) Write a Python program to perform the append and concatenation CO1 PO1 06
(with axis=1) operations on the following dataframes. Show the
appropriate output.
data1 = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35] }
data2 = { 'Name': ['David', 'Eva'], 'Age': [40, 45] }.
c) A retail company has been collecting data from its sales CO2 PO2 08
transactions, which includes the following information.
a. TansactionID: Unique identifier for each transaction.
b. CustomerID: Identifier for the customer who made the
purchase.
c. ProductID: Identifier for the product purchased.
d. Quantity: Quantity of the product purchased in the
transaction.
e. Price: Price per unit of the product.
f. TotalAmount: Total amount for the transaction (Quantity
* Price).
However, some of the records in the dataset are duplicates as
shown below due to system errors, such as customers placing
multiple orders for the same product, or the same transaction being
recorded more than once.
TransactionID CustomerID ProductID Quantity Price TotalAmount
0 1 101 1001 1 20 20
1 2 102 1002 2 15 30
2 3 101 1001 1 20 20
3 4 104 1003 5 30 150
4 5 103 1002 2 15 30
5 6 102 1002 3 15 45
6 7 101 1001 1 20 20

Write a Python program to do the following and show

subsequent output after each step.
a. Identify duplicate rows.
b. Removing Duplicates
c. Instead of simply removing the duplicates, replace
duplicate value by aggregating data such as summing
up the quantities and total amount and average the
prices for the duplicate records.
OR
4 a) Discuss the importance of random sampling of a dataset. CO1 PO1 06
Differentiate between random sampling with replacement and
without replacement.
b) A retail company has collected data on the ages of its customers CO2 PO2 06
to analyze purchasing behavior across different age groups. To
perform a more meaningful analysis, the company wants to
classify the customers into distinct age groups (bins) rather than
working with individual age values.
The dataset contains the following columns:
• CustomerID: Unique identifier for each customer.
• Age: Age of the customer.
• Spending Amount: Total amount spent by the customer in
the last month.
Write a Python program to categorize customers into predefined
age groups such as "18-25", "26-35", "36-45", etc. and find the
average of their spending behavior within each group.
c) A company collects data from two different sources: one CO2 PO2 08
containing information about its employees, and another
containing the details of the projects that each employee is
working on.
DataFrame- df_employees
EmployeeID Name Department

0 1 Alice HR
1 2 Bob Finance
2 3 Charlie IT
3 4 David Sales
4 5 Eva Marketing
DataFrame- df_projects
EmployeeID ProjectID ProjectName

0 1 101 Project A
1 2 102 Project B
2 4 103 Project C
3 5 104 Project D
4 6 105 Project E
These two data sources need to be merged to gain a complete
picture of each employee and their associated projects.

Analyse the types of joins that needs to performed to do the

following. Write necessary code along with suitable output.
1. If an employee doesn't have a corresponding project, they
will be excluded from the result.
2. A join that returns all rows from the df_employees and the
matching rows from the right df_projects. If there is no
match, the result will contain NaN for columns from the right
DataFrame.
3. A join that returns all rows from the df_projects and the
matching rows from df_employees. If there is no match, the
result will contain NaN for columns from the left DataFrame.
4. A join that returns all rows from both DataFrames, with
matching records from both sides when available. If there is
no match, NaN is filled in for the columns from the
DataFrame without a match.

UNIT - III
5 a) Explain the most common measures for analyzing the distribution CO1 PO1 06
frequency. Demonstrate the same with a Python program.
b) You are a data analyst working for a retail company that tracks CO2 PO2 06
sales data across different regions and products. You have a
dataset containing information about sales figures and profit
margins for various products in different regions.
Data set: 'Region': ['North', 'North', 'South', 'South', 'North',
'South'], 'Product': ['A', 'B', 'A', 'B', 'A', 'B'], 'Sales': [100, 150, 200,
120, 180, 90], 'Profit_Margin': [0.2, 0.3, 0.25, 0.15, 0.18, 0.12]
Write a program to calculate the total sales and maximum profit
margin for each product across all regions.

c) Write a Python program to do the following. CO2 PO2 08

i.Create a dataframe for the following data.
1. Date: The date of the transaction or record.
2. Category: The category of the product sold (e.g.,
Electronics, Clothing, Food).
3. Region: The region where the sale took place (e.g., North,
South, East, West).
4. Sales: The total sales amount for each transaction.
5. Units Sold: The number of units sold in the transaction
ii.Create a pivot table that shows the total Sales by Category and
Region.
iii.Create a cross-tabulation that shows the Units Sold by Category
and Region.
iv.Print the average sales per unit sold for each product category
across different regions.
v.Print the region that has the highest sales for the Electronics
category.

OR
6 a) Illustrate the various characteristics of the time series dataset with CO1 PO1 06
suitable example.
b) As a data analyst for a retail company that wants to understand the CO2 PO2 06
relationship between the amount spent by customers and the time
they spend on the company’s website. This information will aid in
optimizing the website experience and marketing strategies.
1. What type of analysis technique can be applied?
2. Write a panda’s code to explore the relationship between the
time spent on the website (in minutes) and the amount spent by
customers (in dollars) to derive insights for the retail company.

c) We have a dataset with information about employees in an CO2 PO2 08

organization. The dataset includes Employee ID, Department,
Training Hours (hours spent in training during the last quarter),
Job Performance Score (a composite score based on efficiency,
quality of work, and job satisfaction), and Job Satisfaction (on a
scale from 1 to 10).
Employee Department Training Job Job
ID Hours Performance Satisfaction
Scores
101 Sales 40 85 8
102 Marketing 50 90 9
103 HR 20 75 7
104 IT 60 92 9
a. Write a Python program to do the following.
i. Create a data frame for the data shown.
ii. Find the correlation between Training Hours and Job
Performance Score for each department.
iii. Find the overall correlation between Training Hours and
Job Performance Score across all departments.
b. Interpret the output for the given queries.
i. Find the correlation between Training Hours and Job
Performance Score for each department.
ii. Find the overall correlation between Training Hours and
Job Performance Score across all departments.

UNIT - IV
7 a) Illustrate the linear scale and logarithmic scale with respect to axes CO1 PO1 06
representation. Explain how the following data points could be
represented using both the scales.
1,3.16,10,31.6,100
b) Consider the following monthly sales dataset for 4 regions (North, CO2 PO2 06
South, East, West) across 6 months (January to June). The dataset
contains the total sales in dollars for each region each month.

Identify the various plots and charts that are could be used to
visualize the given data under following circumstances. Justify
your answer.
i. To compare the total sales in each region across the
months.
ii. To show individual data points for each month's sales
across regions.
iii. To visualize the sales performance across regions and
months in a more holistic way, where colour intensity
represents sales volume.
c) Explore the various plots and charts used to visualize associations CO2 PO2 08
among two or more quantitative variables. Analyze the
visualization strategies that could be adopted to address the
following relationships.
i. Relationship of two variables.
ii. Same types of variables with two different types of
scales, position and size.
iii. More than three to four quantitative variables.

OR
8 a) List the limitations of Histogram and Density Plot. Explain how 06
these limitations are handled using Empirical Cumulative 1 1
Distribution Functions with an example.
b) Consider a business scenario where a company wants to analyze CO2 PO2 06
the revenue composition across different product categories.
Propose a specific visualization method that breaks down revenue
proportions, highlighting the parts of the total. Justify your choice
and explain how this visualization could guide decision-making
within the company.
c) Describe how confidence intervals are used to convey uncertainty 08
in curve fits. What information do confidence intervals provide, 1 1
and how can they be visually integrated into a curve fit
visualization?
UNIT - V
9 a) Write a Python program to sort an array of words from longest to 06
1 1
shortest length.
b) Consider the following data set. CO2 PO2 06
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
'Score': [85, 90, 78, 88, 92] }
Write python code by creating a dataframe of the above dataset to
do the following.
i. Access the first name
ii. Select Name and City from first three rows.
iii. Print rows where age>25
c) Write a Python program that demonstrates how to create an CO2 PO2 08
Employee database with ID, Name, Age, Department as attributes
using SQLite, insert data into it and then transfer the data into a
Pandas dataframe.

OR
10 a) Write a Python program demonstrating how to reshape the data CO1 PO1 06
using melt() and pivot() methods. Illustrate the purpose of the
same.
b) Write a Python program demonstrating the use of Web APIs and CO2 PO2 06
processing the retrieved data using pandas.
c) Write a Python program to create a Student dataframe with ID, CO2 PO2 08
Name, Age, Grade and Score. Find the missing values if any and
fill the same using mean, mode and median for whichever data
applicable. Justify your choice of filling the missing values.

******

22CS5PEDEV
No ratings yet
22CS5PEDEV
3 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
5 pages
22CS5PEDEV
No ratings yet
22CS5PEDEV
5 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
4 pages
Xii Ip CHN 03 QP
No ratings yet
Xii Ip CHN 03 QP
6 pages
Kendriya Vidyalaya Sangathan: Kolkata Region First Preboard E Informatics Practices New (065) - Class Xii
No ratings yet
Kendriya Vidyalaya Sangathan: Kolkata Region First Preboard E Informatics Practices New (065) - Class Xii
15 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
65 (Hs Xii A SC Com Ip 22)
No ratings yet
65 (Hs Xii A SC Com Ip 22)
11 pages
DOC-20231220-WA0095..pdf 20231220 215403 0000
No ratings yet
DOC-20231220-WA0095..pdf 20231220 215403 0000
44 pages
IP Practical File2
No ratings yet
IP Practical File2
35 pages
Programming Notes 3
No ratings yet
Programming Notes 3
3 pages
IT Exam Answer Key
No ratings yet
IT Exam Answer Key
6 pages
Python DataFrame and Series Practices
0% (1)
Python DataFrame and Series Practices
38 pages
Ip MS
No ratings yet
Ip MS
6 pages
IP Practicals
No ratings yet
IP Practicals
30 pages
IP Question Paper 2020-2021
No ratings yet
IP Question Paper 2020-2021
9 pages
Class XII Informatics Practices
100% (1)
Class XII Informatics Practices
8 pages
STD Xii-Tee - Ip
No ratings yet
STD Xii-Tee - Ip
9 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Computer Paper12
No ratings yet
Computer Paper12
15 pages
Practice Paper 1
No ratings yet
Practice Paper 1
6 pages
Flipkart Analyst Interview Insights
No ratings yet
Flipkart Analyst Interview Insights
16 pages
Python & SQL Exam Paper
No ratings yet
Python & SQL Exam Paper
9 pages
Class Xii Ip - Sahodaya Set2 2023
No ratings yet
Class Xii Ip - Sahodaya Set2 2023
13 pages
Ip Final Practical File
No ratings yet
Ip Final Practical File
22 pages
Informatics Practices Exam Paper 2022-23
No ratings yet
Informatics Practices Exam Paper 2022-23
8 pages
Practical File
No ratings yet
Practical File
27 pages
Holiday Worksheet
No ratings yet
Holiday Worksheet
9 pages
Informatics Practices
No ratings yet
Informatics Practices
9 pages
Xii Ip CHN 02 QP
No ratings yet
Xii Ip CHN 02 QP
5 pages
Class Xii Ip Pb-I QP
No ratings yet
Class Xii Ip Pb-I QP
51 pages
Class XII Informatics Practices Guide
No ratings yet
Class XII Informatics Practices Guide
9 pages
Class 12 Informatics Practices Sample Paper
No ratings yet
Class 12 Informatics Practices Sample Paper
59 pages
Ip 1
No ratings yet
Ip 1
8 pages
XII IP Practical List 2025-26 - KV1UDR
No ratings yet
XII IP Practical List 2025-26 - KV1UDR
4 pages
Python DataFrame and Plotting Tasks
No ratings yet
Python DataFrame and Plotting Tasks
14 pages
Class XII - Informatics Practices
No ratings yet
Class XII - Informatics Practices
7 pages
XIIInfo Pract S E 427
No ratings yet
XIIInfo Pract S E 427
5 pages
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
No ratings yet
Httppython Mykvs inuploadsfilesXIIInfo Pract S E 150 PDF
15 pages
Xii PB 2023 Ip 201123
No ratings yet
Xii PB 2023 Ip 201123
6 pages
Practical List 2022-23
100% (1)
Practical List 2022-23
4 pages
Fds QB
No ratings yet
Fds QB
6 pages
Pragya File
No ratings yet
Pragya File
31 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Python Project File
No ratings yet
Python Project File
31 pages
Amity International School SESSION: 2024-25 Informatics Practices (065) Class Xii Practical List
No ratings yet
Amity International School SESSION: 2024-25 Informatics Practices (065) Class Xii Practical List
5 pages
SQL and Pandas Exam Practice Questions
No ratings yet
SQL and Pandas Exam Practice Questions
4 pages
List of Practicals XII IP 2023-24
No ratings yet
List of Practicals XII IP 2023-24
5 pages
Xii Ip CHD QP PB1
No ratings yet
Xii Ip CHD QP PB1
6 pages
Work Sheet Informatics Practices
No ratings yet
Work Sheet Informatics Practices
6 pages
Mock2 IP
No ratings yet
Mock2 IP
9 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
Class Xii Ip - Sahodaya Set1 2023
No ratings yet
Class Xii Ip - Sahodaya Set1 2023
13 pages
Class 12 Ip Pre Board Set A
No ratings yet
Class 12 Ip Pre Board Set A
9 pages
Bangluru Ip
No ratings yet
Bangluru Ip
6 pages
Self Practical File Tina Gupta
No ratings yet
Self Practical File Tina Gupta
45 pages
IP Practical G12
No ratings yet
IP Practical G12
5 pages
Prac 2
No ratings yet
Prac 2
11 pages
International Shipping Notice
No ratings yet
International Shipping Notice
2 pages
BOOK
No ratings yet
BOOK
10 pages
Upendra-CV - Updated-1
No ratings yet
Upendra-CV - Updated-1
9 pages
Nimit's Resume PDF
No ratings yet
Nimit's Resume PDF
1 page
Balance Sheet: (Company Name)
No ratings yet
Balance Sheet: (Company Name)
2 pages
TQM Strategies for Zambian Firms
No ratings yet
TQM Strategies for Zambian Firms
3 pages
A Study On Customer Satisfaction Towards Maruti Suzuki
No ratings yet
A Study On Customer Satisfaction Towards Maruti Suzuki
61 pages
12go Booking 8060054
No ratings yet
12go Booking 8060054
11 pages
SHE Performance Standard - SHE Officer - Assistant SHE Officer
No ratings yet
SHE Performance Standard - SHE Officer - Assistant SHE Officer
4 pages
Content Strategy for Bike Prompts
No ratings yet
Content Strategy for Bike Prompts
7 pages
Entrepreneurship Development and Project Management
100% (1)
Entrepreneurship Development and Project Management
20 pages
Bsa1202-Fs2122-Incometax-03-Solutionx: Problem 1 - Mr. F. Ficiente
No ratings yet
Bsa1202-Fs2122-Incometax-03-Solutionx: Problem 1 - Mr. F. Ficiente
6 pages
Dove SWOT Analysis and Market Insights
No ratings yet
Dove SWOT Analysis and Market Insights
4 pages
Payment and Receivable Records
No ratings yet
Payment and Receivable Records
4 pages
SAP S/4HANA Overview and Innovations
No ratings yet
SAP S/4HANA Overview and Innovations
27 pages
Previous Reviews
No ratings yet
Previous Reviews
5 pages
Cbebirr Receipt Che5s4qo7l
No ratings yet
Cbebirr Receipt Che5s4qo7l
1 page
Report Timken
50% (4)
Report Timken
45 pages
04 SWOT and CAME Exercise-1
No ratings yet
04 SWOT and CAME Exercise-1
7 pages
HOPEX Agile EA
No ratings yet
HOPEX Agile EA
14 pages
ABTTchapter 1
No ratings yet
ABTTchapter 1
6 pages
Consumer Buying Behaviour Towards Online Food Delivery Apps in North Maharashtra Region
No ratings yet
Consumer Buying Behaviour Towards Online Food Delivery Apps in North Maharashtra Region
4 pages
Work in Progress
No ratings yet
Work in Progress
5 pages
Environmentalmanagement 191215065846
No ratings yet
Environmentalmanagement 191215065846
25 pages
Test-Your-Skills CompetitorResearch Worksheet
No ratings yet
Test-Your-Skills CompetitorResearch Worksheet
2 pages
Chpter 2
No ratings yet
Chpter 2
2 pages
Civil Law Act 1956
No ratings yet
Civil Law Act 1956
38 pages
GSTR1 09brrpc4205j1zo 102024
No ratings yet
GSTR1 09brrpc4205j1zo 102024
5 pages
Marketing Services Agreement
No ratings yet
Marketing Services Agreement
9 pages
Set 2 Tutorial Ias 0ne
No ratings yet
Set 2 Tutorial Ias 0ne
7 pages

23CS5PCDEV

Uploaded by

23CS5PCDEV

Uploaded by

U.S.N.

B.M.S. College of Engineering, Bengaluru-560019

April 2025 Semester End Make-Up Examinations

Programme: B.E. Semester: V

1 a) Discuss the generic steps involved in Exploratory data analysis CO1 PO 06

Write a Python program to do the following and show

Analyse the types of joins that needs to performed to do the

c) Write a Python program to do the following. CO2 PO2 08

c) We have a dataset with information about employees in an CO2 PO2 08

You might also like