0% found this document useful (0 votes)
12 views4 pages

23CS5PCDEV

This document outlines the examination structure for the B.E. program in Computer Science and Engineering at B.M.S. College of Engineering for the January/February 2025 semester. It includes instructions for answering questions from various units related to Data Exploration and Visualization, with specific tasks such as data classification, cleaning datasets, and applying statistical measures. The exam consists of multiple units with questions that require both theoretical explanations and practical Python code snippets.

Uploaded by

siddanthn.me24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

23CS5PCDEV

This document outlines the examination structure for the B.E. program in Computer Science and Engineering at B.M.S. College of Engineering for the January/February 2025 semester. It includes instructions for answering questions from various units related to Data Exploration and Visualization, with specific tasks such as data classification, cleaning datasets, and applying statistical measures. The exam consists of multiple units with questions that require both theoretical explanations and practical Python code snippets.

Uploaded by

siddanthn.me24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

U.S.N.

B.M.S. College of Engineering, Bengaluru-560019


Autonomous Institute Affiliated to VTU

January / February 2025 Semester End Main Examinations

Programme: B.E. Semester: V


Branch: Computer Science and Engineering Duration: 3 hrs.
Course Code: 23CS5PCDEV Max Marks: 100
Course: Data Exploration and Visualization

Instructions: 1. Answer any FIVE full questions, choosing one full question from each unit.
2. Missing data, if any, may be suitably assumed.
Important Note: Completing your answers, compulsorily draw diagonal cross lines on the remaining blank

UNIT - I CO PO Marks

1 a) Explicate in brief the steps involved in Exploratory Data Analysis CO2


PO
10
(EDA). 1

b) With an example, discuss Numerical data and Categorical data


types along with its various sub-classification.
pages. Revealing of identification, appeal to evaluator will be treated as malpractice.

For the given Student Record, classify the data into Numerical or
categorical. Justify your answer.
STUDENT_ID = 1001
Name = REYANSH CO1 PO2 10
Address = Mannsverk 61, 5094, M G ROAD, BENGALURU
Date of birth = 10th July 2018
Email = [email protected]
Weight = 60
Gender = Male

OR
2 a) Compare EDA with classical and Bayesian Analysis. CO2 PO1 10
b) List out the different types of measurement scales described in CO2 PO1 10
statistics. Explain each of them with a suitable example.

UNIT - II
3 a) You have the following dataset of ages (in years):
[25,30,"NaN",40,35,28,"missing",22] CO1 PO3 10
i. Clean this dataset by replacing the missing values
("NaN", "missing") with the median of the available
values.
ii. Define data transformation for the above dataset,
replace the numerical data(ages) into category such as
(Youth, Gentlemen, Senior). Clearly specify the range
value considered.
b) What is binning in data transformation? Given data on the heights
of a group of students as follows: height = [120, 122, 125, 127,
121, 123, 137, 131, 161, 145, 141, 132], convert that dataset into
intervals of 118 to 125, 126 to 135, 136 to 160, and finally 160 and CO1 PO3 10
higher.
Write the suitable python code snippet and write the possible
output for the same of the above operation.

OR
4 a) Given the dataset of income values (in Rupees):
[45000,52000,48000,51000,60000,52000,"error"]

i. Identify and remove the erroneous value ("error")


from the dataset, and calculate the average income
of the cleaned dataset. CO1 PO3 10
ii. Apply Data transformation technique to replace the
income values to categorical such as [Fresher,
Experienced, HighNetWorth]
Write the corresponding Python code snippet.
b) Demonstrate with suitable examples, how does skewness help in
understanding the distribution of data, and how can positive or CO2 PO2 10
negative skewness be interpreted to identify potential outliers in a
dataset?
UNIT - III
5 a) Illustrate cross-tabulation and Pivot table with suitable example CO1 PO1
10
with appropriate code snippet.
b) Discuss the concept of linear interpolation and how it is applied to CO1 PO1
fill missing values in a time-series dataset with an example.
Examine the potential impact of outliers on the effectiveness of 10
linear interpolation. How might extreme values influence the
interpolated results?

OR
6 a) Give a case study on univariate and multivariate analysis with
CO1 PO4 10
example.
b) What is central tendency and Dispersion? For the given data set, CO1 PO3 10
apply central tendency measures (mean, median and mode) and
also apply Dispersion measures (Range, Variance, Standard
Deviation).
Dataset of daily temperatures recorded over a week: 22°C, 23°C,
21°C, 25°C, 22°C, 24°C, and 20°C.

UNIT - IV
7 a) A Pandas DataFrame contains the exam scores of students in three CO3 PO3
subjects: Math, Science, and English. Give Python code snippet to
create:
i. A histogram to visualize the distribution of Math 10
scores?
ii. A density plot (kernel density estimate) to visualize the
distribution of Science scores?
b) Interpret the purpose of a scatter plot matrix (pair plot) and a CO3 PO3
correlation matrix heatmap. How do these visualizations help in
understanding the relationships between multiple quantitative 10
variables? Write python code snippet to draw plot matrix and
heatmap.

OR
8 a) Illustrate how data can be mapped onto aesthetics, scales, and CO3 PO3
coordinate systems in data visualization. Provide an example
using a scatter plot where the x-axis represents a numerical 10
variable, the y-axis represents another numerical variable, and the
color represents a categorical variable.
b) Demonstrate how visualizations like error bars and stacked bar
charts can be used to represent data uncertainty and proportions, CO2 PO4 10
respectively. What insights do these visualizations provide?

UNIT - V
9 a) Case study on Data wrangling:
Given with a sales dataset that contains information on
transactions made over the last year. The dataset has columns such
as TransactionID, ProductCategory, Price, Quantity, and
DateOfSale. It is noticed that some missing values in the Price and
Quantity columns are seen, with some rows where both are
CO3 PO4 10
missing, as well as a few duplicates based on TransactionID. The
goal is to clean the data for a time-series analysis on monthly sales
trends. How can these missing values and duplicates handled
efficiently?
List out the steps you would apply for the above goal. Give
corresponding python code snippet.
b) Demonstrate using a Python code snippet to scrape the titles and CO3 PO3
URLs of all articles from the homepage of a news website.
10
OR
10 a) Demonstrate string manipulation in Pandas for string replacement CO3 PO3
10
and combining of strings on a Data frame and columns.
b) A Pandas DataFrame with monthly sales data for different CO3 PO3
products.
i. Create a DataFrame which contains montly sales of
three products in three months viz Jan, Feb and March. 10
ii. Compute the total sales for each product across all
months using vectorized operations?
iii. Calculate the average sales for each product across all
months using vectorized operations in Pandas?

******

You might also like