23CS5PCDEV

This document outlines the examination structure for the B.E. program in Computer Science and Engineering at B.M.S. College of Engineering for the January/February 2025 semester. It includes instructions for answering questions from various units related to Data Exploration and Visualization, with specific tasks such as data classification, cleaning datasets, and applying statistical measures. The exam consists of multiple units with questions that require both theoretical explanations and practical Python code snippets.

Uploaded by

siddanthn.me24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views4 pages

23CS5PCDEV

Uploaded by

siddanthn.me24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

U.S.N.

B.M.S. College of Engineering, Bengaluru-560019

Autonomous Institute Affiliated to VTU

January / February 2025 Semester End Main Examinations

Programme: B.E. Semester: V

Branch: Computer Science and Engineering Duration: 3 hrs.
Course Code: 23CS5PCDEV Max Marks: 100
Course: Data Exploration and Visualization

Instructions: 1. Answer any FIVE full questions, choosing one full question from each unit.
2. Missing data, if any, may be suitably assumed.
Important Note: Completing your answers, compulsorily draw diagonal cross lines on the remaining blank

UNIT - I CO PO Marks

1 a) Explicate in brief the steps involved in Exploratory Data Analysis CO2

PO
10
(EDA). 1

b) With an example, discuss Numerical data and Categorical data

types along with its various sub-classification.
pages. Revealing of identification, appeal to evaluator will be treated as malpractice.

For the given Student Record, classify the data into Numerical or
categorical. Justify your answer.
STUDENT_ID = 1001
Name = REYANSH CO1 PO2 10
Address = Mannsverk 61, 5094, M G ROAD, BENGALURU
Date of birth = 10th July 2018
Email = [email protected]
Weight = 60
Gender = Male

OR
2 a) Compare EDA with classical and Bayesian Analysis. CO2 PO1 10
b) List out the different types of measurement scales described in CO2 PO1 10
statistics. Explain each of them with a suitable example.

UNIT - II
3 a) You have the following dataset of ages (in years):
[25,30,"NaN",40,35,28,"missing",22] CO1 PO3 10
i. Clean this dataset by replacing the missing values
("NaN", "missing") with the median of the available
values.
ii. Define data transformation for the above dataset,
replace the numerical data(ages) into category such as
(Youth, Gentlemen, Senior). Clearly specify the range
value considered.
b) What is binning in data transformation? Given data on the heights
of a group of students as follows: height = [120, 122, 125, 127,
121, 123, 137, 131, 161, 145, 141, 132], convert that dataset into
intervals of 118 to 125, 126 to 135, 136 to 160, and finally 160 and CO1 PO3 10
higher.
Write the suitable python code snippet and write the possible
output for the same of the above operation.

OR
4 a) Given the dataset of income values (in Rupees):
[45000,52000,48000,51000,60000,52000,"error"]

i. Identify and remove the erroneous value ("error")

from the dataset, and calculate the average income
of the cleaned dataset. CO1 PO3 10
ii. Apply Data transformation technique to replace the
income values to categorical such as [Fresher,
Experienced, HighNetWorth]
Write the corresponding Python code snippet.
b) Demonstrate with suitable examples, how does skewness help in
understanding the distribution of data, and how can positive or CO2 PO2 10
negative skewness be interpreted to identify potential outliers in a
dataset?
UNIT - III
5 a) Illustrate cross-tabulation and Pivot table with suitable example CO1 PO1
10
with appropriate code snippet.
b) Discuss the concept of linear interpolation and how it is applied to CO1 PO1
fill missing values in a time-series dataset with an example.
Examine the potential impact of outliers on the effectiveness of 10
linear interpolation. How might extreme values influence the
interpolated results?

OR
6 a) Give a case study on univariate and multivariate analysis with
CO1 PO4 10
example.
b) What is central tendency and Dispersion? For the given data set, CO1 PO3 10
apply central tendency measures (mean, median and mode) and
also apply Dispersion measures (Range, Variance, Standard
Deviation).
Dataset of daily temperatures recorded over a week: 22°C, 23°C,
21°C, 25°C, 22°C, 24°C, and 20°C.

UNIT - IV
7 a) A Pandas DataFrame contains the exam scores of students in three CO3 PO3
subjects: Math, Science, and English. Give Python code snippet to
create:
i. A histogram to visualize the distribution of Math 10
scores?
ii. A density plot (kernel density estimate) to visualize the
distribution of Science scores?
b) Interpret the purpose of a scatter plot matrix (pair plot) and a CO3 PO3
correlation matrix heatmap. How do these visualizations help in
understanding the relationships between multiple quantitative 10
variables? Write python code snippet to draw plot matrix and
heatmap.

OR
8 a) Illustrate how data can be mapped onto aesthetics, scales, and CO3 PO3
coordinate systems in data visualization. Provide an example
using a scatter plot where the x-axis represents a numerical 10
variable, the y-axis represents another numerical variable, and the
color represents a categorical variable.
b) Demonstrate how visualizations like error bars and stacked bar
charts can be used to represent data uncertainty and proportions, CO2 PO4 10
respectively. What insights do these visualizations provide?

UNIT - V
9 a) Case study on Data wrangling:
Given with a sales dataset that contains information on
transactions made over the last year. The dataset has columns such
as TransactionID, ProductCategory, Price, Quantity, and
DateOfSale. It is noticed that some missing values in the Price and
Quantity columns are seen, with some rows where both are
CO3 PO4 10
missing, as well as a few duplicates based on TransactionID. The
goal is to clean the data for a time-series analysis on monthly sales
trends. How can these missing values and duplicates handled
efficiently?
List out the steps you would apply for the above goal. Give
corresponding python code snippet.
b) Demonstrate using a Python code snippet to scrape the titles and CO3 PO3
URLs of all articles from the homepage of a news website.
10
OR
10 a) Demonstrate string manipulation in Pandas for string replacement CO3 PO3
10
and combining of strings on a Data frame and columns.
b) A Pandas DataFrame with monthly sales data for different CO3 PO3
products.
i. Create a DataFrame which contains montly sales of
three products in three months viz Jan, Feb and March. 10
ii. Compute the total sales for each product across all
months using vectorized operations?
iii. Calculate the average sales for each product across all
months using vectorized operations in Pandas?

******

23CS5PCDEV
No ratings yet
23CS5PCDEV
5 pages
22CS5PEDEV
No ratings yet
22CS5PEDEV
5 pages
22CS5PEDEV
No ratings yet
22CS5PEDEV
3 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Python CAT Papers
No ratings yet
Python CAT Papers
6 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
7 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
Question Bank2 1722502558363
No ratings yet
Question Bank2 1722502558363
6 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
FDS Apr - May 2024
No ratings yet
FDS Apr - May 2024
4 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
QB For DS - V Sem Students
No ratings yet
QB For DS - V Sem Students
23 pages
Data Analysis Exam Prep Guide
No ratings yet
Data Analysis Exam Prep Guide
4 pages
Question Bank CIA 2
No ratings yet
Question Bank CIA 2
3 pages
Class 12 IP Pre-Board Exam 2019-20
No ratings yet
Class 12 IP Pre-Board Exam 2019-20
11 pages
Work Sheet-1 Class 12 IPR
No ratings yet
Work Sheet-1 Class 12 IPR
5 pages
20ad41sc - Data Representation and Analysis Using Python
No ratings yet
20ad41sc - Data Representation and Analysis Using Python
2 pages
Edap Cse (Ai) - B
No ratings yet
Edap Cse (Ai) - B
4 pages
VIP Question Bank For DPV For Theory Exam
No ratings yet
VIP Question Bank For DPV For Theory Exam
6 pages
1
No ratings yet
1
3 pages
12TH Hy Ip St. Mary 2023
No ratings yet
12TH Hy Ip St. Mary 2023
10 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
Fds QB
No ratings yet
Fds QB
6 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Question Bank1 1722502541307
No ratings yet
Question Bank1 1722502541307
2 pages
FDS Model
No ratings yet
FDS Model
4 pages
Class XII Informatics Practices Exam Paper
No ratings yet
Class XII Informatics Practices Exam Paper
14 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
Data Science
No ratings yet
Data Science
3 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
QA Unit 4
No ratings yet
QA Unit 4
2 pages
CS-605 DataAnalyticsLab Manav
No ratings yet
CS-605 DataAnalyticsLab Manav
20 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
Aids - 21ad62 - Datascience Lab Manual-1
No ratings yet
Aids - 21ad62 - Datascience Lab Manual-1
15 pages
CS3361 Data Science Lab Exam Guide
No ratings yet
CS3361 Data Science Lab Exam Guide
3 pages
DSV Manual Final
No ratings yet
DSV Manual Final
47 pages
DSBDA Lab Manual24-25
No ratings yet
DSBDA Lab Manual24-25
58 pages
DVW 203105491 - 5926 - Question - Paper
No ratings yet
DVW 203105491 - 5926 - Question - Paper
2 pages
23CSE312-MQP Python Sjbit
No ratings yet
23CSE312-MQP Python Sjbit
3 pages
Manishadav
No ratings yet
Manishadav
27 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
23 pages
Practical 2 fKs4RPadH3
No ratings yet
Practical 2 fKs4RPadH3
4 pages
Python Pandas DataFrame Exercises
No ratings yet
Python Pandas DataFrame Exercises
29 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
Set-B - CT2 - AnswerKey
No ratings yet
Set-B - CT2 - AnswerKey
10 pages
PRACTICAL QUESTIONS For DSBDA
No ratings yet
PRACTICAL QUESTIONS For DSBDA
9 pages
IP Question Paper 2020-2021
No ratings yet
IP Question Paper 2020-2021
9 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Data Analytics Lab Course Overview
No ratings yet
Data Analytics Lab Course Overview
125 pages
Machine
No ratings yet
Machine
10 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Ii - CS3352 - Int Iv - QB
No ratings yet
Ii - CS3352 - Int Iv - QB
3 pages
Ds Lab Assignment 4
No ratings yet
Ds Lab Assignment 4
4 pages
TYBSC (CS) - CS - 354 Foundations of Data Science
No ratings yet
TYBSC (CS) - CS - 354 Foundations of Data Science
2 pages
Ai Lab PRGM
No ratings yet
Ai Lab PRGM
10 pages
Computing Unit 4
No ratings yet
Computing Unit 4
37 pages
Statistical Questions For Practice Exercises
No ratings yet
Statistical Questions For Practice Exercises
7 pages
Quantitative Methods for Management Syllabus
No ratings yet
Quantitative Methods for Management Syllabus
2 pages
F-Distribution Tables for Students
No ratings yet
F-Distribution Tables for Students
4 pages
Interpreting Results of Regression With Interaction Terms
No ratings yet
Interpreting Results of Regression With Interaction Terms
3 pages
Presentation On Clustering High Dimensional Data
No ratings yet
Presentation On Clustering High Dimensional Data
10 pages
Input Data SPSS: Quiz Sesi II Statistik Berbasis Komputer
No ratings yet
Input Data SPSS: Quiz Sesi II Statistik Berbasis Komputer
16 pages
Analysis of Longitudinal Data Second Edition Peter Diggle PDF Download
100% (3)
Analysis of Longitudinal Data Second Edition Peter Diggle PDF Download
49 pages
Hypothesis For Math
No ratings yet
Hypothesis For Math
42 pages
Math 12 Quiz Bee Worksheets
No ratings yet
Math 12 Quiz Bee Worksheets
14 pages
CS5103 Lecture Plan - Fundamnetals of Data Science
No ratings yet
CS5103 Lecture Plan - Fundamnetals of Data Science
2 pages
Dependent-Samples T Test
No ratings yet
Dependent-Samples T Test
26 pages
Moments
No ratings yet
Moments
42 pages
Cost
No ratings yet
Cost
6 pages
Paired T Test
No ratings yet
Paired T Test
19 pages
Week 6
No ratings yet
Week 6
2 pages
Satisfaction Attributes and Satisfaction of Customers: The Case of Korean Restaurants in Bataan
No ratings yet
Satisfaction Attributes and Satisfaction of Customers: The Case of Korean Restaurants in Bataan
10 pages
Expt. 12 Forecasting 214
No ratings yet
Expt. 12 Forecasting 214
12 pages
Real Estate Analysis Part I
No ratings yet
Real Estate Analysis Part I
8 pages
Skewness
50% (2)
Skewness
6 pages
Victor Chernozhukov: Lectures On Recent Developments in Econometrics
No ratings yet
Victor Chernozhukov: Lectures On Recent Developments in Econometrics
1 page
Group Project STA 108
0% (1)
Group Project STA 108
18 pages
Combined QP (Reduced) - S1 Edexcel PDF
No ratings yet
Combined QP (Reduced) - S1 Edexcel PDF
107 pages
Rank Biserial Correlation
100% (1)
Rank Biserial Correlation
3 pages
Top 45 Machine Learning Interview Questions 2024
100% (1)
Top 45 Machine Learning Interview Questions 2024
34 pages
Microeconometrics
No ratings yet
Microeconometrics
228 pages
Automatic Debiased Machine Learning Via Neural Nets For Generalized Linear Regression
No ratings yet
Automatic Debiased Machine Learning Via Neural Nets For Generalized Linear Regression
30 pages