0% found this document useful (0 votes)
12 views3 pages

CS3352 Apr2023

The document is an exam paper for the CS 3352 course on Foundations of Data Science, intended for B.E./B.Tech. students in their third semester. It consists of three parts: Part A includes short answer questions, Part B contains detailed questions requiring elaboration and calculations, and Part C focuses on statistical concepts and regression analysis. The exam is scheduled for April/May 2023 and is worth a total of 100 marks.

Uploaded by

Mageshms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

CS3352 Apr2023

The document is an exam paper for the CS 3352 course on Foundations of Data Science, intended for B.E./B.Tech. students in their third semester. It consists of three parts: Part A includes short answer questions, Part B contains detailed questions requiring elaboration and calculations, and Part C focuses on statistical concepts and regression analysis. The exam is scheduled for April/May 2023 and is worth a total of 100 marks.

Uploaded by

Mageshms
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Here’s a clean and organized version of the exam paper you uploaded, formatted

for a Word document:

CS 3352 – FOUNDATIONS OF DATA SCIENCE


B.E./B.Tech. DEGREE EXAMINATIONS, APRIL/MAY 2023
Third Semester – Computer Science and Engineering
(Common to: Computer and Communication Engineering / Information
Technology)
Regulations 2021
Time: Three Hours
Maximum Marks: 100

PART A – (10 × 2 = 20 Marks)


Answer ALL questions.
1. Outline the difference between structured data and unstructured data.
2. Define data mining.
3. Compare and contrast qualitative data and quantitative data with an
example.
4. List the differences between a discrete variable and a continuous variable
with an example.
5. What is the use of scatter plot?
6. Define correlation coefficient.
7. State the advantages of using NumPy arrays.
8. Outline the two types of NumPy's UFuncs.
9. State the two possible options in IPython notebook used to embed
graphics directly in the notebook.
10.How does plt.scatter function differ from plt.plot function?

PART B – (5 × 13 = 65 Marks)
Answer ALL questions.
11. (a) Elaborate the steps in the data science process with a diagram.
Or
(b) What is a data warehouse? Outline the architecture of a data warehouse with
a diagram.
12. (a)
(i) What is a frequency distribution? Given the following ratings:
3 7 2 7 8 3 1 4 10 3 2 5 3 5 8 9 7 6 3 7 89 7 3 6
Construct a frequency distribution.
(ii) What is relative frequency distribution? Given the GRE score distribution
below, convert it to a relative frequency distribution:

GRE Frequen
Score cy

725–749 1

700–724 3

675–699 14

650–674 30

625–649 34

600–624 42

575–599 30

550–574 27

525–549 13

500–524 4

475–499 2

Total 200

Or
(b)
(i) What is Z-score? Outline the steps to obtain a Z-score.
(ii) Calculate Z-scores for:
 IQ = 135, Mean = 100, SD = 15
 Score = 470, Mean = 500, SD = 100
13. (a) Calculate the correlation coefficient for the following data:
Fathers' heights (x): 66 68 68 70 71 72 72
Sons' heights (y): 68 70 69 72 72 72 74
Or
(b)
Given:
x: 0.5 1.5 2.5 3.5 4.5 5.5 6.5
y: 2.5 3.5 5.5 4.5 6.5 8.5 10.5
(i) Find the least square regression line y = ax + b
(ii) Estimate the value of y when x = 10
14. (a) What is an aggregate function? Elaborate on aggregate functions in
NumPy.
Or
(b)
(i) What is broadcasting? Explain the rules with an example.
(ii) Elaborate on the mapping between Python operators and Pandas methods.
15. (a) Explain various visualization charts like line plots, scatter plots, and
histograms using Matplotlib with examples.
Or
(b) Outline any two 3D plotting techniques in Matplotlib with examples.

PART C – (1 × 15 = 15 Marks)
Answer ALL questions.
16. (a)
(i) What is mode? Can a distribution have no mode or more than one mode?
Given: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9 – Find the mode.
(ii) What is median? Outline the steps and find the median for:
 Set 1: 2, 8, 2, 7, 6
 Set 2: 3, 8, 9, 3, 1, 8
Or
(b) Fit a multiple linear regression model to the dataset:

14 15 15 17 19 20 21 21
y
0 5 9 9 2 0 2 5

x
60 62 67 70 71 72 75 78
1

x
22 25 24 20 15 14 14 11
2

You might also like