Here’s a clean and organized version of the exam paper you uploaded, formatted
for a Word document:
CS 3352 – FOUNDATIONS OF DATA SCIENCE
B.E./B.Tech. DEGREE EXAMINATIONS, APRIL/MAY 2023
Third Semester – Computer Science and Engineering
(Common to: Computer and Communication Engineering / Information
Technology)
Regulations 2021
Time: Three Hours
Maximum Marks: 100
PART A – (10 × 2 = 20 Marks)
Answer ALL questions.
1. Outline the difference between structured data and unstructured data.
2. Define data mining.
3. Compare and contrast qualitative data and quantitative data with an
example.
4. List the differences between a discrete variable and a continuous variable
with an example.
5. What is the use of scatter plot?
6. Define correlation coefficient.
7. State the advantages of using NumPy arrays.
8. Outline the two types of NumPy's UFuncs.
9. State the two possible options in IPython notebook used to embed
graphics directly in the notebook.
10.How does plt.scatter function differ from plt.plot function?
PART B – (5 × 13 = 65 Marks)
Answer ALL questions.
11. (a) Elaborate the steps in the data science process with a diagram.
Or
(b) What is a data warehouse? Outline the architecture of a data warehouse with
a diagram.
12. (a)
(i) What is a frequency distribution? Given the following ratings:
3 7 2 7 8 3 1 4 10 3 2 5 3 5 8 9 7 6 3 7 89 7 3 6
Construct a frequency distribution.
(ii) What is relative frequency distribution? Given the GRE score distribution
below, convert it to a relative frequency distribution:
GRE Frequen
Score cy
725–749 1
700–724 3
675–699 14
650–674 30
625–649 34
600–624 42
575–599 30
550–574 27
525–549 13
500–524 4
475–499 2
Total 200
Or
(b)
(i) What is Z-score? Outline the steps to obtain a Z-score.
(ii) Calculate Z-scores for:
IQ = 135, Mean = 100, SD = 15
Score = 470, Mean = 500, SD = 100
13. (a) Calculate the correlation coefficient for the following data:
Fathers' heights (x): 66 68 68 70 71 72 72
Sons' heights (y): 68 70 69 72 72 72 74
Or
(b)
Given:
x: 0.5 1.5 2.5 3.5 4.5 5.5 6.5
y: 2.5 3.5 5.5 4.5 6.5 8.5 10.5
(i) Find the least square regression line y = ax + b
(ii) Estimate the value of y when x = 10
14. (a) What is an aggregate function? Elaborate on aggregate functions in
NumPy.
Or
(b)
(i) What is broadcasting? Explain the rules with an example.
(ii) Elaborate on the mapping between Python operators and Pandas methods.
15. (a) Explain various visualization charts like line plots, scatter plots, and
histograms using Matplotlib with examples.
Or
(b) Outline any two 3D plotting techniques in Matplotlib with examples.
PART C – (1 × 15 = 15 Marks)
Answer ALL questions.
16. (a)
(i) What is mode? Can a distribution have no mode or more than one mode?
Given: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9 – Find the mode.
(ii) What is median? Outline the steps and find the median for:
Set 1: 2, 8, 2, 7, 6
Set 2: 3, 8, 9, 3, 1, 8
Or
(b) Fit a multiple linear regression model to the dataset:
14 15 15 17 19 20 21 21
y
0 5 9 9 2 0 2 5
x
60 62 67 70 71 72 75 78
1
x
22 25 24 20 15 14 14 11
2