[Apr-24]
GITAM (Deemed to be University)
[CSCI6061]
[Link]. Degree Examination
Data Science
II Semester
MACHINE LEARNING
(Effective for the admitted batch 2020–21)
Time: 3 Hours [Link]: 60
-----------------------------------------------------------------------------------------------------
Instructions: All parts of the unit must be answered in one place only.
--------------------------------------------------------------------------------------- -------------
Section-A
1. Answer all Questions: (102=20)
a) What is an abstraction in machine learning? What role does it
play in the process of machine learning?
b) List at least four differences between Supervised learning and
Unsupervised learning
c) Distinguish between model underfitting and overfitting
d) Illustrate on Leave One Out Cross Validation method.
e) Find any two weaknesses of the Bayes classifier.
f) Summarize what is supervised learning. Why is it called so?
g) What are the advantages and disadvantages of neural networks?
h) Write a short note on the Association rule.
i) What is Query by the committee in Active learning
j) Summarize the Pros and Cons of the Instance-Based Learning
(IBL) Method.
Section-B
Answer the following: (58=40)
UNIT-I
2. Draw a box plot for the following data and calculate mean, median,
mode, and IQR:
232 277 173 283 197 251 212 213 213
229 164 219 196 116 247 244 269 276
252 314 161 165 221 260 219 290 251
OR
3. Explain qualitative and quantitative data in detail. Differentiate
between the two, with good examples.
UNIT-II
4. Solve for the given two vectors 01101011 and 11001001
a) Hamming distance
b)Jaccard Coefficient
c) Simple Match Coefficient.
b) calculate the cosine similarity of x and y, where x = (2, 4, 0, 0, 2,
1, 3, 0, 0) and y = (2, 1, 0, 0, 3, 2, 1, 0, 1).
OR
5. Describe how PCA achieves dimensionality reduction. What are the
techniques for finding the number of principal components after
PCA.
UNIT-III
6. Examine whether it is possible to use Naïve Bayes classifier for
continuous numeric data. If so, how? Given a dataset with the
following emails and labels:
Email List
"get rich quick Spam
"click here for discount" Spam
"buy now" Spam
"hello how are you Non-Spam
"important meeting tomorrow Non-Spam
"confirm your appointment Non-Spam
Using the Naïve Bayes classifier classify the given a new email: "get
discount now” is Spam of Non-Spam.
OR
7. Discuss the random forest model in detail. What are the features of a
random forest?
UNIT-IV
8. Examine how the Market Basket Analysis uses the concepts of
association analysis.
OR
9. Discuss the strengths and weaknesses of the k-means algorithm.
UNIT-V
10. Explain the concept of active learning. Explain its heuristics.
OR
11. Summarize the Radial Basis Function in detail.
[II S/124]