0% found this document useful (0 votes)
32 views12 pages

ML Final 2021

The document is a final exam for the course 'Artificial Intelligence in Medicine' at Cairo University, consisting of various questions related to AI concepts such as Bayes Decision Theory, Decision Trees, Support Vector Machines, and more. The exam includes instructions for answering, grading, and specific requirements for students. It contains a total of 10 questions covering theoretical and practical aspects of AI in the medical field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views12 pages

ML Final 2021

The document is a final exam for the course 'Artificial Intelligence in Medicine' at Cairo University, consisting of various questions related to AI concepts such as Bayes Decision Theory, Decision Trees, Support Vector Machines, and more. The exam includes instructions for answering, grading, and specific requirements for students. It contains a total of 10 questions covering theoretical and practical aspects of AI in the medical field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SBE – CUFE Final Exam SBEN456

Faculty of Engineering, Cairo University Department: SBME


Academic Year: 2021/2022 Semester: Fall
Course Artificial Intelligence
Course Code: SBE456
Title: in Medicine
Day: Monday Date: 17/01/2022
Time: 8:30am to 10:30am Full Mark: 75
------------------------------------------------------------------------------------------------------
Please read carefully the following Instructions:
1. Total Grade is (91 points), the final grade is 75 points.
2. Term Exam duration is 2 hours.
3. Answers should be placed in the free corresponding area after each question. Any MISPLACED answer
will be DISCARDED.
4. The Exam is formed of 10 pages. The equations Cheat sheet is found in pages 11 and 12.
5. Themed or patterned Answers will be discarded.
6. A cheating case will fetch you zero in the exam.
-------------------------------------------------------------------------------------------------------

Answer the following Questions:

Question 1: (20 points) Give an example of a complete pattern recognition problem, where you need
to discuss the different modules you will be employing within the data science lifecycle until reaching
the target of the problem. and start planning for the different modules of the project.

For Students: Please write using a blue pen only


Page 1 of 12
SBE – CUFE Final Exam SBEN456

Question 2: Bayes Decision Theory


2.1 (4 points ) Suppose you trained a Bayes Classifier using general Gaussian using the data shown in the following
Figures, with input Features ( X1 , X2), and one output Value Y that takes the value (A or B). What class would
the Bayes Classifier predict for the input at the location shown by a question mark in the following Figures.

2.2. Suppose you have the following training set with one real valued input Feature X, ad a categorical output Y that
has two values.

X Y
0 A
2 A
3 B
4 B
5 B
6 B
7 B

(a) (3 points) In order to learn the Bayes Rule, we may need to calculate the following (2 points)

𝜇𝐴 = 𝜎𝐴2 𝑃(𝑌 = 𝐴) =

𝜇𝐵 = 𝜎𝐵2 𝑃(𝑌 = 𝐵) =

Consider Answering the Following questions in terms of𝛼 𝑎𝑛𝑑 𝛽 , where


𝛼 = 𝑝(𝑋 = 2|𝑌 = 𝐴)
𝛽 = 𝑝(𝑋 = 2|𝑌 = 𝐵)
(b) (2 points) Calculate 𝑝(𝑋 = 2 ∧ 𝑌 = 𝐴) in terms of 𝛼

(c) (2 points) Calculate 𝑝(𝑋 = 2)

For Students: Please write using a blue pen only


Page 2 of 12
SBE – CUFE Final Exam SBEN456

Question 3: Decision trees

3.1. The Following data can be employing a decision tree to classify whether or not an advertisement was clicked
based on its size, position, and whether or not it played a sound.

Clicked Size Position Sound


F Big Top No
F Small Middle Yes
F Small Middle Yes
T Small Bottom No
T Big Bottom No
F Big Top Yes
T Big Bottom Yes
T Small Middle No
T Small Middle No
F Big Top No

3.2. (2 points) What is the initial entropy of Clicked?

3.3. (4 points) Assume that Position is chosen for the root of the decision tree. What is the information gain
associated with this attribute?

3.4. (6 points) Draw the first two layers of the decision tree learned from this data (without any pruning).

For Students: Please write using a blue pen only


Page 3 of 12
SBE – CUFE Final Exam SBEN456

3.5 ( 2 points) You’ve just finished training a random forest for spam classification, and it is getting abnormally bad
performance on your validation set, but good performance on your training set. Your implementation has no bugs.
What could be causing the problem?
(a) Your decision trees are too deep
(b) You are randomly sampling too many features when you choose a split
(c) You have too few trees in your ensemble
(d) Your bagging implementation is randomly sampling sample points without replacement

Question 4: Support Vector Machine

4.1. (3 points) Suppose we are using a linear SVM (i.e., no kernel), with some large C value, and are given the
following data set. Draw the decision boundary of linear SVM. Give a brief explanation.

4.2. (4 points) In the following image, circle the points such that removing that example from the training set and
retraining SVM, we would get a different decision boundary than training on the full sample. You do not need to
provide a formal proof, but give a one or two sentence explanation.

Question 5: Recommender Systems

Use these tables to answer the following questions


Single Item Sets Number of Items
Magazine Promo = Yes 7
Watch Promo = No 6
Life Ins Promo = Yes 5

For Students: Please write using a blue pen only


Page 4 of 12
SBE – CUFE Final Exam SBEN456

Life Ins Promo = No 5


Card Insurance = No 8
Sex = Male 6

Two Item Sets Number of


Items
Magazine Promo = Yes & Watch Promo = No 4
Magazine Promo = Yes & Life Ins Promo = Yes 5
Magazine Promo = Yes & Card Insurance = No 5
Watch Promo = No & Card Insurance = No 5

5.1 (3 points) One two-item set rule that can be generated from the tables above is:
If Magazine Promo = Yes Then Life Ins promo = Yes. The confidence for this rule is:

Question 6: Dimensionality reduction

6.1 ( 2 points ) Draw the approximate principal components for datasets1 and 2

6.2 (4 point) Mark the following with either


(a) the goal of PCA is to interpret the underlying structure of the data in terms of the principal components that are
best at predicting the output.
(b) The output of PCA is a new representation of the data that is always of lower dimensionality than the original
feature representation.
(c) The principal components are always orthogonal to each other.
(a) (b) (c)

Question 7: Word to Vector Representation


7.1 (1 point) Which of the following statement about Skip-gram are correct?
(a) It predicts the center word from the surrounding context words
(b) The final word vector for a word is the average or sum of the input vector v and output vector u corresponding to
that word
(c) When it comes to a small corpus, it has better performance than GloVe
(d) It makes use of global co-occurrence statistics

For Students: Please write using a blue pen only


Page 5 of 12
SBE – CUFE Final Exam SBEN456
7.2 (2 points) Word2Vec represents a family of embedding algorithms that are commonly used in a variety of contexts.
Suppose in a recommender system for online shopping, we have information about co-purchase records for items
x1,x2,...,xn (for example, item xi is commonly bought together with item xj). Explain how you would use ideas similar
to Word2Vec to recommend similar items to users who have shown interest in any one of the items.

Question 8: Reinforcement Learning

This gridworld MDP operates like to the one we saw in class. The states are grid squares, identified by their row and
column number (row first). The agent always starts in state (1,1), marked with the letter S. There are two terminal
goal states, (2,3) with reward +5 and (1,3) with reward -5. Rewards are 0 in non-terminal states. (The reward for a
state is received as the agent moves into the state.) The transition function is such that the intended agent movement
(North, South, West, or East) happens with probability .8. With probability .1 each, the agent ends up in one of the
states perpendicular to the intended direction. If a collision with a wall happens, the agent stays in the same state.

(a) Gridworld MDP. (b) Transition function.

8.1. (2 points)Draw the optimal policy for this grid?

S= (1,1) (1,2) (1,3) (2,1) (2,2) (2,3)


π∗(S) = Up Left NA Right Right NA

8.2. (3 points) Suppose the agent knows the transition probabilities. Give the first two rounds of value iteration
updates for each state, with a discount of 0.9. (Assume V0 is 0 everywhere and compute Vi for times i = 1, 2).

Question 9: Clustering
For Students: Please write using a blue pen only
Page 6 of 12
SBE – CUFE Final Exam SBEN456
Suppose you are given the following <x,y> pairs. You will simulate the k-means algorithm and Gaussian Mixture
Models learning algorithm to identify TWO clusters in the data. Please simulate the k-means (k=2) algorithm for
ONE iteration.

9.1 (2 points) What are the cluster assignments after ONE iteration?

9.2 (2 points) Assume k-means uses Euclidean distance. What are the cluster assignments until convergence? (Fill in
the table below)

Data # x y
1 1.90 0.97
2 1.76 0.84
3 2.32 1.63
4 2.31 2.09
5 1.14 2.11
6 5.02 3.02
7 5.74 3.84
8 2.25 3.47
9 4.71 3.60
10 3.17 4.96

Data # Cluster Assignment after One Iteration Cluster Assignment after convergence
1
2
3
4
5
6
7
8
9
10

Question 10: (1 point each) Answer these questions by marking the best answer among the choices
given:

10.1. Which of the following can help to reduce overfitting in a Decision trees classifier?
a. Pruning
b. Normalizing the data
c. High-degree polynomial features
d. Setting a very low learning rate

10.2. Which of the following are true about bagging?


a. In bagging, we choose random subsamples of the input points with replacement
b. Bagging is ineffective with logistic regression, because all of the learners learn exactly the same decision
boundary
c. The main purpose of bagging is to decrease the bias of learning algorithms.
d. If we use decision trees that have one sample point per leaf, bagging never gives lower training error than one
ordinary decision tree

For Students: Please write using a blue pen only


Page 7 of 12
SBE – CUFE Final Exam SBEN456
10.3. Given the dataset showed in the following figure, the initial weight for each point, when using adaboost
technique, will be

a. 1
b. 1⁄
2
c. 1⁄
3
d. 1⁄
4

10.4. After the first iteration, the decision boundary can be

10.5. Support Vector Machines (SVMs)


a. Support vectors are used for computing hyperplanes
b. Is a method for minimizing the margin to hyperplanes
c. Nonlinear problems are handled with mapping inputs to lower-dimensional space
d. Kernel functions are used for transforming data

10.6. Which of the following is true for neural networks?


(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a combination of linear classifiers.
(iii)Artificial neurons are identical in operation to biological ones.
a. (ii) is true.
b. (i) and (ii) are true.
c. (i) and (iii) are true.
d. (i), (ii), (iii) are true

10.7. One of the most advantageous aspects for Naive Bayes classifier is:
a. Handles variable interactions and dependency very well
b. Well suited for high dimensional models
c. Works with continuous variable without any transformation
d. Let you understand how each of the variables affects the classification problem

10.8. Thinking about dimensionality reduction and feature selection, which ones of the following statements
are true (multiple choice):
a. PCA is unsupervised methods for dimensionality reduction.
b. Wrappers are feature selection methods that, employ a classifier as a performance criteria, the search in the
space of subset of features for the minimal one that obtains the higher accuracy.
c. Filters are unsupervised feature selection methods because they do not use a classifier as selection criteria.
d. All the above.

For Students: Please write using a blue pen only


Page 8 of 12
SBE – CUFE Final Exam SBEN456

10.9. Overfitting occurs when a model


a. Does fit in future states.
b. Does not fit in future states.
c. Does fit in current state.
d. Does not fit in the current state

10.10. You want to design a recommendation system for an online bookstore that has been launched recently.
The bookstore has over 1 million book titles, but its rating database has only 10,000 ratings. Which of
the following would be a better recommendation system?
a. User-user collaborative filtering
b. Item-item collaborative filtering
c. Content-based recommendation.
d. None of the above

10.11. The kernel trick


a. can be applied to every classification algorithm
b. changes ridge regression so we solve a d × d linear system instead of an n × n system, given n sample points
with d features
c. 15-nearest neighbors
d. Perceptron is commonly used for dimensionality reduction

10.12. Which of the following is correct with respect to random forest?


a. Random forest are difficult to interpret but often very accurate
b. Random forest are easy to interpret but often very accurate
c. Random forest are difficult to interpret but very less accurate
d. None of the above

10.13. Boosting is said to be a good classifier because


a. It creates all ensemble members in parallel, so their diversity can be boosted.
b. It attempts to minimize the margin distribution.
c. It attempts to maximize the margins on the training data
d. None of the above

10.14. Your customer has asked for a model to understand customer purchase patterns based on 15 variables
including the selection of features that most effetely split those customers in potential classes that makes sense
to the business. Which of the following techniques should be used?
a. Clustering
b. Classification
c. Simulation
d. Recommender

10.15. When employing gradient descent based linear regression, As the number of iterations goes to infinity,
boosting is always guaranteed reach zero training error.
a. False
b. True

10.16. Overfitting is more likely to happen, if the data points have fewer features,
a. False
b. True

10.17. Lasso can be interpreted as least-squares linear regression where

a. weights are regularized with the l1 norm


For Students: Please write using a blue pen only
Page 9 of 12
SBE – CUFE Final Exam SBEN456
b. the weights have a Gaussian prior
c. weights are regularized with the l2 norm
d. the solution algorithm is simpler

10.18. Which of the following are true about forward subset

a. O(2d) models must be trained during the algorithm, where d is the number of features
b. It greedily adds the feature that most improves cross-validation accuracy selection?
c. It finds the subset of features that give the lowest test error
d. Forward selection is slower than backward selection if few features are relevant to prediction

Best of Luck,
Inas A. Yassine

Equations Cheat Sheet


Bayes Decision Theory

𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 ∗ 𝑃𝑟𝑖𝑜𝑟
𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 =
𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒

Decision Trees
Entropy (decision) = P+ log 2 P+ + P- log 2 P-

Entropy (decision) = å P( Di )(P+ ( Di )(log 2 P+ ( Di ))


For Students: Please write using a blue pen only
+ P- ( Di )(logPage
2 P- ( Di )))
10 of 12
SBE – CUFE Final Exam SBEN456

Sv
Gain( S , F ) = Entropy ( S ) - å
vÎValues ( F ) S
Entropy ( S v )

Dimensionality Reduction
t-Test

PCA Transformation Function : Eign value representation

Covariance Matrix

Neural Networks

Recommender System
( X È Y ).count
support = Pr( X È Y ) =
n
( X È Y ).count
confidence = Pr( X | Y ) =
X .count

Support Vector Machine


Hard Margin optimization Function
1 t
F ( w) = ww
2
Soft Margin optimization Function
R
11 t
F ( w) = w.w + C å εk
22 k =1

Reinforcement Learning

Markov Decision Process(MDP) model

For Students: Please write using a blue pen only


Page 11 of 12
SBE – CUFE Final Exam SBEN456
the expected reward for taking action a in state s (R(s, a))

R(s, a) = ∑T(s, a, s')r(s, a, s')

Where S: set of states, A: set of actions , T(s, a, s') = P(s'|s ,a): the probability of transition from s to s’
given action a
Utility equation:

́ ))
𝑈(𝑆) = 𝑚𝑎𝑥(𝑅(𝑠, 𝑎) + 𝛾 ∑ 𝑇 (𝑠, 𝑎, 𝑠)𝑈(𝑠́
𝑆́
where 0< γ <1

For Students: Please write using a blue pen only


Page 12 of 12

You might also like