1) Which of the following is/are true about bagging trees?
1. In bagging trees, individual trees are independent of each other
2. Bagging is the method for improving the performance by aggregating the results of weak
learners
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both options are true. In Bagging, each individual trees are independent of each other because they
consider different subset of features and samples.
2) Which of the following is/are true about boosting trees?
1. In boosting trees, individual weak learners are independent of each other
2. It is the method for improving the performance by aggregating the results of weak learners
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: B
In boosting tree individual weak learners are not independent of each other because each tree
correct the results of previous tree. Bagging and boosting both can be consider as improving the
base learners results.
3) Which of the following is/are true about Random Forest and Gradient Boosting ensemble
methods?
1. Both methods can be used for classification task
2. Random Forest is use for classification whereas Gradient Boosting is use for regression task
3. Random Forest is use for regression whereas Gradient Boosting is use for Classification task
4. Both methods can be used for regression task
A) 1
B) 2
C) 3
D) 4
E) 1 and 4
Solution: E
4) In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then
aggregate the results of these tree. Which of the following is true about individual(Tk) tree in
Random Forest?
1. Individual tree is built on a subset of the features
2. Individual tree is built on all the features
3. Individual tree is built on a subset of observations
4. Individual tree is built on full set of observations
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: A
Random forest is based on bagging concept, that consider faction of sample and faction of feature
for building the individual trees.
5) Which of the following is true about “max_depth” hyperparameter in Gradient Boosting?
1. Lower is better parameter in case of same validation accuracy
2. Higher is better parameter in case of same validation accuracy
3. Increase the value of max_depth may overfit the data
4. Increase the value of max_depth may underfit the data
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: A
Increase the depth from the certain value of depth may overfit the data and for 2 depth values
validation accuracies are same we always prefer the small depth in final model building.
6) Which of the following algorithm doesn’t uses learning Rate as of one of its
hyperparameter?
1. Gradient Boosting
2. Extra Trees
3. AdaBoost
4. Random Forest
A) 1 and 3
B) 1 and 4
C) 2 and 3
D) 2 and 4
Solution: D
Random Forest and Extra Trees don’t have learning rate as a hyperparameter.
7) In random forest or gradient boosting algorithms, features can be of any type. For example,
it can be a continuous feature or a categorical feature. Which of the following option is true
when you consider these types of features?
A) Only Random forest algorithm handles real valued attributes by discretizing them
B) Only Gradient boosting algorithm handles real valued attributes by discretizing them
C) Both algorithms can handle real valued attributes by discretizing them
D) None of these
Solution: C
8) Which of the following algorithm are not an example of ensemble learning algorithm?
A) Random Forest
B) Adaboost
C) Extra Trees
D) Gradient Boosting
E) Decision Trees
Solution: E
9) Suppose you are using a bagging based algorithm say a RandomForest in model building.
Which of the following can be true?
1. Number of tree should be as large as possible
2. You will have interpretability after using RandomForest
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: A
Since Random Forest aggregate the result of different weak learners, If It is possible we would want
more number of trees in model building. Random Forest is a black box model you will lose
interpretability after using it.
10) Which of the following is true about the Gradient Boosting trees?
1. In each stage, introduce a new regression tree to compensate the shortcomings of existing
model
2. We can use gradient decent method for minimize the loss function
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
11) True-False: The bagging is suitable for high variance low bias models?
A) TRUE
B) FALSE
Solution: A
12) In gradient boosting it is important to use learning rate to get optimum output. Which of
the following is true about choosing the learning rate?
A) Learning rate should be as high as possible
B) Learning Rate should be as low as possible
C) Learning Rate should be low but it should not be very low
D) Learning rate should be high but it should not be very high
Solution: C
13) [True or False] Cross validation can be used to select the number of iterations in boosting;
this procedure may help reduce overfitting.
A) TRUE
B) FALSE
Solution: A
14) When you use the boosting algorithm you always consider the weak learners. Which of the
following is the main reason for having weak learners?
1. To prevent overfitting
2. To prevent under fitting
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: A
15) To apply bagging to regression trees which of the following is/are true in such case?
1. We build the N regression with N bootstrap sample
2. We take the average the of N regression tree
3. Each tree has a high variance with low bias
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1,2 and 3
Solution: D
16) How to select best hyperparameters in tree based models?
A) Measure performance over training data
B) Measure performance over validation data
C) Both of these
D) None of these
Solution: B
17) In which of the following scenario a gain ratio is preferred over Information Gain?
A) When a categorical variable has very large number of category
B) When a categorical variable has very small number of category
C) Number of categories is the not the reason
D) None of these
Solution: A
When high cardinality problems, gain ratio is preferred over Information Gain
technique.
Skill test Questions and Answers
1) [True or False] k-NN algorithm does more computation on test time rather than train time.
A) TRUE
B) FALSE
Solution: A
The training phase of the algorithm consists only of storing the feature vectors and class labels of
the training samples.
In the testing phase, a test point is classified by assigning the label which are most frequent among
the k training samples nearest to that query point – hence higher computation.
2) True about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification and regression
Solution: C
3) Which of the following statement is true about k-NN algorithm?
1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but struggles when the number
of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
4) Which of the following machine learning algorithm can be used for imputing missing values
of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
Solution: A
5) Which of the following is true about Manhattan distance?
A) It can be used for continuous variables
B) It can be used for categorical variables
C) It can be used for categorical as well as continuous
D) None of these
Solution: A
6) Which of the following distance measure do we use in case of categorical variables in k-NN?
1. Hamming Distance
2. Euclidean Distance
3. Manhattan Distance
A) 1
B) 2
C) 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: A
Both Euclidean and Manhattan distances are used in case of continuous variables, whereas
hamming distance is used in case of categorical variable.
7) Which of the following will be Euclidean Distance between the two data point A(1,3) and
B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2) = 1
8) Which of the following will be Manhattan Distance between the two data point A(1,3) and
B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 + 0) = 1
9) Which of the following will be true about k in k-NN in terms of Bias?
A) When you increase the k the bias will be increases
B) When you decrease the k the bias will be increases
C) Can’t say
D) None of these
Solution: A
large K means simple model, simple model always condider as high bias
10) Which of the following will be true about k in k-NN in terms of variance?
A) When you increase the k the variance will increases
B) When you decrease the k the variance will increases
C) Can’t say
D) None of these
Solution: B
Simple model will be consider as less variance model
11) When you find noise in data which of the following option would you consider in k-NN?
A) I will increase the value of k
B) I will decrease the value of k
C) Noise can not be dependent on value of k
D) None of these
Solution: A
To be more sure of which classifications you make, you can try increasing the value of k.
12) In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the
following option would you consider to handle such problem?
1. Dimensionality Reduction
2. Feature selection
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
13) Below are two statements given. Which of the following will be true both statements?
1. k-NN is a memory-based approach is that the classifier immediately adapts as we collect
new training data.
2. The computational complexity for classifying new samples grows linearly with the number
of samples in the training dataset in the worst-case scenario.
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
14) A company has build a kNN classifier that gets 100% accuracy on training data. When
they deployed this model on client side it has been found that the model is not at all accurate.
Which of the following thing might gone wrong?
Note: Model has successfully deployed and no technical issues are found at client side except
the model performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be performing well on training data, but it is not generalized
enough to give the same results on a new data.
15) You have given the following 2 statements, find which of these option is/are true in case of
k-NN?
1. In case of very large value of k, we may include points from other classes into the
neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
16) Which of the following statements is true for k-NN classifiers?
A) The classification accuracy is better with larger values of k
B) The decision boundary is smoother with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit training step.
Solution: D
Option A: This is not always true. You have to ensure that the value of k is not too high or not too
low.
Option B: This statement is not true. The decision boundary can be a bit jagged
Option C: Same as option B
Option D: This statement is true
17) True-False: It is possible to construct a 2-NN classifier by using the 1-NN classifier?
A) TRUE
B) FALSE
Solution: A
You can implement a 2-NN classifier by ensembling 1-NN classifiers
18) In k-NN what will happen when you increase/decrease the value of k?
A) The boundary becomes smoother with increasing value of K
B) The boundary becomes smoother with decreasing value of K
C) Smoothness of boundary doesn’t dependent on value of K
D) None of these
Solution: A
19) Following are the two statements given for k-NN algorthm, which is/are true?
1. We can choose optimal value of k with the help of cross validation
2. Euclidean distance treats each feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Question 1
Answer Explanation
Choose k to be the smallest value so that at This maintains the structure of the data while
least 99% of the variance is retained maximally reducing its dimension.
Question 2
Answer
Question 3
True
or Statement Explanation
False
Data visualization: To take 2D data, and find a
False None needed
different way of plotting it in 2D (using k=2)
As a replacement for (or alternative to) linear
PCA is not linear regression. They have
regression: For most learning applications, PCA
False different goals (and cost functions), so
and linear regression give substantially similar
they give different results.
results
Data compression: Reduce the dimension of
If your learning algorithm is too slow
your input data x(i), which will be used in a because the input dimension is too high,
True supervised learning algorithm (i.e., use PCA so then using PCA to speed it up is a
that your supervised learning algorithm runs reasonable choice.
faster)
If memory or disk space is limited, PCA
Data compression: Reduce the dimension of
allows you to save space in exchange for
True your data, so that it takes up less memory/disk
losing a little of the data's information.
space.
This can be a reasonable tradeoff.