What is Machine learning?
A) The autonomous acquisition of knowledge through the use of
computer programs
B) Static data entry
C) Manual programming
D) Human labelingA) The autonomous acquisition of knowledge through
the use of computer programs;
Which of the following is not a type of machine learning?
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Descriptive learning D) Descriptive learning;
In supervised learning, the training data includes:
A) Only input variables
B) Output labels
C) Both input and output labels
D) No labels C) Both input and output labels;
Which of the following is an example of classification?
A) Predicting the stock price
B) Identifying whether an email is spam or not
C) Estimating the temperature tomorrow
D) Predicting continuous values B) Identifying whether an email is
spam or not;
What is overfitting in machine learning?
A) When a model performs well on new data
B) When a model performs poorly on training data
C) When a model fits the training data too well and performs poorly on
new data
D) When a model generalizes well C) When a model fits the training
data too well and performs poorly on new data;
Which of the following is a dimensionality reduction technique?
A) Decision Trees
B) Support Vector Machines
C) Principal Component Analysis (PCA)
D) K-Means C) Principal Component Analysis (PCA);
Which algorithm is used for regression tasks?
A) Linear Regression
B) K-Means
C) Naive Bayes
D) Apriori A) Linear Regression;
The Naive Bayes algorithm is based on which theorem?
A) Bayes Theorem
B) Central Limit Theorem
C) Pythagorean Theorem
D) No theorem A) Bayes Theorem;
Which evaluation metric is appropriate for classification problems?
A) Mean Absolute Error
B) Root Mean Squared Error
C) Accuracy
D) R-squared C) Accuracy;
K-Means is a type of:
A) Supervised learning algorithm
B) Reinforcement learning algorithm
C) Unsupervised learning algorithm
D) Semi-supervised learning algorithm C) Unsupervised learning
algorithm;
Which machine learning technique uses a reward-based system to learn?
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) Clustering C) Reinforcement learning;
Which algorithm is suitable for finding groups in data without labels?
A) Decision Trees
B) Logistic Regression
C) K-Means Clustering
D) Naive Bayes C) K-Means Clustering;
Which type of learning uses labeled training data?
A) Reinforcement learning
B) Unsupervised learning
C) Supervised learning
D) Clustering C) Supervised learning;
Which of the following is a classification algorithm?
A) K-Means
B) Linear Regression
C) Logistic Regression
D) PCA C) Logistic Regression;
Which concept helps in reducing multicollinearity in a dataset?
A) Feature scaling
B) Normalization
C) PCA
D) Encoding C) PCA;
What is clustering in machine learning?
A) Grouping data points with labels
B) Grouping data points without using labels
C) Sorting data in order
D) Using rewards to guide learning B) Grouping data points without
using labels;
Which technique is commonly used to prevent overfitting?
A) Increasing the number of features
B) Removing regularization
C) Cross-validation
D) Ignoring training error C) Cross-validation;
Which machine learning algorithm works well with text classification?
A) K-Means
B) Naive Bayes
C) Linear Regression
D) K-Nearest Neighbors B) Naive Bayes;
What does the term 'feature' refer to in machine learning?
A) The final result
B) The target variable
C) An input variable used for prediction
D) The accuracy metric C) An input variable used for prediction;
Which of the following is a performance metric for regression tasks?
A) Accuracy
B) Confusion Matrix
C) Mean Squared Error
D) Precision C) Mean Squared Error;
What is the main goal of regression in machine learning?
A) Grouping similar data points
B) Predicting categorical outcomes
C) Predicting continuous values
D) Finding anomalies C) Predicting continuous values;
Which type of data does unsupervised learning typically use?
A) Labeled data
B) Cleaned data
C) Unlabeled data
D) Real-time data C) Unlabeled data;
Which method can be used for feature selection?
A) Decision Tree Importance
B) Clustering
C) Data Augmentation
D) One-Hot Encoding A) Decision Tree Importance;
What is the output of a classification model?
A) A continuous number
B) A group or category label
C) A clustering score
D) A reward function B) A group or category label;
Which of the following is used to evaluate a binary classifier?
A) ROC Curve
B) R-squared
C) MAE
D) MSE A) ROC Curve;
The true error is available to the learner.
A) True
B) False B) False;
What is one of the drawbacks of Empirical Risk Minimization?
A) Underfitting
B) Both Overfitting and Underfitting
C) Overfitting
D) No drawbacks C) Overfitting;
The error available to the learner is ______
A) true error
B) error of the classifier
C) training error
D) testing error C) training error;
Which is the more desirable way to reduce overfitting?
A) Giving an upper bound to the size of the training set
B) Making the test set larger than the training set
C) Giving an upper bound to the accuracy obtained on the training set
D) Overfitting cannot be reduced A) Giving an upper bound to the size
of the training set;
What is the relation between Empirical Risk Minimization and Training
Error?
A) ERM tries to maximize training error
B) ERM tries to minimize training error
C) It depends on the dataset
D) ERM is not concerned with training error B) ERM tries to minimize
training error;
What happens due to overfitting?
A) Hypothesis works poorly on training data but works well on test data
B) Hypothesis works well on training data and works well on test data
C) Hypothesis works well on training data but works poorly on test data
D) Hypothesis works poorly on training data and works poorly on test
data C) Hypothesis works well on training data but works poorly on test
data;
What is assumed while using empirical risk minimization with inductive
bias?
A) The learner has some prior knowledge about training data
B) The learner has some knowledge about labeling function
C) Reduction of overfitting may lead to underfitting
D) No assumptions are made A) The learner has some prior knowledge
about training data;
The hypothesis space H for inductive bias is a finite class.
A) False
B) True B) True;
The assumption that the training set instances are independently and
identically distributed is known as the __________
A) empirical risk assumption
B) inductive bias assumption
C) i.i.d. assumption
D) training set rule C) i.i.d. assumption;
Delta is the __________ parameter of the prediction.
A) training
B) confidence
C) accuracy
D) computing B) confidence;
Who introduced the concept of PAC learning?
a) Francis Galton
b) Reverend Thomas Bayes
c) J.Ross Quinlan
d) Leslie Valiant d) Leslie Valiant;
When was PAC learning invented?
a) 1874
b) 1974
c) 1984
d) 1884 c) 1984;
The full form of PAC is ______
a) Partly Approximation Computation
b) Probability Approximation Curve
c) Probably Approximately Correct
d) Partly Approximately Correct c) Probably Approximately Correct;
What can be explained by PAC learning?
a) Sample Complexity
b) Overfitting
c) Underfitting
d) Label Function a) Sample Complexity;
What is the significance of epsilon in PAC learning?
a) Probability of approximation < epsilon
b) Maximum error < epsilon
c) Minimum error > epsilon
d) Probability of approximation = delta – epsilon b) Maximum error <
epsilon;
What is the significance of delta in PAC learning?
a) Probability of approximation < delta
b) Error < delta
c) Probability = 1 – delta
d) Probability of approximation = delta – epsilon c) Probability = 1 –
delta;
One of the goals of PAC learning is to give __________
a) maximum accuracy
b) cross-validation complexity
c) error of classifier
d) computational complexity d) computational complexity;
A learner can be deemed consistent if it produces a hypothesis that
perfectly fits the __________
a) cross-validation data
b) overall dataset
c) test data
d) training data d) training data;
Number of hypothesizes |H| = 973, probability = 95%, error < 0.1. Find
minimum number of training examples, m, required.
a) 98.8
b) 99.8
c) 99
d) 98 c) 99;
In PAC learning, sample complexity grows as the logarithm of the number
of hypothesizes.
a) False
b) True b) True;
A Boolean-valued function can be an example of concept learning.
a) True
b) False a) True;
How do we learn concepts from training examples?
a) Arbitrarily
b) Decremental
c) Incrementally
d) Non-incremental c) Incrementally;
What is the goal of concept learning?
a) To minimize cross-validation set error
b) To maximize test set accuracy
c) To find a hypothesis that is most suitable for training instances
d) To identify all possible predictors c) To find a hypothesis that is most
suitable for training instances;
Which is not a concept learning algorithm?
a) ID3
b) Find-S
c) Candidate Elimination
d) List-Then-Eliminate a) ID3;
In the list-then-eliminate algorithm, the initial version space contains
_____
a) most specific hypothesis
b) all hypotheses in H
c) most accurate hypothesis
d) most general hypothesis b) all hypotheses in H;
What happens to the version space in the list-then-eliminate algorithm, at
each step?
a) Remains the same
b) Increases
c) Shrinks
d) Depends on dataset c) Shrinks;
The list-then-eliminate algorithm can output more than one hypothesis.
a) True
b) False a) True;
What is the advantage of the list-then-eliminate algorithm?
a) Computation is less
b) Time-effective
c) Overfitting never occurs
d) Contains all hypotheses consistent with observed data d) Contains
all hypotheses consistent with observed data;
For a dataset with 4 attributes, which is the most general hypothesis?
a) (Sunny, Warm, Strong, Humid)
b) (Sunny, ?, ?, ?)
c) (?, ?, ?, ?)
d) (phi, phi, phi, phi) c) (?, ?, ?, ?);
How is a hypothesis represented in concept learning?
a) Scalar
b) Vector
c) Polynomial
d) Either scalar or vector b) Vector;
What is present in the version space of the Find-S algorithm in the
beginning?
a) Set of all hypotheses H
b) Both maximally general and maximally specific hypotheses
c) Maximally general hypothesis
d) Maximally specific hypothesis d) Maximally specific hypothesis;
When does the hypothesis change in the Find-S algorithm, while
iteration?
a) Any example (positive or negative) is encountered
b) Any negative example is encountered
c) Positive Example inconsistent with the hypothesis is encountered
d) Any positive example is encountered c) Positive Example
inconsistent with the hypothesis is encountered;
What is one of the assumptions of the Find-S algorithm?
a) No assumptions are made
b) The most specific hypothesis is also the most general hypothesis
c) All training data are correct (there is no noise)
d) Overfitting does not occur c) All training data are correct (there is no
noise);
What is one of the advantages of the Find-S algorithm?
a) Computation is faster than other concept learning algorithms
b) All correct hypotheses are output
c) Most generalized hypothesis is output
d) Overfitting does not occur a) Computation is faster than other
concept learning algorithms;
How does the hypothesis change gradually?
a) Specific to Specific
b) Specific to General
c) General to Specific
d) General to General b) Specific to General;
S = <phi, phi, phi> Training data = <rainy, cold, white> => No (negative
example). How will S be represented after encountering this training
data?
a) <phi, phi, phi>
b) <sunny, warm, white>
c) <rainy, cold, black>
d) <?, ?, ?> a) <phi, phi, phi>;
What is one of the drawbacks of the Find-S algorithm?
a) Computation cost is high
b) Time-ineffective
c) All correct hypotheses are not output
d) Most specific accurate hypothesis is not output c) All correct
hypotheses are not output;
Noise or errors in the dataset can severely affect the performance of the
Find-S algorithm.
a) True
b) False a) True;
S = <phi, phi, phi> Training data = <square, pointy, white> => Yes (positive
example). How will S be represented after encountering this training
data?
a) <phi, phi, phi>
b) <square, pointy, white>
c) <circular, blunt, black>
d) <?, ?, ?> b) <square, pointy, white>;
The algorithm accommodates all the maximally specific hypotheses.
a) True
b) False b) False;
The algorithm is trying to find a suitable day for swimming. What is the
most general hypothesis?
a) A rainy day is a positive example
b) A sunny day is a positive example
c) No day is a positive example
d) Every day is a positive example d) Every day is a positive example;
Candidate-Elimination algorithm can be described by ____________
a) just a set of candidate hypotheses
b) depends on the dataset
c) set of instances, set of candidate hypotheses
d) just a set of instances c) set of instances, set of candidate hypotheses;
How is the version space represented?
a) Least general members
b) Most general members
c) Most general and least general members
d) Arbitrary members chosen form hypothesis space c) Most general
and least general members;
Let G be the set of maximally general hypotheses. While iterating through
the dataset, when is it changed for the first time?
a) Negative example is encountered for the first time
b) Positive example is encountered for the first time
c) First example encountered, irrespective of whether it is positive or
negative
d) S, the set of maximally specific hypotheses, is changed a) Negative
example is encountered for the first time;
Let S be the set of maximally specific hypotheses. While iterating through
the dataset, when is it changed for the first time?
a) Negative example is encountered for the first time
b) Positive example is encountered for the first time
c) First example encountered, irrespective of whether it is positive or
negative
d) G, the set of maximally general hypotheses, is changed b) Positive
example is encountered for the first time;
S = <sunny, warm, high, same>. Training data = <sunny, warm, normal,
same> => Yes (positive example). How will S be represented after
encountering this training data?
a) <sunny, warm, high, same>
b) <phi, phi, phi, phi>
c) <sunny, warm, ?, same>
d) <sunny, warm, normal, same> c) <sunny, warm, ?, same>;
S = <phi, phi, phi, phi> Training data = <rainy, cold, normal, change> => No
(negative example). How will S be represented after encountering this
training data?
a) <phi, phi, phi, phi>
b) <sunny, warm, high, same>
c) <rainy, cold, normal, change>
d) <?, ?, ?, ?> a) <phi, phi, phi, phi>;
G = <?, ?, ?, ?>. Training data = <sunny, warm, normal, same> => Yes
(positive example). How will G be represented after encountering this
training data?
a) <sunny, warm, normal, same>
b) <phi, phi, phi, phi>
c) <rainy, cold, normal, change>
d) <?, ?, ?, ?> d) <?, ?, ?, ?>;
G = (<sunny, ?, ?, ?> / <?, warm, ?, ?> / <?, ?, high, ?>). Training data =
<sunny, warm, normal, same> => Yes (positive example). How will G be
represented after encountering this training data?
a) <phi, phi, phi, phi>
b) (<sunny, ?, ?, ?> / <?, warm, ?, ?> / <?, ?, high, ?>)
c) (<sunny, ?, ?, ?> / <?, warm, ?, ?>)
d) <?, ?, ?, ?> c) (<sunny, ?, ?, ?> / <?, warm, ?, ?>);
It is possible that in the output, set S contains only phi.
a) False
b) True b) True;
What does VC dimension do?
a) Reduces complexity of hypothesis space
b) Removes noise from dataset
c) Measures complexity of training dataset
d) Measures the complexity of hypothesis space H d) Measures the
complexity of hypothesis space H;
An instance set S is given. How many dichotomies are possible?
a) 2*|S|
b) 2/|S|
c) 2^|S|
d) |S|c) 2^|S|;
If h is a straight line, what is the maximum number of points that can be
shattered?
a) 4
b) 2
c) 3
d) 5 c) 3;
What is the VC dimension of a straight line?
a) 3
b) 2
c) 4
d) 0 a) 3;
A set of 3 instances is shattered by _____ hypotheses.
a) 4
b) 8
c) 3
d) 2 b) 8;
What is the relation between VC dimension and hypothesis space H?
a) VC(H) <= |H|
b) VC(H) != log2|H|
c) VC(H) <= log2|H|
d) VC(H) > log2|H| c) VC(H) <= log2|H|;
VC Dimension can be infinite.
a) True
b) False a) True;
Who invented VC dimension?
a) Francis Galton
b) J. Ross Quinlan
c) Leslie Valiant
d) Vapnik and Chervonenkis d) Vapnik and Chervonenkis;
What is the advantage of VC dimension over PAC learning?
a) VC dimension reduces complexity of training data
b) VC dimension outputs more accurate predictors
c) VC dimension can work for infinite hypothesis space
d) There is no advantagec) VC dimension can work for infinite hypothesis
space;
If VC(H) increases, number of maximum training examples required (m)
increases.
a) False
b) True b) True;
Instance space: X = set of real numbers/ Hypothesis space H: the set of
intervals on the real number line. a and b can be any constants used to
represent the hypothesis. How is H represented?
a) a – b < a + b
b) a + b < x < 2(a+b)
c) a/b < x < a*b
d) a < x < b d) a < x < b;
S = {3.1, 5.7}. How many hypotheses are required?
a) 2
b) 3
c) 4
d) 1 c) 4;
S = {x0, x1, x2}. This set can be shattered by hypotheses of form a < x < b,
where a and b are arbitrary constants.
a) True
b) False b) False;
S = {x0, x1, x2}. Hypotheses are of the form a < x < b. What is H?
a) infinite
b) 0
c) 2
d) 1 a) infinite;
S = {x0, x1, x2}. Hypotheses are of the form a < x < b. What is VC(H)?
a) 0
b) 2
c) 1
d) infinite b) 2;
S = {x0, x1, x2} and H is finite. What is VC(H)?
a) 1
b) 2
c) 3
d) infinite c) 3;
S = {x0, x1, x2}. Hypotheses are straight lines. What is H?
a) 8
b) 3
c) 4
d) infinite a) 8;
S = {x0, x1, x2, x3}. Hypotheses are straight lines. What is H?
a) 8
b) 3
c) 4
d) infinite d) infinite;
S contains 4 instances. H is the hypothesis space and it can shatter S.
What is the correct form of hypotheses?
a) Hypotheses are intervals
b) Hypotheses are straight lines
c) Hypotheses are rectangles
d) No such hypotheses exist c) Hypotheses are rectangles;
For which combination H is infinite and VC(H) is finite?
a) S contains 2 points and hypotheses are intervals
b) S contains 3 points and hypotheses are intervals
c) S contains 3 points and hypotheses are straight lines
d) S contains 2 points and hypotheses are straight lines b) S contains 3
points and hypotheses are intervals;
Any ERM rule is a successful PAC learner for hypothesis space H.
a) True
b) False a) True
If distribution D assigns zero probability to instances where h not equal to
c, then an error will be ______
a) 1
b) 0.5
c) 0
d) infinite c) 0
If distribution D assigns zero probability to instances where h = c, then an
error will be ______
a) Cannot be determined
b) 0.5
c) 1
d) 0 c) 1
;
Error strongly depends on distribution D.
a) True
b) False a) True
PAC learning was introduced by ____________
a) Vapnik
b) Leslie Valiant
c) Chervonenkis
d) Reverend Thomas Bayes b) Leslie Valiant
Error is defined over the _____________
a) training set
b) test Set
c) domain set
d) cross-validation set c) domain set
The error of h with respect to c is the probability that a randomly drawn
instance will fall into the region where _________
a) h and c disagree
b) h and c agree
c) h is greater than c but not less
d) h is lesser than c but not greater a) h and c disagree
When was PAC learning invented?
a) 1954
b) 1964
c) 1974
d) 1984 d) 1984