0% found this document useful (0 votes)
13 views8 pages

Lecture4 MCQ Guide

This document serves as a study guide for model selection and evaluation, covering key concepts such as model selection techniques, hyperparameter tuning, and evaluation metrics for classification and regression. It discusses the bias-variance tradeoff, regularization methods, ensemble techniques, and feature selection strategies, along with practice questions and calculations. The guide emphasizes understanding evaluation metrics, tradeoffs, and the effects of regularization on model coefficients.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Lecture4 MCQ Guide

This document serves as a study guide for model selection and evaluation, covering key concepts such as model selection techniques, hyperparameter tuning, and evaluation metrics for classification and regression. It discusses the bias-variance tradeoff, regularization methods, ensemble techniques, and feature selection strategies, along with practice questions and calculations. The guide emphasizes understanding evaluation metrics, tradeoffs, and the effects of regularization on model coefficients.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 4: Model Selection and Evaluation -

MCQ Study Guide


Key Concepts Explained Simply
Model Selection
What is Model Selection? Model selection is the process of choosing the
best model from a set of candidate models. It’s like trying on different shoes to
find the pair that fits best.

Approaches to Model Selection


1. Hold-out Validation: Split data into training and validation sets
2. Cross-Validation: Split data into k folds, train on k-1 folds, test on the
remaining fold
3. Nested Cross-Validation: Cross-validation within cross-validation for
both model selection and evaluation

Cross-Validation Techniques
• K-fold Cross-Validation: Split data into k equal parts
• Stratified K-fold: Maintains the same class distribution in each fold
• Leave-One-Out (LOOCV): Use n-1 samples for training and 1 for test-
ing (where n is the total number of samples)
• Leave-P-Out: Use n-p samples for training and p for testing

Hyperparameter Tuning
• Grid Search: Try all combinations of a predefined set of hyperparameter
values
• Random Search: Randomly sample hyperparameter values from defined
distributions
• Bayesian Optimization: Use past evaluations to guide the search for
better hyperparameters

Model Evaluation
Classification Metrics
• Accuracy: Proportion of correct predictions
• Precision: Proportion of positive identifications that were actually cor-
rect
• Recall (Sensitivity): Proportion of actual positives that were identified
correctly
• F1 Score: Harmonic mean of precision and recall
• ROC Curve: Plot of True Positive Rate vs. False Positive Rate
• AUC (Area Under the Curve): Area under the ROC curve

1
• Confusion Matrix: Table showing correct and incorrect predictions

Regression Metrics
• Mean Absolute Error (MAE): Average of absolute differences between
predicted and actual values
• Mean Squared Error (MSE): Average of squared differences between
predicted and actual values
• Root Mean Squared Error (RMSE): Square root of MSE
• R-squared (Coefficient of Determination): Proportion of variance
explained by the model
• Adjusted R-squared: R-squared adjusted for the number of predictors

Bias-Variance Tradeoff
Understanding Bias and Variance
• Bias: Error from overly simplistic assumptions (underfitting)
• Variance: Error from sensitivity to small fluctuations in training data
(overfitting)
• Tradeoff: Reducing bias typically increases variance and vice versa

Total Error Decomposition Total Error = Bias² + Variance + Irreducible


Error

How to Balance Bias and Variance


• High Bias (Underfitting): Use more complex models, add features
• High Variance (Overfitting): Use simpler models, add regularization,
get more training data

Regularization
What is Regularization? Regularization is a technique to prevent overfit-
ting by adding a penalty term to the loss function. It’s like adding weight to a
seesaw to keep it balanced.

Types of Regularization
• L1 Regularization (Lasso): Adds the sum of absolute values of coeffi-
cients to the loss function
– Can lead to sparse models (feature selection)
– Formula: Loss + � × Σ|w_i|
• L2 Regularization (Ridge): Adds the sum of squared values of coeffi-
cients to the loss function
– Shrinks coefficients towards zero but rarely to exactly zero
– Formula: Loss + � × Σ(w_i)²
• Elastic Net: Combination of L1 and L2

2
– Formula: Loss + �� × Σ|w_i| + �� × Σ(w_i)²

Regularization Parameter (�)


• Controls the strength of regularization
• Higher � = stronger regularization = simpler model
• Lower � = weaker regularization = more complex model
• Optimal � is typically found through cross-validation

Ensemble Methods
What are Ensemble Methods? Ensemble methods combine multiple mod-
els to improve performance. It’s like asking multiple experts and taking their
collective wisdom.

Types of Ensemble Methods


• Bagging (Bootstrap Aggregating):
– Train multiple models on random subsets of the data
– Combine by averaging (regression) or voting (classification)
– Example: Random Forest
• Boosting:
– Train models sequentially, each focusing on errors of previous models
– Combine by weighted voting
– Examples: AdaBoost, Gradient Boosting
• Stacking:
– Train multiple models and use their predictions as inputs to a meta-
model
– Meta-model learns how to best combine the predictions

Popular Ensemble Algorithms


• Random Forest: Ensemble of decision trees using bagging
• AdaBoost: Boosts weak learners by focusing on misclassified instances
• Gradient Boosting: Builds trees sequentially to correct errors
• XGBoost: Optimized implementation of gradient boosting
• Voting Classifier/Regressor: Combines different types of models

Feature Selection
Why Feature Selection?
• Reduces overfitting
• Improves model performance
• Reduces training time
• Makes models more interpretable

3
Feature Selection Methods
• Filter Methods: Select features based on statistical measures
– Correlation
– Chi-square test
– Information gain
• Wrapper Methods: Use a model to evaluate feature subsets
– Recursive Feature Elimination (RFE)
– Forward/Backward selection
• Embedded Methods: Feature selection as part of model training
– Lasso regression
– Decision trees
– Random Forest feature importance

MCQ Practice Questions


Question 1
Which cross-validation technique is most appropriate when dealing
with imbalanced classes? - A) K-fold Cross-Validation - B) Leave-One-Out
Cross-Validation - C) Stratified K-fold Cross-Validation - D) Random Subsam-
pling
Answer: C) Stratified K-fold Cross-Validation
Explanation: Stratified K-fold ensures that each fold has the same proportion
of classes as the original dataset, which is crucial for imbalanced datasets to
avoid bias in the validation process.

Question 2
Which regularization technique can reduce coefficients to exactly
zero, effectively performing feature selection? - A) L1 Regularization
(Lasso) - B) L2 Regularization (Ridge) - C) Dropout - D) Batch Normalization
Answer: A) L1 Regularization (Lasso)
Explanation: L1 regularization adds the sum of absolute values of coefficients
to the loss function, which can shrink some coefficients to exactly zero, effectively
removing those features from the model.

Question 3
What is the main difference between bagging and boosting ensemble
methods? - A) Bagging uses decision trees while boosting uses neural networks
- B) Bagging trains models in parallel while boosting trains them sequentially
- C) Bagging is for classification while boosting is for regression - D) Bagging
requires more data than boosting

4
Answer: B) Bagging trains models in parallel while boosting trains them se-
quentially
Explanation: In bagging, multiple models are trained independently on ran-
dom subsets of the data. In boosting, models are trained sequentially, with each
model focusing on the errors made by previous models.

Question 4
Which metric is most appropriate for evaluating a regression model
when outliers are a concern? - A) Mean Squared Error (MSE) - B) Root
Mean Squared Error (RMSE) - C) Mean Absolute Error (MAE) - D) R-squared
Answer: C) Mean Absolute Error (MAE)
Explanation: MAE uses absolute differences rather than squared differences,
making it less sensitive to outliers compared to MSE or RMSE.

Question 5
In the bias-variance tradeoff, what happens as model complexity in-
creases? - A) Bias increases, variance decreases - B) Bias decreases, variance
increases - C) Both bias and variance increase - D) Both bias and variance
decrease
Answer: B) Bias decreases, variance increases
Explanation: As model complexity increases, the model can fit the training
data better (reducing bias), but becomes more sensitive to fluctuations in the
training data (increasing variance).

Question 6
Which of the following is NOT a method for hyperparameter tuning?
- A) Grid Search - B) Random Search - C) Bayesian Optimization - D) Principal
Component Analysis
Answer: D) Principal Component Analysis
Explanation: Principal Component Analysis (PCA) is a dimensionality reduc-
tion technique, not a method for hyperparameter tuning.

Question 7
What does the Area Under the ROC Curve (AUC) measure? - A)
The accuracy of the model - B) The probability that a randomly chosen positive
instance is ranked higher than a randomly chosen negative instance - C) The
precision of the model - D) The recall of the model
Answer: B) The probability that a randomly chosen positive instance is ranked
higher than a randomly chosen negative instance

5
Explanation: AUC represents the probability that the model will rank a ran-
domly chosen positive instance higher than a randomly chosen negative instance,
making it a measure of the model’s ability to discriminate between classes.

Question 8
Which ensemble method is MOST likely to reduce bias in a model? -
A) Bagging - B) Boosting - C) Stacking - D) Voting with identical models
Answer: B) Boosting
Explanation: Boosting focuses on reducing bias by sequentially training mod-
els that focus on the errors made by previous models, making it particularly
effective at reducing bias.

Calculation Problems
Problem 1: Cross-Validation
You have a dataset with 1000 instances and want to perform 5-fold
cross-validation. How many instances will be used for training and
testing in each fold?
Solution: - Total instances: 1000 - Number of folds: 5 - Testing instances per
fold: 1000 ÷ 5 = 200 - Training instances per fold: 1000 - 200 = 800
Therefore, each fold will use 800 instances for training and 200 for testing.

Problem 2: Confusion Matrix Metrics


**A classification model produces the following confusion matrix for a binary
classification problem: - True Positives (TP): 120 - False Positives (FP): 30 -
False Negatives (FN): 20 - True Negatives (TN): 130
Calculate the accuracy, precision, recall, F1 score, and specificity.**
Solution: - Accuracy = (TP + TN) / (TP + TN + FP + FN) = (120 + 130)
/ (120 + 130 + 30 + 20) = 250 / 300 = 0.833 or 83.3% - Precision = TP / (TP
+ FP) = 120 / (120 + 30) = 120 / 150 = 0.8 or 80% - Recall (Sensitivity) =
TP / (TP + FN) = 120 / (120 + 20) = 120 / 140 = 0.857 or 85.7% - F1 Score
= 2 × (Precision × Recall) / (Precision + Recall) = 2 × (0.8 × 0.857) / (0.8
+ 0.857) = 2 × 0.686 / 1.657 = 1.372 / 1.657 = 0.828 or 82.8% - Specificity =
TN / (TN + FP) = 130 / (130 + 30) = 130 / 160 = 0.813 or 81.3%

Problem 3: Regularization Effect


In a linear regression model with two features, the unregularized
coefficients are �� = 5 and �� = -3. If L2 regularization with � = 0.1
is applied, what will be the regularization penalty term added to the
loss function?

6
Solution: L2 regularization penalty = � × (��² + ��²) L2 regularization penalty
= 0.1 × (5² + (-3)²) L2 regularization penalty = 0.1 × (25 + 9) L2 regularization
penalty = 0.1 × 34 L2 regularization penalty = 3.4

Problem 4: R-squared Calculation


**A regression model produces the following predictions and actual values: -
Predicted: [12, 15, 18, 11, 20] - Actual: [10, 14, 17, 13, 22]
Calculate the R-squared value.**
Solution: First, calculate the mean of actual values: Mean(y) = (10 + 14 +
17 + 13 + 22) / 5 = 76 / 5 = 15.2
Then, calculate the total sum of squares (TSS): TSS = Σ(y_i - mean(y))² =
(10 - 15.2)² + (14 - 15.2)² + (17 - 15.2)² + (13 - 15.2)² + (22 - 15.2)² TSS =
(-5.2)² + (-1.2)² + (1.8)² + (-2.2)² + (6.8)² TSS = 27.04 + 1.44 + 3.24 + 4.84
+ 46.24 TSS = 82.8
Next, calculate the residual sum of squares (RSS): RSS = Σ(y_i - ŷ_i)² = (10
- 12)² + (14 - 15)² + (17 - 18)² + (13 - 11)² + (22 - 20)² RSS = (-2)² + (-1)² +
(-1)² + (2)² + (2)² RSS = 4 + 1 + 1 + 4 + 4 RSS = 14
Finally, calculate R-squared: R² = 1 - (RSS / TSS) = 1 - (14 / 82.8) = 1 - 0.169
= 0.831 or 83.1%

Key Formulas to Remember


1. Accuracy: (TP + TN) / (TP + TN + FP + FN)
2. Precision: TP / (TP + FP)
3. Recall (Sensitivity): TP / (TP + FN)
4. Specificity: TN / (TN + FP)
5. F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
6. Mean Absolute Error (MAE): (1/n) × Σ|y_i - ŷ_i|
7. Mean Squared Error (MSE): (1/n) × Σ(y_i - ŷ_i)²
8. Root Mean Squared Error (RMSE): √MSE
9. R-squared: 1 - (RSS / TSS)
• RSS: Residual Sum of Squares = Σ(y_i - ŷ_i)²
• TSS: Total Sum of Squares = Σ(y_i - mean(y))²
10. L1 Regularization: Loss + � × Σ|w_i|
11. L2 Regularization: Loss + � × Σ(w_i)²

Tips for MCQ Questions


1. Understand evaluation metrics: Know which metrics are appropriate
for different types of problems.
2. Know the tradeoffs: Understand the bias-variance tradeoff and how
different techniques affect it.

7
3. Remember regularization effects: Know how L1 and L2 regularization
affect model coefficients differently.
4. Understand ensemble methods: Know the differences between bag-
ging, boosting, and stacking.
5. Practice calculations: Be comfortable calculating common metrics from
raw data or confusion matrices.

You might also like