Machine Learning Questions
1. Define Machine Learning and describe its applications.
2. Define supervised and unsupervised learning with one example each.
3. Explain the concept of overfitting and underfitting with suitable diagrams.
4. What is the bias-variance trade-off? How does it affect model performance? Explain
with an example.
Generalization Error = Bias² + Variance + Irreducible Error
• Bias: Error from incorrect assumptions.
• Variance: Error from model sensitivity to training data.
• Irreducible Error: Noise in data (e.g., broken sensors, outliers)
5. Compare Classification and Regression
6. Describe cross-validation and its importance in model evaluation.
https://www.geeksforgeeks.org/machine-learning/cross-validation-machine-learning/
7. Given data:
X = [1, 2, 3, 4, 5], Y = [2, 4, 6, 8, 10]
Fit a linear regression model and predict Y when X=6.
8. A model gives RMSE=5 on training data and RMSE=15 on test data. What does this
indicate?
9. What is the difference between parametric and non-parametric models?
Understanding Parametric Models
Parametric models make strong assumptions about the functional form, or shape, of the
relationship between the variables in the data. These models are characterized by
having a fixed number of parameters, which are estimated from the training data and
used to make predictions.
Non-Parametric Models
Non-parametric models, in contrast, make fewer assumptions about the functional form of
the relationship between variables. Instead of a fixed set of parameters, these models
typically have parameters that grow with the size of the data, allowing them to be more
flexible in modeling complex relationships.
10. Explain bias–variance tradeoff with examples.
11. Consider the following dataset
Employment Credit Income Buy
Employed Bad Low No
Employed Bad High Yes
Employed Good Low Yes
Unemployed Good High Yes
Unemployed Bad Low Yes
I.Compute the entropy of the entire dataset.
II. Calculate the Information Gain (IG) for each attribute (Employment, Credit, Income).
III. Identify the attribute that should be chosen as the root node
IV. Construct the first split of the decision tree.
I. Entropy of dataset = 0.722 bits
II. IG(Employment) = IG(Credit) = IG(Income) = 0.171
III. Any attribute can be chosen as root (say Employment)
IV. First split of decision tree:
• Employment = Unemployed → Yes
• Employment = Employed → further split required
12. Write in detail about evaluation metrics for regression models.
https://www.geeksforgeeks.org/machine-learning/regression-metrics/
13. Discuss linear regression with equations and assumptions.
14. What is logistic regression? How is it different from linear regression?
15. The following table shows the number of hours studied (X) and the corresponding
marks obtained (Y)
Find the linear relationship between hours studied and marks obtained
Predict the marks if a student studies for 6 hours.
16. Define the sigmoid function. Why is it used in logistic regression?
17. List applications of logistic regression.
18. A machine learning model was evaluated on the following dataset of 10 instances
a. Construct the confusion matrix for this dataset.
b. Calculate the following performance metrics for the model
i. Accuracy ii. Precision, iii Recall, iv F1-Score
c. Interpret the results, what do these values indicate about the model’s performance?
19. A logistic regression model gives:
Find the probability of Y=1 when X=3.
Solution: Put X=3 in the given formula
20. Explain training, validation, and test sets with a neat diagram.
21. Write the steps involved in the machine learning process.
https://www.geeksforgeeks.org/machine-learning/general-steps-to-follow-in-a-machine-
learning-problem/
22. Compare classification and regression tasks.
23. What is entropy in decision trees?
https://www.geeksforgeeks.org/data-science/how-to-calculate-entropy-in-decision-tree/
24. Differentiate between ID3 and CART.
25. Define Gini Index with formula.
26. Explain pruning in decision trees. Construct a decision tree using ID3 for the
following dataset (Outlook vs Play Tennis).
https://www.geeksforgeeks.org/machine-learning/pruning-decision-trees/
27. Explain CART algorithm in detail with an example.
CART – Classification and Regression Trees
CART (Classification And Regression Tree) in Machine Learning - GeeksforGeeks
28. Describe KNN algorithm with advantages and disadvantages.
K-Nearest Neighbor(KNN) Algorithm - GeeksforGeeks
29. Discuss lazy learning vs eager learning.
https://www.geeksforgeeks.org/data-science/what-is-the-difference-between-lazy-and-
eager-learning/
30. Given a dataset of 10 students with attributes {Study Hours, Result}, construct a
decision tree using Gini index.
31. A KNN classifier uses k=3. Given training data points (2, A), (4, A), (6, B), (8, B),
predict the class of test point (5)
32. List assumptions of Naive Bayes classifier.Give two applications of Naive Bayes.
https://www.geeksforgeeks.org/machine-learning/naive-bayes-classifiers/
33. Compare Naive Bayes and KNN classifiers.
34. A dataset has two classes Spam and Not Spam.
P(Spam)=0.3, P(Not Spam)=0.7
P(word="Free"|Spam)=0.6, P(word="Free"|Not Spam)=0.1
If the word “Free” occurs in a mail, classify it using Naive Bayes.