Machine Learning – Assignment Questions
(Compiled)
Unit 1 – Machine Learning Basics
1. Define Machine Learning. Explain various application areas of Machine Learning.
2. Discuss various terminologies used in ML models. Explain two major phases of building an ML
model.
3. Elaborate various steps used to develop an ML Model.
4. Given the definitions of AI, ML, and Data Science, analyze how they interrelate and overlap.
Provide a Venn diagram and real-life examples.
5. Create a comparison table showing the key differences between ML, AI, and Data Science with
an example.
6. With suitable diagram, explain the concept of learning in computer systems.
7. Differentiate between Feature Selection and Feature Extraction with suitable examples.
8. State the importance of feature engineering in the performance of ML models.
9. State the importance of scaling. Demonstrate normalization using both StandardScaler and
MinMaxScaler with example.
10. Compare and contrast Supervised vs. Unsupervised Learning approaches.
11. Elaborate semi-supervised learning scenario where only a portion of the labels are available.
12. Elaborate terms agent, environment, actions, rewards in Reinforcement Learning. Explain the
process of learning based on feedback.
13. Describe a real-world scenario where Reinforcement Learning is applied, and explain how
feedback improves performance.
14. Given a dataset of customer transactions, identify learning type (Supervised, Unsupervised,
Semi-supervised, RL) for: Predicting future purchases, Grouping similar users, Learning optimal
pricing strategy from feedback
15. Discuss techniques for handling missing data: forward fill, backward fill, interpolation.
16. Describe Python functions for detecting & managing missing data with example.
17. Provide sample code to encode categorical variables using Label Encoding and One-Hot
Encoding.
18. Write Python script to preprocess dataset: Fill missing Age with median, Encode Gender with
Label Encoding, Encode Purchased with One-Hot Encoding.
19. State and explain Train-test split code used to build classifier models.
20. Analyze the effect of a linear vs. non-linear decision surface on classification problems.
Unit 2 – Regression & Evaluation
1. Differentiate between dependent and independent variables in simple and multiple regression.
2. Explain what residuals represent in regression. Describe the role of cost function in linear
regression.
3. Differentiate between simple regression and multiple regression.
4. Explain why polynomial regression is used for non-linear datasets.
5. Analyze the impact of outliers on linear regression models.
6. Interpret the slope and intercept in a linear regression model.
7. Explain the relationship between gradient descent and minimizing error.
8. Compare Ridge Regression vs. Lasso Regression in terms of feature selection & overfitting.
9. Describe a confusion matrix and interpret its terms.
10. Explain precision & recall in classification with examples.
11. Compute precision & recall for given confusion matrix values.
12. Compute F1 Score given Precision=0.8, Recall=0.6.
13. Describe the importance of ROC-AUC in classification evaluation.
14. Draw a ROC curve given TPR & FPR values.
15. State the significance of R² score in regression.
16. Compute MAE, MSE, RMSE for given actual vs. predicted values (different datasets provided).
Unit 3 – Classification & Distance Metrics
1. What does the parameter k signify in k-NN? Explain effects of small vs. large k.
2. k-NN is called a lazy learner algorithm. Justify.
3. Apply k-NN (k=3) to classify a query point given dataset of 6 points.
4. Explain the concept of bias and variance in ML with examples.
5. Differentiate between instance-based and model-based learning.
6. Define Euclidean distance and write general formula.
7. Define Manhattan distance and mention where it is useful.
8. Compute Euclidean & Manhattan distance between given pairs of points.
9. Compare Euclidean vs. Manhattan distance.
10. Differentiate between hard margin SVM vs. soft margin SVM.
11. What are support vectors in SVM? Explain the concept of maximum margin hyperplane.
12. Draw the separating hyperplane for linearly separable points in SVM.
13. What is the kernel trick in SVM? Differentiate between linear & non-linear SVM.
14. Differentiate between linear, polynomial, and RBF kernels in SVM.
15. Compare performance of SVM with linear, polynomial, RBF kernels.
16. Justify why SVM is preferred over k-NN in high-dimensional datasets.
17. Analyze how outliers affect k-NN classification.