Assignment 1
R1UC525B – Machine Learning
1. Define Machine Learning and list two major goals of ML.
2. Differentiate between Supervised and Unsupervised Learning with one example each.
3. What is the role of training and test datasets in supervised learning?
4. List any two real-world applications of Machine Learning in healthcare.
5. Explain why data preprocessing is important before model training.
6. Define “feature” and “label” with an example from a student exam dataset.
7. Differentiate between Regression and Classification.
8. Write any two performance metrics for classification tasks.
9. What is Cross-Validation and why is it used?
10. List two common issues in Machine Learning models and how they can be addressed.
11. Explain the steps in the Machine Learning Pipeline with a neat diagram.
12. Describe how Maximum Likelihood Estimation (MLE) is used to estimate model
parameters in ML.
13. Illustrate the Bias–Variance Trade-off and its impact on model generalization.
14. A nutritionist collected data for three individuals to study the relationship between
calorie intake and weight gain. Given data: (2000, 60), (2500, 65), (3000, 70). Compute the
Pearson correlation coefficient and interpret it.
15. Explain the difference between Simple and Multiple Linear Regression with examples.
16. A logistic regression model predicts whether a patient has diabetes. Given z = −4 +
0.06×Age + 0.04×Glucose. Compute the probability for Age = 40, Glucose = 150. Interpret
the result.
17. Explain how the Naïve Bayes classifier works and state its key assumption.
18. You trained two models for predicting house prices. Model A has high training and test
error; Model B has low training error but high test error. Identify which model is
underfitting and which is overfitting. Suggest two remedies for each case.
19. Explain how the model computes posterior probabilities and makes the final
classification. Mention one limitation of the independence assumption.
20. Describe how an SVM finds the optimal hyperplane. If data is non-linear, explain how
the kernel trick helps.
21. You built a medical diagnosis model with Precision = 0.7 and Recall = 0.9. Interpret
these values and discuss when you would prefer Precision over Recall.
22. A startup is developing a system to predict crop yield using soil moisture, rainfall, and
temperature. Explain how you would prepare data, select a suitable model, and evaluate its
performance.
23. A marketing analyst collected data on advertising expenditure (in ₹ thousand) and the
corresponding sales revenue (in ₹ lakh) for five months as shown below:
Advertising (₹000) Sales (₹Lakh)
10 25
15 35
20 45
25 52
30 60
Using this data:
1. Fit a simple linear regression model by calculating the slope and intercept using the
standard formulas.
2. Predict the sales revenue when the company spends ₹22,000 on advertising.
24. An email service provider tested its spam-filtering algorithm on 12,000 emails. Out of
these, 1,000 emails were actually spam. The system identified 900 emails as spam, but only
700 of them were truly spam.
Using this data:
1. Calculate the Precision and Recall of the spam detection system.
2. Interpret the results — what do these values indicate about the system’s ability to
correctly detect spam emails and avoid false alarms?
25. An agricultural scientist wants to understand the effect of fertilizer quantity on plant
height. The following data was collected from an experiment:
Fertilizer (kg) Plant Height (cm)
2 25
3 35
4 42
5 48
6 55
8 62
Tasks:
a) Express the relationship using the linear regression model 𝑌 = 𝑎 + 𝑏𝑋and explain what
each variable represents.
b) Plot fertilizer quantity vs plant height and indicate the best-fit line.
c) Estimate the expected height of a plant when 7 kg of fertilizer is applied.