Unit 3
Scenario-Based Questions
1. Classification
1. A hospital wants to predict whether a patient has diabetes (Yes/No) based on their
glucose level, age, and BMI.
👉 Identify which machine learning approach should be used and explain how
training and test data would be prepared.
2. An email service provider wants to automatically categorize emails as “Spam” or
“Not Spam.”
👉 Explain how classification and evaluation metrics (accuracy, precision, recall)
play a role here.
2. Regression
3. A real estate company wants to predict house prices based on area, location, and
number of rooms.
👉 Suggest whether to use Linear Regression or Logistic Regression and justify your
choice.
4. A university wants to predict a student’s exam score based on study hours and
attendance percentage.
👉 Explain how you would train and test a regression model for this problem.
3. Types of Learning
5. A company has thousands of labeled customer feedback entries as “positive,”
“neutral,” or “negative.”
👉 Identify the type of learning and explain why supervised learning is suitable here.
6. A bank wants to detect fraudulent transactions from millions of records without
prior labels.
👉 Which type of learning (supervised/unsupervised/reinforcement) fits best? Explain
the reasoning.
4. Overfitting and Underfitting
7. During model training, the model performs very well on training data but poorly on
unseen test data.
👉 Describe whether this is overfitting or underfitting and suggest two methods to fix
it.
8. A linear regression model gives poor accuracy on both training and test data.
👉 Identify the issue (underfitting) and explain what changes can improve
performance.
5. Bias and Variance
9. A student trains multiple models for predicting student grades. One model always
predicts the same value for all students, while another model gives highly varying
predictions.
👉 Relate both situations to bias and variance concepts and suggest a balanced
approach.
6. Cross Validation
10. You have a small dataset of only 200 samples for a classification problem.
👉 Explain how k-fold cross validation can help in ensuring reliable model
evaluation.
7. Probability & Hypothesis Space
11. A weather prediction model estimates a 70% chance of rain tomorrow.
👉 Explain how probability supports the decision-making process in machine
learning.
12. When training a model to recognize cats in images, multiple possible algorithms
(decision trees, SVM, neural networks) are available.
👉 Discuss the concept of hypothesis space and inductive bias in selecting the most
suitable model.
8. Linear Algebra in ML
13. You are working on an image recognition system that converts each image into a
matrix of pixel values.
👉 Explain how linear algebra (matrices and vectors) is used in this context.
Unit 4
1. Neural Networks – Perceptron, Adaline, Backpropagation
1. A company wants to build a model that predicts whether a customer will buy a
product (Yes/No) based on past purchase patterns, age, and income.
👉 Identify how a Perceptron Network could be used for this classification problem.
2. A student trains a simple Adaline model for predicting exam pass/fail results but
notices that the output values are not perfectly binary (not exactly 0 or 1).
👉 Explain how Adaline differs from the Perceptron and why the output behaves this
way.
3. A handwriting recognition system uses a Backpropagation Neural Network to
identify digits from images.
👉 Describe how backpropagation helps improve the model’s accuracy during
training.
2. Decision Tree – Entropy, Information Gain, Gini Impurity
4. A bank wants to classify loan applicants as “Approved” or “Rejected” based on
attributes such as income, credit score, and debt.
👉 Explain how a Decision Tree model uses Entropy and Information Gain to select
the best attribute for the root node.
5. Two decision trees are built using different criteria — one with Information Gain
and another with Gini Impurity.
👉 Discuss how the choice of these criteria may affect the structure of the tree and the
final accuracy.
6. An e-commerce platform uses a Decision Tree to recommend whether to offer a
discount. After testing, the tree performs poorly on new customers.
👉 Identify the likely problem (overfitting) and suggest a method (like pruning) to fix
it.
3. Rule-Based Classification
7. A retail store has rules like “IF age < 25 AND purchase history = low THEN discount
= 10%.”
👉 Explain how a Rule-Based Classification System can use such rules to classify
customers automatically.
8. In a medical diagnosis system, experts define a set of IF-THEN rules for predicting
diseases based on symptoms.
👉 Describe how these rules can be converted into a Rule-Based Classifier and what
its limitations might be.
4. Naïve Bayes Classification
9. A spam filter needs to classify incoming emails as “Spam” or “Not Spam” based on
words appearing in the message (e.g., “win,” “offer,” “free”).
👉 Explain how Naïve Bayes Classification uses probabilities to make this decision.
10. A weather forecasting system uses past data to predict whether it will rain tomorrow
given humidity and temperature levels.
👉 Illustrate how conditional probabilities are applied in the Naïve Bayes approach
to calculate the most likely outcome.
5. Support Vector Machines (SVM)
11. A manufacturing company wants to separate defective and non-defective products
based on sensor readings.
👉 Explain how SVM determines the optimal hyperplane to distinguish between the
two classes.
12. In a dataset where two classes are not linearly separable, an analyst applies a kernel
function in SVM.
👉 Describe how the kernel trick helps handle this kind of non-linear data.
Unit 5
. Unsupervised Learning
1. A retail company has thousands of customer purchase records but no labels about
their buying habits. The company wants to group customers with similar purchasing
behavior.
👉 Explain how unsupervised learning can be applied to identify different customer
segments.
2. A streaming service wants to recommend similar movies to users without knowing
their ratings or feedback.
👉 Describe how unsupervised learning helps in building such a recommendation
system.
2. Principal Component Analysis (PCA)
3. A data analyst is working with a dataset having 50 features related to financial
performance. Processing is slow and visualization is difficult.
👉 Explain how PCA can be used to reduce dimensionality while retaining most of
the important information.
4. A healthcare researcher uses PCA to analyze patient data containing blood pressure,
cholesterol, sugar levels, and BMI.
👉 Describe how PCA helps identify the most influential factors affecting patient
health.
3. Neural Network – Fixed Weight Competitive Nets
5. A company uses a Fixed Weight Competitive Network to classify customer
purchase patterns into clusters.
👉 Explain how competition among neurons helps the network discover natural
groupings in the data.
6. A researcher trains a competitive network with five neurons to identify groups of
handwritten digits.
👉 Discuss how the winning neuron is determined and how this contributes to the
clustering process.
4. Kohonen Self-Organizing Feature Maps (SOM)
7. A weather department uses a Kohonen Self-Organizing Map to cluster climate
patterns from multiple locations (temperature, humidity, wind).
👉 Explain how the SOM visually represents similar weather patterns on a 2D grid.
8. In an industrial automation system, a SOM is used to detect unusual patterns in
machine vibration signals.
👉 Describe how SOM identifies anomalies or outliers in the dataset.
5. Clustering – Definition and Types
9. An e-commerce company wants to group its products based on user browsing
behavior.
👉 Define clustering and explain which type of clustering (partition or hierarchical)
is suitable for this case and why.
10. A marketing team wants to group customers into “high spenders,” “moderate
spenders,” and “low spenders” without prior labels.
👉 Discuss how clustering can be used and what features might be included in the
dataset.
6. Hierarchical Clustering
11. A social media company wants to analyze user connections and group similar users
together.
👉 Explain how hierarchical clustering can create a dendrogram to represent user
similarity levels.
12. A data scientist finds that merging small clusters gives a more interpretable structure
of customer data.
👉 Describe how agglomerative hierarchical clustering works in this scenario.
7. K-Means Algorithm
13. A bank wants to divide its customers into groups based on their credit score and
income.
👉 Explain how the K-means algorithm will form these clusters and how the value of
k can be chosen.
14. An image compression system reduces the number of colors in an image using
clustering.
👉 Describe how K-means clustering groups similar color pixels to achieve
compression.
15. During K-means clustering, a student observes that results vary each time the
algorithm is run.
👉 Identify the reason for this variation and suggest how to improve cluster stability.