0% found this document useful (0 votes)
10 views5 pages

Examples

Examples for Machine Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

Examples

Examples for Machine Learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Scenario Based Question

Scenario 1: Real Estate Market Analysis

Scenario: A real estate analyst is tasked with predicting house prices in a metropolitan
area based on various features such as square footage, number of bedrooms, age of the
house, and proximity to schools. The analyst collects a dataset containing these features
along with the corresponding sale prices of homes.

Questions:

1. Model Development: Describe the steps you would take to develop a multiple
linear regression model for predicting house prices. What preprocessing steps
would be necessary before fitting the model?
2. Interpretation and Impact: After fitting the model, you find that the coefficient for
square footage is $200. What does this mean in terms of pricing, and how would
you communicate this finding to potential home buyers and sellers? What other
factors could influence the accuracy of your model, and how might you address
them?

Scenario 2: Customer Satisfaction and Retention

Scenario: A telecommunications company is investigating the factors that influence


customer satisfaction and retention. They collect data on customer demographics, service
usage, customer service interactions, and satisfaction ratings on a scale from 1 to 10. The
company wants to use logistic regression to predict whether a customer will remain with
the service provider or switch to a competitor.

Questions:

3. Model Selection: Explain why logistic regression is a suitable choice for this
scenario. What are the dependent and independent variables in your model, and
how would you handle categorical variables in your dataset?
4. Evaluation and Strategy: After building the model, you find that the model predicts
a 70% probability of retention for customers with a high satisfaction rating. How
would you assess the model’s performance? What strategies could the company
implement to improve customer satisfaction based on the findings from your
model?
Scenario 1: Email Spam Detection

Scenario: A tech company is developing a spam detection system for its email service.
They have a dataset containing features extracted from emails, such as the presence of
certain keywords, the length of the email, and the sender's reputation. The goal is to
classify emails as either "spam" or "not spam."

Questions:

5. Model Development: Describe how you would approach building a classification


model for this spam detection system. Which algorithm would you choose (e.g.,
Logistic Regression, Decision Trees, SVM) and why? What steps would you take for
data preprocessing, including feature selection or engineering?
6. Model Evaluation: After training the model, you achieve an accuracy of 85% on the
test set. However, upon further inspection, you notice a high false positive rate.
How would you evaluate the model’s performance more thoroughly? What metrics
(e.g., precision, recall, F1 score) would you consider, and how would you address
the issue of false positives in your system?

Scenario 2: Disease Diagnosis

Scenario: A healthcare organization is developing a predictive model to diagnose a


particular disease based on patient symptoms and medical history. The dataset includes
features such as age, gender, specific symptoms, and test results. The target variable is
binary, indicating whether a patient has the disease (1) or not (0).

Questions:

7. Feature Importance: Which features do you think would be most important for the
model, and how would you determine their significance? What classification
algorithm would you use, and why?
8. Real-World Implications: After deploying the model, you find that the model's
predictions are leading to a high rate of false negatives, meaning some patients with
the disease are being incorrectly diagnosed as healthy. What steps would you take
to improve the model? How would you communicate the importance of accurate
diagnosis to healthcare providers and patients?
Scenario 1: Customer Segmentation

Scenario: A retail company wants to improve its marketing strategies by segmenting its
customer base. They have collected data on customer purchases, demographics, and
online behavior. The goal is to use clustering techniques to identify distinct customer
segments.

Questions:

Clustering Approach: Describe how you would approach the customer segmentation
problem. Which clustering algorithm would you choose (e.g., K-Means, Hierarchical
Clustering, DBSCAN) and why? What steps would you take for data preprocessing,
including feature selection and scaling?

Interpretation and Action: After applying the clustering algorithm, you identify four
distinct customer segments. How would you interpret the characteristics of each
segment? What marketing strategies would you recommend for each group to enhance
customer engagement and sales?

Scenario 2: Image Compression

Scenario: A tech company is developing an image compression algorithm to reduce the


file size of images while maintaining quality. They decide to use clustering techniques to
group similar pixel colors together in the images.

Questions:

9. Algorithm Selection: Which clustering algorithm would you recommend for this
image compression task, and why? How would you determine the optimal number
of clusters to use for effective compression?
10. Evaluation of Results: After implementing the algorithm, you notice that the
compressed images lose some quality. What metrics would you use to evaluate the
quality of the compressed images compared to the original? How would you adjust
your clustering approach to improve the balance between compression and image
quality?

Numerical Based Question


Linear Regression:

• Calculation of coefficients using the least squares method.


• Prediction of values based on a linear model.
• Residuals and their analysis.

Logistic Regression:

• Calculation of odds ratios and probabilities.


• Interpretation of coefficients.
• Confusion matrix metrics (accuracy, precision, recall, F1 score).

K-Nearest Neighbors (KNN):

• Distance calculations (Euclidean, Manhattan).


• Prediction based on majority voting.
• Effect of different values of kkk on classification results.

Data Preprocessing:

• Imputation of missing values (mean, median, mode).


• Calculation of z-scores for outlier detection.
• One-hot encoding calculations.

Classification Metrics:

• Calculating confusion matrix elements.


• ROC curve and AUC calculations.
• Evaluation metrics like accuracy, precision, recall, F1 score.

Bayesian Methods:

• Application of Bayes' theorem for probability updates.


• Prior and posterior probability calculations.

Clustering:

• K-means clustering calculations (centroid updates, inertia).


• Silhouette score calculations.

Polynomial Regression:
• Fitting polynomial models and calculating coefficients.
• Predictions using polynomial equations.

You might also like