0% found this document useful (0 votes)
21 views3 pages

Data Preprocessing and Model Evalution

The document discusses key concepts in data preprocessing and model evaluation, emphasizing the importance of metrics for both supervised and unsupervised learning. It outlines various evaluation metrics for classification (Accuracy, Precision, Recall, F1-Score) and regression models (MAE, MSE), as well as issues like overfitting and underfitting. Additionally, it highlights hyperparameter tuning and ensemble methods as strategies to enhance model performance.

Uploaded by

layappa44lk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views3 pages

Data Preprocessing and Model Evalution

The document discusses key concepts in data preprocessing and model evaluation, emphasizing the importance of metrics for both supervised and unsupervised learning. It outlines various evaluation metrics for classification (Accuracy, Precision, Recall, F1-Score) and regression models (MAE, MSE), as well as issues like overfitting and underfitting. Additionally, it highlights hyperparameter tuning and ensemble methods as strategies to enhance model performance.

Uploaded by

layappa44lk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data preprocessing and model evalution

Model Evaluation: A critical step to determine how well your model is performing. This
involves metrics like:
For Supervised Learning: Accuracy, Precision, Recall, F1-Score, Mean Squared Error
(MSE), etc.
For Unsupervised Learning: Metrics like Silhouette Score or Adjusted Rand Index.
Overfitting and Underfitting: Understanding these two common problems is crucial.

Overfitting: When a model learns the training data too well, including its noise, and
performs poorly on new data.
Underfitting: When a model is too simple to capture the underlying patterns in the
data.
Hyperparameter Tuning: The process of adjusting a model's settings (hyperparameters) to
optimize its performance.
Ensemble Methods: Combining multiple models to create a single, more powerful
model. This includes techniques like Bagging (e.g., Random Forest) and Boosting (e.g.,
XGBoost, LightGBM

have i need to learn all above topics

Of course. Here are notes on the key evaluation metrics for supervised learning.

Evaluation Metrics for Supervised Learning


Evaluation metrics are crucial for quantifying the performance of a machine learning
model. They tell us how well the model is learning and how accurately it can make
predictions on new, unseen data. The choice of metric depends on the type of problem:
classification or regression.

Metrics for Classification Models


Classification models predict a category or class (e.g., "Spam" vs. "Not Spam", "Cat" vs.
"Dog").

1. Accuracy

What it is: The simplest metric. It measures the percentage of predictions that the
model got right.
Formula: (Number of Correct Predictions) / (Total Number of Predictions)
When to use it: Good for balanced datasets, where each class has a similar number of
examples.
Limitation: Can be misleading on imbalanced datasets. For example, if 95% of emails
are not spam, a model that always predicts "not spam" will be 95% accurate but is
completely useless.

2. Precision

What it is: Measures the accuracy of the positive predictions. Of all the times the
model predicted a positive outcome, how many were actually positive?
Formula: (True Positives) / (True Positives + False Positives)
When to use it: When the cost of a false positive is high.
Analogy 📧: In a spam filter, you want high precision. You would rather a spam email
get into your inbox (a false negative) than have an important email go to spam (a false
positive).

3. Recall (Sensitivity)
What it is: Measures how well the model finds all the actual positives. Of all the actual
positive cases, how many did the model correctly identify?
Formula: (True Positives) / (True Positives + False Negatives)
When to use it: When the cost of a false negative is high
Analogy 🩺: In medical screening for a disease, you want high recall. You would rather
tell a healthy person they might be sick (a false positive) than miss an actual case of
the disease (a false negative).

4. F1-Score
What it is: The harmonic mean of Precision and Recall. It provides a single score that
balances both concerns.
Formula: 2 * (Precision * Recall) / (Precision + Recall)
When to use it: When you need a balance between Precision and Recall, especially on
imbalanced datasets where accuracy is not a reliable metric. It's a good all-around
metric.

Metrics for Regression Models


Regression models predict a continuous, numerical value (e.g., price, temperature).

1. MAE (Mean Absolute Error)


What it is: The average of the absolute differences between the predicted and actual
values.
Formula: (1/n) * Σ|Actual - Predicted|
Key Feature: It's easy to interpret because the error is in the same units as the output
variable (e.g., an MAE of 5 means the predictions are off by an average of $5). It is not
very sensitive to outliers.

2. MSE (Mean Squared Error)


What it is: The average of the squared differences between the predicted and actual
values.
Formula: (1/n) * Σ(Actual - Predicted)²
Key Feature: It penalizes larger errors much more heavily than smaller ones. If
your model makes a very bad prediction, the MSE will be much higher. Its units are
"squared," making it less intuitive.

You might also like