0% found this document useful (0 votes)
40 views25 pages

Moule 3

The document compares Machine Learning (ML) and Traditional Programming, highlighting their differences in problem-solving approaches, complexity handling, adaptability, data dependency, and debugging. It outlines the key components of the learning problem in ML, including input features, output targets, hypothesis space, loss functions, and learning algorithms. Additionally, it discusses model training, parameter functions, evaluation metrics, and the importance of data splitting for effective model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views25 pages

Moule 3

The document compares Machine Learning (ML) and Traditional Programming, highlighting their differences in problem-solving approaches, complexity handling, adaptability, data dependency, and debugging. It outlines the key components of the learning problem in ML, including input features, output targets, hypothesis space, loss functions, and learning algorithms. Additionally, it discusses model training, parameter functions, evaluation metrics, and the importance of data splitting for effective model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

MOULE 3

Machine Learning (ML) differs from Traditional Programming in several fundamental ways. Here’s a
comparison:

1. Approach to Problem-Solving

 Traditional Programming:

o Developers write explicit rules (logic) to process input data and produce output.

o Example: Writing a function to calculate tax based on income brackets.

o Flow: Input → Program (Rules) → Output

 Machine Learning:

o Instead of writing rules, the system learns patterns from data to make predictions.

o Example: Training a model on past tax records to predict future taxes.

o Flow: Input + Output → Model Training → Learned Model → Predictions

2. Handling Complexity

 Traditional Programming:

o Works well for well-defined, rule-based problems (e.g., sorting, calculations).

o Struggles with complex, ambiguous tasks (e.g., image recognition, natural language
processing).

 Machine Learning:

o Excels at problems where rules are hard to define (e.g., spam detection,
recommendation systems).

o Can adapt to new patterns in data without manual updates.

3. Adaptability

 Traditional Programming:

o Requires manual updates if rules change (e.g., updating tax laws means rewriting
code).

 Machine Learning:

o Can improve over time with more data (retraining).

o Adapts to new patterns automatically (if designed properly).

4. Data Dependency

 Traditional Programming:
o Relies on logic written by developers.

o Works even with small or no data (if rules are correct).

 Machine Learning:

o Heavily depends on quality and quantity of data.

o Poor data leads to poor predictions (Garbage In → Garbage Out).

5. Debugging & Interpretability

 Traditional Programming:

o Bugs can be traced to specific lines of code.

o Logic is transparent and explainable.

 Machine Learning:

o Harder to debug (errors may come from data, model choice, or hyperparameters).

o Some models (e.g., deep learning) act as "black boxes" (hard to interpret).

6. Use Cases

Traditional Programming Machine Learning

Calculator apps Fraud detection

Database queries Speech recognition

Web servers Self-driving cars

Sorting algorithms Personalized recommendations

Summary

Feature Traditional Programming Machine Learning

Logic Source Handwritten by developers Learned from data

Adaptability Static (needs manual updates) Dynamic (improves with data)


Feature Traditional Programming Machine Learning

Best For Rule-based problems Pattern recognition

Debugging Straightforward Complex (depends on data/model)

Example Excel formulas ChatGPT

Understanding and Formalizing the Learning Problem in Machine


Learning
In machine learning, the learning problem refers to the task of training a model to make predictions
or decisions based on data. Unlike traditional programming (where rules are explicitly coded), ML
systems learn from examples to generalize patterns.

1. Key Components of the Learning Problem

To formalize a learning problem, we define:

(A) Input (Features)

 Represented as X (a vector or matrix of features).

 Example: In house price prediction, features could be:

o Size (sq. ft.)

o Number of bedrooms

o Location

(B) Output (Target Variable)

 Represented as Y (what we want to predict).

 Example:

o Regression: Price of the house (continuous value).

o Classification: "Spam" or "Not Spam" (discrete label).

(C) Hypothesis Space (Model Class)

 A set of possible functions/models that can map X → Y.

 Example:

o Linear models: Y=wX+bY=wX+b


o Decision trees, Neural Networks, etc.

(D) Loss Function (Cost Function)

 Measures how well the model performs (difference between predictions and true values).

 Example:

o Mean Squared Error (MSE) for regression.

o Cross-Entropy Loss for classification.

(E) Learning Algorithm

 Adjusts model parameters to minimize the loss function.

 Example:

o Gradient Descent (optimization method).

o Backpropagation (for neural networks).

2. Formal Definition of the Learning Problem

Given:

 A dataset D={(x1,y1),(x2,y2),...,(xn,yn)}D={(x1,y1),(x2,y2),...,(xn,yn)}

 A hypothesis space HH (possible models).

 A loss function LL (measures prediction errors).

Goal:
Find a function h∈Hh∈H that minimizes the expected loss over new, unseen data:

h∗=arg⁡min⁡h∈HE(x,y)[L(h(x),y)]h∗=argh∈HminE(x,y)[L(h(x),y)]

Interpretation:

 The model should generalize (perform well on unseen data, not just training data).

 Avoid overfitting (memorizing training data) and underfitting (failing to learn patterns).

4. Example: Formalizing a Simple Linear Regression Problem

Problem Statement:

Predict house prices based on size (sq. ft.).

Formalization:
1. Input (X): House sizes [1000,1500,2000,...][1000,1500,2000,...].

2. Output (Y): Prices [200k,250k,300k,...][200k,250k,300k,...].

3. Model: Linear function Y=wX+bY=wX+b.

4. Loss Function: Mean Squared Error (MSE).

5. Learning Algorithm: Gradient Descent (optimizes w,bw,b to minimize MSE).

1. What is a Model?

A model is a function that maps input features (X) to output predictions (Ŷ).

 It defines the relationship between inputs and outputs.

 Different models make different assumptions about data patterns.

Examples of Models:

Model Type Formula (Example) Use Case

Linear
Y^=wX+bY^=wX+b Predicting house prices
Regression

Logistic Binary classification (spam


Y^=11+e−(wX+b)Y^=1+e−(wX+b)1
Regression detection)

Decision Tree Splits data based on feature thresholds Customer churn prediction

Neural Complex layered


Image recognition
Network functions Y^=fn(...f2(f1(X)))Y^=fn(...f2(f1(X)))

2. What are Parameters?

Parameters are the internal settings of a model that are learned from data during training.

 They define how the model transforms input into output.

 The goal of training is to find the best parameters that minimize prediction errors.

Examples of Parameters:
Model Parameters Role

Controls slope and intercept of


Linear Regression ww (weight), bb (bias)
the line

Neural Network Weights & biases in each neuron Adjusts how signals propagate

Support Vector Machine Support vectors & margin


Determines the decision bou
(SVM) coefficients

3. Model Training: How Parameters are Learned

1. Initialize Parameters

o Start with random values (e.g., w=0.1,b=0w=0.1,b=0).

2. Make Predictions

o Compute Y^=Model(X)Y^=Model(X).

3. Calculate Loss

o Compare predictions (Y^Y^) with true values (YY) using a loss function (e.g., Mean
Squared Error).

4. Update Parameters

o Adjust ww and bb to reduce loss (using optimization like gradient descent).

5. Repeat

o Iterate until loss is minimized (or convergence).

Hyperparameters vs. Parameters

Parameters Hyperparameters

Learned from data (e.g., weights in a Set before training (e.g., learning rate, number of trees
neural network). in a random forest).

Adjusted automatically during training. Tuned manually or via grid search.

Example: Coefficients in linear Example: Depth of a decision tree.


Parameters Hyperparameters

regression.

To build a reliable machine learning model, data is typically split into three distinct sets:
1. Training Data
2. Validation Data
3. Test Data

1. Training Data

Purpose

 Used to train the model (i.e., adjust its parameters).

 The model learns patterns from this data.

Characteristics

 Typically 60-80% of the total dataset.

 The larger the training set, the better the model can learn (but needs to be balanced with
validation/test data).

Example

 In image classification, the model sees labeled images (e.g., "cat" or "dog") and adjusts its
weights to minimize prediction errors.

2.Validation Data

Purpose

 Used to tune hyperparameters (e.g., learning rate, number of layers in a neural network).

 Helps select the best model architecture.

 Prevents overfitting (model memorizing training data instead of generalizing).

Characteristics

 Typically 10-20% of the total dataset.

 Not used in training—only for model selection.


Example

 Trying different learning rates (0.01 vs. 0.001) and picking the one that performs best on the
validation set.

3. Test Data

Purpose

 Used only once for final evaluation of the trained model.

 Simulates real-world performance on unseen data.

Characteristics

 Typically 10-20% of the total dataset.

 Must never be used during training or validation (to avoid bias).

Example

 After training a spam classifier, you evaluate its accuracy on a held-out test set of emails.

4. Why Split Data?

Problem Solution

Overfitting (model works well on training data but fails on Validation set checks performance
new data) during training.

Optimistic bias (if test data is used for tuning, performance


Keep test data completely separate.
estimates are inflated)

Use validation data to pick the best


Model selection (comparing different algorithms)
one fairly.

5. Common Splitting Strategies

(A) Holdout Method (Simple Split)

 70% Train | 15% Validation | 15% Test

 Best for large datasets.

(B) K-Fold Cross-Validation (Better for Small Data)

 Split data into K folds (e.g., K=5).

 Train on K-1 folds, validate on the remaining fold.


 Repeat K times and average results.

 No separate test set unless held out initially.

(C) Stratified Splitting (For Imbalanced Data)

 Ensures each split has the same class distribution.

 Example: If 20% of data is "spam," each set (train/val/test) keeps ~20% spam.

Function of Model Parameters in Machine Learning


Model parameters are the internal variables that a machine learning model learns from training
data. They define how input features are transformed into predictions.

Key Functions:

1. Define Model Behavior

o Parameters control how the model makes decisions (e.g., weights in a neural
network, coefficients in linear regression).

2. Optimize Predictions

o Adjusted during training to minimize errors (e.g., via gradient descent).

3. Capture Patterns

o Store learned relationships between features and targets (e.g., positive/negative


correlations).

Types of Parameters

1. Weights (e.g., in Neural Networks, Linear Regression)

o Determine feature importance.

o Example: In y=w1x1+w2x2+by=w1x1+w2x2+b, w1,w2w1,w2 are weights.

2. Biases (Intercept Terms)

o Shift the prediction function (e.g., bb in y=wx+by=wx+b).

3. Support Vectors (in SVM)

o Define the decision boundary.

Parameter vs. Hyperparameter


Aspect Parameters Hyperparameters

Learned from Training data Set by the developer

Example Weights in a neural network Learning rate, batch size

Adjusted via gradient


Optimization Tuned via grid search
descent

Key Tradeoffs

1. Bias-Variance Tradeoff

o More parameters → Lower bias (fits training well) but higher variance (overfitting).

2. Interpretability vs. Performance

o Linear models (few params) are interpretable; deep learning (many params) is
powerful but opaque.

When to Adjust Parameters?

 Underfitting? Increase model complexity (more parameters).

 Overfitting? Reduce parameters or add regularization.

Metrics for Evaluating Model Performance


Evaluating a machine learning model's performance is crucial to ensure it generalizes well to unseen
data. The choice of metric depends on the type of problem (classification, regression, clustering)
and business goals.

1. Classification Metrics

Used when the output is a category (e.g., spam/not spam, fraud/legit).

(A) Confusion Matrix

A table showing:

 True Positives (TP): Correctly predicted positives.


 True Negatives (TN): Correctly predicted negatives.

 False Positives (FP): Negative samples wrongly predicted as positive.

 False Negatives (FN): Positive samples wrongly predicted as negative.

Predicted: Yes Predicted: No

Actual: Yes TP FN

Actual: No FP TN

(B) Accuracy

 Measures overall correctness.

 Formula:

Accuracy=TP+TN/TP+TN+FP+FN

(C) Precision

 Measures how many predicted positives are actually positive.

 Formula: Precision=TP/TP+FP

 Use Case: Important when FP are costly (e.g., falsely flagging legit emails as spam).

(D) Recall (Sensitivity)

 Measures how many actual positives were correctly predicted.

 Formula: Recall=TP/TP+FN

 Use Case: Important when FN are costly (e.g., missing a cancer diagnosis).

(E) F1-Score

 Harmonic mean of precision and recall (balances both).

 Formula: F1=2×Precision×Recall/Precision+Recall

 Use Case: Best for imbalanced datasets.

(F) ROC-AUC (Receiver Operating Characteristic - Area Under Curve)

 Measures model’s ability to distinguish classes at different thresholds.

 AUC = 1.0: Perfect classifier.

 AUC = 0.5: Random guessing.

 Use Case: Comparing models in binary classification.


2. Regression Metrics

Used when the output is a continuous value (e.g., house price, temperature).

(A) Mean Absolute Error (MAE)

 Average of absolute errors.

 Formula: MAE=1/n∑∣yi−y^i∣

(B) Mean Squared Error (MSE)

 Average of squared errors.

 Formula: MSE=1/n∑(yi−y^i)^2

(C) Root Mean Squared Error (RMSE)

 Square root of MSE.

3. Clustering Metrics (Unsupervised Learning)

(A) Silhouette Score

 Measures how similar a sample is to its own cluster vs. other clusters.

 Range: -1 (worst) to +1 (best).

(B) Davies-Bouldin Index

 Lower values = better clustering.

4. Key Takeaways

Problem Type Best Metrics

Classification Accuracy, Precision, Recall, F1, ROC-AUC

Regression MAE, RMSE, R²

Clustering Silhouette Score, Davies-Bouldin

Key Machine Learning Evaluation Concepts Explained


1. Accuracy
 What it measures: Overall correctness of predictions

 Formula: (TP + TN) / (TP + TN + FP + FN)

 Best for: Balanced datasets where all classes are equally important

 Limitation: Misleading for imbalanced data (e.g., 95% negative class)

2. Precision

 What it measures: Quality of positive predictions

 Formula: TP / (TP + FP)

 When to use: When false positives are costly (e.g., spam detection)

 Example: High precision means when your model says "spam", it's very likely correct

3. Recall (Sensitivity)

 What it measures: Ability to find all positive instances

 Formula: TP / (TP + FN)

 When to use: When false negatives are dangerous (e.g., cancer detection)

 Example: High recall means your model finds most actual positive cases

4. Confusion Matrix

A visualization tool showing:

Predicted

Positive Negative

Actual Positive TP FN

Actual Negative FP TN

5. Bias-Variance Tradeoff

 Bias Error: From oversimplified assumptions (underfitting)

o High bias = model is too simple (e.g., linear model for complex data)

 Variance Error: From excessive sensitivity to training data (overfitting)

o High variance = model is too complex (memorizes noise)

 Tradeoff: As model complexity increases:

o Bias decreases (fits training data better)

o Variance increases (generalizes worse to new data)

6. Overfitting vs Underfitting
Overfitting Underfitting

Model memorizes training data (including


Definition Model fails to learn patterns
noise)

Training
Excellent Poor
Performance

Test Performance Poor Poor

Overly smooth, simple


Visualization Complex, wiggly decision boundary
boundary

- Regularization
- More complex model
- More training data
Solutions - More features
- Feature selection
- Longer training
- Early stopping

Practical Implications

 For medical diagnosis: Prioritize recall (don't miss real cases)

 For spam filters: Prioritize precision (don't block legit emails)

 Model selection: Use validation set to find sweet spot in bias-variance tradeoff

 Debugging:

o High training error? → Likely underfitting

o Large gap between train/test performance? → Likely overfitting

Real-world Example

Imagine building a fraud detection system:

 High precision = When it flags fraud, it's probably right

 High recall = It catches most actual fraud cases

 Overfitting = Model flags transactions as fraud based on random quirks in training data

 Underfitting = Model misses obvious fraud patterns

Types of Machine Learning


Machine learning approaches can be categorized based on the learning paradigm and the nature of
supervision. Here's a comprehensive breakdown:

1. Supervised Learning

Definition: Learns from labeled training data (input-output pairs)


Goal: Predict outputs for new inputs
Key Characteristics:

 Requires a fully labeled dataset

 Most common type in practical applications

 Two main subtypes:

Type Output Examples

Classificatio
Discrete categories Spam detection, image recognition
n

Regression Continuous values House pricing, stock market prediction

Algorithms:

 Linear/Logistic Regression

 Decision Trees, Random Forests

 SVM, Neural Networks

Pros:
✔ Predictions are interpretable
✔ Well-established techniques

Cons:
❌ Requires labeled data (often expensive)
❌ May not generalize beyond training distribution

2. Unsupervised Learning

Definition: Discovers patterns in unlabeled data


Goal: Find hidden structures or groupings
Key Use Cases:

Technique Purpose Examples

Clustering Group similar data points Customer segmentation


Technique Purpose Examples

Dimensionality Simplify data while preserving


PCA for visualization
Reduction structure

Fraud detection in credit card


Anomaly Detection Identify unusual patterns
transactions

Algorithms:

 K-Means, DBSCAN

 Autoencoders

 Apriori (Association Rules)

Pros:
✔ Works with unlabeled data
✔ Reveals hidden insights

Cons:
❌ Harder to evaluate objectively
❌ Results may be ambiguous

3. Semi-Supervised Learning

Definition: Uses both labeled and unlabeled data


Goal: Improve learning accuracy with limited labels
Applications:

 Speech recognition

 Medical imaging (where labeling is expensive)

Approaches:

 Self-training

 Co-training

Pros:
✔ Reduces labeling costs
✔ More robust than pure supervised

Cons:
❌ Complex implementation
❌ Quality depends on initial labeled data

4. Reinforcement Learning (RL)


Definition: Learns by interacting with an environment
Goal: Maximize cumulative reward
Key Components:

 Agent: The learner

 Environment: The world agent interacts with

 Reward: Feedback signal

Applications:

 Game playing (AlphaGo)

 Robotics control

 Autonomous vehicles

Algorithms:

 Q-Learning

 Deep Q Networks (DQN)

 Policy Gradient Methods

Pros:
✔ Can handle complex, dynamic environments
✔ Doesn't require pre-labeled data

Cons:
❌ Computationally expensive
❌ Hard to design proper reward functions

5. Self-Supervised Learning

Definition: Generates its own labels from data


Goal: Learn useful representations
Examples:

 Masked language modeling (BERT)

 Predicting image rotations

Pros:
✔ Eliminates manual labeling
✔ Powerful for pre-training

Cons:
❌ Requires massive data
❌ Task-specific design needed

6. Transfer Learning
Definition: Applies knowledge from one task to another
Approach:

1. Pre-train on large dataset

2. Fine-tune on target task

Applications:

 Image classification (using ImageNet pre-trained models)

 NLP (using BERT/GPT embeddings)

Pros:
✔ Saves computation time
✔ Works well with limited target data

Cons:
❌ Potential negative transfer if domains mismatch

. Memory-Based Learning

Definition: Systems that store and retrieve specific training instances to make predictions.

Key Types:

1. Instance-Based Learning:

o Stores raw training examples

o Predicts based on similarity to stored cases

o Example: k-Nearest Neighbors (k-NN)

2. Case-Based Reasoning:

o Stores problem-solution pairs

o Retrieves similar past cases to solve new problems

o Used in medical diagnosis, legal reasoning

Characteristics:

 No explicit model training phase

 Prediction happens at query time

 Requires efficient similarity metrics

Pros:
✔ Adapts easily to new data
✔ Handles complex relationships
Cons:
❌ Computationally expensive at runtime
❌ Sensitive to irrelevant features

. Hebbian Learning

Core Principle: "Neurons that fire together, wire together" (Donald Hebb, 1949)

Mechanism:

 If two connected neurons activate simultaneously:

o Connection strength increases

 If activation is uncorrelated:

o Connection weakens

Mathematical Form:
Δw_ij = η x_i x_j
(where η is learning rate, x_i and x_j are neuron activations)

Applications:

 Unsupervised feature learning

 Neural network initialization

 Neuromorphic computing

Modern Variants:

 Oja's Rule (adds weight normalization)

 BCM Theory (incorporates sliding threshold)

. Other Specialized Learning Paradigms

A. Competitive Learning

 Neurons compete to respond to inputs

 Winner updates its weights (Winner-Take-All)

 Used in:

o Self-Organizing Maps (SOM)

o Learning Vector Quantization (LVQ)

B. Error-Corrective Learning

 Adjusts weights based on output error

 Backpropagation is the most famous example

 Includes:
o Perceptron Learning Rule

o Delta Rule

C. Reinforcement Learning Variants

1. Temporal Difference Learning:

o Updates predictions based on subsequent predictions

o Combines Monte Carlo and dynamic programming

2. Actor-Critic Methods:

o Separates policy (actor) and value function (critic)

o Provides more stable learning

D. Meta-Learning

 "Learning to learn"

 Systems that improve their learning ability over time

 Includes:

o Model-Agnostic Meta-Learning (MAML)

o Memory-Augmented Neural Networks

E. Neuromodulated Learning

 Mimics biological neurotransmitter systems

 Uses additional modulation signals

 Enables:

o Context-dependent learning

o Emotional weighting of memories

Comparative Analysis of Machine Learning Paradigms

Example
Learning Supervisi Weaknesse Best Use
Mechanism Strengths Algorith
Type on s Cases
ms

Supervise Full labels Learns input- High Needs Classification CNN,


d output accuracy, labeled , Regression Random
mappings interpreta Forest,
Example
Learning Supervisi Weaknesse Best Use
Mechanism Strengths Algorith
Type on s Cases
ms

ble data SVM

Works K-
Discovers Clustering,
Unsupervi with Hard to Means,
No labels hidden Dimensionali
sed unlabeled evaluate PCA,
patterns ty Reduction
data GANs

Label
Medical
Semi- Uses both Reduces Complex Propaga
Partial imaging,
Supervise labeled/unla labeling implementa tion,
labels Speech
d beled data costs tion Self-
recognition
Training

Handles Needs
Maximizes Q-
Reinforce Reward dynamic careful Game AI,
cumulative Learning,
ment signals environm reward Robotics
reward PPO
ents design

Creates BERT,
Self- Auto- Eliminate Requires NLP,
labels from Contrasti
Supervise generate s manual massive Computer
data ve
d d labeling data Vision
structure Learning

Saves
Fine-
computa
Leverages Domain tuning,
Transfer tion,
Varies pre-trained mismatch All domains Feature
Learning works
models risk Extractio
with little
n
data

k-NN,
Stores/ Adapts to Computatio Recommend Case-
Memory-
Varies retrieves new data nally ation Based
Based
instances easily expensive systems Reasonin
g

Hebbian Unsuperv Strengthens Biologicall Limited to Neuromorph Oja's


Learning ised co-active y Rule,
Example
Learning Supervisi Weaknesse Best Use
Mechanism Strengths Algorith
Type on s Cases
ms

neuron BCM
plausible simple tasks ic computing
connections Theory

Competiti Neurons Sensitive to


Unsuperv Good for Vector SOM,
ve compete to initializatio
ised clustering quantization LVQ
Learning respond n

Fast
Learns Complex,
Meta- Multi- adaptatio Few-shot MAML,
learning data-
Learning task n to new learning Reptile
strategies hungry
tasks

When to Use Which Paradigm?

Scenario Recommended Approach Reason

Abundant labeled data Supervised Learning Maximizes predictive accuracy

No labels but need structure


Unsupervised Learning Reveals hidden patterns
discovery

Semi-Supervised or Transfer Leverages unlabeled data/pre-


Limited labeled data
Learning trained models

Sequential decision-making Reinforcement Learning Optimizes long-term outcomes

Hebbian/Competitive Biologically plausible


Neuromorphic hardware
Learning implementation
Scenario Recommended Approach Reason

Rapid adaptation to new


Meta-Learning "Learning to learn" capability
tasks

Real-time personalization Memory-Based Learning Fast instance-based reasoning

Confusion Matrix: Definition, Processing, and Interpretation


A confusion matrix (or error matrix) is a performance evaluation tool for classification models that
visualizes prediction results by comparing actual vs. predicted class labels. It helps identify model
strengths/weaknesses and calculate key metrics.

1. Structure of a Confusion Matrix

For a binary classifier (e.g., spam detection):

Predicted: Negative (0) Predicted: Positive (1)

Actual: Negative
True Negative (TN) False Positive (FP)
(0)

Actual: Positive (1) False Negative (FN) True Positive (TP)

 True Positives (TP): Correctly predicted positives.

 True Negatives (TN): Correctly predicted negatives.

 False Positives (FP): Negative samples wrongly predicted as positive (Type I error).

 False Negatives (FN): Positive samples wrongly predicted as negative (Type II error).

Example:

 Actual: 100 emails (90 non-spam, 10 spam)

 Model Predictions:

o Correctly classified 85 non-spam (TN)


o Misclassified 5 non-spam as spam (FP)

o Correctly classified 8 spam (TP)

o Missed 2 spam (FN)

Resulting confusion matrix:

Predicted: 0 Predicted: 1

Actual: 0 85 (TN) 5 (FP)

Actual: 1 2 (FN) 8 (TP)

2. Processing a Confusion Matrix

Step 1: Generate Predictions

 Train a model (e.g., logistic regression, random forest).

 Predict class labels on a test set (unseen data).

Step 2: Build the Matrix

Compare actual vs. predicted labels:

Python code

[from sklearn.metrics import confusion_matrix

y_true = [0, 0, 1, 1, 0, 1] # Actual labels

y_pred = [0, 1, 1, 0, 0, 1] # Predicted labels

cm = confusion_matrix(y_true, y_pred)]

Step 3: Calculate Metrics

From the matrix, compute:

1. Accuracy: Overall correctness

Accuracy=TP+TN/TP+TN+FP+FN

2. Precision: How many predicted positives are real?

Precision=TP/TP+FP

Recall (Sensitivity): How many actual positives were caught?

Recall=TP/TP+FN

F1-Score: Harmonic mean of precision/recall

F1=2×Precision×Recall/Precision+Recall
Example Calculations (from the email classifier):

 Accuracy = (85 + 8) / 100 = 93%

 Precision = 8 / (8 + 5) = 61.5%

 Recall = 8 / (8 + 2) = 80%

 F1 = 2 × (0.615 × 0.8) / (0.615 + 0.8) = 69.6%

You might also like