0% found this document useful (0 votes)
15 views46 pages

Classification Algorithms

Uploaded by

231612601008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views46 pages

Classification Algorithms

Uploaded by

231612601008
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

CLASSIFICATION

ALGORITHMS
PRESENTED BY,

DVEIN INNOVATIONS
LOGISTIC REGRESSION
INTRODUCTION

• Despite its name, Logistic Regression is a classification algorithm, not a


regression algorithm.

• It is used when the target variable is categorical (e.g., Yes/No, 0/1,


Churn/Not Churn)

• Logistic Regression predicts the probability that an input belongs to a


particular class using a sigmoid function (also called logistic function).
WHEN TO USE LOGISTIC REGRESSION?

Use Case Example Target Outcome

Predicting customer churn Yes or No

Email spam detection Spam or Not Spam

Medical diagnosis Disease or No Disease


WORKING

Step 1: Linear Combination of Inputs


• It starts like Linear Regression:

• xi: input features


• wi​: learned weights
• b: bias term
Step 2: Apply the Sigmoid Function
• This linear output z is passed into the sigmoid function to squash the value between 0
and 1:

This gives a probability value:


• Closer to 0 → class 0
• Closer to 1 → class 1
Step 3: Make Prediction
• Using a threshold (typically 0.5):

So:
• If probability ≥ 0.5 → predict class 1
• Else → predict class 0
Step 4: Model Training (How it Learns)
• Logistic Regression uses a loss function to measure how good its predictions are.
• The most common is Binary Cross-Entropy Loss:
Where:
• y: actual value
• y^​: predicted probability
• This loss is minimized using Gradient Descent, adjusting weights to reduce error over
time.
PRO’S & CON’S

ADVANTAGES LIMITATIONS
• Simple and interpretable • Assumes linear relationship between
• Fast to train features and the log-odds of the outcome
• Doesn’t work well
• Good for baseline models
with nonlinear patterns unless features
• Works well with linearly separable data are transformed
• Sensitive to outliers and correlated inputs
K - NEAREST NEIGHBOR
INTRODUCTION

• KNN (K-Nearest Neighbors) is a supervised learning algorithm used


for classification and regression (mostly classification).

• It classifies a new data point based on the majority class of its K closest
neighbors in the training data.
CORE IDEA

“Tell me who your neighbors are, and I’ll tell you who you are.”

• For a new data point, KNN:

• Looks at the K nearest data points (neighbors)

• Finds out their labels

• Predicts the most frequent label among them


WORKING

Step 1: Choose the value of K

• K = number of neighbors to consider (typically odd like 3, 5, 7)

• Small K = sensitive to noise

• Large K = smooth but may ignore local structure


Step 2: Measure the Distance
• KNN computes distance between the test point and all training points.
Common distance measures:
• Euclidean Distance:

• Others: Manhattan, Minkowski, Cosine


Step 3: Find the K Nearest Neighbors

• Sort all training points by distance to the test point

• Select the top K closest ones

Step 4: Vote for the Majority Class

• For classification: The class most represented among the K neighbors is selected.

• For regression: The average value of the K neighbors is taken.


EXAMPLE

Imagine this situation:

• You're classifying a flower based on petal length and width

• K=3

• Among 3 closest flowers:


• 2 are “Setosa”

• 1 is “Versicolor”

• Prediction: Setosa (majority vote)


PRO’S & CON’S

ADVANTAGES LIMITATIONS
• Simple and Intuitive • Slow prediction
• No Training Phase (Lazy Learner) • Curse of dimensionality
• Naturally Handles Multiclass • Sensitive to scale
Classification
• Sensitive to noisy data
• Non-Parametric Algorithm
• Adapts to New Data Easily
SUPPORT VECTOR
MACHINE
INTRODUCTION

• Support Vector Machine (SVM) is a supervised machine learning


algorithm used for both classification and regression (mostly
classification).

• SVM finds the best boundary (hyperplane) that separates classes with the
maximum margin.
WHEN TO USE SVM

Use Case Example Target Outcome


Face Recognition Person A or B
Document Classification Spam or Not Spam
Cancer Detection Malignant or Benign
CORE IDEA

• SVM tries to find a hyperplane (a line in 2D, a plane in 3D, or a surface in


higher dimensions) that best separates the classes in the dataset.

• Not just any separation — it wants the one with the widest margin between
the two classes.
WORKING

Step 1: Find a Decision Boundary (Hyperplane)

• A hyperplane is a line that divides the dataset into different classes.


There are many possible hyperplanes, but SVM picks the one with the
maximum margin.
Step 2: Maximize the Margin
• Margin = Distance between the hyperplane and the nearest data points
from each class.
• The points closest to the hyperplane are called Support Vectors.
• SVM chooses the hyperplane that maximizes this margin.
• Larger margin = better generalization to new data
Step 3: Handle Nonlinear Separation (Using Kernels)
• Not all data is linearly separable (i.e., can't be split by a straight line).
SVM solves this using a Kernel Trick:
• Maps data to a higher dimension where it is linearly separable.
Common kernels:
• Linear
• Polynomial
• Radial Basis Function (RBF) or Gaussian
Step 4: Regularization Parameter (C)

• SVM uses a parameter C to control the trade-off:

• High C → tries to classify everything correctly → low bias, high variance


(overfit)

• Low C → allows some misclassifications → high bias, low variance


(better generalization)
CHARACTERISTICS

Feature Description
Type Supervised, binary or multiclass
Nature Margin-based classifier
Strength Based on support vectors only
Decision Works well even in high-dimensional spaces
Limitations Slower on large datasets, sensitive to noise and
parameter tuning
DECISION TREE
INTRODUCTION

• A Decision Tree is a supervised learning algorithm used for classification


and regression.

• It models decisions as a tree-like structure where each internal


node represents a test on a feature, branches represent outcomes of the test,
and leaf nodes represent class labels (for classification) or values (for
regression).
WHEN TO USE DECISION TREES

Use Case Example Target Outcome


Loan approval Approve / Reject
Disease diagnosis Illness categories
Customer segmentation Group A / B / C
CORE IDEA

• Decision Trees split the dataset into subsets based on feature values that
best differentiate the target variable. This process continues recursively, forming
a tree where:

• Each node → tests a condition

• Each branch → outcome of a decision

• Each leaf → final prediction


WORKING

Step 1: Start at the Root Node

• Begin with the entire dataset.

• Pick the best feature to split the data.


Step 2: Choose the Best Feature to Split
• To decide which feature is "best", we use impurity measures like:
1. Gini Impurity:

​(Where pipi​is the probability of class i)


2. Entropy (used in Information Gain):

• The goal is to choose the split that maximizes information gain (i.e., gives
the purest children).
Step 3: Recursively Split the Dataset

For each child node, repeat the process:


• Recalculate Gini or Entropy

• Choose best split

• Continue until stopping conditions are met


Step 4: Define Stopping Criteria
• All records belong to one class
• No more features to split
• Max depth is reached
• Minimum number of samples in a node
Step 5: Make Predictions
• For new data: start at root → follow decision rules → reach a leaf → return the value or
class at the leaf.
PRO’S & CON’S

ADVANTAGES LIMITATIONS
• Simple to Understand and Interpret • Overfitting
• No Need for Feature Scaling • Unstable
• Handles Both Numeric and Categorical • Biased toward features with more levels
Features • Greedy algorithm
• Can Model Nonlinear Relationships
RANDOM FOREST
INTRODUCTION

• Random Forest is a supervised ensemble learning algorithm used for


both classification and regression.

• It builds multiple decision trees and combines their predictions to improve


accuracy and avoid overfitting.
WHEN TO USE RANDOM FOREST

Use Case Example Target Outcome


Fraud detection Fraud / Not Fraud
Customer churn prediction Churn / Not Churn
Medical diagnosis Disease class
CORE IDEA

“Don’t trust one decision tree — take a vote from many.”

• A Random Forest creates many decision trees, each trained on a random


subset of data and features, and then takes the majority vote (for
classification) or average (for regression).
WORKING

Step 1: Create Bootstrapped Datasets


• From the original dataset, randomly sample (with replacement) to create
many subsets (called bootstrap samples).
• Each tree will be trained on a different bootstrapped dataset.
Step 2: Grow a Decision Tree on Each Sample

For each bootstrapped dataset:


• A decision tree is built.

• At each split in the tree, only a random subset of features is considered


— not all features.

• This adds diversity and reduces correlation between trees.


Step 3: Make Predictions
• For classification:
• Each tree gives a class prediction.
• The forest chooses the class with the majority vote.
• For regression:
• Each tree gives a numerical prediction.
• The forest returns the average of all predictions.
Step 4: Aggregate the Output

• This combination of trees:

• Reduces overfitting

• Increases robustness

• Improves overall accuracy


CHARACTERISTICS

Feature Description
Type Ensemble, supervised learning
Works For Classification & regression
Base Model Decision Tree
Reduces Overfitting, variance
Increases Accuracy, robustness
Preprocessing Needed Very little
PRO’S & CON’S

LIMITATIONS ADVANTAGES
• Slower Predictions • High Accuracy and Performance
• Less Interpretable • Reduces Overfitting from
• Large Memory Usage Decision Trees
• Works Well with Missing and
• Overfitting (if too many trees
Imbalanced Data
without pruning)
• Robust to Noise and Outliers

You might also like