CLASSIFICATION
ALGORITHMS
PRESENTED BY,
DVEIN INNOVATIONS
LOGISTIC REGRESSION
INTRODUCTION
• Despite its name, Logistic Regression is a classification algorithm, not a
regression algorithm.
• It is used when the target variable is categorical (e.g., Yes/No, 0/1,
Churn/Not Churn)
• Logistic Regression predicts the probability that an input belongs to a
particular class using a sigmoid function (also called logistic function).
WHEN TO USE LOGISTIC REGRESSION?
Use Case Example Target Outcome
Predicting customer churn Yes or No
Email spam detection Spam or Not Spam
Medical diagnosis Disease or No Disease
WORKING
Step 1: Linear Combination of Inputs
• It starts like Linear Regression:
• xi: input features
• wi: learned weights
• b: bias term
Step 2: Apply the Sigmoid Function
• This linear output z is passed into the sigmoid function to squash the value between 0
and 1:
This gives a probability value:
• Closer to 0 → class 0
• Closer to 1 → class 1
Step 3: Make Prediction
• Using a threshold (typically 0.5):
So:
• If probability ≥ 0.5 → predict class 1
• Else → predict class 0
Step 4: Model Training (How it Learns)
• Logistic Regression uses a loss function to measure how good its predictions are.
• The most common is Binary Cross-Entropy Loss:
Where:
• y: actual value
• y^: predicted probability
• This loss is minimized using Gradient Descent, adjusting weights to reduce error over
time.
PRO’S & CON’S
ADVANTAGES LIMITATIONS
• Simple and interpretable • Assumes linear relationship between
• Fast to train features and the log-odds of the outcome
• Doesn’t work well
• Good for baseline models
with nonlinear patterns unless features
• Works well with linearly separable data are transformed
• Sensitive to outliers and correlated inputs
K - NEAREST NEIGHBOR
INTRODUCTION
• KNN (K-Nearest Neighbors) is a supervised learning algorithm used
for classification and regression (mostly classification).
• It classifies a new data point based on the majority class of its K closest
neighbors in the training data.
CORE IDEA
“Tell me who your neighbors are, and I’ll tell you who you are.”
• For a new data point, KNN:
• Looks at the K nearest data points (neighbors)
• Finds out their labels
• Predicts the most frequent label among them
WORKING
Step 1: Choose the value of K
• K = number of neighbors to consider (typically odd like 3, 5, 7)
• Small K = sensitive to noise
• Large K = smooth but may ignore local structure
Step 2: Measure the Distance
• KNN computes distance between the test point and all training points.
Common distance measures:
• Euclidean Distance:
• Others: Manhattan, Minkowski, Cosine
Step 3: Find the K Nearest Neighbors
• Sort all training points by distance to the test point
• Select the top K closest ones
Step 4: Vote for the Majority Class
• For classification: The class most represented among the K neighbors is selected.
• For regression: The average value of the K neighbors is taken.
EXAMPLE
Imagine this situation:
• You're classifying a flower based on petal length and width
• K=3
• Among 3 closest flowers:
• 2 are “Setosa”
• 1 is “Versicolor”
• Prediction: Setosa (majority vote)
PRO’S & CON’S
ADVANTAGES LIMITATIONS
• Simple and Intuitive • Slow prediction
• No Training Phase (Lazy Learner) • Curse of dimensionality
• Naturally Handles Multiclass • Sensitive to scale
Classification
• Sensitive to noisy data
• Non-Parametric Algorithm
• Adapts to New Data Easily
SUPPORT VECTOR
MACHINE
INTRODUCTION
• Support Vector Machine (SVM) is a supervised machine learning
algorithm used for both classification and regression (mostly
classification).
• SVM finds the best boundary (hyperplane) that separates classes with the
maximum margin.
WHEN TO USE SVM
Use Case Example Target Outcome
Face Recognition Person A or B
Document Classification Spam or Not Spam
Cancer Detection Malignant or Benign
CORE IDEA
• SVM tries to find a hyperplane (a line in 2D, a plane in 3D, or a surface in
higher dimensions) that best separates the classes in the dataset.
• Not just any separation — it wants the one with the widest margin between
the two classes.
WORKING
Step 1: Find a Decision Boundary (Hyperplane)
• A hyperplane is a line that divides the dataset into different classes.
There are many possible hyperplanes, but SVM picks the one with the
maximum margin.
Step 2: Maximize the Margin
• Margin = Distance between the hyperplane and the nearest data points
from each class.
• The points closest to the hyperplane are called Support Vectors.
• SVM chooses the hyperplane that maximizes this margin.
• Larger margin = better generalization to new data
Step 3: Handle Nonlinear Separation (Using Kernels)
• Not all data is linearly separable (i.e., can't be split by a straight line).
SVM solves this using a Kernel Trick:
• Maps data to a higher dimension where it is linearly separable.
Common kernels:
• Linear
• Polynomial
• Radial Basis Function (RBF) or Gaussian
Step 4: Regularization Parameter (C)
• SVM uses a parameter C to control the trade-off:
• High C → tries to classify everything correctly → low bias, high variance
(overfit)
• Low C → allows some misclassifications → high bias, low variance
(better generalization)
CHARACTERISTICS
Feature Description
Type Supervised, binary or multiclass
Nature Margin-based classifier
Strength Based on support vectors only
Decision Works well even in high-dimensional spaces
Limitations Slower on large datasets, sensitive to noise and
parameter tuning
DECISION TREE
INTRODUCTION
• A Decision Tree is a supervised learning algorithm used for classification
and regression.
• It models decisions as a tree-like structure where each internal
node represents a test on a feature, branches represent outcomes of the test,
and leaf nodes represent class labels (for classification) or values (for
regression).
WHEN TO USE DECISION TREES
Use Case Example Target Outcome
Loan approval Approve / Reject
Disease diagnosis Illness categories
Customer segmentation Group A / B / C
CORE IDEA
• Decision Trees split the dataset into subsets based on feature values that
best differentiate the target variable. This process continues recursively, forming
a tree where:
• Each node → tests a condition
• Each branch → outcome of a decision
• Each leaf → final prediction
WORKING
Step 1: Start at the Root Node
• Begin with the entire dataset.
• Pick the best feature to split the data.
Step 2: Choose the Best Feature to Split
• To decide which feature is "best", we use impurity measures like:
1. Gini Impurity:
(Where pipiis the probability of class i)
2. Entropy (used in Information Gain):
• The goal is to choose the split that maximizes information gain (i.e., gives
the purest children).
Step 3: Recursively Split the Dataset
For each child node, repeat the process:
• Recalculate Gini or Entropy
• Choose best split
• Continue until stopping conditions are met
Step 4: Define Stopping Criteria
• All records belong to one class
• No more features to split
• Max depth is reached
• Minimum number of samples in a node
Step 5: Make Predictions
• For new data: start at root → follow decision rules → reach a leaf → return the value or
class at the leaf.
PRO’S & CON’S
ADVANTAGES LIMITATIONS
• Simple to Understand and Interpret • Overfitting
• No Need for Feature Scaling • Unstable
• Handles Both Numeric and Categorical • Biased toward features with more levels
Features • Greedy algorithm
• Can Model Nonlinear Relationships
RANDOM FOREST
INTRODUCTION
• Random Forest is a supervised ensemble learning algorithm used for
both classification and regression.
• It builds multiple decision trees and combines their predictions to improve
accuracy and avoid overfitting.
WHEN TO USE RANDOM FOREST
Use Case Example Target Outcome
Fraud detection Fraud / Not Fraud
Customer churn prediction Churn / Not Churn
Medical diagnosis Disease class
CORE IDEA
“Don’t trust one decision tree — take a vote from many.”
• A Random Forest creates many decision trees, each trained on a random
subset of data and features, and then takes the majority vote (for
classification) or average (for regression).
WORKING
Step 1: Create Bootstrapped Datasets
• From the original dataset, randomly sample (with replacement) to create
many subsets (called bootstrap samples).
• Each tree will be trained on a different bootstrapped dataset.
Step 2: Grow a Decision Tree on Each Sample
For each bootstrapped dataset:
• A decision tree is built.
• At each split in the tree, only a random subset of features is considered
— not all features.
• This adds diversity and reduces correlation between trees.
Step 3: Make Predictions
• For classification:
• Each tree gives a class prediction.
• The forest chooses the class with the majority vote.
• For regression:
• Each tree gives a numerical prediction.
• The forest returns the average of all predictions.
Step 4: Aggregate the Output
• This combination of trees:
• Reduces overfitting
• Increases robustness
• Improves overall accuracy
CHARACTERISTICS
Feature Description
Type Ensemble, supervised learning
Works For Classification & regression
Base Model Decision Tree
Reduces Overfitting, variance
Increases Accuracy, robustness
Preprocessing Needed Very little
PRO’S & CON’S
LIMITATIONS ADVANTAGES
• Slower Predictions • High Accuracy and Performance
• Less Interpretable • Reduces Overfitting from
• Large Memory Usage Decision Trees
• Works Well with Missing and
• Overfitting (if too many trees
Imbalanced Data
without pruning)
• Robust to Noise and Outliers