0% found this document useful (0 votes)
51 views15 pages

Classification in Machine Learning

Classification is a supervised machine learning task that predicts categorical outputs based on input features, with real-life applications in medical diagnosis, finance, and marketing. Common algorithms include Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and Neural Networks. Evaluation metrics for classification performance include accuracy, precision, recall, F1-score, and confusion matrix, while common pitfalls involve imbalanced classes, overfitting, and bias in data.

Uploaded by

Junior Westarb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views15 pages

Classification in Machine Learning

Classification is a supervised machine learning task that predicts categorical outputs based on input features, with real-life applications in medical diagnosis, finance, and marketing. Common algorithms include Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and Neural Networks. Evaluation metrics for classification performance include accuracy, precision, recall, F1-score, and confusion matrix, while common pitfalls involve imbalanced classes, overfitting, and bias in data.

Uploaded by

Junior Westarb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Classification

In
Machine Learning
What Is
Classification

Classification is a supervised machine learning


task used to predict a categorical output
variable (class/label) based on one or more
input features.

It answers:
“Which category does this input belong to?”
Examples:
Email→ Spam or Not Spam

Transaction Fraudulent or Not
Image→ Cat, Dog, or Bird
Real-life
Examples

Medical Diagnosis: Predicting if a tumor is


benign or malignant

Finance: Detecting fraudulent credit card


transactions

Marketing: Predicting if a user will click an ad

Agriculture: Classifying plants based on leaf


shape or diseases
Most Common
Classification
Algorithms

Logistic Regression (for binary classification)


K-Nearest Neighbors (KNN)
Decision Trees
Random Forest
Naive Bayes
Support Vector Machines (SVM)
Neural Networks
Logistic
Regression

Logistic Regression is a linear model for binary classification.


It estimates the probability that a given input belongs to a
specific class using a sigmoid (logistic) function.
Mathematical Equation:

Strengths:
Interpretable and fast to train
Works well for linearly separable data
Outputs probabilities, useful in decision-making
K-Nearest
Neighbors (KNN)

KNN is a non-parametric, lazy learning algorithm. It classifies


new data points based on the majority class among the K
closest training examples.
Mathematical Equation:

Strengths:
Simple to implement
No training time
Works well with non-linear decision boundaries
Decision Trees

A Decision Tree is a flowchart-like structure that splits data


into branches based on feature thresholds. Each path from
root to leaf represents a classification rule.
Mathematical Equation:

Strengths:
Easy to interpret
Handles both numerical and categorical data
Captures non-linear patterns
Random Forest

Random Forest is an ensemble of decision trees. Each tree is


trained on a random subset of the data and features. The
final prediction is made by majority vote (classification).

Strengths:
Reduces overfitting
High accuracy
Robust to noise and outliers

Used in:
Fraud detection
Risk analysis
Bioinformatics
Support Vector
Machine (SVM)

SVM finds the optimal hyperplane that separates data into


classes with the maximum margin. It can be extended to non-
linear problems using kernel functions.
Mathematical Equation:

Strengths:
Works well in high-dimensional space
Effective when there’s a clear margin of separation
Can handle non-linear data via RBF/polynomial kernels
Naive Bayes

A probabilistic classifier based on Bayes' Theorem, assuming


feature independence. It's "naive" because it ignores feature
correlation.
Mathematical Equation:

Strengths:
Fast and efficient
Works well with high-dimensional data
Performs well with text data
Neural Networks
(for Classification)

Inspired by the brain, neural networks consist of layers of


artificial neurons. For classification, they output a probability
distribution using Softmax (for multi-class) or Sigmoid (for
binary).
Mathematical Equation:

Strengths:
Powerful for complex non-linear problems
Scalable to large datasets
Can learn from raw data (e.g. pixels, text)
Loss Functions
for Classification
Evaluation
Metrics

Accuracy: Correct predictions / Total


Precision: True Positives / (TP + FP)
Recall: True Positives / (TP + FN)
F1-score: Harmonic mean of Precision and
Recall
Confusion Matrix: Table showing TP, FP, FN, TN
Common Pitfalls

Imbalanced Classes (e.g., fraud detection)


Solution: SMOTE, class weighting
Overfitting
Solution: regularization, pruning
Bias in Data
Must ensure fair, diverse data sources

Bias-Variance Tradeoff
Underfitting → High bias, low variance
Overfitting → Low bias, high variance
Balance = Better generalization
Learning Never Stops

Linear Regression is just the beginning!

I’m sharing simple breakdowns of core ML


concepts step by step.

🔔 Follow me to stay updated

You might also like