Chapter -4 Classification and Clustering
Algorithms
Classification Algorithms
Classification algorithms are used when the output variable is
categorical, which means there are two classes such as Yes-No,
Male-Female, True-false,
Popular classification algorithms
➔ Logistic Regression
➔ Decision Trees
➔
Naïve Bayes
➔ Support vector Machines
➔
KNN
➔
Random Forest
2
Classification Algorithms
● In classification algorithm, a discrete output function(y) is mapped
to input variable(x).
y=f(x), where y = categorical output
3
Classification Algorithms
The algorithm which implements the classification on a dataset is known as a
classifier.
There are two types of Classifications:
Binary Classifier: classification problem has only two possible outcomes, then
it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG,
etc.
Multi-class Classifier: If a classification problem has more than two outcomes,
then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
4
Types of ML Classification Algorithms
Classification Algorithms can be further divided into the Mainly two category:
Linear Models
✔ Logistic Regression
✔
Support Vector Machines
Non-linear Models
✔ K-Nearest Neighbours
✔ Decision Tree
✔
Naïve Bayes
✔
Random Forest
5
Types of Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three
types:
Binomial: In binomial Logistic regression, there can be only two possible
types of the dependent variables, such as 0 or 1, Pass or Fail, etc.
Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"
Ordinal: In ordinal Logistic regression, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or "High".
6
ML Implementation of Logistic Regression
Data Pre-processing step
Fitting Logistic Regression to the Training set
Predicting the test result
Test accuracy of the result
7
Code samples
8
Example -
There is a car making company that has recently launched a new car
So the company wanted to check predict whether a user will purchase the product or not, one
needs to find out the relationship between Age and Estimated Salary.
Source of data
https://www.kaggle.com/code/sandragracenelson/logistic-regres
sion-on-user-data-csv/input
9
Data Processing -Related to our data set
Data Preprocessing Techniques Techniques you should apply:
1.Learn about your data using pandas df.shape ,df.describe() ,df.isnull().sum()
How about if we want to include the age as independent variable
Replace male and female with discrete values b/n 0 and 1
Select appropriate Data
X=df.iloc[:,:] or X= df [[, , , ,]]
2. as we see there is a variation b/n age and salary value which may create bias
So, need to apply Feature scaling /normalization using StandardScalaer or MinMaxScalar
3.split ,train and test your algorithm
10
K-Nearest Neighbors(KNN)
K-Nearest Neighbors (KNN) is a simple and versatile machine learning
algorithm used for both classification and regression tasks.
The fundamental idea behind KNN is to predict the label of a data point by
looking at its k nearest neighbors in the feature space.
Technique to classify
Given a new, unseen data point, find the k-nearest neighbors in the training set based
on some distance metric (Euclidean distance).
For classification: Assign the majority class label among the k-nearest neighbors to the
new data point.
11
K-Nearest Neighbors(KNN)
12
K-Nearest Neighbors(KNN)
Advantages
– Conceptually simple, easy to understand and explain
– Very flexible decision boundaries
– Not much learning at all
Disadvantages
– It can be hard to find a good distance measure
– Irrelevant features and noise can be very detrimental
– Typically can not handle more than a few dozen attributes
– Computational cost: requires a lot computation and memory
13
SVM Machine Learning algorithm
14
SVM Machine Learning algorithm
Support Vector Machine (SVM) is one of the most useful supervised ML algorithms.
It can be used for both classification and regression tasks.
Basic idea of support vector machines:
SVM is a geometric model that views the input data as two sets of vectors in an n-
dimensional space.
• It constructs a separating hyperplane in that space, one which maximizes the margin
between the two data sets.
15
SVM Machine Learning algorithm
A good separation is achieved by the hyperplane that has the largest distance to the
neighbouring data points of both classes.
• The vectors (points) that constrain the width of the margin are the support vectors.
● Support vectors are the data points that lie closest to the decision surface
An SVM analysis finds the line (or, in general, hyperplane) that is oriented so that the
margin between the support vectors is maximized.
In the figure above, Solution 2 is superior to Solution 1 because it has a larger margin.
16
SVM Machine Learning algorithm
17
SVM Machine Learning algorithm
SVMs maximize the margin around
the separating hyperplane.
• The decision function is fully
specified by a subset of training
samples, the support vectors.
• 2-Ds, it’s a line.
• 3-Ds, it’s a plane.
• In more dimensions, call it a
hyperplane.
18
SVM Machine Learning algorithm
Basic idea of support vector machines:
– hyperplane for linearly separable patterns
-A hyperplane is a linear decision surface that splits the space into two parts
– For non-linearly separable data-- transformations of original data to map into new space –
the Kernel function
19
SVM Machine Learning algorithm
Important because of:
– Robust to very large number of variables and small samples
– Can learn both simple and highly complex classification models
– Employ sophisticated mathematical principles to avoid overfitting
– Can be used for both classification and regression tasks
-Effective in cases of limited data.
20
SVM Implementation Python
Scenario
Worldwide, breast cancer is the most common type of cancer in women and the second
highest in terms of mortality rates. Diagnosis of breast cancer is performed when an abnormal
lump is found (from self-examination or x-ray) or a tiny speck of calcium is seen (on an x-
ray).
After a suspicious lump is found, the doctor will conduct a diagnosis to determine whether it
is cancerous or not
21
Naïve Bayes ML Algorithm
Naïve Bayes Classifier is one of the simplest and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
It is mainly used in text classification that includes a high-dimensional training
dataset.
Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
22
Naïve Bayes ML Algorithm
23
Naïve Bayes ML Algorithm
24
Example : Naïve Bayes ML
Problem: using the given data set , classify or predict weather a person with the given
condition will play tennis or not?
25
Example : Naïve Bayes ML
Step-1 calculate the prior/class label probability for Yes / No conditions Yes appeared 9 , and
no appeared 5 out of 14 probability
26
Example : Naïve Bayes ML
Step-2 calculate the conditional probability of individual attributes/predictors(outlook ,
temperature,Humidity,Windy)
27
Example : Naïve Bayes ML
Step-3 apply naive bayes formula to find new instance classification: sum up yes_ conditional
probabilities of all feature and no probabilities , then compare the value lastly normalize it
Finaly , we can conclude that with the given features person will not play tennis
28
Example : Naïve Bayes ML
Step-3 based on the following classify the new species ?
29
Example : Naïve Bayes ML
From the below we understand that the new instance to classified as H is higher than M ,
so the new instance is H ,
30
Example : Naïve Bayes ML
Advantage
– Simple
– Incremental learning
– Naturally a probability estimator
– Easily handles missing values
Disadvantage / Weakness
– Independence assumption
– Categorical/discrete attributes
– Sensitive to missing values
31
Example : Naïve Bayes ML Python
Implementation
from sklearn.naive_bayes import BernoulliNB
32
Decision Tree
33
Decision Tree
34
Decision Tree
Solving the classification problem using DT is a two-step process:
• Decision Tree Induction- Construct a DT using training data/Induction
35
Decision Tree-Algorithm
36
Decision Tree...
●
In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
● Pruning: Pruning is the process of removing the unwanted branches from the tree.
●
Entropy is defined as the randomness or measuring the disorder of the information being
processed in Machine Learning
● every piece of information has a specific value to make and can be used to draw conclusions
from it.
● Entropy is higher=> difficult to draw any conclusion from that piece of information.
37
Decision Tree...
●
Let's consider a case when all observations belong to
the same class; then entropy will always be 0.
●
When entropy becomes 0, then the dataset has no
impurity.
●
Datasets with 0 impurities are not useful for learning.
Further, if the entropy is 1, then this kind of dataset
is good for learning.
38
Attribute Selection Measures (ASM)
In DT, the main issue arises that how to select the best attribute for the root node and
for sub-nodes.
to solve such problems there is a technique which is called as Attribute Selection
Measure (ASM).
Information Gini
✔ is the measurement of changes in entropy after the segmentation dataset based on
✔
an attribute
✔
According to the value of information gain, we split the node and build the decision tree.
✔
DT algorithm always tries to maximize the value of information gain, and a
✔ Node / attribute having the highest information gain is split first.
Gain Index:
✔
is a measure of impurity used while creating a decision tree in the CART(uses gain index for
splitting )
✔
An attribute with the low Gini index should be preferred as compared to the high Gini index.
39
Decision Tree-Python Implementation
Step : Import Library and Train the data
From sklearn.tree import DecisionTreeClassifier
classifier= DecisionTreeClassifier()
Key Term
Check More on how to calculate the Gini and Gain index
https://www.youtube.com/watch?v=wefc_36d5mU&ab_channel=MaheshHuddar
40
Evaluating a Classification model:
The matrix consists of predictions result in a summarized form, which has a
total number of correct predictions and incorrect predictions.
The matrix looks like as below table:
41
Evaluating a Classification model:
Use Confusion Matrix
The confusion matrix provides us a matrix/table as output and describes the
performance of the model.
It is also known as the error matrix.
42
Evaluating a Classification model:
Precision, Recall, and F1-Score
These metrics are particularly useful in binary or multiclass classification.
Precision: The ratio of correctly predicted positive observations to the total
predicted positives.
Recall: The ratio of correctly predicted positive observations to all actual
positives.
F1-Score: The harmonic mean of precision and recall.
43
Evaluating a Regression models
Regression Metrics:
Mean Absolute Error (MAE): The average absolute differences between
predicted and actual values.
Mean Squared Error (MSE): The average of the squared differences between
predicted and actual values.
Root Mean Squared Error (RMSE): The square root of the MSE, providing an
interpretable scale.
44