Om Namo Bhagavate Vasudevaya
Introduction to Classification-
Lazy Learners- Nearest Neighbor
Classifiers
And some insights into classifier
terminology
Dr. Binoy B Nair
2/26/2025 1
Note
◼ Only to be used as a supplementary learning
material. Not for any commercial use.
◼ These slides have been substantially sourced from
Han and Kamber with some modifications made
for easier understanding.
◼ All rights belong to the original authors.
February 26, 2025 2
Nearest Neighbor Classifiers
◼ Basic idea:
◼ If it walks like a duck, quacks like a duck, then
it’s probably a duck
Compute
Distance Test
Record
Training Choose k of the
Records “nearest” records
2/26/2025 3
Nearest Neighbor Classifier
◼ This is amongst the simplest of all classification algorithms
in supervised learning.
◼ This is a method of classifying patterns based on the class
label of the closest training patterns in the feature space.
◼ These are non-parametric methods where no model is
fitted using the training patterns.
◼ The accuracy using nearest neighbor classifiers is good. It
is guaranteed to yield an error rate no worse than twice
the Bayes error rate which is the optimal error rate.
2/26/2025 4
Nearest Neighbor Classifiers
◼ There is no training time required for this classifier. In
other words, there is no design time for training the
classifier.
◼ Every time a test pattern is to be classified, it has to be
compared with all the training patterns, to find the closest
pattern.
◼ This classification time could be large if the training
patterns are large in number or if the dimensionality of the
patterns is high.
2/26/2025 5
Example: 1-NN classifier
Q. Find what the fifth customer is likely to buy from the given dataset using 1-NN
classifier
Customer Age Income Purchased
Product
1 45 46000 Book
2 39 100000 Book
3 35 38000 DVD
4 69 150000 DVD
5 58 51000 ???
2/26/2025 6
Example: 1-NN classifier
Solving :
Determine Distance between the unknown sample (sample no. 5) and other four known samples
58 − 𝐴𝑔𝑒 2 51000 − 𝐼𝑛𝑐𝑜𝑚𝑒 2
using Euclidean distance after normalization = +
69−35 150000−38000
Norm. Norm. Norm. Un Norm.
Customer Age Income
Age Income Distance Distance
1 45 46000 1.3 0.4 0.41 5000.02
2 39 100000 1.1 0.9 0.72 49000
3 35 38000 1 0.3 0.73 13000.02
4 69 150000 2 1.3 0.85 99000
5 58 51000 1.7 0.5
based on the Nearest Neighbor algorithm we ran through above, the customer
5 features are closest to Customer 1 and hence he is likely to buy what
customer 1 bought i.e. book.
2/26/2025 7
Issues with 1-NN classifier
◼ Might take outliers as the real data
2/26/2025 8
K-Nearest neighbor algorithm
◼ If there are n patterns X1,X2, ...,Xn in the training
data X and a test pattern P,
◼ If Xk is the most similar pattern to P from X, then
the class of P is the class of Xk.
◼ The similarity is usually measured by computing
the distance from P to the patterns X1,X2, ...,Xn.
2/26/2025 9
K-Nearest Neighbor Classification process
◼ Euclidean distance is a very common distance
measure
𝑑(𝑝, 𝑞) = (𝑝𝑖 − 𝑞𝑖 )2
𝑖
◼ Determine the class from nearest neighbor list using either of the two
below techniques:
1. Simply take the majority vote of class labels among the k-nearest
neighbors.
2. Weigh the vote according to distance weight factor, w = 1/d2 and
then find the class with highest weight.
2/26/2025 10
Distance Measures-Minkowski distances
◼ Euclidean distance belongs to a group of popular
distance measures called Minkowski distances
given by:
𝑞
𝑑(𝑋𝑖 , 𝑋𝑗 ) = (|𝑥𝑖1 − 𝑥𝑗1 |𝑞 + |𝑥𝑖2 − 𝑥𝑗2 |𝑞 +. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |𝑞 )
where Xi = (xi1, xi2, …, xip) and Xj = (xj1, xj2, …, xjp) are two p-dimensional
data objects, and q is a positive integer
2/26/2025 11
Distance Measures-Minkowski distances
◼ If q = 1, the distance measure is Manhattan (or
city block) distance
𝑑(𝑋𝑖 , 𝑋𝑗 ) = |𝑥𝑖1 − 𝑥𝑗1 | + |𝑥𝑖2 − 𝑥𝑗2 |+. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |
❑If q = 2, the distance measure is Euclidean distance
𝑑(𝑋𝑖 , 𝑋𝑗 ) = (|𝑥𝑖1 − 𝑥𝑗1 |2 + |𝑥𝑖2 − 𝑥𝑗2 |2 +. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |2 )
2/26/2025 12
Distance Measures-Minkowski distances
◼ If q = ∞, the distance measure is Chebyshev
distance
𝑑 𝑋𝑖 , 𝑋𝑗 = max{ (𝑥𝑖1 −𝑥𝑗1 ), (𝑥𝑖2 −𝑥𝑗2 ), … , (𝑥𝑖𝑝 −𝑥𝑗𝑝 )}
Note: Just subtract dimension wise and take the
maximum value form the resulting p values
2/26/2025 13
K-Nearest neighbor method
◼ Majority vote within the k nearest neighbors
new
K= 1: brown
K= 3: green
2/26/2025 14
Nearest-Neighbor Classifiers
Unknown record Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve
To classify an unknown record:
– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)
2/26/2025 15
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points
that have the k smallest distance to x
2/26/2025 16
Nearest Neighbor Classification… Issues
◼ Choosing the value of k:
◼ If k is too small, sensitive to noise points
◼ If k is too large, neighborhood may include points from
other classes
2/26/2025 17
Nearest Neighbor Classification…Issues
◼ Scaling issues
◼ Attributes may have to be scaled to prevent
distance measures from being dominated by
one of the attributes
◼ Example:
◼ height of a person may vary from 1.5m to 1.8m
◼ weight of a person may vary from 90lb to 300lb
◼ income of a person may vary from $10K to $1M
2/26/2025 18
Discussion on the k-NN Algorithm
◼ Make the system robust to noisy data by averaging k-nearest
neighbors
◼ The k-NN algorithm for continuous-valued target :
◼ Calculate the mean values of the k nearest neighbors
◼ Curse of dimensionality: distance between neighbors could be
dominated by irrelevant attributes.
◼ To overcome it, axes stretch or elimination of the least relevant
attributes.
2/26/2025 19
We still have the problem of how to
choose an optimal value of k ?
◼ To better understand this, we need to understand two
commonly used (and misunderstood) terms:
◼ Parameters and
◼ Hyperparameters
2/26/2025 20
Important term: Parameter
◼ Remember: Use term ‘Parameter’ when you talk about the ML model.
◼ In case of ML, the parameters of the trained machine learning model
that describe the model are referred to as model parameters.
◼ The aim of the ALL machine learning problems is to find the optimal
set of model parameters that give the best performance
(classification, clustering or forecasting performance based on the ML
problem in hand).
◼ Eg. The final set of connection weights in a trained artificial neural
network (ANN) model are the set of parameters for that ANN
model.
2/26/2025 21
Important term: Hyperparameter
◼ Remember: Use term ‘Hyperparameter’ when you talk about the ML
algorithm.
◼ Hyperparameters are set of parameters used for the ML algorithm to
start functioning so that it can eventually result in finding an ML
model that has optimal model parameters.
◼ Typically, these are set manually or tuned via some search algorithms.
◼ Eg. The learning rate parameter η and the momentum term µ to
be set for starting the execution of the backpropagation algorithm
to train the above ANN are the two hyperparameters.
◼ The algorithm hyperparameters significantly affect the ML model
training process, hence tuning of hyperparameters is usually quite
helpful.
2/26/2025 22
Let’s get back to our problem
The problem we had was: ‘How to choose the
optimal value of k for a kNN classifier?’
◼ If you look at the definitions in the previous
slides, k is a hyperparameter which is need for
the kNN algorithm to start working.
◼ The kNN classifier is a lazy classifier, hence has
no training or model, hence there is no question
of model parameters, either.
2/26/2025 23
How to find optimal hyperparameters ?
Step 1: Split the dataset into 3 parts
Entire dataset
70% 15% 15%
Training set Validation set Test set
Step 2: Set the model hyperparameters.
Step 3: Use the training set into train the model.
Step 4: Use validation set to find the model performance.
Step 5: Go back to step 2, change model hyperparameters and repeat Steps 2-4 until you
have found the hyperparameter values that results in the ML model that gives the best
performance on validation set.
Step 6: The best performing model is your final model that can be deployed. To test how
well it will do, pass the test set through the best model identified in Step 5 and get the
performance results. This is the result you are very likely to get when the model is deployed
in the field as well.
Note: The percentage 70-15-15 is not sacrosanct. You can have
your own split ratios.
2/26/2025 24
NOTES
◼ Note 1: Several ML practitioners use only the
training and the test sets. In this case, the test
set serves the same function as the validation set
discussed in the previous slide.
◼ Note 2: There is a possibility that a dataset might
be very small in size making this 70-15-15 split
infeasible. In such cases ‘Cross-Validation’ is a
good alternative.
2/26/2025 25
Performance??
◼ In the previous slides, we talked about ‘…. ML model that
gives the best performance…’.
◼ What do you mean by performance of an ML model?
◼ Typically, performance of an ML classifier refers to its
Accuracy. There are other measures such as Sensitivity,
specificity, F1-score etc.
◼ A confusion matrix is a good way of visualizing the ML
classifier performance
2/26/2025 26
Example
◼ Aim: To identify the type of
flower using the features:
[Petal Width, Sepal Length]
◼ Measurements of 150 flowers
(50 of each type) is already Iris virginica
available with us (FisherIris
dataset).
Iris Setosa
◼ k-NN example
◼ Find the flower type for the
unknown flower with given Iris Versicolor
m/m:
P=[5.2, 1.6]
2/26/2025 27
Major References
◼ Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques, 2nd Ed. (including slides).
◼ http://www.ibm.com/developerworks/library/os-weka3/
◼ M. Narasimha Murty, V. Susheela Devi, Pattern Recognition, NPTEL
Notes.
◼ https://www.cse.msu.edu/~cse847/slides/knn.pptx
◼ chem-eng.utoronto.ca/~datamining/Presentations/KNN.ppt
◼ http://www.cs.uoi.gr/~tsap/teaching/2012s-cs059/material/
datamining-lect10.pptx
2/26/2025 28
Questions?
2/26/2025 29