0% found this document useful (0 votes)

31 views29 pages

12 - 23ECE216 - Nearest Neighbors

The document provides an introduction to Nearest Neighbor Classifiers, explaining their basic principles, advantages, and issues, particularly focusing on the k-Nearest Neighbor (k-NN) algorithm. It discusses the importance of choosing the right value of k, the impact of distance measures, and the process for optimizing hyperparameters for better model performance. Additionally, it includes examples and references for further reading on the topic.

Uploaded by

pvsbym

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views29 pages

12 - 23ECE216 - Nearest Neighbors

Uploaded by

pvsbym

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Om Namo Bhagavate Vasudevaya

Introduction to Classification-
Lazy Learners- Nearest Neighbor
Classifiers

And some insights into classifier

terminology

Dr. Binoy B Nair

2/26/2025 1
Note
◼ Only to be used as a supplementary learning
material. Not for any commercial use.

◼ These slides have been substantially sourced from

Han and Kamber with some modifications made
for easier understanding.

◼ All rights belong to the original authors.

February 26, 2025 2

Nearest Neighbor Classifiers

◼ Basic idea:
◼ If it walks like a duck, quacks like a duck, then

it’s probably a duck

Compute
Distance Test
Record

Training Choose k of the

Records “nearest” records

2/26/2025 3
Nearest Neighbor Classifier
◼ This is amongst the simplest of all classification algorithms
in supervised learning.

◼ This is a method of classifying patterns based on the class

label of the closest training patterns in the feature space.

◼ These are non-parametric methods where no model is

fitted using the training patterns.

◼ The accuracy using nearest neighbor classifiers is good. It

is guaranteed to yield an error rate no worse than twice
the Bayes error rate which is the optimal error rate.
2/26/2025 4
Nearest Neighbor Classifiers
◼ There is no training time required for this classifier. In
other words, there is no design time for training the
classifier.

◼ Every time a test pattern is to be classified, it has to be

compared with all the training patterns, to find the closest
pattern.

◼ This classification time could be large if the training

patterns are large in number or if the dimensionality of the
patterns is high.

2/26/2025 5
Example: 1-NN classifier

Q. Find what the fifth customer is likely to buy from the given dataset using 1-NN
classifier

Customer Age Income Purchased

Product
1 45 46000 Book
2 39 100000 Book
3 35 38000 DVD
4 69 150000 DVD
5 58 51000 ???

2/26/2025 6
Example: 1-NN classifier
Solving :

Determine Distance between the unknown sample (sample no. 5) and other four known samples
58 − 𝐴𝑔𝑒 2 51000 − 𝐼𝑛𝑐𝑜𝑚𝑒 2
using Euclidean distance after normalization = +
69−35 150000−38000

Norm. Norm. Norm. Un Norm.

Customer Age Income
Age Income Distance Distance
1 45 46000 1.3 0.4 0.41 5000.02
2 39 100000 1.1 0.9 0.72 49000
3 35 38000 1 0.3 0.73 13000.02
4 69 150000 2 1.3 0.85 99000
5 58 51000 1.7 0.5
based on the Nearest Neighbor algorithm we ran through above, the customer
5 features are closest to Customer 1 and hence he is likely to buy what
customer 1 bought i.e. book.

2/26/2025 7
Issues with 1-NN classifier

◼ Might take outliers as the real data

2/26/2025 8
K-Nearest neighbor algorithm

◼ If there are n patterns X1,X2, ...,Xn in the training

data X and a test pattern P,

◼ If Xk is the most similar pattern to P from X, then

the class of P is the class of Xk.

◼ The similarity is usually measured by computing

the distance from P to the patterns X1,X2, ...,Xn.

2/26/2025 9
K-Nearest Neighbor Classification process

◼ Euclidean distance is a very common distance

measure
𝑑(𝑝, 𝑞) = ෍(𝑝𝑖 − 𝑞𝑖 )2
𝑖

◼ Determine the class from nearest neighbor list using either of the two
below techniques:

1. Simply take the majority vote of class labels among the k-nearest
neighbors.
2. Weigh the vote according to distance weight factor, w = 1/d2 and
then find the class with highest weight.

2/26/2025 10
Distance Measures-Minkowski distances

◼ Euclidean distance belongs to a group of popular

distance measures called Minkowski distances
given by:

𝑞
𝑑(𝑋𝑖 , 𝑋𝑗 ) = (|𝑥𝑖1 − 𝑥𝑗1 |𝑞 + |𝑥𝑖2 − 𝑥𝑗2 |𝑞 +. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |𝑞 )

where Xi = (xi1, xi2, …, xip) and Xj = (xj1, xj2, …, xjp) are two p-dimensional
data objects, and q is a positive integer

2/26/2025 11
Distance Measures-Minkowski distances

◼ If q = 1, the distance measure is Manhattan (or

city block) distance

𝑑(𝑋𝑖 , 𝑋𝑗 ) = |𝑥𝑖1 − 𝑥𝑗1 | + |𝑥𝑖2 − 𝑥𝑗2 |+. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |

❑If q = 2, the distance measure is Euclidean distance

𝑑(𝑋𝑖 , 𝑋𝑗 ) = (|𝑥𝑖1 − 𝑥𝑗1 |2 + |𝑥𝑖2 − 𝑥𝑗2 |2 +. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |2 )

2/26/2025 12
Distance Measures-Minkowski distances

◼ If q = ∞, the distance measure is Chebyshev

distance

𝑑 𝑋𝑖 , 𝑋𝑗 = max{ (𝑥𝑖1 −𝑥𝑗1 ), (𝑥𝑖2 −𝑥𝑗2 ), … , (𝑥𝑖𝑝 −𝑥𝑗𝑝 )}

Note: Just subtract dimension wise and take the

maximum value form the resulting p values

2/26/2025 13
K-Nearest neighbor method

◼ Majority vote within the k nearest neighbors

new

K= 1: brown
K= 3: green

2/26/2025 14
Nearest-Neighbor Classifiers
Unknown record Requires three things
– The set of stored records
– Distance Metric to compute
distance between records
– The value of k, the number of
nearest neighbors to retrieve

To classify an unknown record:

– Compute distance to other
training records
– Identify k nearest neighbors
– Use class labels of nearest
neighbors to determine the
class label of unknown record
(e.g., by taking majority vote)

2/26/2025 15
Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

that have the k smallest distance to x

2/26/2025 16
Nearest Neighbor Classification… Issues

◼ Choosing the value of k:

◼ If k is too small, sensitive to noise points
◼ If k is too large, neighborhood may include points from
other classes

2/26/2025 17
Nearest Neighbor Classification…Issues

◼ Scaling issues

◼ Attributes may have to be scaled to prevent

distance measures from being dominated by
one of the attributes

◼ Example:
◼ height of a person may vary from 1.5m to 1.8m
◼ weight of a person may vary from 90lb to 300lb
◼ income of a person may vary from $10K to $1M

2/26/2025 18
Discussion on the k-NN Algorithm
◼ Make the system robust to noisy data by averaging k-nearest
neighbors

◼ The k-NN algorithm for continuous-valued target :

◼ Calculate the mean values of the k nearest neighbors

◼ Curse of dimensionality: distance between neighbors could be

dominated by irrelevant attributes.

◼ To overcome it, axes stretch or elimination of the least relevant

attributes.

2/26/2025 19
We still have the problem of how to
choose an optimal value of k ?

◼ To better understand this, we need to understand two

commonly used (and misunderstood) terms:

◼ Parameters and

◼ Hyperparameters

2/26/2025 20
Important term: Parameter
◼ Remember: Use term ‘Parameter’ when you talk about the ML model.

◼ In case of ML, the parameters of the trained machine learning model

that describe the model are referred to as model parameters.

◼ The aim of the ALL machine learning problems is to find the optimal
set of model parameters that give the best performance
(classification, clustering or forecasting performance based on the ML
problem in hand).
◼ Eg. The final set of connection weights in a trained artificial neural

network (ANN) model are the set of parameters for that ANN
model.

2/26/2025 21
Important term: Hyperparameter
◼ Remember: Use term ‘Hyperparameter’ when you talk about the ML
algorithm.

◼ Hyperparameters are set of parameters used for the ML algorithm to

start functioning so that it can eventually result in finding an ML
model that has optimal model parameters.

◼ Typically, these are set manually or tuned via some search algorithms.
◼ Eg. The learning rate parameter η and the momentum term µ to

be set for starting the execution of the backpropagation algorithm

to train the above ANN are the two hyperparameters.

◼ The algorithm hyperparameters significantly affect the ML model

training process, hence tuning of hyperparameters is usually quite
helpful.
2/26/2025 22
Let’s get back to our problem

The problem we had was: ‘How to choose the

optimal value of k for a kNN classifier?’

◼ If you look at the definitions in the previous

slides, k is a hyperparameter which is need for
the kNN algorithm to start working.

◼ The kNN classifier is a lazy classifier, hence has

no training or model, hence there is no question
of model parameters, either.

2/26/2025 23
How to find optimal hyperparameters ?

Step 1: Split the dataset into 3 parts

Entire dataset
70% 15% 15%
Training set Validation set Test set

Step 2: Set the model hyperparameters.

Step 3: Use the training set into train the model.
Step 4: Use validation set to find the model performance.
Step 5: Go back to step 2, change model hyperparameters and repeat Steps 2-4 until you
have found the hyperparameter values that results in the ML model that gives the best
performance on validation set.
Step 6: The best performing model is your final model that can be deployed. To test how
well it will do, pass the test set through the best model identified in Step 5 and get the
performance results. This is the result you are very likely to get when the model is deployed
in the field as well.

Note: The percentage 70-15-15 is not sacrosanct. You can have

your own split ratios.
2/26/2025 24
NOTES

◼ Note 1: Several ML practitioners use only the

training and the test sets. In this case, the test
set serves the same function as the validation set
discussed in the previous slide.

◼ Note 2: There is a possibility that a dataset might

be very small in size making this 70-15-15 split
infeasible. In such cases ‘Cross-Validation’ is a
good alternative.

2/26/2025 25
Performance??
◼ In the previous slides, we talked about ‘…. ML model that
gives the best performance…’.

◼ What do you mean by performance of an ML model?

◼ Typically, performance of an ML classifier refers to its

Accuracy. There are other measures such as Sensitivity,
specificity, F1-score etc.

◼ A confusion matrix is a good way of visualizing the ML

classifier performance

2/26/2025 26
Example
◼ Aim: To identify the type of
flower using the features:
[Petal Width, Sepal Length]

◼ Measurements of 150 flowers

(50 of each type) is already Iris virginica
available with us (FisherIris
dataset).
Iris Setosa
◼ k-NN example

◼ Find the flower type for the

unknown flower with given Iris Versicolor
m/m:
P=[5.2, 1.6]
2/26/2025 27
Major References
◼ Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques, 2nd Ed. (including slides).
◼ http://www.ibm.com/developerworks/library/os-weka3/
◼ M. Narasimha Murty, V. Susheela Devi, Pattern Recognition, NPTEL
Notes.
◼ https://www.cse.msu.edu/~cse847/slides/knn.pptx
◼ chem-eng.utoronto.ca/~datamining/Presentations/KNN.ppt
◼ http://www.cs.uoi.gr/~tsap/teaching/2012s-cs059/material/
datamining-lect10.pptx

2/26/2025 28
Questions?

2/26/2025 29

KNN: Classifying with Nearest Neighbors
No ratings yet
KNN: Classifying with Nearest Neighbors
51 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
ML-LECTURE9 KNN Classification
No ratings yet
ML-LECTURE9 KNN Classification
23 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
K Nearest Neighbour Classifier Overview
No ratings yet
K Nearest Neighbour Classifier Overview
30 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
04 KNN
No ratings yet
04 KNN
25 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
05 KNN
No ratings yet
05 KNN
49 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
ML KN
No ratings yet
ML KN
12 pages
ML 5
No ratings yet
ML 5
76 pages
ML 4
No ratings yet
ML 4
33 pages
DADM S15 K-NN Classification
No ratings yet
DADM S15 K-NN Classification
13 pages
KNN Algorithm Overview and Applications
No ratings yet
KNN Algorithm Overview and Applications
41 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
2 KNN
No ratings yet
2 KNN
67 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
k-NN Algorithm Overview & Applications
No ratings yet
k-NN Algorithm Overview & Applications
35 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
ML Unit 3
No ratings yet
ML Unit 3
106 pages
Machine Learning Classification Overview
No ratings yet
Machine Learning Classification Overview
26 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
No ratings yet
CH5 Data Mining Classification Prepared by Dr. Maher Abuhamdeh
61 pages
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
No ratings yet
T6 - KNN - Features, Distances &amp Amp Non-Parametric Models
23 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
12 ML KNN
No ratings yet
12 ML KNN
28 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Supervised Learning & KNN Guide
No ratings yet
Supervised Learning & KNN Guide
27 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
L05-Predictive Analytics I
No ratings yet
L05-Predictive Analytics I
49 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
Understanding K-Nearest Neighbors (KNN)
No ratings yet
Understanding K-Nearest Neighbors (KNN)
9 pages
ML Notes
100% (2)
ML Notes
125 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
No ratings yet
ML 7th Sem Aiml Ite Notes Complete Long (1) - 63-155
93 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
10 - 23ECE216 - Descriptive Statistics
No ratings yet
10 - 23ECE216 - Descriptive Statistics
60 pages
04 23ECE216 OptimizationWithHessians
No ratings yet
04 23ECE216 OptimizationWithHessians
24 pages
Machine Learning: Partial Derivatives Explained
No ratings yet
Machine Learning: Partial Derivatives Explained
47 pages
Taylor Series Expansion Techniques
No ratings yet
Taylor Series Expansion Techniques
24 pages
EDA Preprocessing Techniques Explained
No ratings yet
EDA Preprocessing Techniques Explained
16 pages
08 23ECE216 LinearRegressionUsingGradientDescent
No ratings yet
08 23ECE216 LinearRegressionUsingGradientDescent
11 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
Quantitative Techniques
No ratings yet
Quantitative Techniques
2 pages
The Effect of Earnings Surprises On Stock Returns: Analysis of The Canadian Market
No ratings yet
The Effect of Earnings Surprises On Stock Returns: Analysis of The Canadian Market
26 pages
Descriptive Statistics and Data Visualization
No ratings yet
Descriptive Statistics and Data Visualization
12 pages
Linear Regression for Analysts
No ratings yet
Linear Regression for Analysts
17 pages
Part-A Assignment No. 5
No ratings yet
Part-A Assignment No. 5
2 pages
Linear Regression Models Explained
No ratings yet
Linear Regression Models Explained
7 pages
Machine Learning Lab Course Overview
No ratings yet
Machine Learning Lab Course Overview
49 pages
Practice Set Solved BSM201
No ratings yet
Practice Set Solved BSM201
20 pages
5St1 Systematic Samp - in 12
No ratings yet
5St1 Systematic Samp - in 12
4 pages
Multilevel and Mixed-Effects Hamilton - ch7 PDF
No ratings yet
Multilevel and Mixed-Effects Hamilton - ch7 PDF
18 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Statistics 12th Edition by James T McClave Ebook and TestBank Bundle PDF Testbank
No ratings yet
Statistics 12th Edition by James T McClave Ebook and TestBank Bundle PDF Testbank
316 pages
Analysis of Supermarket Price Differences
No ratings yet
Analysis of Supermarket Price Differences
15 pages
II - MA3391 - PS Question Bank - 1 (CIA-II)
No ratings yet
II - MA3391 - PS Question Bank - 1 (CIA-II)
2 pages
Section 7.1
No ratings yet
Section 7.1
45 pages
Dec - 2021 BCS-040 IGNOUAssignment
No ratings yet
Dec - 2021 BCS-040 IGNOUAssignment
4 pages
Problem Set 2 (DDL 20 Nov, in Class)
No ratings yet
Problem Set 2 (DDL 20 Nov, in Class)
2 pages
BBA Cource Outline Buiness Statistics
No ratings yet
BBA Cource Outline Buiness Statistics
5 pages
21CB71
No ratings yet
21CB71
4 pages
MA2201Assignment 5
No ratings yet
MA2201Assignment 5
3 pages
Regression Analysis and Modelling - Amar Sahay
No ratings yet
Regression Analysis and Modelling - Amar Sahay
93 pages
Time Series Chapman Hall CRC Texts in Statistical Science 1st Edition Robert Shumway Download
No ratings yet
Time Series Chapman Hall CRC Texts in Statistical Science 1st Edition Robert Shumway Download
46 pages
18MAB303T - Testing Hypothesis - Basics 2023
No ratings yet
18MAB303T - Testing Hypothesis - Basics 2023
25 pages
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
No ratings yet
Name: Muhammad Siddique Class: B.Ed. Semester: Fifth Subject: Inferential Statistics Submitted To: Sir Sajid Ali
6 pages
Homework 1
No ratings yet
Homework 1
9 pages
Lind 19e Chap015 PPT Accessible
No ratings yet
Lind 19e Chap015 PPT Accessible
30 pages
2nd Proposal Draft On Breast Self-Examination Practice
No ratings yet
2nd Proposal Draft On Breast Self-Examination Practice
34 pages
Introduction to Chemometrics Basics
No ratings yet
Introduction to Chemometrics Basics
162 pages
Statistical Analysis for Coaches
No ratings yet
Statistical Analysis for Coaches
6 pages
Analyzing Within vs. Between Subject Effects
No ratings yet
Analyzing Within vs. Between Subject Effects
34 pages

12 - 23ECE216 - Nearest Neighbors

Uploaded by

12 - 23ECE216 - Nearest Neighbors

Uploaded by

Om Namo Bhagavate Vasudevaya

And some insights into classifier

Dr. Binoy B Nair

◼ These slides have been substantially sourced from

◼ All rights belong to the original authors.

February 26, 2025 2

it’s probably a duck

Training Choose k of the

◼ This is a method of classifying patterns based on the class

◼ These are non-parametric methods where no model is

◼ The accuracy using nearest neighbor classifiers is good. It

◼ Every time a test pattern is to be classified, it has to be

◼ This classification time could be large if the training

Customer Age Income Purchased

Norm. Norm. Norm. Un Norm.

◼ Might take outliers as the real data

◼ If there are n patterns X1,X2, ...,Xn in the training

◼ If Xk is the most similar pattern to P from X, then

◼ The similarity is usually measured by computing

◼ Euclidean distance is a very common distance

◼ Euclidean distance belongs to a group of popular

◼ If q = 1, the distance measure is Manhattan (or

𝑑(𝑋𝑖 , 𝑋𝑗 ) = |𝑥𝑖1 − 𝑥𝑗1 | + |𝑥𝑖2 − 𝑥𝑗2 |+. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |

❑If q = 2, the distance measure is Euclidean distance

𝑑(𝑋𝑖 , 𝑋𝑗 ) = (|𝑥𝑖1 − 𝑥𝑗1 |2 + |𝑥𝑖2 − 𝑥𝑗2 |2 +. . . +|𝑥𝑖𝑝 − 𝑥𝑗𝑝 |2 )

◼ If q = ∞, the distance measure is Chebyshev

𝑑 𝑋𝑖 , 𝑋𝑗 = max{ (𝑥𝑖1 −𝑥𝑗1 ), (𝑥𝑖2 −𝑥𝑗2 ), … , (𝑥𝑖𝑝 −𝑥𝑗𝑝 )}

Note: Just subtract dimension wise and take the

◼ Majority vote within the k nearest neighbors

To classify an unknown record:

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

◼ Choosing the value of k:

◼ Attributes may have to be scaled to prevent

◼ The k-NN algorithm for continuous-valued target :

◼ Calculate the mean values of the k nearest neighbors

◼ Curse of dimensionality: distance between neighbors could be

◼ To overcome it, axes stretch or elimination of the least relevant

◼ To better understand this, we need to understand two

◼ In case of ML, the parameters of the trained machine learning model

◼ Hyperparameters are set of parameters used for the ML algorithm to

be set for starting the execution of the backpropagation algorithm

◼ The algorithm hyperparameters significantly affect the ML model

The problem we had was: ‘How to choose the

◼ If you look at the definitions in the previous

◼ The kNN classifier is a lazy classifier, hence has

Step 1: Split the dataset into 3 parts

Step 2: Set the model hyperparameters.

Note: The percentage 70-15-15 is not sacrosanct. You can have

◼ Note 1: Several ML practitioners use only the

◼ Note 2: There is a possibility that a dataset might

◼ What do you mean by performance of an ML model?

◼ Typically, performance of an ML classifier refers to its

◼ A confusion matrix is a good way of visualizing the ML

◼ Measurements of 150 flowers

◼ Find the flower type for the

You might also like