K - nn Algorithm Example
We have data from the questionnaires survey (to ask people opinion)
and objective testing with two attributes (acid durability and strength)
to classify whether a special paper tissue is good or not. Here is four
training samples.
X2 = Strength
X1 = Acid Durability (seconds) (kg/square meter) Y = Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that pass laboratory test
with X1 = 3 and X2 = 7. Without another expensive survey, can we
guess what the classification of this new tissue is?
1.Determine parameter K = number of nearest neighbors
Suppose use K = 3
2.Calculate the distance between the query-instance and all the
training samples
Coordinate of query instance is (3, 7), instead of calculating the
distance we compute square distance which is faster to calculate
(without square root)
X2 = Strength
X2 = Strength
X1 = Acid Durability (seconds) (kg/square meter) Square Distance to
Query Distance
7 7
7 4
3 4
1 4
3.Sort the distance and determine nearest neighbors based on the K-th
minimum distance
X2 = Strength
X1 = Acid (kg/square Square Distance Rank Min.Distance Is it included
Durability meter) to Query Distance NearestNeighbor
(seconds)
7 7 3 Yes
7 4 4 No
3 4 1 Yes
1 4 2 Yse
4. Notice in the second row last column that the category of
nearest neighbor (Y) is not included because the rank of this
data is more than 3 (=K).
5.Use simple majority of the category of nearest neighbors as the
prediction value of the query instance
We have 2 good and 1 bad, since 2>1 then we conclude that a new
paper tissue that pass Good laboratory test with X1 = 3 and X2 = 7
is included in Good category.
Weighted k-nn Algorithm
The k-NEAREST NEIGHBOR algorithm is to weight the contribution of each
of the k neighbors according to their distance to the query point xq