Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
MD2201: Data Science
Name of the student: Swaroop Deokar Roll No. 16
Div: CS-AIML-A Batch: 1
Date of performance: 14/09/2023
Experiment No.5
Title: Classifier.
Aim: To apply Nearest Neighbor algorithm.
Software used: Programming language R.
Code Statement:
Consider 18 points data set referred in theory class. Consider test sample P(3,2). Apply following
algorithms to find the class of this test point.
i. NN
ii. KNN with K=5 and K=7.
iii. MKNN with K=5
iv. R-NN Radius based algorithm with radius as 1.45 units.
Code:
f<-read.csv("knn1_csv.csv") #reading the csv file
dist = sqrt ((3 - f$x)^2 + (2 - f$y)^2) #calculating distance between points
f<-cbind(f,dist) #binding the calculated distance with the existing data frame
f<- f[order(f$dist),] #ordering the data in ascending order based on the distance
#NN ----
c1<-f[1,4]
cat("class of P for NN is:", c1)
#KNN for K=5 ----
f1<-f[1:5,]
s1<-sum(f1$class==1)
s2<-sum(f1$class==2)
s3<-sum(f1$class==3)
if(s1 > s2 & s1 > s3)
cat("Class of P for K=5 is 1.")
cat("\n")
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
if(s2 > s1 & s2 > s3) Department of SY Common
cat("Class of P for K=5 is 2.")
cat("\n")
if(s3 > s2 & s3 > s1)
cat("Class of P for K=5 is 3.")
cat("\n")
#KNN for K=7 ----
f2<-f[1:7,]
s11<-sum(f2$class==1)
s12<-sum(f2$class==2)
s13<-sum(f2$class==3)
if(s11 > s12 & s11 > s13)
cat("Class of P for K=7 is 1.")
cat("\n")
if(s12 > s11 & s12 > s13)
cat("Class of P for K=7 is 2.")
cat("\n")
if(s13 > s12 & s13 > s11)
cat("Class of P for K=7 is 3.")
cat("\n")
#RNN ----
f3<-f[f$dist <1.45, ]
s21<-sum(f3$class==1)
s22<-sum(f3$class==2)
s23<-sum(f3$class==3)
if(s21 > s22 & s21 > s23)
cat("Class of P for R=1.45 is 1.")
cat("\n")
if(s22 > s21 & s22 > s23)
cat("Class of P for R=1.45 is 2.")
cat("\n")
if(s23 > s22 & s23 > s21)
cat("Class of P for R=1.45 is 3.")
cat("\n")
#MKNN----
#MKNN
f4<- f[1:5,]
weight = (f4$dist[5] - f4$dist) / (f4$dist[5] - f4$dist[1])
f4<-cbind(f4,weight)
s31<-sum(f4$class==1)
s32<-sum(f4$class==2)
s33<-sum(f4$class==3)
if(s31 > s32 & s31 > s33)
cat("\nClass of P for MKNN = 5 is 1.")
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
if(s32 > s31 & s32 > s33)
cat("\nClass of P for MKNN = 5 is 2.")
if(s33 > s32 & s33 > s31)
cat("\nClass of P for MKNN = 5 is 3.")
cat("\n")
Results:
Conclusion: In conclusion, we have applied a range of nearest neighbor-based algorithms to classify a test
point P(3,2) using a dataset consisting of 18 points. Each algorithm has its own approach to determining
the class of the test point, and their results can provide valuable insights into their respective strengths and
weaknesses.
1. NN (Nearest Neighbor Algorithm): This algorithm assigns the class of the nearest data point to the test
point. It relies on the assumption that points in close proximity share similar characteristics. The class of
point P(3,2) has been determined based on the closest neighbor in the dataset.
2. KNN (K-Nearest Neighbors Algorithm): KNN considers the K closest data points and assigns the class
based on majority voting. In our case, we applied KNN with K=5 and K=7, considering the 5 and 7 nearest
neighbors, respectively.
3. MKNN (Modified K-Nearest Neighbors Algorithm): MKNN is a variation of KNN that takes into
account multiple values of K for different classes. We applied MKNN with K=5 in this case to determine
the class of point P(3,2).
4. R-NN (Radius-Based Nearest Neighbor Algorithm): R-NN assigns the class of a data point based on all
data points within a specified radius. We used a radius of 1.45 units in this study to classify the test point.
Bansilal RamnathAgarwal Charitable Trust’s
VISHWAKARMA INSTITUTE OF TECHNOLOGY – PUNE
Department of SY Common
Each of these algorithms offers a different perspective on how to classify data points, and their
performance may vary depending on the dataset and specific application. The choice of which algorithm to
use depends on factors such as dataset size, distribution, and the desired trade-off between accuracy and
computational complexity.