MachineLearningLaboratory
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
K-NearestNeighborAlgorithm
Training Algorithm:
● Foreach Training Example(x, f(x)), add example to thelist training examples
Classification algorithm:
● Givenaqueryinstancexqtobeclassified,
● Letx1...xkdenotethekinstancesfromtrainingexamplesthatarenearesttoxq
● Return
● Where,f(xi)functiontocalculatethemeanvalueoftheknearesttrainingexamples.
DataSet:
IrisPlantsDataset:Dataset Contains 150 Instances(50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the Class
MachineLearningLaboratory
Program:
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets
iris=datasets.load_iris()
x = iris.data
y = iris.target
print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')
print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')
print(y)
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)
#To Training the model and Nearest nighbors K=5
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)
#To make predictions on our test data
y_pred=classifier.predict(x_test)
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
MachineLearningLaboratory
Output:
sepal-lengthsepal-widthpetal-lengthpetal-width
[[5. 3 1. 0.
1 . 4 2]
5
[4. 3 1. 0.
9 . 4 2]
[4. 3 1. 0.
7 . 3 2]
2
[4. 3 1. 0.
6 . 5 2]
1
[5. 3 1. 0.
. . 4 2] .
. 6 . .
. . . .
.
[6.23.45.4 2.3]
[5.93. 1.8]
]
5.1
class:0-Iris-Setosa,1-Iris-Versicolour,2-Iris-Virginica
[000………00111…………11222…………22]ConfusionMatrix
AccuracyMetrics
Precision recall f1-score suppor
t
0 1.00 1.00 1.00 20
1 0.91 1.00 0.95 10
2 1.00 0.93 0.97 15
avg/total 0.98 0.98 0.98 45
MachineLearningLaboratory
Basicknowledge
ConfusionMatrix
[[ 0 0]
20
[0 1 0]
0
[0 1 14]
]
True Positives: data points labelled as positive that are actually positive
False Positives: data points labelled as positive that are actually negative
Truenegatives:datapointslabelledasnegativethatareactuallynegative
Falsenegatives:datapointslabelledasnegativethatareactuallypositive
Accuracy:how often is the classifier correct?
F1-Score:
MachineLearningLaboratory
Support:TotalPredictedofClass.
Support=TP+FN
MachineLearningLaboratory
Example:
● Support_A= TP_A+FN_A
=30+(20+10)
=60