0% found this document useful (0 votes)

79 views6 pages

Data Mining-Lecture6

The document discusses data classification using the k-nearest neighbors (KNN) algorithm. KNN is a supervised learning algorithm that can be used for classification or regression problems. It works by calculating the similarity between a test sample and training samples, and assigning the test sample to the class of its k nearest neighbors. The algorithm is simple to implement but computationally expensive, as it stores all training data and calculates distances to every point. An example is provided to demonstrate how KNN classifies a test point based on its 3 or 5 nearest neighbors in the training data. Pseudocode is also given to outline the KNN classification process.

Uploaded by

ashraf8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views6 pages

Data Mining-Lecture6

Uploaded by

ashraf8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining- Practical Fourth Class-IT Dr.

Eman Hato

5. Data Classification
Classification is a data analysis technique, used to categorize data into different
classes. Classification technique is supervised learning work in two phases which are
training phase and testing phase. During the training phase (known as learning
phase), a classifier model is constructed by training a classification algorithm with a
predetermined set of training data inputs. In the testing phase, the classifier model
is used to predict the class labels for the test data.

K-Nearest Neighbors Algorithm (KNN)

The k-nearest neighbors (KNN) algorithm is a simple supervised machine learning
algorithm that can be used to solve both classification and regression problems.
KNN calculate the similarity between an input test sample and each training samples
to assign the new test sample into the category that is most similar to the available
categories of the training set. The KNN selects the specified number of samples (K)
closest to the input test sample, and then votes for the most frequent category. The
choosing right K for data set is done by trying several Ks and picking the one that
works best.

Advantages

 The algorithm is simple to implement.

 The algorithm not needs to build a model.
 The algorithm is versatile. It can be used for classification, regression, and search.

Disadvantages

 Computationally expensive.
 High memory requirement because it stores all of the training data.
 Sensitive to irrelevant features and the scale of the data.

19
Data Mining- Practical Fourth Class-IT Dr.Eman Hato

KNN Algorithm
The KNN model can implement by following the steps:
1. Load the dataset, load the training as well as test data.
2. Initialize the value of K (the nearest data points).
3. For getting the predicted class of test sample in the test data do the following:
3.1. Calculate the distance between the test sample and each sample of training
data (any distance method can be used).
3.2. Sort the distances and indices in ascending order based on the distance
values.
3.3. Get top K entries from the sorted array.
3.4. Assign a class to the test sample based on most frequent class of these
entries.
4. End.

KNN Example

Example of KNN classification

The data samples K=3 K=5

The test sample (green dot) should be classified either to blue squares or to
red triangles. If k = 3 it is assigned to the red triangles because there are 2
triangles and only 1 square inside the inner circle. If k = 5 it is assigned to the
blue squares (3 squares vs. 2 triangles inside the outer circle).

20
Data Mining- Practical Fourth Class-IT Dr.Eman Hato

Numerical Example of KNN Algorithm

Suppose a training data have four objects and each object have two attributes
(Time, Strength) classified whether the object is good or bad as shown in table
below:
Training
Time Strength Classification
Sample
1 7 7 Bad
2 7 4 Bad
3 3 4 Good
4 1 4 Good

The goal is to classify the test sample Test (Time =3, and Strength=7) as good or bad
class?
1. Determine the value of K (the nearest data points), K=3.
2. Calculate the distance between the test sample and all the training samples.

Training samples Test Sample Distance Classification

1 (7,7) (3,7) 𝐃 = √(7 − 3)2 + (7 − 7)2 = 𝟒 Bad
2 (7,4) (3,7) 𝐃 = √(7 − 3)2 + (4 − 7)2 = 𝟓 Bad
3 (3,4) (3,7) 𝐃 = √(3 − 3)2 + (4 − 7)2 = 𝟑 Good
4 (1,4) (3,7) 𝐃 = √(1 − 3)2 + (4 − 7)2 = 𝟑. 𝟔 Good

3. Sort the calculated distances in ascending order based on distance values.

Training samples Test Sample Ascending Order of Distance Classification

3 (3,4) (3,7) 3 Good
4 (1,4) (3,7) 3.6 Good
1 (7,7) (3,7) 4 Bad
2 (7,4) (3,7) 5 Bad

4. Get top K items from the sorted array, and here k=3.

Training samples Test Sample Ascending Order of Distance Classification

3 (3,4) (3,7) 3 Good
4 (1,4) (3,7) 3.6 Good
1 (7,7) (3,7) 4 Bad

21
Data Mining- Practical Fourth Class-IT Dr.Eman Hato

5. Assign a class to the test sample based on most frequent class of top K items, the
classification labels is (2 good and 1 bad) so the prediction class label is good.

Most frequent class of nearest

Classification
neighbors (K=3)
Good
Good Good
Bad
The class of test sample (3,7) is Good

The Code
Form Design:

Form CS:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace DM_KNN_Algorithm
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}

// Initialize the training data.

int[,] Train = new int[5,4] { {10, 20, 30, 40},
{91, 92, 93, 94},
{81, 82, 83, 84},
{95, 96, 97, 98},
{11, 17, 25, 36}};
int[] Lable = new int[5] { 1, 2, 2, 2, 1 };

22
Data Mining- Practical Fourth Class-IT Dr.Eman Hato

//Initialize the value of K (the nearest data points).

int K = 3;

// Compute Euclidean Distance

public double Euclidean(int[] X, int[] Y)
{
double Sum = 0;
int len = X.Length;

for (int i = 0; i < len; i++)

{
Sum += Math.Pow((X[i] - Y[i]), 2.0);
}

double Dist = Math.Sqrt(Sum);

return Dist;
}

//Compute the Most Frequnt Item

public int MostFrequnt(int [] input)

{
int[] Fritem;
int Max = 0;

//Remove the redundancy items from array

int[] Items = input.Distinct().ToArray();
int len = Items.Length;

// Give initial value to MostFrequent

int MostFreq = Items[0];
//Find the number of redundancy of each items
for (int i = 0; i < len; i++)
{
Fritem = Array.FindAll(input, x => x == Items[i]);
if (Max < Fritem.Length)
{
Max = Fritem.Length;
MostFreq = Items[i];
}
}
return MostFreq;
}

private void button1_Click(object sender, EventArgs e)

{
// The Test sample
int[] Test = new int[4] { 40, 30, 20, 10 };

//Get the number of columns and rows

int Norow = Train.GetLength(0);
int Nocol = Train.GetLength(1);

//Define the vector size to hold the train sample data

int[] Tsample = new int[Nocol];

//Define the vector size to hold the similarity or dissimilarity values

double[] D=new double [Norow];

23
Data Mining- Practical Fourth Class-IT Dr.Eman Hato

// Calculate the distance between test sample and each sample of training data

for (int i = 0; i < Norow; i++)

{
for (int j = 0; j< Nocol; j++)
{
Tsample[j] = Train[i, j];
}
D[i] = Euclidean(Tsample, Test);
}

// Sort the calculated distances in ascending order based on distance values

Array.Sort(D, Lable);

// Get top K items from the sorted array

int [] temp= new int [K];
Array.Copy(Lable, temp, K);

// Assign a class to the test sample based on most frequent class of top K items.
int Class = MostFrequnt(temp);

//Return the predicted class

textBox1.Text = Class.ToString();

private void textBox1_TextChanged(object sender, EventArgs e)

{

}
}
}

4.1 K-Nearest Neighbours (K-NN
No ratings yet
4.1 K-Nearest Neighbours (K-NN
9 pages
KNN
No ratings yet
KNN
53 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
No ratings yet
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
5 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
KNN Algorithm
No ratings yet
KNN Algorithm
15 pages
Why Do We Need A K-NN Algorithm?
No ratings yet
Why Do We Need A K-NN Algorithm?
11 pages
T01 Soln
No ratings yet
T01 Soln
5 pages
KNN & Decision Tree Basics
No ratings yet
KNN & Decision Tree Basics
9 pages
Adobe Scan 16 May 2023
No ratings yet
Adobe Scan 16 May 2023
9 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
ML Program 4
No ratings yet
ML Program 4
10 pages
Supervised Learning KNN
No ratings yet
Supervised Learning KNN
23 pages
Distance-Based Methods - KNN
0% (1)
Distance-Based Methods - KNN
8 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
12 pages
Instance Based Learning: Aiml/ Bda
No ratings yet
Instance Based Learning: Aiml/ Bda
25 pages
ML Unit 5..
No ratings yet
ML Unit 5..
40 pages
Lecture 17 - KNN
No ratings yet
Lecture 17 - KNN
18 pages
K-NN Numerical N Theory
No ratings yet
K-NN Numerical N Theory
5 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
Lecture 38 KNN
No ratings yet
Lecture 38 KNN
4 pages
KNN Classifier for Data Scientists
No ratings yet
KNN Classifier for Data Scientists
16 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
04 Unit-Iv - ML
No ratings yet
04 Unit-Iv - ML
23 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Unit 4
No ratings yet
Unit 4
24 pages
E Learning KNN
No ratings yet
E Learning KNN
31 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
K Nearest Neighbour Classifier Overview
No ratings yet
K Nearest Neighbour Classifier Overview
30 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
ML-LECTURE9 KNN Classification
No ratings yet
ML-LECTURE9 KNN Classification
23 pages
K-NN Classification Review
No ratings yet
K-NN Classification Review
7 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
Difference Between Instance-And Model-Based Learning
No ratings yet
Difference Between Instance-And Model-Based Learning
35 pages
Understanding K-Nearest Neighbors (KNN)
No ratings yet
Understanding K-Nearest Neighbors (KNN)
27 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
KNN 200406061259
No ratings yet
KNN 200406061259
9 pages
ML Unit - 2
No ratings yet
ML Unit - 2
85 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Aiml M5 PPT 23-24
No ratings yet
Aiml M5 PPT 23-24
94 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
Lec 7
No ratings yet
Lec 7
40 pages
XML Document Filtering Techniques
No ratings yet
XML Document Filtering Techniques
5 pages
Distributed DBMS Transaction Management
No ratings yet
Distributed DBMS Transaction Management
35 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.9
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.9
9 pages
Chapter 4
No ratings yet
Chapter 4
17 pages
Lecture 2
No ratings yet
Lecture 2
18 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.9
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.9
9 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.6
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.6
15 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.8
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.8
13 pages
Polyalphabetic Ciphers Explained
No ratings yet
Polyalphabetic Ciphers Explained
12 pages
Computer System Structure Computer System Architecture
No ratings yet
Computer System Structure Computer System Architecture
6 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.8
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.8
13 pages
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.1
No ratings yet
Computer and Data Security 4 Class: Dr. Ashwaq Mahmood Alabaichi 2015-2016 Lec.1
15 pages
Transposition Ciphers Explained
No ratings yet
Transposition Ciphers Explained
12 pages
Canon Pixma Ip4200 Error Code 6500 - Fixya
No ratings yet
Canon Pixma Ip4200 Error Code 6500 - Fixya
2 pages
SAP HANA Master Guide en
No ratings yet
SAP HANA Master Guide en
84 pages
Complete Biology For Cambridge Igcse 3rd Edition Pickering Ron Instant Download
No ratings yet
Complete Biology For Cambridge Igcse 3rd Edition Pickering Ron Instant Download
69 pages
Control Engineering Simplified
No ratings yet
Control Engineering Simplified
23 pages
L01 - Intro To Java
No ratings yet
L01 - Intro To Java
18 pages
1VLG100520 - Installation Guide of IR Temperature Sensors - Revb
No ratings yet
1VLG100520 - Installation Guide of IR Temperature Sensors - Revb
46 pages
Computer Networks: Application Layer
No ratings yet
Computer Networks: Application Layer
115 pages
Flame Scanner
No ratings yet
Flame Scanner
29 pages
Introduction To Node-Red v2
No ratings yet
Introduction To Node-Red v2
22 pages
Consultation Form
No ratings yet
Consultation Form
1 page
Popular Computer Uses & Benefits
No ratings yet
Popular Computer Uses & Benefits
3 pages
ECE302-Explore Engineering-Project F25
No ratings yet
ECE302-Explore Engineering-Project F25
8 pages
hw5 f2018 VLSI Sol
No ratings yet
hw5 f2018 VLSI Sol
3 pages
Aditi Agarwal - An Expert Guide To Problem-Solving - With Practical Examples (Learn Brainstorming, Fishbone, SWOT, FMEA, 5whys + 6 More) - Aditi Agarwal Books LLC (2016)
No ratings yet
Aditi Agarwal - An Expert Guide To Problem-Solving - With Practical Examples (Learn Brainstorming, Fishbone, SWOT, FMEA, 5whys + 6 More) - Aditi Agarwal Books LLC (2016)
55 pages
Presentation Topic: Cyber Crimes and Security: by Ashwini Awatare
50% (2)
Presentation Topic: Cyber Crimes and Security: by Ashwini Awatare
47 pages
Management Science Chapter 2 Notes
No ratings yet
Management Science Chapter 2 Notes
2 pages
Face2Face - Pre-Intermediate - Teacher's Book (PDFDrive)
No ratings yet
Face2Face - Pre-Intermediate - Teacher's Book (PDFDrive)
224 pages
Grid Computing Explained: Key Features & Benefits
No ratings yet
Grid Computing Explained: Key Features & Benefits
8 pages
Ieee 535-1986
No ratings yet
Ieee 535-1986
14 pages
Digital Litracy Exam
No ratings yet
Digital Litracy Exam
4 pages
Google Ads
No ratings yet
Google Ads
15 pages
g9 Central Provincial Common Mock 2025 - Marking Key
No ratings yet
g9 Central Provincial Common Mock 2025 - Marking Key
4 pages
Generate C Prototypes in VSCode
No ratings yet
Generate C Prototypes in VSCode
2 pages
VSD Service Switch Troubleshooting
No ratings yet
VSD Service Switch Troubleshooting
10 pages
Next Gen Employability Program Overview
No ratings yet
Next Gen Employability Program Overview
16 pages
MoitinhoDeAlmeida 2018 HE-book PDF
No ratings yet
MoitinhoDeAlmeida 2018 HE-book PDF
120 pages
Smart Wireless Electric Vehicle Charging System
No ratings yet
Smart Wireless Electric Vehicle Charging System
42 pages
Hydra Steps
No ratings yet
Hydra Steps
2 pages
RE Re Updated EULA - Recovery Care - Monthly Subscription - $ 30.00
No ratings yet
RE Re Updated EULA - Recovery Care - Monthly Subscription - $ 30.00
9 pages
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
No ratings yet
3d-Aware Conditional Image Synthesis: Kangle Deng Gengshan Yang Deva Ramanan Jun-Yan Zhu Carnegie Mellon University
15 pages