0% found this document useful (0 votes)

15 views7 pages

Certificate

The document describes a certificate for a report submitted in partial fulfillment of a bachelor's degree. It includes the student's declaration that the work is authentic and their own work. It also includes signatures from the student and their supervisor certifying the contents.

Uploaded by

Vatsal Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Certificate

Uploaded by

Vatsal Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Certificate

I hereby declare that the work presented in this report entitled “ Clustering Techniques in
Machine Learning” in partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science and Engineering/Information Technology
submitted in the department of Computer Science & Engineering and Information Technology,
Jaypee University of Information Technology Waknaghat is an authentic record of my own
work carried out over a period from July 2022 to May 2023 under the supervision of Dr.
Monika Bharti Assistant Professor (SG).
The matter embodied in the report has not been submitted for the award of any other degree or
diploma.

Vatsal Singh, 191286

This is to certify that the above statement made by the candidate is true to the best of my
knowledge.

Dr. Monika Bharti

Assistant Ptofessor (SG)
Computer Science and Engineering/Information Technology
Dated:

i
Acknowledgement

The successful completion of any task would be incomplete without acknowledging the people
who made it possible and whose constant guidance and encouragement secured the success.

First of all I wish to acknowledge the benevolence of omnipotent God who gave me strength
and courage to overcome all obstacles and showed me the silver lining in the dark clouds with
the profound sense of gratitude and heartiest regard. I express my sincere feelings of
indebtedness to my guide Dr. Monika Bharti for their positive attitude, excellent guidance,
constant encouragement, keen interest, invaluable co-operation, generous attitude and above
all their blessings. She has been a source of inspiration for me.

Last but not the least I would like to express my heartfelt thanks to my parents and my friends
who with their thought provoking views, veracity and whole hearted cooperation helped in
doing this project.

Vatsal Singh

(191286)

ii
Abstract

In today’s era data generated by scientific applications and corporate environment has grown
rapidly not only in size but also in variety. This data collected is of huge amount and there is
a difficulty in collecting and analyzing such big data. Data mining is the technique in which
useful information and hidden relationship among data is extracted, but the traditional data
mining approaches cannot be directly used for big data due to their inherent complexity.

Data Clustering is one of the most important issues in data mining and machine learning.
Clustering is a task of discovering homogenous groups of the studied objects. Recently, many
researchers have a significant interest in developing clustering algorithms. The most problem
in clustering is that we do not have prior information knowledge about the given dataset.
Moreover, the choice of input parameters such as the number of clusters, number of nearest
neighbors and other factors in these algorithms make the clustering more challengeable topic.
Thus any incorrect choice of these parameters yields bad clustering results. Furthermore, these
algorithms suffer from unsatisfactory accuracy when the dataset contains clusters with
different complex shapes, densities, sizes, noise, and outliers. In this project, we propose a
new approach for unsupervised clustering task. Our approach consists of three phases of
operations. In the first phase we use the Genetic algorithm for finding first initial cluster
centroid. In genetic algorithm we use a crossover and mutation of the dataset. The second
phase, takes these initial cluster centroid produced by genetic algorithm for finding clusters
using K-means clustering. From the second phase we obtain a set of clusters of the given
dataset. Hence, the third phase considers these clusters for evaluation of cluster based on
Davies Bouldin Index. This new algorithm is named as Genetic K-means Algorithm (GKA).
We present experiments that provide the strength of our new proposed algorithm in
discovering clusters with different non-convex shapes, sizes, densities, noise, outliers and
higher accuracy. These experiments show the superiority of our proposed algorithm when
comparing with K-means algorithm.

iii
Table of Contents

Certificate ……………………………………………………………………….. i

Acknowledgement………………………………………………………………. ii

Abstract……………………………………………………….…………………. iii

Table of Contents……………………………………………….……………….. iv

List of Figures…………………………………………….……………………... vi

List of Table………………………………………….………………………….. vii

1. Introduction………………………………………………………................ 1

1.1 Introduction to Machine Learning…………………………………… 1

1.2 Unsupervised Learning……………………………………………… 1
1.2.1 Clustering……………………………………………….. 2
1.3 Types of Clustering ………………………………………………….. 3
1.3.1 Partitioning Methods……………………………………. 3
1.3.2 Hierarchical Clustering…………………………………. 3
1.3.3 Fuzzy Clustering………………………………………… 4
1.3.4 Model Based Clustering………………………………… 4

1.3.5 Density Based Clustering……………………………….. 4

1.4 Comparison of Clusters………………………………………………. 4

1.4.1 Euclidian Distance………………………………………. 5

1.4.2 Manhattan Distance……………………………………… 5

5
1.4.3 Edit Distance……………………………………………..
5
1.4.4 Hamming Distance……………………………………….
5
1.5 Techniques to find the optimum number of Clusters………………….
5
1.5.1 Elbow method…………………………………………….
6
1.5.2 Average silhouette method……………………………….
7
1.5.3 Gap Statistical method……………………………………
8
1.5.4 Davies Bouldin Index…………………………………….
8
1.5.5 Dunn Index……………………………………………….
8
1.6 Standard K-Means Algorithm…………………………………………
9
1.6.1 Flowchart of standard K-means………………………….
iv
1.6.2 Drawbacks of standard K means clustering……………… 9
1.6.3 Example of K-means…………………………………….. 10
1.7 Genetic Algorithm…………………………………………………….. 11
1.7.1 Flow chart of genetic algorithm…………………………. 14
2. Literature Survey…………………………………………………………… 15
2.1 Clustering……………………………………………………….…… 15
2.2 Partitioning Clustering………………………………………………. 15
2.3 Unsupervised learning technique……………………………………. 16
2.4 Conclusion…………………………………………………………… 18

3. System Design and Development…………………………………………… 19

3.1 Problem Statement…………………………………………………… 19

3.2 Research Gap………………………………………………………… 19
3.3 Objectives……………………………………………………………. 20
3.4 Research Methodology……………………………………………… 20
3.5 Proposed Hybrid Technique…………………………………………
20
3.6 Basic Genetic Algorithm…………………………………………..…
21
3.6.1 Application of genetic algorithm……………………………
23
3.6.2 Example of MaxOne using genetic algorithm………………
23
3.7 Proposed Algorithm……………………………………...……………
3.7.1 Example of proposed Algorithm……………………………. 23
4. Experiments and Result Analysis……………………………………………. 27
4.1 Implementation of Proposed Technique……………………………… 36
4.1.1 Iris Dataset for Implementation…………………………….. 36
4.2 Experimental Results…………………………………………………. 36
4.2.1 Confusion Matrix of K-means clustering……………………
38
4.2.2 Confusion Matrix of Genetic K-means clustering………….. 39
4.2.3 Test for performance Accuracy…………………………….. 40
4.2.4 Calculation of Intra cluster distance………………………… 41
5. Conclusion and Future Scope………………………………………………... 42
5.1 Conclusion……………………………………………………………. 45
5.2 Limitations……………………………………………………………. 45
5.3 Future Scope………………………………………………………….. 45
References………………………………………………………………………. 46

v
List of Figures

Figure No. Description Page No.

1.1 Inter and Intra Similarities of Cluster ............................................................. 2

1.2 Evaluation Graph of Elbow Method............................................................... 6

1.3 Evaluation Graph of Silhouette Method .......................................................... 7

1.4 Evaluation Graph of Gap Statistical Method................................................... 7

1.5 Flowchart of standard K-means Clustering ..................................................... 9

1.6 Clustering on Iris Dataset .............................................................................. 11

1.7 Genetic Algorithm Chromosomes and Population ........................................ 12

1.8 Execution Steps of Genetic Algorithm ......................................................... 14

3.1 Implementation Methodology for Clustering of dataset .............................. 21

3.2 Process of Genetic Algorithm ...................................................................... 22

3.3 Flowchart of Proposed Algorithm ................................................................ 27

4.1 Code of K-means Clustering on Iris Dataset................................................ 37

4.2 Code of Genetic K-means Clustering on Iris dataset ................................... 38

4.3 Confusion Matrix obtained from K-means Algorithm ................................ 39

4.4 Confusion Matrix obtained from Genetic K-means Algorithm................... 40

vi
List of Tables

Table No. Description Page No.

1.1 Crossover Operation on chromosome S1 and S3 ............................................... 13

1.2 Result of Crossover on Chromosome S1 and S3 ................................................ 13

1.3 Mutation Operation on Chromosome S1 and S3 ................................................ 13

1.4 Result of Mutation on Chromosome S1 and S3 ................................................. 13

3.1 Initialization of Chromosome for MaxOne Problem .......................................... 23

3.2 Arrangement of Chromosome based on Fitness value ....................................... 24

3.3 Crossover of chromosome S1 and S3 ................................................................. 25

3.4 Crossover Result of chromosome S1 and S3 ...................................................... 25

3.5 Crossover of chromosome S2 and S4 ................................................................. 25

3.6 Crossover Result of chromosome S2 and S4 ...................................................... 25

3.7 Crossover of chromosome S5 and S6 ................................................................. 26

3.8 Crossover Result of chromosome S5 and S6 ...................................................... 26

3.9 Mutation Result of chromosomes ....................................................................... 26

3.10 Iris dataset for Genetic K-means Clustering ..................................................... 31

3.11 Normalized dataset for Genetic K-means Clustering....................................... 32

3.12 Selected Row Indices and Chromosomes ........................................................ 33

3.13 Calculated Distance and Assignment of cluster ............................................... 34

3.14 Clusters obtained for Fifteen Records .............................................................. 34

4.1 Accuracy obtained from K-means and Genetic Algorithm ................................ 41

4.2 Intra Cluster distance using K-means algorithm ................................................ 42

4.3 Intra Cluster distance using Proposed algorithm ................................................ 42

4.4 Inter Cluster distance using K-means algorithm................................................. 43

4.5 Inter Cluster distance using Proposed algorithm ................................................ 44

vii

1
No ratings yet
1
76 pages
Certificate
No ratings yet
Certificate
11 pages
Clustering Algorithms for CS Students
No ratings yet
Clustering Algorithms for CS Students
10 pages
Clustering Techniques in Machine Learning
No ratings yet
Clustering Techniques in Machine Learning
77 pages
Genedata
No ratings yet
Genedata
67 pages
ML 8
No ratings yet
ML 8
5 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
No ratings yet
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
8 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
Symmetry 13 01789 v2
No ratings yet
Symmetry 13 01789 v2
15 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
1120pm - 85.epra Journals 8308
No ratings yet
1120pm - 85.epra Journals 8308
7 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
7 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Ijcset 2016060701
No ratings yet
Ijcset 2016060701
3 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
18 pages
Genetic K-Means Algorithm
No ratings yet
Genetic K-Means Algorithm
7 pages
Fast and Robust General Purpose Clustering Algorit
No ratings yet
Fast and Robust General Purpose Clustering Algorit
29 pages
1 IJISAE Yemona
No ratings yet
1 IJISAE Yemona
15 pages
Genetic K-Means Algorithm: Conf., 1987, Pp. 50-58
No ratings yet
Genetic K-Means Algorithm: Conf., 1987, Pp. 50-58
7 pages
GA Clustering
No ratings yet
GA Clustering
6 pages
Influence of Machining Parameter On Concentricity of The Hole On VMC Machining Using RSM (Central Composite Design)
No ratings yet
Influence of Machining Parameter On Concentricity of The Hole On VMC Machining Using RSM (Central Composite Design)
8 pages
Clustering & Classification Metrics
No ratings yet
Clustering & Classification Metrics
13 pages
Data Clustering and Algorithm: Seema Yadav
No ratings yet
Data Clustering and Algorithm: Seema Yadav
2 pages
Expert Systems With Applications: Jing Xiao, Yuping Yan, Jun Zhang, Yong Tang
No ratings yet
Expert Systems With Applications: Jing Xiao, Yuping Yan, Jun Zhang, Yong Tang
8 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
A K-Means Based Genetic Algorithm For Data Clustering: Advances in Intelligent Systems and Computing October 2017
No ratings yet
A K-Means Based Genetic Algorithm For Data Clustering: Advances in Intelligent Systems and Computing October 2017
12 pages
K-Means Clustering BI Tool Report
No ratings yet
K-Means Clustering BI Tool Report
24 pages
Automatic Clustering With Single Optimal Solution
No ratings yet
Automatic Clustering With Single Optimal Solution
13 pages
1287 مظفری
No ratings yet
1287 مظفری
3 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
An Efficient GA-based Clustering Technique: Hwei-Jen Lin, Fu-Wen Yang and Yang-Ta Kao
No ratings yet
An Efficient GA-based Clustering Technique: Hwei-Jen Lin, Fu-Wen Yang and Yang-Ta Kao
10 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Iterative Improved K-Means Clusterin
No ratings yet
Iterative Improved K-Means Clusterin
5 pages
2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
No ratings yet
2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
43 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Statistical Considerations On The K - Means Algorithm
No ratings yet
Statistical Considerations On The K - Means Algorithm
9 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
No ratings yet
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
5 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Video 18
No ratings yet
Video 18
17 pages
I Jcs It 20140506204
No ratings yet
I Jcs It 20140506204
4 pages
Optimization of Clustering Algorithm Using Metaheuristic: Ayushi Sinha, Mr. Manish Mahajan
No ratings yet
Optimization of Clustering Algorithm Using Metaheuristic: Ayushi Sinha, Mr. Manish Mahajan
5 pages
ML Unit-4-1
No ratings yet
ML Unit-4-1
39 pages
Unsupervised Learning: Clustering Techniques
No ratings yet
Unsupervised Learning: Clustering Techniques
14 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
PRJ C MR 18
No ratings yet
PRJ C MR 18
4 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Clustering
No ratings yet
Clustering
11 pages
Ambo University: Inistitute of Technology
No ratings yet
Ambo University: Inistitute of Technology
15 pages
Lab Manual 6
No ratings yet
Lab Manual 6
10 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
Expert Systems With Applications: D. Binu
No ratings yet
Expert Systems With Applications: D. Binu
12 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
English Exam for Grade XI Students
No ratings yet
English Exam for Grade XI Students
7 pages
Effect of Fermentation On The Proximate Composition of The Epicarp Ofwatermelon Citrullus Lanatus 2090 4908 1000143
No ratings yet
Effect of Fermentation On The Proximate Composition of The Epicarp Ofwatermelon Citrullus Lanatus 2090 4908 1000143
5 pages
Diatomeas Epilíticas Como Indicadores de La Calidad Del Agua en La Cuenca Alta Del Río Lerma
No ratings yet
Diatomeas Epilíticas Como Indicadores de La Calidad Del Agua en La Cuenca Alta Del Río Lerma
12 pages
DNA Extraction Worksheet
No ratings yet
DNA Extraction Worksheet
11 pages
Argumentation-Bardaje, Betito, Garbeles-SC2C
No ratings yet
Argumentation-Bardaje, Betito, Garbeles-SC2C
4 pages
Biomaterials & Medicine Seminar 2011
No ratings yet
Biomaterials & Medicine Seminar 2011
6 pages
Instrument
No ratings yet
Instrument
36 pages
Vocabulary List 2
No ratings yet
Vocabulary List 2
2 pages
Lampiran 3 - Senarai Bidang
No ratings yet
Lampiran 3 - Senarai Bidang
4 pages
Modelos de E-Mail e Follow Up
No ratings yet
Modelos de E-Mail e Follow Up
4 pages
L4 Bile
No ratings yet
L4 Bile
10 pages
52 Biliary Stone Formation 2001 Surgical Research
No ratings yet
52 Biliary Stone Formation 2001 Surgical Research
12 pages
Cambridge IGCSE™: Environmental Management 0680/23
No ratings yet
Cambridge IGCSE™: Environmental Management 0680/23
12 pages
Chapter 7 Science of Cryobiology
No ratings yet
Chapter 7 Science of Cryobiology
27 pages
Bacteriophages and Their Applications PDF
No ratings yet
Bacteriophages and Their Applications PDF
12 pages
HPLC Method Validation Guide
No ratings yet
HPLC Method Validation Guide
34 pages
Theories of Personality 8th Edition Feist Test Bank 2025 Instant Download
No ratings yet
Theories of Personality 8th Edition Feist Test Bank 2025 Instant Download
97 pages
Kaged Muscle Magazine Issue 1
100% (1)
Kaged Muscle Magazine Issue 1
41 pages
History of Photosynthesis Discoveries
No ratings yet
History of Photosynthesis Discoveries
3 pages
End of Term 2 Year 8 Biology
No ratings yet
End of Term 2 Year 8 Biology
16 pages
Maders Understanding Human Anatomy and Physiology 9th Edition Longenbaker Solutions Manual
100% (43)
Maders Understanding Human Anatomy and Physiology 9th Edition Longenbaker Solutions Manual
2 pages
04 Lecture Animation Cell
100% (1)
04 Lecture Animation Cell
72 pages
IAL Edexcel Biology Unit 4 Sorted by Topic
75% (4)
IAL Edexcel Biology Unit 4 Sorted by Topic
22 pages
Class 9 Science MLL
No ratings yet
Class 9 Science MLL
37 pages
Basics of Biofiltration
No ratings yet
Basics of Biofiltration
2 pages
Lesson 4 - 16.2 - Part 3
No ratings yet
Lesson 4 - 16.2 - Part 3
11 pages
Capillarys Protein (E) 6
No ratings yet
Capillarys Protein (E) 6
11 pages
Chapter 12
No ratings yet
Chapter 12
55 pages
Advanced Techniques in Diagnostic Microbiology Volume 1 Techniques Yi-Wei Tang Available Instanly
No ratings yet
Advanced Techniques in Diagnostic Microbiology Volume 1 Techniques Yi-Wei Tang Available Instanly
86 pages
Magnification Questions
No ratings yet
Magnification Questions
4 pages

Certificate

Uploaded by

Certificate

Uploaded by

Certificate

Vatsal Singh, 191286

Dr. Monika Bharti

List of Table………………………………………….………………………….. vii

1.1 Introduction to Machine Learning…………………………………… 1

1.3.5 Density Based Clustering……………………………….. 4

1.4 Comparison of Clusters………………………………………………. 4

1.4.1 Euclidian Distance………………………………………. 5

1.4.2 Manhattan Distance……………………………………… 5

3. System Design and Development…………………………………………… 19

3.1 Problem Statement…………………………………………………… 19

Figure No. Description Page No.

1.1 Inter and Intra Similarities of Cluster ............................................................. 2

1.3 Evaluation Graph of Silhouette Method .......................................................... 7

1.4 Evaluation Graph of Gap Statistical Method................................................... 7

1.5 Flowchart of standard K-means Clustering ..................................................... 9

1.6 Clustering on Iris Dataset .............................................................................. 11

1.7 Genetic Algorithm Chromosomes and Population ........................................ 12

1.8 Execution Steps of Genetic Algorithm ......................................................... 14

3.1 Implementation Methodology for Clustering of dataset .............................. 21

3.2 Process of Genetic Algorithm ...................................................................... 22

3.3 Flowchart of Proposed Algorithm ................................................................ 27

4.1 Code of K-means Clustering on Iris Dataset................................................ 37

4.2 Code of Genetic K-means Clustering on Iris dataset ................................... 38

4.3 Confusion Matrix obtained from K-means Algorithm ................................ 39

4.4 Confusion Matrix obtained from Genetic K-means Algorithm................... 40

Table No. Description Page No.

1.1 Crossover Operation on chromosome S1 and S3 ............................................... 13

1.2 Result of Crossover on Chromosome S1 and S3 ................................................ 13

1.3 Mutation Operation on Chromosome S1 and S3 ................................................ 13

1.4 Result of Mutation on Chromosome S1 and S3 ................................................. 13

3.1 Initialization of Chromosome for MaxOne Problem .......................................... 23

3.2 Arrangement of Chromosome based on Fitness value ....................................... 24

3.3 Crossover of chromosome S1 and S3 ................................................................. 25

3.4 Crossover Result of chromosome S1 and S3 ...................................................... 25

3.5 Crossover of chromosome S2 and S4 ................................................................. 25

3.6 Crossover Result of chromosome S2 and S4 ...................................................... 25

3.7 Crossover of chromosome S5 and S6 ................................................................. 26

3.8 Crossover Result of chromosome S5 and S6 ...................................................... 26

3.9 Mutation Result of chromosomes ....................................................................... 26

3.10 Iris dataset for Genetic K-means Clustering ..................................................... 31

3.11 Normalized dataset for Genetic K-means Clustering....................................... 32

3.12 Selected Row Indices and Chromosomes ........................................................ 33

3.13 Calculated Distance and Assignment of cluster ............................................... 34

3.14 Clusters obtained for Fifteen Records .............................................................. 34

4.1 Accuracy obtained from K-means and Genetic Algorithm ................................ 41

4.2 Intra Cluster distance using K-means algorithm ................................................ 42

4.3 Intra Cluster distance using Proposed algorithm ................................................ 42

4.4 Inter Cluster distance using K-means algorithm................................................. 43

4.5 Inter Cluster distance using Proposed algorithm ................................................ 44

You might also like