0% found this document useful (0 votes)

48 views9 pages

Unit - IV Partitioning Algorithm

Partitional algorithms create clusters in a single step, requiring the user to specify the number of clusters, k, and utilize a criterion function to evaluate the quality of clustering. Common algorithms include K-Means, PAM, and the Nearest Neighbor Algorithm, each with varying complexities and methodologies for clustering data. Techniques such as Genetic Algorithms and Self-Organizing Feature Maps also contribute to clustering by leveraging different approaches to data organization and pattern recognition.

Uploaded by

Suja Mary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views9 pages

Unit - IV Partitioning Algorithm

Uploaded by

Suja Mary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit- IV

Partitional Algorithms
 Nonhierarchical or partitional clustering creates the clusters in one step as opposed to
several steps. Only one set of clusters is created and user must input the desired number k
of clusters.
 In addition some metric of criterion function is used to determine the goodness of any
proposed solution. This measure of quality could be the average distance between
clusters or some other metric. The solution with the best value for the criterion function is
the clustering solution used.
 One common measure is a squared error metric, which measures the squared distance
from each point to the centroid for the associated cluster:

 Partitional algorithms suffer from a combinatorial explosion due to the number of

possible solutions. Thus, most algorithms look only at a small subset of all clusters.

Most well known algorithms are:

1. Minimum Spanning Tree

2. Squared Error Clustering Algorithm
3. K-Means Clustering
4. Nearest Neighbor Algorithm
5. PAM Algorithm
6. Bond Energy Algorithm
7. Clustering with Genetic Algorithms
8. Clustering with Neural Networks

Minimum Spanning Tree

MST, produces a minimum spanning tree given an adjacency matrix as input. The clusters are
merged in increasing order of the distance found in the MST.

A partitional MST algorithm is a very simplistic approach, but it illustrates how partitional
algorithms work
Since the clustering problem is to define a mapping, the output of this algorithm shows the
clusters as a set of ordered pairs (ti , j) where f (ti ) = Kj .

The time complexity of this algorithm is again dominated by the MST procedure, which is
O(n2 ). At most, k - 1 edges will be removed, so the last three steps of the algorithm, assuming
each step takes a constant time, is only O(k - 1).

Squared Error Clustering Algorithm

The squared error for a cluster is the sum of the squared Euclidean distances between each
element in the cluster and the cluster centroid, Ck. Given a cluster Ki, let the set of items mapped
to that cluster be {til, ti2, . . . , tim}. The squared error is defined as
For each iteration in the squared error algorithm, each tuple is assigned to the cluster with the
closest center. Since there are k clusters and n items, this is an O(kn) operation.

K-Means Clustering

K-means is an iterative clustering algorithm in which items are moved among sets of clusters
until the desired set is reached.

The cluster mean of Ki = {tn, ti2, . . . , tim} is defined as

This definition assumes that each tuple has only one numeric value as opposed to a tuple with
many attribute values.

This algorithm assumes that the desired number of clusters, k, is an input parameter.

The time complexity of K-means is O(tkn) where t is the number of iterations.

Although the K-means algorithm often produces good results, 1t 1s not t1me-effic1ent and does
not scale well.

Nearest Neighbor Algorithm

An algorithm similar to the single link technique is called the nearest neighbor algorithm. With
this serial algorithm, items are iteratively merged into the existing clusters that are closest. In this
algorithm a threshold, t, is used to determine if items will be added to existing clusters or if a
new cluster is created.
The complexity of the nearest neighbor algorithm actually depends on the number of items. For
each loop, each item must be compared to each item already in a cluster. Obviously, this is n in
the worst case. Thus, the time complexity is O(n2 ).

PAM Algorithm

The PAM (partitioning around medoids) algorithm also called, the K-medoids algorithm
represents a cluster by a medoid.

The PAM algorithm is shown below

Initially, a random set of k items is taken to be the set of medoids. Then at each step, all items
from the input dataset that are not currently medoids are examined one by one to see if they
should be medoids. That is, the algorithm determines whether there is an item that should replace
one of the existing medoids.
To use Cjih to be the cost change for an item f j associated with swapping medoid t; with non-
medoid fh. The cost is the change to the sum of all distances from items to their cluster medoids.
There are four cases that must be examined when calculating this cost:

The total impact to quality by a medoid change TCih then is given by

PAM does not scale well to large datasets because of its computational complexity. For each
iteration, we have k(n -k) pairs of objects i, h for which a cost, TCih, should be determined.

CLARA (Clustering LARge Applications) improves on the time complexity of PAM by using
samples of the dataset. The basic idea is that it applies PAM to a sample of the underlying
database and then uses the medoids found as the medoids for the complete clustering.

CLARANS (clustering large applications based upon randomized search) improves on CLARA
by using multiple different samples. In addition to the normal input to PAM, CLARANS
requires two additional parameters: maxneighbor and numlocal. Maxneighbor is the number of
neighbors of a node to which any specific node can be compared.

Bond Energy Algorithm

 The bond energy algorithm (BEA) was developed and has been used in the database
design area to determine how to group data and how to physically place data on a desk.
 It can be used to cluster attributes based on usage and then perform logical or physical
design accordingly.
 With BEA, the affinity (bond) between database attributes is based on common usage.
This bond is used by the clustering algorithm as a similarity measure.
 The actual measure counts the number of times the two attributes are used together in a
given time. To find this, all common queries must be identified.
 In a distributed database, each resulting cluster is called vertical fragment and may by
stored at different sites from other fragments.
 The basic steps of this clustering algorithm are: ·
1. Create an attribute affinity matrix in which each entry indicates the affinity between
the two associate attributes. The entries in the similarity matrix are based on the
frequency of common usage of attribute pairs.
2. The BEA then converts this similarity matrix to a BOND matrix in which the entries
represent a type of nearest neighbor bonding based on probability of co access. The BEA
algorithm rearranges rows or columns so that similar attributes appear dose together in
the matrix.
3. Finally, the designer draws boxes around regions in the matrix with high similarity.

Two attributes Ai and Aj have a high affinity if they are frequently used together in database
applications. At the heart of the BEA algorithm is the global affinity measure. Suppose that a
database schema consists of n attributes {A1, A2, . . . , An}. The global affinity measure, AM, is
defined as

Clustering with Genetic Algorithms

The genetic algorithms, determine how to represent each cluster. One simple approach would be
to use a bit-map representation for each possible cluster.

Algorithm shows one possible iterative refinement technique for clustering that uses a genetic
algorithm. The approach is similar to that in the squared error approach in that an initial random
solution is given and successive changes to this converge on a local optimum. A new solution is
generated from the previous solution using crossover and mutation operations.

Self-Organizing Feature Maps.

A self-organizing fe ature map (SOFM) or self organizing map (SOM) is an NN approach that
uses competitive unsupervised learning. Learning is based on the concept that the behavior of a
node should impact only those nodes and arcs near it. Weights are initially assigned randomly
and adjusted during the learning process to produce better results. During this learning process,
hidden features or patterns in the data ,are uncovered and the weights are adjusted accordingly.
SOFMs were developed by obsefving how neurons work in the brain and in ANNs. That is :
\• The firing of neurons impact the firing of other neurons that are near it.

• Neurons that are far apart seem to inhibit each other.

• Neurons seem to have specific nonoverlapping tasks.

The term self-organizing indicates the ability of these NNs to organize the nodes into clusters
based on the similarity between them.

Example

SOFM is the Kohonen self-organizing map, which is used extensively in commercial data
mining products to perform clustering

Each input node is connected to each node in this grid. Propagation occurs by sending the input
value for each input node to each node in the competitive layer. As with regular NNs, each arc
has an associated weight and each node in the competitive layer has an activation function.

A common approach is to initialize the weights on the input arcs to the com petitive layer with
normalized values. The similarity between output nodes and input vectors is then determined by
the dot product of the two vectors. Given an input tuple X = (x1, . . . , xh) and weights on arcs
input to a competitive node i as WJi, . . . , Whi, the similarity between X and i can be calculated by

The learning process uses

In this formula, c indicates the learning rate and may actually vary based on the node rather than
being a constant

Data Mining: Association Rules & Clustering
No ratings yet
Data Mining: Association Rules & Clustering
82 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
44 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Clustering
No ratings yet
Clustering
28 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
CV Unit IV
No ratings yet
CV Unit IV
26 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
Module 3 Clustering
No ratings yet
Module 3 Clustering
57 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering Algorithms in Data Mining Review
No ratings yet
Clustering Algorithms in Data Mining Review
7 pages
PSO in Data Mining Clustering Analysis
No ratings yet
PSO in Data Mining Clustering Analysis
19 pages
Machine Learning IV
No ratings yet
Machine Learning IV
54 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Module-5 Clustering Algorithms
No ratings yet
Module-5 Clustering Algorithms
44 pages
Clustering and Dimensionality Reduction Techniques
No ratings yet
Clustering and Dimensionality Reduction Techniques
24 pages
Clustering Methods: K-Means & K-Medoids
No ratings yet
Clustering Methods: K-Means & K-Medoids
28 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Cluster
No ratings yet
Cluster
20 pages
ML 8
No ratings yet
ML 8
5 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
Clustering
No ratings yet
Clustering
45 pages
K-Means Clustering in Data Mining
No ratings yet
K-Means Clustering in Data Mining
5 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
50 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
40 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
No ratings yet
Datamining and Dataware Housing With Special Reference TO Partitional Algorithms in Clustering of Data Mining
10 pages
Clustering
No ratings yet
Clustering
35 pages
Text Clustering and Validation For Web Search Results
No ratings yet
Text Clustering and Validation For Web Search Results
7 pages
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
No ratings yet
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
59 pages
Unsupervised ML
No ratings yet
Unsupervised ML
15 pages
M5
No ratings yet
M5
40 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Fast and Eager - Medoids Clustering: Runtime Improvement of The PAM, CLARA, and CLARANS Algorithms
No ratings yet
Fast and Eager - Medoids Clustering: Runtime Improvement of The PAM, CLARA, and CLARANS Algorithms
41 pages
MLT Unit 1 Vaishali
No ratings yet
MLT Unit 1 Vaishali
44 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Module 3
No ratings yet
Module 3
21 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Clustering
No ratings yet
Clustering
34 pages
Data Clustering & Classification Guide
No ratings yet
Data Clustering & Classification Guide
60 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Face Recognization
No ratings yet
Face Recognization
6 pages
Unit-1 Data Structure
No ratings yet
Unit-1 Data Structure
20 pages
Social Media
No ratings yet
Social Media
18 pages
Unit-1 Introduction of Big Data
No ratings yet
Unit-1 Introduction of Big Data
17 pages
Unit-1 State of the Practice in Analytics
No ratings yet
Unit-1 State of the Practice in Analytics
24 pages
Unit-1 Data Analytics Lifecycle
No ratings yet
Unit-1 Data Analytics Lifecycle
57 pages
ID3 pgm
No ratings yet
ID3 pgm
3 pages
Thought for the Day
No ratings yet
Thought for the Day
1 page
Decision Tree - Unit3
No ratings yet
Decision Tree - Unit3
21 pages
Ds Plan of Work 2025-26
No ratings yet
Ds Plan of Work 2025-26
2 pages
Normalization in DBMS
No ratings yet
Normalization in DBMS
22 pages
Agriculture Paper
No ratings yet
Agriculture Paper
5 pages
Unit-1 Data Mining Metrics
100% (1)
Unit-1 Data Mining Metrics
2 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Sequential Storage
No ratings yet
Sequential Storage
9 pages
Unit-1 Control Statement
No ratings yet
Unit-1 Control Statement
15 pages
Unit IV Recommender System
No ratings yet
Unit IV Recommender System
5 pages
Regression Model Diagnostics Overview
No ratings yet
Regression Model Diagnostics Overview
8 pages
Pandas & NumPy Data Analysis Guide
No ratings yet
Pandas & NumPy Data Analysis Guide
11 pages
Programs
No ratings yet
Programs
10 pages
Compal Electronics Engineering Drawings
No ratings yet
Compal Electronics Engineering Drawings
53 pages
Registration For Unisys India Recruitment Drive 2025 Graduated Batch
No ratings yet
Registration For Unisys India Recruitment Drive 2025 Graduated Batch
2 pages
IdeaLLiance - PRISM, Publishing Requirements For Industry Standard Metadata v2.2 - The PRISM-PAM Inline Markup Specification, 2014
No ratings yet
IdeaLLiance - PRISM, Publishing Requirements For Industry Standard Metadata v2.2 - The PRISM-PAM Inline Markup Specification, 2014
19 pages
2 3 A Glassbox
No ratings yet
2 3 A Glassbox
13 pages
Silicon PNP Power Transistors: TIP42/42A/42B/42C
No ratings yet
Silicon PNP Power Transistors: TIP42/42A/42B/42C
4 pages
BUS800 All Sections D2L Connect Fdoc W25
No ratings yet
BUS800 All Sections D2L Connect Fdoc W25
15 pages
Exam SOPs for Software Students
No ratings yet
Exam SOPs for Software Students
3 pages
Sphera Duo Data Sheet
No ratings yet
Sphera Duo Data Sheet
2 pages
Jeppesen Legend
100% (1)
Jeppesen Legend
11 pages
and 330425 Accelerometer User Guide - 127088
No ratings yet
and 330425 Accelerometer User Guide - 127088
22 pages
Readme (Edrw)
No ratings yet
Readme (Edrw)
2 pages
Complete Advanced Progress Tests Organized
No ratings yet
Complete Advanced Progress Tests Organized
7 pages
Is7 Troubleshooting ENG Rev1.0 150902
No ratings yet
Is7 Troubleshooting ENG Rev1.0 150902
48 pages
Java Applet Lifecycle and Parameters
No ratings yet
Java Applet Lifecycle and Parameters
27 pages
Essential Patent Search Tools Guide
No ratings yet
Essential Patent Search Tools Guide
34 pages
Myricom MVA v2.0.1.0 API Manual 2021 05 14
No ratings yet
Myricom MVA v2.0.1.0 API Manual 2021 05 14
47 pages
Shopee Mass Edit User Guide (My)
No ratings yet
Shopee Mass Edit User Guide (My)
26 pages
HPCZ Health Practitioner Renewal On-Line Guide With CPD Component - Re-Editted
No ratings yet
HPCZ Health Practitioner Renewal On-Line Guide With CPD Component - Re-Editted
8 pages
Class 11 Asseration Reason Informatics Practices CHP 1 (2024-25)
No ratings yet
Class 11 Asseration Reason Informatics Practices CHP 1 (2024-25)
25 pages
LRM2601 Assessment 02 (2025 S1) Template
No ratings yet
LRM2601 Assessment 02 (2025 S1) Template
5 pages
Soft Computing PPT 071
No ratings yet
Soft Computing PPT 071
8 pages
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
No ratings yet
Lütkepohl & Krätzig 2004 Applied Time Series Econometrics
350 pages
Douglas C. Montgomery - Supplemental Text Material For Design and Analysis of Experiments (2019)
No ratings yet
Douglas C. Montgomery - Supplemental Text Material For Design and Analysis of Experiments (2019)
179 pages
TrackUnit Raw TU600
No ratings yet
TrackUnit Raw TU600
6 pages
Chapter 7 Data Analytics and Visualisation
No ratings yet
Chapter 7 Data Analytics and Visualisation
7 pages
TD6347S ToshibaSemiconductor
No ratings yet
TD6347S ToshibaSemiconductor
9 pages
Knowledge Capture and Codification
100% (1)
Knowledge Capture and Codification
45 pages
LJBA - Key To Correction
No ratings yet
LJBA - Key To Correction
1 page
Unit 3, Pharmaceutical Engineering, B Pharmacy 3rd Sem, Carewell Pharma
No ratings yet
Unit 3, Pharmaceutical Engineering, B Pharmacy 3rd Sem, Carewell Pharma
55 pages
Flight Dynamics Simulator Development
100% (1)
Flight Dynamics Simulator Development
16 pages

Unit - IV Partitioning Algorithm

Uploaded by

Unit - IV Partitioning Algorithm

Uploaded by

Unit- IV

 Partitional algorithms suffer from a combinatorial explosion due to the number of

Most well known algorithms are:

1. Minimum Spanning Tree

Minimum Spanning Tree

Squared Error Clustering Algorithm

The cluster mean of Ki = {tn, ti2, . . . , tim} is defined as

The time complexity of K-means is O(tkn) where t is the number of iterations.

Nearest Neighbor Algorithm

The PAM algorithm is shown below

The total impact to quality by a medoid change TCih then is given by

Bond Energy Algorithm

Clustering with Genetic Algorithms

Self-Organizing Feature Maps.

• Neurons that are far apart seem to inhibit each other.

• Neurons seem to have specific nonoverlapping tasks.

The learning process uses

You might also like