0% found this document useful (0 votes)

24 views10 pages

Pso-Data Clustering

Uploaded by

priya ramesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views10 pages

Pso-Data Clustering

Uploaded by

priya ramesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.

1, January 2008 203

Comparative evaluation of Particle Swarm Optimization Algorithms

for Data Clustering using real world data sets

R.Karthi$, S.Arumugam#, K. Rameshkumar@

$
Department of Computer Science and Engineering, Amrita School of Engineering,
Coimbatore – 641 105, India.
#
Bannariamman Educational trust, Coimbatore - 641018
@
Department of Mechanical Engineering, Amrita School of Engineering,
Coimbatore – 641 105, India.
[email protected] (R.Karthi) (corresponding author)

Particle Swarm Optimization (PSO). Clustering techniques

Summary based on Evolutionary Computing and Swarm Intelligence
In this paper, well-known PSO algorithms reported in the literature algorithms have outperformed many classical methods of
for solving continuous function optimization problems were clustering.
comparatively evaluated by considering real world data clustering
problems. Data clustering problems are solved, by considering PSO was first introduced to optimize various continuous
three performance clustering metrics such as TRace Within criteria nonlinear functions by Kennedy and Eberhart [2]. PSO
(TRW), Variance Ratio Criteria (VRC) and Marriott Criteria (MC). algorithms have shown to successfully optimize a wide
The results obtained by the PSO variants were compared with the range of continuous functions. Many variants of PSO
basic PSO algorithm, Genetic algorithm and Differential evolution algorithms were developed over the years and applied to
algorithms. A detailed performance analysis has been carried out to
solve the various optimization problems. Literature review
study the convergence behavior of the PSO algorithms using run
reveals that only few attempts has been made to solve the
length distribution.
Key words: clustering problem using PSO algorithms and also there is
Data clustering, Particle Swarm Optimization, Genetic no cross comparison among many PSO variants derived
over the years for solving clustering problems. The
Algorithm, Differential Evolution Algorithm, Trace Within
performance of the well-known PSO algorithms are studied
criteria, Variance Ratio Criteria, Marriott Criteria.
with the consideration of three clustering metrics such as
1. Introduction TRace Within criteria (TRW), Variance Ratio Criteria
(VRC) and Marriott Criteria (MC) using real world data
Clustering is a technique that attempts to organize sets. The results are compared with the published results of
unlabeled data objects into clusters or groups of similar the basic PSO, GA and DE for all the clustering metrics. A
objects. A cluster is a collection of data objects that are detailed performance analysis of the PSO algorithms has
similar to one another with in the same cluster and are been carried out based on Run Length Distribution (RLD).
dissimilar to objects in other clusters. Clustering techniques
have been used in a variety of fields like machine learning, The remaining part of the paper is organized as follows:
artificial intelligence, web mining, image segmentation, life Section 2 defines the formal clustering problem. In Section
science and medicine, earth science, social science and 3, the basic PSO algorithm and its variants are discussed.
economics. A comprehensive review of the state-of-the-art Section 4 describes the PSO algorithm for data clustering.
clustering methods can be found in Xu and Wunsch, [1]. In Section 5 presents the benchmark data sets and parameter
recent years, due to the increasing computational speed of settings used for experimentation. Section 6 presents the
computers, heuristics are used to solve clustering problems. results obtained by PSO variants. A detailed analysis based
Various heuristic algorithms have already been proposed in on Run Length Distribution (RLD) is provided. In Section
the literature such as Genetic Algorithm (GA), Ant Colony 7 some conclusions from this study are reported.
Optimization (ACO), Differential Evolution (DE) and

Manuscript received January 5, 2008

Manuscript revised January 20, 2008
204 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008

2. Data clustering problem formulation Minimization of TRace Within criteria (TRW): This
criterion is based on pooled within groups scatter matrix W.
2.1 Notations Used: The pooled within scatter matrix W is defined as
W = å K W , where Wk is the variance matrix of the
N the number of data objects to be clustered. k =1 k
D the dimension of each of the data objects. data object allocated to cluster Ck , where k = {1,......, K } .
'
nk r r r r
K the number of clusters . Wk = å
l =1
( Olk - O k )( Olk - O k ) (1)
O set of N data objects to be clustered, where, r
}
r r r Olk indicate the l th data object in cluster Ck and nk is the
{
O = O1 , O2 ,......., ON .
number of objects in cluster Ck .
rk
Each data object is represented as: rk nk æ Ol ö
O = å l =1 ç (Vector of the centroid for the cluster Ck ).
r ç n ÷÷
Oi = {oi1 , oi 2 .............oid } , è k ø
Maximization of Variance Ratio Criteria (VRC): This
where, oid represents value of data i at dimension d. criterion is based on pooled within groups scatter matrix W
and between group scatter matrixes B. The between scatter
C set of K partitions with data objects assigned to
matrix B is defined as
each partition C = {Ci ,.............., CK } .
( )( )
K ur k ur ur k ur '
B = å nk O - O O - O , (2)
Z Cluster centers to which data objects are assigned, k =1
r r r r
{ }
Z = Z 1 , Z 2 ..........., Z k . r N O
where O = å i .
i =1 N
Each cluster center is represented as: The total scatter matrix T of N data objects is defined as
r T = B +W .
Z i = { zi1 , zi 2 .............zid } ,
æ ( trace ( B ) ) ö
where zid represents value of cluster i at dimension d. ç ÷
ç ( K - 1) ÷
Given O the set of data objects, the goal of partitional VRC = è ø (3)
clustering is to determine a partition {Ci ,.............., CK } æ ( trace (W ) ) ö
ç ÷
with the following constraints. ç (N - K ) ÷
è ø
Ck ¹ f , k = {1,......, K } Minimization of Marriott’s Criteria (MC): This
Ci Ç C j º f , where i ¹ j , i = {1,....., K } and j = {1,....., K } criterion is based on pooled within groups scatter matrix
K
W and total scatter matrix T.
UC k =O
MC = K ´
2 ( det (W )) (4)
( det (T ) )
k =1
r 1 r
Z i = å O ÎC Oi , where i = {1,....., K } , ni is the number of
ni i i
3. Introduction to PSO
elements belonging to cluster Ci .
PSO is a population based, cooperative search heuristic
In general, the data objects are assigned to clusters based on
introduced by Kennedy and Eberhart [2] to find optimal or
distance measures like Manhattan distance, Euclidean
near solutions to an unconstrained optimization problem.
distance and Minkowski distance [3]. In our study, the
The ideas that underlie PSO are inspired by the social
objects are assigned to cluster using the Euclidean distance
behavior of bird flocking and fish schooling. PSO is an
measure. Different statistical criteria or fitness measures
iterative method that is based on the search behavior of the
have been proposed in the literature to measure goodness
swarm in a multidimensional space. A particle i called
of a partition. In this paper, we have considered the fitness r
measures considered by Sandra and Krink [4] for Current, at time step ‘ t ’ has a position vector xit and a
comparing partitions generated by different clustering r
velocity vector vit . The fitness function ‘ f ’determine the
algorithms. The various fitness measures considered in this
quality of a particle’s position, i.e., a particle’s position
paper are as follows: represents a solution to the problem being solved. Each
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 205

ur
particle ‘ i ’ has a vector p i called Best that represents its 3.2 PSO algorithm in Clustering
own best position and has an associated fitness value. The PSO based clustering algorithm was first proposed by
ur
best position, the swarm has visited is stored in a vector g Merwe et al. [13]. Xiao et al. [14] proposed a hybrid
r approach to cluster the gene data. Self Organizing Maps
called G-Best. For each particle xit the velocity vector is (SOM) trains the weights of the nodes in the first stage and
updated according to (5). The particle moves to their new weights were optimized using PSO approach. Chen and Ye
r
position according to (6). For each particle xit the objective [15] employed a PSO representation in which each particle
function ‘ f ’ is evaluated. The best position of the particle corresponds to the centroids of the clusters. Two-
ur ur dimensional and three-dimensional data were used for
p i and global best position g are updated. evaluation. Orman et al. [16] proposed a dynamic
r r
( )
r r r
( )
r r r
vit +1 = vit + c1 × r1 pi - xit + c2 × r2 g - xit (5) clustering system based binary PSO and K-means
algorithm. The algorithm automatically identifies the
r t +1 r t r t +1
xi = xi + vi (6) number of clusters and employs a validity index to evaluate
Two constants c1 and c2 are called cognitive and social the clusters. Cohen et al., [17] proposed a Particle Swarm
Clustering (PSC) algorithm where each particle represents
acceleration coefficients, r1 and r2 are two uniformly
a centroid in the input data space. The whole population is
distributed random vectors. The algorithm iterates by needed to present the final clustering solution. Sandra and
updating the velocities and positions of the particles until Krink [4] compared the performance of Differential
the stopping criteria is met. Evolution (DE), Random Search (RS), PSO and GA for
3.1 PSO Variants partitional clustering problems. The empirical results show
that PSO and DE perform better compared to GA and K-
Several variations of this basic PSO scheme have been means algorithms. Recently, Swagatham et al. [18]
proposed in the literature for solving continuous proposed an automatic clustering technique using an
optimization problems. Shi and Eberhart [5,6] introduced improved differential evolution algorithm. In this work, we
the idea of a time varying inertia weight PSO model. This have considered the data sets used by Sandra and Krink [4]
was done to adjust the swarm’s behavior from initial to evaluate the performance of the following PSO variants.
exploration of entire search space to exploitation of
promising regions. Eberhart and Shi [7] proposed another a) Time varying inertia weight PSO model (SE-PSO)
inertia weight variation approach in which inertia weight is proposed by Shi and Eberhart [5,6].
randomly selected according to a uniform distribution in the b) Stochastic inertia weight PSO model (ES-PSO)
range [0.5, 1]. Clerc and Kennedy [8] introduced the proposed by Eberhart and Shi [7].
constriction factor in PSO to control the convergence c) Constriction type PSO model (CK-PSO) proposed by
properties of the particles. The constriction factor is Clerc and Kennedy [8].
multiplied by the entire equation 5 instead of inertia weight d) Self-organizing Hierarchical Particle Swarm Optimizer
ω in order to control the overall velocity of the swarm. In (R-PSO) with time varying acceleration coefficients
the Fully Informed Particle Swarm Optimizer proposed by proposed by Ratnaweera et al [10].
Mendes et al. [9], a particle uses information from all its e) Non linear inertia weight PSO model (CS-PSO)
topological neighbors rather than the best one to contribute proposed by Chatterjee and Siarry [12].
to its velocity adjustment. Ratnaweera et al. [10] proposed a 4. General Structure of PSO algorithm for
Self-organizing Hierarchical Particle Swarm Optimizer
data clustering
with time varying acceleration coefficients where only the
social and cognitive part of the particle are considered to Notations used:
estimate the new velocity of each particle. The particles are t iteration counter.
reinitialized when there is stagnation in the search space.
Janson and Middendorf [11] proposed an Adaptive T maximum number of iterations.
Hierarchical Particle Swarm Optimizer with dynamic S swarm size.
adaptation of population topology. The topology D maximum numbers of dimensions in each data
considered is a tree like structure where each node of the object.
tree represents a particle. Particles move up or down in the
K maximum number of clusters.
hierarchy of the tree depending on its solution quality.
Recently, Chatterjee and Siarry [12] have proposed a new N number of data objects to be clustered.
non-linear variation of inertia weight PSO model. The data objects to be clustered are represented as a set:
206 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008

}
r r r
{
O = O1 , O2 ,......., ON . Kennedy and Eberhart [2], with a population size of 50.
Sandra and Krink reported the best results by running GA
Each data object is represented as: with a population size of 100. For DE, Sandra and Krink
r reported best results by considering crossover factor as 0.9,
Oi = {oi1 , oi 2 .............oid } , scaling factor as 0.3 and the population size of 50. The four
where oid represents value of data i in dimension d . real world datasets considered in this study are listed in the
Table 1. For a fair comparison, all the PSO algorithms
xnt position of Current Particle n (Current) at considered in this paper were repeated 50 times with the
iteration t . maximum number of functional evaluations as 100000 for
evaluating the performance measures.
( )
f xn
t
value of objective function for particle n (Current)
at iteration t . Table 1. Real World data sets

pn Best position of particle n till iteration t .

Number Number Number
Data Set
f ( pn ) value of objective function for pn .
of data of Features of clusters
Fisher Iris data 150 4 3
Vowel data 871 3 6
g G-Best position of the swarm.
Wisconsin Breast Cancer
683 9 2
f (g) value of objective function for g . data
Ripley’s glass data 214 9 6

4.1 General Structure Performance of the PSO variants are measured based on the
Step1: Generate 2S +1 initial solutions randomly according following criterion:
to the swarm size S . 1. Mean best fitness value of TRW, VRC and MC
Step2: For each of the 2S+1 initial solutions, evaluate for measure.
its fitness measure. 2. Mean percent relative increase in objective value of
TRW, VRC and MC measure.
Step3: Initialize Current ( xnt ), Best ( pn ) and the G-Best 3. Percentage of number of runs (i.e., success %) that
( g ) positions from the 2S +1 initial solutions, where reach best known objective function value over 50
n = {1,......, S } . simulations.
4. Run Length Distribution (RLD) as proposed by Hoos
Step4: While (termination condition not met) and Stutzle [19].
For each particle n = {1,......, S }
6. Performance analysis of PSO variants
Update the position and velocity vectors of the
current particle xnt using PSO heuristics (SE-PSO, 6.1 Mean best fitness value
ES-PSO, CK-PSO, R-PSO, CS-PSO) All the PSO variants were coded in C++ and are allowed to
Evaluate the particle based on fitness measure run for a maximum of 100000 functional evaluations.
(TRW, VRC, MC). Experiments were repeated for 50 times and mean best
Update Best ( pn ) and the G-Best ( g ) positions. fitness value for each algorithm has been calculated with
respect to the objective functions considered in this paper.
Step5: Return G-Best ( g ) particle. The mean best fitness values for the PSO variants were
reported in the Table 2. The results were compared with the
5. Experimental setup Basic PSO algorithm (B-PSO), Genetic Algorithm (GA)
Five different variants of PSO algorithms are considered in and Differential Evolution (DE) Algorithms. For a fair
this study for comparative evaluation in correspondence to comparison we have tested the PSO variants using the
the three criteria’s such as TRW, VRC and MC. To same experimental setup considered by Sandra and Krink.
evaluate the performance of the PSO variants we have The results indicate PSO variants considered in this study
considered benchmark data sets reported by Sandra and are performing better than the basic PSO, GA and DE
Krink [4]. By considering maximum number of functional algorithms. It is evident from the Table 2 that PSO variants
evaluations as 100000, Sandra and Krink reported the best improve the best known VRC and MC measure for all the
results by running basic version of PSO introduced by benchmark problems. The PSO variants yield solutions of
same quality for the cancer dataset for the TRW measure.
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 207

Improved quality for Vowel dataset for the TRW measure run length distribution plots are used. In this paper we have
is also reported in the Table 2. adopted the methodology proposed by Hoos and Stutzle
[19] to plot the RLD. RLD’s were plotted for all the data
6.2 Mean percent relative increase in objective sets with respect to the objective under consideration.
value RLD plot shows the convergence of the PSO algorithm
The mean percentage relative increase in objective function with respect to the number of functional evaluations and
values for the benchmark problems are given in Table 3, also indicates the probability of reaching a pre-specified
and are calculated as follows: objective function value over specified number of
functional evaluations. The probability value (success rate)
Let the heuristic solutions yielded by the C-K PSO, S-E is the ratio between the number of runs finding a solution of
PSO, R-PSO, C-S PSO and E-S PSO for a given problem certain quality and the total number of runs. In this paper
be denoted by F1, F2, F3, F4 and F5 respectively. These we have considered the best known objective function
solutions are relatively evaluated as given below. values reported by Sandra and Krink [4] as pre-specified
Mean percentage relative increase in objective function values for plotting RLD’s for the performance measures
value of the solution yielded by the approach i for a considered in this paper. RLD plots for the benchmark
minimization problem is datasets are shown in the Figure 1 to Figure 12.
Run Length Distribution for each of the PSO variants on
=
( F - min ( F , k = 1, 2,3, 4 and 5)) ´100
i k
(7) iris data set for TRW metrics are shown in Figure 1. The
min ( Fk , k = 1, 2,3, 4 and 5 ) distribution shows that S-E PSO is the fastest first hitting
algorithm for the best known value and C-S PSO has a
Similarly, mean percentage relative increase in objective slow increasing curve to reach best known value. All the
function value of the solution yielded by the approach i for PSO variants are able to find a solution of required quality,
the maximization problem is but no variant is capable of finding solution of required

=
( max ( F , k = 1, 2,3, 4 and 5) - F ) ´100
k i
(8)
quality with a probability of 1.0. C-S PSO and C-K PSO
reach solution of required quality with a probability of 0.60.
max ( Fk , k = 1, 2,3, 4 and 5) Figure 2 shows the run length distribution of cancer data set
The results indicate R-PSO considered in this study for TRW measure. The distribution shows that S-E PSO is
performs better for the Iris data sets. For the Cancer data set the fastest first hitting algorithm for the best known value
CS-PSO perform better for TRW, VRC and MC criterion. and C-S PSO has slow increasing curve to reach best
For the Vowel dataset SE-PSO performs better than the known value. All the PSO variants are able to reach the
other variants. best-known value with a probability of 1.0.
Run Length Distribution of glass data set for TRW measure
6.3 Success Percentage: is shown in Figure 3. The distribution shows that S-E PSO
has the fastest first hitting time for the best-known value
Table 4 reports the Percentage of number of runs (i.e.,
and C-S PSO has slow increasing curve to reach best-
success %) that reach best known objective function value
known value. All the PSO variants are able to reach the
over 50 simulations. The best known value reported by
best-known value and C-S PSO finds the best-known value
Sandra and Krink [4] is used for evaluation. The results
with a probability of 0.32. Figure 4 shows the run length
shown in Table 4 indicate that the PSO variants considered
distribution of vowel data set for TRW measure. The
in this study perform well for the MC and VRC measure.
distribution shows that S-E PSO is the fastest first hitting
All the variants are able to reach almost 100% success for
algorithm for the best known value, and C-S PSO has slow
the VRC and MC measure for all the data sets. For the
increasing curve to reach best known value. All the PSO
TRW measure, cancer and vowel datasets perform better
variants are able to reach a solution of required quality and
compared to iris and glass dataset.
E-S PSO reach a solution of required quality with a
probability of 0.94. Run Length Distribution of iris data set
6.4 Run Length Distribution (RLD) for VRC measure is shown in Figure 5.
To study the behavior of stochastic algorithms with respect
to solution quality and number of functional evaluations,
208 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008

Table 2. Comparison of mean best fitness of PSO Variants with GA, DE and Basic PSO algorithm

Mean best fitness

Dataset Criteria
GA DE B-PSO C-K PSO S-E PSO R-PSO C-S PSO E-S PSO
MC 0.1984 0.1984 0.198 0.0642 0.0643 0.0630 0.0671 0.0635
Iris TRW 7885.14 7885.14 7885.14 7885.31 7885.38 7885.48 7885.31 7885.34
VRC 561.63 561.63 561.63 628.56 628.60 628.72 628.59 628.53
MC 0.3565 0.3546 0.3527 0.1674 0.1666 0.1720 0.1660 0.1664
Cancer TRW 19323 19323 19324 19323 19323 19323 19323 19323
VRC 1026.26 1026.26 1026.26 1621.17 1621.19 1620.87 1621.19 1621.18
MC 0.02661 0.01984 0.03176 0.0058 0.0056 0.0056 0.0085 0.0058
Glass TRW 341.09 336.06 339.04 348.48 345.19 351.11 348.23 346.87
VRC 121.94 124.62 122.74 145.58 146.75 145.07 147.88 148.63
MC 0.3199 0.2906 0.3032 0.1613 0.1612 0.1765 0.1730 0.1626
Vowel TRW 30943106 30690785 30734068 30689689 30689234 30692132 30688873 30688417
VRC 1450.45 1465.55 1463.33 1602.42 1603.12 1598.24 1599.17 1601.76

Notes:
MC Marriott’s Criteria (Minimization Objective)
TRW TRace Within Criteria (Minimization Objective)
VRC Variance Ratio Criteria (Maximization Objective)
GA Genetic Algorithms results reported in Sandra and Krink, (2006)
DE Differential Evolution results reported in Sandra and Krink, (2006)
B-PSO Basic PSO results reported in Sandra and Krink, (2006)
S-E PSO Time varying inertia weight PSO model proposed by Shi and Eberhart (1998 and 1999)
E-S PSO Stochastic inertia weight PSO model proposed by Eberhart and Shi (2001)
C-K PSO Constriction type PSO model proposed by Clerc and Kennedy (2002)
R-PSO Self-organizing Hierarchical Particle Swarm Optimizer with time varying acceleration coefficients proposed by Ratnaweera et. al (2004)
C-S PSO Non linear inertia weight PSO model proposed by Chatterjee and Siarry (2006)

Table 3. Mean percent relative increase in objective function value of heuristics

Dataset Criteria C-K PSO S-E PSO R-PSO C-S PSO E-S PSO
MC 1.8253 1.9748 0.0000 6.4227 0.7714
Iris TRW 0.0250 0.0188 0.0000 0.0208 0.0303

VRC 0.0000 0.0009 0.0021 0.0000 0.0004

MC 0.8449 0.3364 3.6218 0.0000 0.2485

Cancer TRW 0.0013 0.0001 0.0198 0.0000 0.0002

VRC 0.0000 0.0000 0.0000 0.0000 0.0000

MC 2.8924 0.2485 0.0000 52.3322 4.2574

Glass TRW 2.0514 1.2668 2.3955 0.5021 0.0000

VRC 0.9528 0.0000 1.7133 0.8789 0.4868

MC 0.0761 0.0000 9.4818 7.3552 0.8896

Vowel TRW 0.0442 0.0000 0.3044 0.2467 0.0848

VRC 0.0041 0.0027 0.0121 0.0015 0.0000

IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 209

The distribution shows that R-PSO and S-E PSO are the Figure 12 shows the run length distribution of vowel
fastest first hitting algorithms for the best-known value and dataset for MC criterion. The distributions show that S-E
C-S PSO has a slow increasing curve to reach best-known PSO is the fastest first hitting algorithm for the best known
value. All the PSO variants are able to find a solution of value and C-S PSO has slow increasing curve to reach the
required quality with a probability of 1.0. best known value. All the PSO variants are able to find a
solution of required quality with a probability of 1.
Table 4 reports the Percentage of number of runs (i.e., success %) that
reach best known objective function Interesting observations can be made from the RLDs for
TRW measure. All PSO variants reach the best-known
Dataset Criteria
Best Known
C-K PSO S-E PSO R-PSO C-S PSO E-S PSO value as reported by Sandra and Krink [4]. For all the
Value
benchmark data sets S-E PSO has the fastest hitting time
MC 0.198 100 100 100 100 100 and C-S PSO has the slowest hitting time to reach the best-
Iris TRW 7885.14 60 44 20 60 52 known value. It is also observed that the convergence of the
VRC 561.63 100 100 100 100 100 R-PSO is poor for most of the benchmark data sets.
MC 0.3527 100 100 100 100 100 Another interesting observation from the RLDs is that C-S
Cancer TRW 19323 100 100 100 100 100 PSO has the maximum probability of finding best-known
VRC 1026.26 100 100 100 100 100 value for most of the datasets. For TRW measure all PSO
MC 0.01984 100 100 100 100 100 variants show strong stagnation behavior.
Glass TRW 336.06 14 22 8 32 20
By considering VRC measure, S-E PSO has the fastest
VRC 124.62 98 100 100 100 100
hitting time and C-S PSO has the slowest hitting time for
MC 0.2906 100 100 100 100 100
the reported best-known value. It is also found that E-S
Vowel TRW 30690785 90 90 58 88 94
PSO and CS-PSO has the maximum probability of finding
VRC 1465.55 96 92 98 98 98
best-known value for all the datasets. The convergence of
The run length distribution of cancer data for VRC criterion the PSO variants for the VRC measure is faster comparing
is shown in Figure 6. All the variants reach the best-known with the TRW measure for most of the benchmark
value within 100 function evaluations. Figure 7 shows the problems. For the MC measure, the convergences of the
run length distribution of glass dataset for VRC criterion. PSO variants are slower compared to the VRC measure.
The distribution show that S-E PSO is the fastest first All variants reach the reported best-known value with a
hitting algorithm for the best known value and C-S PSO probability of 1.
has slow increasing curve to reach optimal value for the
dataset. All the PSO variants are able to find a solution of 7. Conclusion
required quality with a probability of almost 1. Figure 8
Few attempts have been made to solve data clustering
shows the run length distribution of vowel dataset for VRC problem using PSO algorithms. In this paper the
criterion. The distribution shows that S-E PSO has the performance evaluation of well-known PSO variants for
fastest first hitting algorithm for the best known value and
data clustering using real world data sets has been studied.
C-S PSO has slow increasing curve to reach best known The performances of the PSO variants were compared with
value. All the PSO variants are able to find a solution of the basic PSO algorithm, GA and DE algorithm. The
required quality with a probability of 0.90.
comparative evaluation shows, the PSO variants perform
The run length distribution of iris data for MC criterion is better for most of the benchmark datasets for the VRC,
shown in Figure 9. All variants find the best-known value TRW and MC criterion and also improves the best-known
within 100 function evaluations. All the PSO variants are solution available in the literature for the VRC and MC
able to find a solution of required quality with a probability measures. Run Length Distribution analysis has been
of 1. The run length distribution of cancer data for MC carried out to study the stagnation behavior and
criterion is shown in Figure 10. All variants finds the best convergence speed of the PSO variants. Run Length
know value within 2000 function evaluations. Figure 11 Distribution (RLD) plot of the PSO variants indicate the
shows the run length distribution of glass dataset for MC convergence is faster in the case of SE-PSO when a
criterion. The distributions show that E-S PSO is the fastest termination criterion is fixed to lesser number of functional
first hitting algorithm for the best known value and C-K evaluations. As the number of functional evaluation
PSO has slow increasing curve to reach optimal value. All increases, results of comparison reveals that no PSO variant
the PSO variants are able to find a solution of required dominates all the others on all benchmark datasets.
quality with a probability of 1.
210 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008

Figure 1. RLD of IRIS data set for TRW measure Figure 5. RLD of IRIS data set for VRC measure

Figure 2. RLD of CANCER data set for TRW measure Figure 6. RLD of CANCER data set for VRC measure

Figure 3. RLD of GLASS data set for TRW Figure 7. RLD of GLASS data set for VRC

Figure 4. RLD of VOWEL data set for TRW measure Figure 8. RLD of VOWEL data set for VRC measure
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 211

Acknowledgements
The authors are thankful to the three reviewers
for the suggestions and comments to improve
the earlier version of the paper.
References
[1] Rui Xu and Donald Wunsch II, Survey of Clustering
Algorithms, IEEE Transactions on Neural Network, Vol. 16,
No 3, May 2005.
[2] Kennedy, J., Eberhart, R. C., Particle Swarm Optimization,
Proceedings of the 1995 IEEE International Conference on
Figure 9. RLD of IRIS data set for MC measure
Neural Networks, IEEE Press Piscataway, NJ, Vol. 4, pp.
1942 – 1948, 1995.
[3] Jain, A.K., Murty, M.N., and P.J. Flynn, Data Clustering: A
Review, ACM computing Survey, vol. 31, no 3, 1999.
[4] Sandra Paterlini, Krink, T., Differential evolution and particle
swarm optimization in partitional clustering, computational
statistics & data analysis 50(5), 1220-1247, 2006.
[5] Shi, Y., Eberhart, R., A modified particle swarm optimizer, In
Proceeding of the 1998 IEEE World Congress on
Computational Intelligence, Page 69-73, Piscataway, NJ,
IEEE Press, 1998.
[6] Shi, Y., Eberhart, R., Empirical study of particle swarm
optimization. In Proceeding of the 1999 IEEE World
Figure 10. RLD of CANCER data set for MC measure
Congress on Evolutionary Computing, Page 1945-1950.
Piscataway, NJ, IEEE Press, 1999.
[7] Eberhart, R., Shi, Y., Tracking and optimizing dynamic
systems with particle swarms. In Proceeding of 2001 IEEE
World Congress on Evolutionary Computing, Page 94-100.
Piscataway, NJ, IEEE Press, 2001.
[8] Clerc, M., Kennedy, J., The particle swarm-explosion,
stability, and convergence in a multidimensional complex
space. IEEE Transactions on Evolutionary Computation 6,
58-73. 2002.
[9] Mendes, R., Kennedy, J., Neves, J., The fully informed
particle swarm: simple, maybe better, IEEE Transaction on
Evolutionary Computing, 8(3): 204-210, 2004.
Figure 11. RLD of GLASS data set for MC measure
[10] Ratnaweera, A., Halgamuge, S.K., Watson, H.C., Self
organization hierarchical particle swarm optimizer with time
varying acceleration coefficients, IEEE Transaction on
Evolutionary Computing, 8(3): 240-255, 2004.
[11] Janson, S., Middendorf, M., A hierarchical particle swarm
optimizer and its adaptive variants, IEEE Transaction on
System, Man and Cybernetics, part B, 35(6), 1272- 1282,
2005.
[12] Chatterjee, A., Siarry, P., Non Linear inertia weight variation
for dynamic adaptation in particle swarm optimization,
Computers and Operations Research, 33, 859-871, 2006.
[13] Van Der Merwe, D.W., Engelbrecht, A.P., Data clustering
using particle swarm optimization, Proceedings of IEEE
212 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008

Congress on Evolutionary Computing 2003, Canberra,

Australia, 215-220, 2003.
[14] Xiao X, Dow ER, Eberhart RC, Miled ZB and Oppelt RJ,
Gene Clustering Using Self-Organizing Maps and Particle
Swarm Optimization, Proc of the 17th International
Symposium on Parallel and Distributed Processing (PDPS
'03),IEEE Computer Society, Washington DC, 2003.
[15] Chen, C.Y., Ye, F., Particle swarm optimization algorithm
and its applications to clustering analysis, Proceedings of
IEEE International conference on Networking, Sensing and
Control, 789-794, 2004.
[16] Orman, M.G.H., Salman, A., Engelbrecht, A.P., Dynamic
clustering using Particle Swarm Optimization with
application in image segmentation, Pattern Analysis and
Application, Vol. 8, No. 4, 332 – 344, 2005.
[17] Cohen, S.C.M., and De Castro, L.N., Data Clustering with
Particle swarms, IEEE Congress on Evolutionary
Computations, Vancouver, Canada, 2006.
[18] Swagatam Das, Ajith Abraham and Amit Konar, Automatic
Clustering Using an Improved Differential Evolution
Algorithm, IEEE Transactions on Systems Man and
Cybernetics – PART A, IEEE Press, USA, 2008.
[19] Hoos, H. H., Stutzle, T., Stochastic Local search: Foundation
and Applications, Morgan Kaufmann, San Francisco, CA,
USA, 2004.

R.Karthi received the M.E. in Computer Science and

Engineering from Madurai Kamaraj University, Madurai in 1998.
She also obtained her B.E. (Electronics and Communication
Engg.) from Bharathiar University, Coimbatore in 1996. She is
currently working as Assistant Professor in the Department of
Computer Science and Engineering, Amrita School of
Engineering, Coimbatore.
Dr.S.Arumugam received the Ph.D degree in Computer Science
and Engineering from Anna University, Chennai in 1990. He also
obtained his B.E. (Electrical and Electronics Engg) and M.
Sc.(Engg) (Applied Electronics ) degrees from PSG College of
Technology, Coimbatore, University of Madras in 1971 and 1973
respectively. He was working in the Directorate of Technical
Education, Government of Tamil Nadu from 1974 at various
positions from Associate Lecturer, Lecturer, Assistant Professor,
Professor, Principal and Additional Director of Technical
Education. He has guided 4 PhD scholars and guiding 10 PhD
scholars. He has published 70 technical papers in international and
national journals and conferences. His area of interest is including
clustering, network security, biometrics and neural networks. He is
presently working as a Chief Executive, Bannari Educational Trust,
Coimbatore.
K.Rameshkumar received the Ph.D degree in Production
Engineering in 2007. He also obtained his B.E. (Mechanical
Engineering) and M.E. (Production Engineering) from Bharathiar
University, Coimbatore in 1990 and 1991 respectively. He is
presently working as a Professor in the Department of Mechanical
Engineering, Amrita School of Engineering, Coimbatore.

10 1109@icnsc 2004 1297047 PDF
No ratings yet
10 1109@icnsc 2004 1297047 PDF
6 pages
Genedata
No ratings yet
Genedata
67 pages
PSO-Based Data Clustering Techniques
No ratings yet
PSO-Based Data Clustering Techniques
6 pages
The Improved K-Means With Particle Swarm Optimization
No ratings yet
The Improved K-Means With Particle Swarm Optimization
8 pages
Data Clustering Using Particle Swarm Optimization: PSO Is PSO by PSO A
No ratings yet
Data Clustering Using Particle Swarm Optimization: PSO Is PSO by PSO A
6 pages
A Study of Bio-Inspired Algorithm To Data Clustering Using Different Distance Measures
No ratings yet
A Study of Bio-Inspired Algorithm To Data Clustering Using Different Distance Measures
13 pages
Sine Cosine Algorithm for Data Clustering
No ratings yet
Sine Cosine Algorithm for Data Clustering
5 pages
Expert Systems With Applications: D. Binu
No ratings yet
Expert Systems With Applications: D. Binu
12 pages
Analysis of The Usage of Chaotic Theory in Data Clustering Using Particle Swarm Optimization
No ratings yet
Analysis of The Usage of Chaotic Theory in Data Clustering Using Particle Swarm Optimization
19 pages
Towards An Improved Particle Swarm Optimization
No ratings yet
Towards An Improved Particle Swarm Optimization
15 pages
PDPC: Enhanced DPC Clustering Algorithm
No ratings yet
PDPC: Enhanced DPC Clustering Algorithm
15 pages
Social Spider Optimization for Data Clustering
No ratings yet
Social Spider Optimization for Data Clustering
16 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
1287 مظفری
No ratings yet
1287 مظفری
3 pages
Sensors: Data Clustering Using Moth-Flame Optimization Algorithm
No ratings yet
Sensors: Data Clustering Using Moth-Flame Optimization Algorithm
19 pages
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
No ratings yet
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
8 pages
Mukhopadhyay 2015
No ratings yet
Mukhopadhyay 2015
46 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
Video 18
No ratings yet
Video 18
17 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
7 pages
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
No ratings yet
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
7 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
16 pages
Exponential Particle Swarm Optimization Approach For Improving Data Clustering
No ratings yet
Exponential Particle Swarm Optimization Approach For Improving Data Clustering
5 pages
1 s2.0 S2210650211000265 Main
No ratings yet
1 s2.0 S2210650211000265 Main
8 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
50 pages
PSO in Data Mining Clustering Analysis
No ratings yet
PSO in Data Mining Clustering Analysis
19 pages
Association Rule Mining by Dynamic Neighborhood Selection in Particle Swarm Optimization
No ratings yet
Association Rule Mining by Dynamic Neighborhood Selection in Particle Swarm Optimization
7 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
A Multi Clustering Method Based On Evolutionary Multi - 2020 - Swarm and Evoluti
No ratings yet
A Multi Clustering Method Based On Evolutionary Multi - 2020 - Swarm and Evoluti
12 pages
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
No ratings yet
Xu, R., & Wunsch, D. (2005) - Survey of Clustering Algorithms
35 pages
SC-UNIT-3 Part-2
No ratings yet
SC-UNIT-3 Part-2
14 pages
PSO y Funciones
No ratings yet
PSO y Funciones
14 pages
02 - Clustering
No ratings yet
02 - Clustering
43 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
Genetic K-Means Algorithm
No ratings yet
Genetic K-Means Algorithm
7 pages
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
No ratings yet
Survey of Clustering Algorithms: Rui Xu, Student Member, IEEE and Donald Wunsch II, Fellow, IEEE
59 pages
ML Unit-4-1
No ratings yet
ML Unit-4-1
39 pages
Fast and Robust General Purpose Clustering Algorit
No ratings yet
Fast and Robust General Purpose Clustering Algorit
29 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Comparative Analysis of Clustering Techniques
No ratings yet
Comparative Analysis of Clustering Techniques
13 pages
Advanced Algorithm Optimization
No ratings yet
Advanced Algorithm Optimization
17 pages
3 Automic Clustering Using Nature Inspired 2019
No ratings yet
3 Automic Clustering Using Nature Inspired 2019
12 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
Electronics 09 01295 v2
No ratings yet
Electronics 09 01295 v2
12 pages
Clustering Algorithms in Data Mining Review
No ratings yet
Clustering Algorithms in Data Mining Review
7 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Automatic Clustering Algorithms A Systematic Revie
No ratings yet
Automatic Clustering Algorithms A Systematic Revie
61 pages
ASPSO
No ratings yet
ASPSO
20 pages
Master Thesis
No ratings yet
Master Thesis
97 pages
Genetic K-Means Algorithm: Conf., 1987, Pp. 50-58
No ratings yet
Genetic K-Means Algorithm: Conf., 1987, Pp. 50-58
7 pages
AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
No ratings yet
AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
10 pages
February 2024-: Top Read Articles in Computer Science & Information Technology
No ratings yet
February 2024-: Top Read Articles in Computer Science & Information Technology
35 pages
Clustering & Classification Metrics
No ratings yet
Clustering & Classification Metrics
13 pages
A Hybrid Algorithm Based On KFCM-HACO-FAPSO For Clustering ECG Beat
No ratings yet
A Hybrid Algorithm Based On KFCM-HACO-FAPSO For Clustering ECG Beat
6 pages
A Review of Self Optimal Clustering Technique and Data Mining Approach
No ratings yet
A Review of Self Optimal Clustering Technique and Data Mining Approach
6 pages
Efficient K-Means Clustering Algorithm
No ratings yet
Efficient K-Means Clustering Algorithm
4 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
ANIMATION 1 2nd Periodical
No ratings yet
ANIMATION 1 2nd Periodical
3 pages
(BSC, Ba) (2015 2021) : Personal Information
No ratings yet
(BSC, Ba) (2015 2021) : Personal Information
2 pages
Understanding Army Green Color Codes
No ratings yet
Understanding Army Green Color Codes
8 pages
IT 10th (Prashant Kirad) - 1-35
100% (1)
IT 10th (Prashant Kirad) - 1-35
35 pages
ICP-Programming Assignment-IV PDF
No ratings yet
ICP-Programming Assignment-IV PDF
3 pages
Updated CV Trupti 2024
No ratings yet
Updated CV Trupti 2024
1 page
Brain Tumor Detection and Classification Using Intelligence Techniques An Overview
No ratings yet
Brain Tumor Detection and Classification Using Intelligence Techniques An Overview
17 pages
Mobile Solutions for Hotels
No ratings yet
Mobile Solutions for Hotels
6 pages
DMR102088 Rev02 PM AdjustGeometry
No ratings yet
DMR102088 Rev02 PM AdjustGeometry
7 pages
Controlling 24V Solenoids with Arduino
No ratings yet
Controlling 24V Solenoids with Arduino
3 pages
C Programs for Basic Algorithms
No ratings yet
C Programs for Basic Algorithms
27 pages
Central Railway Headquarters Office S&T Branch, CSMT, Mumbai
100% (1)
Central Railway Headquarters Office S&T Branch, CSMT, Mumbai
8 pages
AIML
No ratings yet
AIML
8 pages
Computeractive - Issue 654 29 March 11 April 2023 PDF
100% (1)
Computeractive - Issue 654 29 March 11 April 2023 PDF
76 pages
Data Structures Lab Programs Guide
No ratings yet
Data Structures Lab Programs Guide
87 pages
HUAWEI MatePad Pro 13.2'' User Guide - (PCE-W29, HarmonyOS 4 - 01, En-Us)
No ratings yet
HUAWEI MatePad Pro 13.2'' User Guide - (PCE-W29, HarmonyOS 4 - 01, En-Us)
122 pages
MATLAB & EES Integration Guide
No ratings yet
MATLAB & EES Integration Guide
1 page
March 2023 Bank Statement Summary
No ratings yet
March 2023 Bank Statement Summary
9 pages
Opc Da Client Manual
No ratings yet
Opc Da Client Manual
30 pages
ESP32 Blynk IoT Home Automation Guide
No ratings yet
ESP32 Blynk IoT Home Automation Guide
33 pages
An Automated Fracture Trace Detection Technique Us
No ratings yet
An Automated Fracture Trace Detection Technique Us
30 pages
Safety Precautions and Practices in Operation and Maintenance
No ratings yet
Safety Precautions and Practices in Operation and Maintenance
8 pages
Web Security Challenge Guide
No ratings yet
Web Security Challenge Guide
13 pages
Note 1
No ratings yet
Note 1
11 pages
Elements and Types of Communication
No ratings yet
Elements and Types of Communication
25 pages
React Router Installation & Setup Guide
No ratings yet
React Router Installation & Setup Guide
6 pages
Canara WP BusinessCaseForMonitoring
No ratings yet
Canara WP BusinessCaseForMonitoring
12 pages
Sertifikat Covid-19 Sesi 2
No ratings yet
Sertifikat Covid-19 Sesi 2
549 pages
Turing Machines: Algorithms & Variants
No ratings yet
Turing Machines: Algorithms & Variants
4 pages
Experiment: 1.2: University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
Experiment: 1.2: University Institute of Engineering Department of Computer Science & Engineering
8 pages

Pso-Data Clustering

Uploaded by

Pso-Data Clustering

Uploaded by

IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.

1, January 2008 203

Comparative evaluation of Particle Swarm Optimization Algorithms

R.Karthi$, S.Arumugam#, K. Rameshkumar@

Particle Swarm Optimization (PSO). Clustering techniques

Manuscript received January 5, 2008

pn Best position of particle n till iteration t .

Mean best fitness

Table 3. Mean percent relative increase in objective function value of heuristics

VRC 0.0000 0.0009 0.0021 0.0000 0.0004

MC 0.8449 0.3364 3.6218 0.0000 0.2485

VRC 0.0000 0.0000 0.0000 0.0000 0.0000

MC 2.8924 0.2485 0.0000 52.3322 4.2574

VRC 0.9528 0.0000 1.7133 0.8789 0.4868

MC 0.0761 0.0000 9.4818 7.3552 0.8896

VRC 0.0041 0.0027 0.0121 0.0015 0.0000

Congress on Evolutionary Computing 2003, Canberra,

R.Karthi received the M.E. in Computer Science and

You might also like