Pso-Data Clustering
Pso-Data Clustering
2. Data clustering problem formulation Minimization of TRace Within criteria (TRW): This
criterion is based on pooled within groups scatter matrix W.
2.1 Notations Used: The pooled within scatter matrix W is defined as
W = å K W , where Wk is the variance matrix of the
N the number of data objects to be clustered. k =1 k
D the dimension of each of the data objects. data object allocated to cluster Ck , where k = {1,......, K } .
'
nk r r r r
K the number of clusters . Wk = å
l =1
( Olk - O k )( Olk - O k ) (1)
O set of N data objects to be clustered, where, r
}
r r r Olk indicate the l th data object in cluster Ck and nk is the
{
O = O1 , O2 ,......., ON .
number of objects in cluster Ck .
rk
Each data object is represented as: rk nk æ Ol ö
O = å l =1 ç (Vector of the centroid for the cluster Ck ).
r ç n ÷÷
Oi = {oi1 , oi 2 .............oid } , è k ø
Maximization of Variance Ratio Criteria (VRC): This
where, oid represents value of data i at dimension d. criterion is based on pooled within groups scatter matrix W
and between group scatter matrixes B. The between scatter
C set of K partitions with data objects assigned to
matrix B is defined as
each partition C = {Ci ,.............., CK } .
( )( )
K ur k ur ur k ur '
B = å nk O - O O - O , (2)
Z Cluster centers to which data objects are assigned, k =1
r r r r
{ }
Z = Z 1 , Z 2 ..........., Z k . r N O
where O = å i .
i =1 N
Each cluster center is represented as: The total scatter matrix T of N data objects is defined as
r T = B +W .
Z i = { zi1 , zi 2 .............zid } ,
æ ( trace ( B ) ) ö
where zid represents value of cluster i at dimension d. ç ÷
ç ( K - 1) ÷
Given O the set of data objects, the goal of partitional VRC = è ø (3)
clustering is to determine a partition {Ci ,.............., CK } æ ( trace (W ) ) ö
ç ÷
with the following constraints. ç (N - K ) ÷
è ø
Ck ¹ f , k = {1,......, K } Minimization of Marriott’s Criteria (MC): This
Ci Ç C j º f , where i ¹ j , i = {1,....., K } and j = {1,....., K } criterion is based on pooled within groups scatter matrix
K
W and total scatter matrix T.
UC k =O
MC = K ´
2 ( det (W )) (4)
( det (T ) )
k =1
r 1 r
Z i = å O ÎC Oi , where i = {1,....., K } , ni is the number of
ni i i
3. Introduction to PSO
elements belonging to cluster Ci .
PSO is a population based, cooperative search heuristic
In general, the data objects are assigned to clusters based on
introduced by Kennedy and Eberhart [2] to find optimal or
distance measures like Manhattan distance, Euclidean
near solutions to an unconstrained optimization problem.
distance and Minkowski distance [3]. In our study, the
The ideas that underlie PSO are inspired by the social
objects are assigned to cluster using the Euclidean distance
behavior of bird flocking and fish schooling. PSO is an
measure. Different statistical criteria or fitness measures
iterative method that is based on the search behavior of the
have been proposed in the literature to measure goodness
swarm in a multidimensional space. A particle i called
of a partition. In this paper, we have considered the fitness r
measures considered by Sandra and Krink [4] for Current, at time step ‘ t ’ has a position vector xit and a
comparing partitions generated by different clustering r
velocity vector vit . The fitness function ‘ f ’determine the
algorithms. The various fitness measures considered in this
quality of a particle’s position, i.e., a particle’s position
paper are as follows: represents a solution to the problem being solved. Each
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 205
ur
particle ‘ i ’ has a vector p i called Best that represents its 3.2 PSO algorithm in Clustering
own best position and has an associated fitness value. The PSO based clustering algorithm was first proposed by
ur
best position, the swarm has visited is stored in a vector g Merwe et al. [13]. Xiao et al. [14] proposed a hybrid
r approach to cluster the gene data. Self Organizing Maps
called G-Best. For each particle xit the velocity vector is (SOM) trains the weights of the nodes in the first stage and
updated according to (5). The particle moves to their new weights were optimized using PSO approach. Chen and Ye
r
position according to (6). For each particle xit the objective [15] employed a PSO representation in which each particle
function ‘ f ’ is evaluated. The best position of the particle corresponds to the centroids of the clusters. Two-
ur ur dimensional and three-dimensional data were used for
p i and global best position g are updated. evaluation. Orman et al. [16] proposed a dynamic
r r
( )
r r r
( )
r r r
vit +1 = vit + c1 × r1 pi - xit + c2 × r2 g - xit (5) clustering system based binary PSO and K-means
algorithm. The algorithm automatically identifies the
r t +1 r t r t +1
xi = xi + vi (6) number of clusters and employs a validity index to evaluate
Two constants c1 and c2 are called cognitive and social the clusters. Cohen et al., [17] proposed a Particle Swarm
Clustering (PSC) algorithm where each particle represents
acceleration coefficients, r1 and r2 are two uniformly
a centroid in the input data space. The whole population is
distributed random vectors. The algorithm iterates by needed to present the final clustering solution. Sandra and
updating the velocities and positions of the particles until Krink [4] compared the performance of Differential
the stopping criteria is met. Evolution (DE), Random Search (RS), PSO and GA for
3.1 PSO Variants partitional clustering problems. The empirical results show
that PSO and DE perform better compared to GA and K-
Several variations of this basic PSO scheme have been means algorithms. Recently, Swagatham et al. [18]
proposed in the literature for solving continuous proposed an automatic clustering technique using an
optimization problems. Shi and Eberhart [5,6] introduced improved differential evolution algorithm. In this work, we
the idea of a time varying inertia weight PSO model. This have considered the data sets used by Sandra and Krink [4]
was done to adjust the swarm’s behavior from initial to evaluate the performance of the following PSO variants.
exploration of entire search space to exploitation of
promising regions. Eberhart and Shi [7] proposed another a) Time varying inertia weight PSO model (SE-PSO)
inertia weight variation approach in which inertia weight is proposed by Shi and Eberhart [5,6].
randomly selected according to a uniform distribution in the b) Stochastic inertia weight PSO model (ES-PSO)
range [0.5, 1]. Clerc and Kennedy [8] introduced the proposed by Eberhart and Shi [7].
constriction factor in PSO to control the convergence c) Constriction type PSO model (CK-PSO) proposed by
properties of the particles. The constriction factor is Clerc and Kennedy [8].
multiplied by the entire equation 5 instead of inertia weight d) Self-organizing Hierarchical Particle Swarm Optimizer
ω in order to control the overall velocity of the swarm. In (R-PSO) with time varying acceleration coefficients
the Fully Informed Particle Swarm Optimizer proposed by proposed by Ratnaweera et al [10].
Mendes et al. [9], a particle uses information from all its e) Non linear inertia weight PSO model (CS-PSO)
topological neighbors rather than the best one to contribute proposed by Chatterjee and Siarry [12].
to its velocity adjustment. Ratnaweera et al. [10] proposed a 4. General Structure of PSO algorithm for
Self-organizing Hierarchical Particle Swarm Optimizer
data clustering
with time varying acceleration coefficients where only the
social and cognitive part of the particle are considered to Notations used:
estimate the new velocity of each particle. The particles are t iteration counter.
reinitialized when there is stagnation in the search space.
Janson and Middendorf [11] proposed an Adaptive T maximum number of iterations.
Hierarchical Particle Swarm Optimizer with dynamic S swarm size.
adaptation of population topology. The topology D maximum numbers of dimensions in each data
considered is a tree like structure where each node of the object.
tree represents a particle. Particles move up or down in the
K maximum number of clusters.
hierarchy of the tree depending on its solution quality.
Recently, Chatterjee and Siarry [12] have proposed a new N number of data objects to be clustered.
non-linear variation of inertia weight PSO model. The data objects to be clustered are represented as a set:
206 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008
}
r r r
{
O = O1 , O2 ,......., ON . Kennedy and Eberhart [2], with a population size of 50.
Sandra and Krink reported the best results by running GA
Each data object is represented as: with a population size of 100. For DE, Sandra and Krink
r reported best results by considering crossover factor as 0.9,
Oi = {oi1 , oi 2 .............oid } , scaling factor as 0.3 and the population size of 50. The four
where oid represents value of data i in dimension d . real world datasets considered in this study are listed in the
Table 1. For a fair comparison, all the PSO algorithms
xnt position of Current Particle n (Current) at considered in this paper were repeated 50 times with the
iteration t . maximum number of functional evaluations as 100000 for
evaluating the performance measures.
( )
f xn
t
value of objective function for particle n (Current)
at iteration t . Table 1. Real World data sets
4.1 General Structure Performance of the PSO variants are measured based on the
Step1: Generate 2S +1 initial solutions randomly according following criterion:
to the swarm size S . 1. Mean best fitness value of TRW, VRC and MC
Step2: For each of the 2S+1 initial solutions, evaluate for measure.
its fitness measure. 2. Mean percent relative increase in objective value of
TRW, VRC and MC measure.
Step3: Initialize Current ( xnt ), Best ( pn ) and the G-Best 3. Percentage of number of runs (i.e., success %) that
( g ) positions from the 2S +1 initial solutions, where reach best known objective function value over 50
n = {1,......, S } . simulations.
4. Run Length Distribution (RLD) as proposed by Hoos
Step4: While (termination condition not met) and Stutzle [19].
For each particle n = {1,......, S }
6. Performance analysis of PSO variants
Update the position and velocity vectors of the
current particle xnt using PSO heuristics (SE-PSO, 6.1 Mean best fitness value
ES-PSO, CK-PSO, R-PSO, CS-PSO) All the PSO variants were coded in C++ and are allowed to
Evaluate the particle based on fitness measure run for a maximum of 100000 functional evaluations.
(TRW, VRC, MC). Experiments were repeated for 50 times and mean best
Update Best ( pn ) and the G-Best ( g ) positions. fitness value for each algorithm has been calculated with
respect to the objective functions considered in this paper.
Step5: Return G-Best ( g ) particle. The mean best fitness values for the PSO variants were
reported in the Table 2. The results were compared with the
5. Experimental setup Basic PSO algorithm (B-PSO), Genetic Algorithm (GA)
Five different variants of PSO algorithms are considered in and Differential Evolution (DE) Algorithms. For a fair
this study for comparative evaluation in correspondence to comparison we have tested the PSO variants using the
the three criteria’s such as TRW, VRC and MC. To same experimental setup considered by Sandra and Krink.
evaluate the performance of the PSO variants we have The results indicate PSO variants considered in this study
considered benchmark data sets reported by Sandra and are performing better than the basic PSO, GA and DE
Krink [4]. By considering maximum number of functional algorithms. It is evident from the Table 2 that PSO variants
evaluations as 100000, Sandra and Krink reported the best improve the best known VRC and MC measure for all the
results by running basic version of PSO introduced by benchmark problems. The PSO variants yield solutions of
same quality for the cancer dataset for the TRW measure.
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 207
Improved quality for Vowel dataset for the TRW measure run length distribution plots are used. In this paper we have
is also reported in the Table 2. adopted the methodology proposed by Hoos and Stutzle
[19] to plot the RLD. RLD’s were plotted for all the data
6.2 Mean percent relative increase in objective sets with respect to the objective under consideration.
value RLD plot shows the convergence of the PSO algorithm
The mean percentage relative increase in objective function with respect to the number of functional evaluations and
values for the benchmark problems are given in Table 3, also indicates the probability of reaching a pre-specified
and are calculated as follows: objective function value over specified number of
functional evaluations. The probability value (success rate)
Let the heuristic solutions yielded by the C-K PSO, S-E is the ratio between the number of runs finding a solution of
PSO, R-PSO, C-S PSO and E-S PSO for a given problem certain quality and the total number of runs. In this paper
be denoted by F1, F2, F3, F4 and F5 respectively. These we have considered the best known objective function
solutions are relatively evaluated as given below. values reported by Sandra and Krink [4] as pre-specified
Mean percentage relative increase in objective function values for plotting RLD’s for the performance measures
value of the solution yielded by the approach i for a considered in this paper. RLD plots for the benchmark
minimization problem is datasets are shown in the Figure 1 to Figure 12.
Run Length Distribution for each of the PSO variants on
=
( F - min ( F , k = 1, 2,3, 4 and 5)) ´100
i k
(7) iris data set for TRW metrics are shown in Figure 1. The
min ( Fk , k = 1, 2,3, 4 and 5 ) distribution shows that S-E PSO is the fastest first hitting
algorithm for the best known value and C-S PSO has a
Similarly, mean percentage relative increase in objective slow increasing curve to reach best known value. All the
function value of the solution yielded by the approach i for PSO variants are able to find a solution of required quality,
the maximization problem is but no variant is capable of finding solution of required
=
( max ( F , k = 1, 2,3, 4 and 5) - F ) ´100
k i
(8)
quality with a probability of 1.0. C-S PSO and C-K PSO
reach solution of required quality with a probability of 0.60.
max ( Fk , k = 1, 2,3, 4 and 5) Figure 2 shows the run length distribution of cancer data set
The results indicate R-PSO considered in this study for TRW measure. The distribution shows that S-E PSO is
performs better for the Iris data sets. For the Cancer data set the fastest first hitting algorithm for the best known value
CS-PSO perform better for TRW, VRC and MC criterion. and C-S PSO has slow increasing curve to reach best
For the Vowel dataset SE-PSO performs better than the known value. All the PSO variants are able to reach the
other variants. best-known value with a probability of 1.0.
Run Length Distribution of glass data set for TRW measure
6.3 Success Percentage: is shown in Figure 3. The distribution shows that S-E PSO
has the fastest first hitting time for the best-known value
Table 4 reports the Percentage of number of runs (i.e.,
and C-S PSO has slow increasing curve to reach best-
success %) that reach best known objective function value
known value. All the PSO variants are able to reach the
over 50 simulations. The best known value reported by
best-known value and C-S PSO finds the best-known value
Sandra and Krink [4] is used for evaluation. The results
with a probability of 0.32. Figure 4 shows the run length
shown in Table 4 indicate that the PSO variants considered
distribution of vowel data set for TRW measure. The
in this study perform well for the MC and VRC measure.
distribution shows that S-E PSO is the fastest first hitting
All the variants are able to reach almost 100% success for
algorithm for the best known value, and C-S PSO has slow
the VRC and MC measure for all the data sets. For the
increasing curve to reach best known value. All the PSO
TRW measure, cancer and vowel datasets perform better
variants are able to reach a solution of required quality and
compared to iris and glass dataset.
E-S PSO reach a solution of required quality with a
probability of 0.94. Run Length Distribution of iris data set
6.4 Run Length Distribution (RLD) for VRC measure is shown in Figure 5.
To study the behavior of stochastic algorithms with respect
to solution quality and number of functional evaluations,
208 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008
Table 2. Comparison of mean best fitness of PSO Variants with GA, DE and Basic PSO algorithm
Notes:
MC Marriott’s Criteria (Minimization Objective)
TRW TRace Within Criteria (Minimization Objective)
VRC Variance Ratio Criteria (Maximization Objective)
GA Genetic Algorithms results reported in Sandra and Krink, (2006)
DE Differential Evolution results reported in Sandra and Krink, (2006)
B-PSO Basic PSO results reported in Sandra and Krink, (2006)
S-E PSO Time varying inertia weight PSO model proposed by Shi and Eberhart (1998 and 1999)
E-S PSO Stochastic inertia weight PSO model proposed by Eberhart and Shi (2001)
C-K PSO Constriction type PSO model proposed by Clerc and Kennedy (2002)
R-PSO Self-organizing Hierarchical Particle Swarm Optimizer with time varying acceleration coefficients proposed by Ratnaweera et. al (2004)
C-S PSO Non linear inertia weight PSO model proposed by Chatterjee and Siarry (2006)
Dataset Criteria C-K PSO S-E PSO R-PSO C-S PSO E-S PSO
MC 1.8253 1.9748 0.0000 6.4227 0.7714
Iris TRW 0.0250 0.0188 0.0000 0.0208 0.0303
The distribution shows that R-PSO and S-E PSO are the Figure 12 shows the run length distribution of vowel
fastest first hitting algorithms for the best-known value and dataset for MC criterion. The distributions show that S-E
C-S PSO has a slow increasing curve to reach best-known PSO is the fastest first hitting algorithm for the best known
value. All the PSO variants are able to find a solution of value and C-S PSO has slow increasing curve to reach the
required quality with a probability of 1.0. best known value. All the PSO variants are able to find a
solution of required quality with a probability of 1.
Table 4 reports the Percentage of number of runs (i.e., success %) that
reach best known objective function Interesting observations can be made from the RLDs for
TRW measure. All PSO variants reach the best-known
Dataset Criteria
Best Known
C-K PSO S-E PSO R-PSO C-S PSO E-S PSO value as reported by Sandra and Krink [4]. For all the
Value
benchmark data sets S-E PSO has the fastest hitting time
MC 0.198 100 100 100 100 100 and C-S PSO has the slowest hitting time to reach the best-
Iris TRW 7885.14 60 44 20 60 52 known value. It is also observed that the convergence of the
VRC 561.63 100 100 100 100 100 R-PSO is poor for most of the benchmark data sets.
MC 0.3527 100 100 100 100 100 Another interesting observation from the RLDs is that C-S
Cancer TRW 19323 100 100 100 100 100 PSO has the maximum probability of finding best-known
VRC 1026.26 100 100 100 100 100 value for most of the datasets. For TRW measure all PSO
MC 0.01984 100 100 100 100 100 variants show strong stagnation behavior.
Glass TRW 336.06 14 22 8 32 20
By considering VRC measure, S-E PSO has the fastest
VRC 124.62 98 100 100 100 100
hitting time and C-S PSO has the slowest hitting time for
MC 0.2906 100 100 100 100 100
the reported best-known value. It is also found that E-S
Vowel TRW 30690785 90 90 58 88 94
PSO and CS-PSO has the maximum probability of finding
VRC 1465.55 96 92 98 98 98
best-known value for all the datasets. The convergence of
The run length distribution of cancer data for VRC criterion the PSO variants for the VRC measure is faster comparing
is shown in Figure 6. All the variants reach the best-known with the TRW measure for most of the benchmark
value within 100 function evaluations. Figure 7 shows the problems. For the MC measure, the convergences of the
run length distribution of glass dataset for VRC criterion. PSO variants are slower compared to the VRC measure.
The distribution show that S-E PSO is the fastest first All variants reach the reported best-known value with a
hitting algorithm for the best known value and C-S PSO probability of 1.
has slow increasing curve to reach optimal value for the
dataset. All the PSO variants are able to find a solution of 7. Conclusion
required quality with a probability of almost 1. Figure 8
Few attempts have been made to solve data clustering
shows the run length distribution of vowel dataset for VRC problem using PSO algorithms. In this paper the
criterion. The distribution shows that S-E PSO has the performance evaluation of well-known PSO variants for
fastest first hitting algorithm for the best known value and
data clustering using real world data sets has been studied.
C-S PSO has slow increasing curve to reach best known The performances of the PSO variants were compared with
value. All the PSO variants are able to find a solution of the basic PSO algorithm, GA and DE algorithm. The
required quality with a probability of 0.90.
comparative evaluation shows, the PSO variants perform
The run length distribution of iris data for MC criterion is better for most of the benchmark datasets for the VRC,
shown in Figure 9. All variants find the best-known value TRW and MC criterion and also improves the best-known
within 100 function evaluations. All the PSO variants are solution available in the literature for the VRC and MC
able to find a solution of required quality with a probability measures. Run Length Distribution analysis has been
of 1. The run length distribution of cancer data for MC carried out to study the stagnation behavior and
criterion is shown in Figure 10. All variants finds the best convergence speed of the PSO variants. Run Length
know value within 2000 function evaluations. Figure 11 Distribution (RLD) plot of the PSO variants indicate the
shows the run length distribution of glass dataset for MC convergence is faster in the case of SE-PSO when a
criterion. The distributions show that E-S PSO is the fastest termination criterion is fixed to lesser number of functional
first hitting algorithm for the best known value and C-K evaluations. As the number of functional evaluation
PSO has slow increasing curve to reach optimal value. All increases, results of comparison reveals that no PSO variant
the PSO variants are able to find a solution of required dominates all the others on all benchmark datasets.
quality with a probability of 1.
210 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008
Figure 1. RLD of IRIS data set for TRW measure Figure 5. RLD of IRIS data set for VRC measure
Figure 2. RLD of CANCER data set for TRW measure Figure 6. RLD of CANCER data set for VRC measure
Figure 3. RLD of GLASS data set for TRW Figure 7. RLD of GLASS data set for VRC
Figure 4. RLD of VOWEL data set for TRW measure Figure 8. RLD of VOWEL data set for VRC measure
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008 211
Acknowledgements
The authors are thankful to the three reviewers
for the suggestions and comments to improve
the earlier version of the paper.
References
[1] Rui Xu and Donald Wunsch II, Survey of Clustering
Algorithms, IEEE Transactions on Neural Network, Vol. 16,
No 3, May 2005.
[2] Kennedy, J., Eberhart, R. C., Particle Swarm Optimization,
Proceedings of the 1995 IEEE International Conference on
Figure 9. RLD of IRIS data set for MC measure
Neural Networks, IEEE Press Piscataway, NJ, Vol. 4, pp.
1942 – 1948, 1995.
[3] Jain, A.K., Murty, M.N., and P.J. Flynn, Data Clustering: A
Review, ACM computing Survey, vol. 31, no 3, 1999.
[4] Sandra Paterlini, Krink, T., Differential evolution and particle
swarm optimization in partitional clustering, computational
statistics & data analysis 50(5), 1220-1247, 2006.
[5] Shi, Y., Eberhart, R., A modified particle swarm optimizer, In
Proceeding of the 1998 IEEE World Congress on
Computational Intelligence, Page 69-73, Piscataway, NJ,
IEEE Press, 1998.
[6] Shi, Y., Eberhart, R., Empirical study of particle swarm
optimization. In Proceeding of the 1999 IEEE World
Figure 10. RLD of CANCER data set for MC measure
Congress on Evolutionary Computing, Page 1945-1950.
Piscataway, NJ, IEEE Press, 1999.
[7] Eberhart, R., Shi, Y., Tracking and optimizing dynamic
systems with particle swarms. In Proceeding of 2001 IEEE
World Congress on Evolutionary Computing, Page 94-100.
Piscataway, NJ, IEEE Press, 2001.
[8] Clerc, M., Kennedy, J., The particle swarm-explosion,
stability, and convergence in a multidimensional complex
space. IEEE Transactions on Evolutionary Computation 6,
58-73. 2002.
[9] Mendes, R., Kennedy, J., Neves, J., The fully informed
particle swarm: simple, maybe better, IEEE Transaction on
Evolutionary Computing, 8(3): 204-210, 2004.
Figure 11. RLD of GLASS data set for MC measure
[10] Ratnaweera, A., Halgamuge, S.K., Watson, H.C., Self
organization hierarchical particle swarm optimizer with time
varying acceleration coefficients, IEEE Transaction on
Evolutionary Computing, 8(3): 240-255, 2004.
[11] Janson, S., Middendorf, M., A hierarchical particle swarm
optimizer and its adaptive variants, IEEE Transaction on
System, Man and Cybernetics, part B, 35(6), 1272- 1282,
2005.
[12] Chatterjee, A., Siarry, P., Non Linear inertia weight variation
for dynamic adaptation in particle swarm optimization,
Computers and Operations Research, 33, 859-871, 2006.
[13] Van Der Merwe, D.W., Engelbrecht, A.P., Data clustering
using particle swarm optimization, Proceedings of IEEE
212 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.1, January 2008