0% found this document useful (0 votes)

179 views10 pages

K-Means Clustering Explained

The document describes the K-means clustering algorithm AS 136. It presents the algorithm in detail, including the key steps of assigning points to clusters based on distance, updating cluster centers, and optimally transferring points between clusters to minimize within-cluster sum of squares. The algorithm uses techniques like live point sets and a quick transfer stage to improve efficiency compared to a related algorithm, AS 58. Testing on sample data shows AS 136 produces locally optimal solutions faster than AS 58.

Uploaded by

Christopher Pádua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

179 views10 pages

K-Means Clustering Explained

Uploaded by

Christopher Pádua

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Algorithm AS 136: A K-Means Clustering Algorithm

Author(s): J. A. Hartigan and M. A. Wong

Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979)
, pp. 100-108
Published by: Wiley for the Royal Statistical Society
Stable URL: [Link]
Accessed: 19-12-2015 14:17 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at [Link]
info/about/policies/[Link]
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@[Link].

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series C (Applied Statistics).

[Link]

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

APPLIED STATISTICS

100

FIND WIMUM ENTRY

C
C

6o PIvcr = ,xC
KK = 0

DO 70 I = II, s
K = INDEX(I)
IF (ABS(LU(IC, IIl
PIVOT = ABS(W(K,
KK

C
c
C

C
C

.LE. PIVOT) GOTO70

IIM

70 CONTIlUE
IF (IE *EQ. 0) GOTO 10
SWITCHORDER
ISAVE = INDEX(ltK)
= INDEX('II
INDMEX(KiC)
INDEX(II) = ISAVE
PUT IN COUJimSor LU ONE AT A TIME
IR
IF (INTIA IIBASE(II)
IF (II *EQ. MNGOTO90
J = II + 1
ix) 80 I = it M
K = INDEXVI)
= W(E,
LU(E, II)

80 CONTINUE

/ LU(ISAVE,

II)

90 CCNTINUE
EKE = IRCW
RETURN
END

AlgorithmAS 136

A K-MeansClustering
Algorithm
By J. A.

HARTIGAN

and M. A.

WONG

New Haven,Connecticut,
Yale University,
U.S.A.
Keywords: K-MEANS CLUSTERING ALGORITHM; TRANSFER ALGORITHM
LANGUAGE

ISO Fortran
DESCRIPTION AND PURPOSE

The K-meansclustering
algorithmis describedin detailby Hartigan(1975). An efficient
versionof thealgorithmis presentedhere.
The aim of the K-meansalgorithmis to divideM pointsin N dimensionsintoK clusters
sum of squaresis [Link] is not practicalto requirethatthe
so thatthe within-cluster
solutionhas minimalsum of squares againstall partitions,
exceptwhenM, N are smalland
K = 2. We seek instead"local" optima,solutionssuchthatno movementofa pointfromone
sum of squares.
clusterto anotherwill reducethe within-cluster
METHOD

The algorithmrequiresas inputa matrixof M pointsin N dimensionsand a matrixof

K initialclustercentresin N [Link] numberof pointsin clusterL is denotedby
NC(L). D(I, L) is theEuclideandistancebetweenpointI and clusterL. The generalprocedure
is to searchfora K-partitionwithlocallyoptimalwithin-cluster
sum of squares by moving
pointsfromone clusterto another.

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL

ALGORITHMS

101

Step 1. For each pointI (I = 1,2, ..., M), finditsclosestand secondclosestclustercentres,

IC1(I) and IC2(I) [Link] to clusterICI(I).
Step 2. Update theclustercentresto be the averagesof pointscontainedwithinthem.
Step 3. Initially,all clustersbelongto thelive set.
Step 4. This is the optimal-transfer
(OPTRA) stage:
Considereach pointI (I = 1,2, ..., M) in turn. If clusterL (L = 1,2, ..., K) is updatedin the
last quick-transfer
(QTRAN) stage, then it belongs to the live set throughoutthis stage.
Otherwise,
at each step,itis notin thelivesetifithas notbeen updatedin thelastM optimaltransfersteps. Let point I be in clusterLI. If LI is in the live set, do Step 4a; otherwise,
do Step 4b.
Step4a. Computetheminimumofthequantity,R2 = [NC(L) * D(I, L)2]/[NC(L)+ 1],over
all clustersL (L LI, L= 1,2, ..., K). Let L2 be the clusterwiththe smallestR2. If this
value is greaterthanor equal to [NC(L1) * D(I, Ll)2]/[NC(Ll) - 1], no reallocationis necessary
andL2 is thenewIC2(I). (Note thatthevalue [NC(L1) * D(I,LI)2]/[NC(Ll) - 1] is remembered
and willremainthesameforpointI untilclusterLI is updated.) Otherwise,
pointI is allocated
to clusterL2 and LI is thenew IC2(I). Clustercentresare updatedto be themeansof points
assignedto themif reallocationhas takenplace. The two clustersthat are involvedin the
transfer
of pointI at thisparticularstepare now in theliveset.
Step 4b. This stepis the same as Step 4a, exceptthatthe minimumR2 is computedonly
over clustersin the live set.
Step 5. Stop if the live set is [Link],go to Step 6 afterone pass throughthe
data set.
Step 6. This is thequick-transfer
(QTRAN) stage:
Considereach pointI (1 = 1,2, ..., M) in turn. Let LI = IC1(I) and L2 = IC2(I). It is not
necessaryto checkthe pointI ifboth theclustersLI and L2 have not changedin the last M
steps. Computethevalues
RI = [NC(L1) * D(I,L1)2]/[NC(L1)- 1] and R2 = [NC(L2) * D(I,L2)2]/[NC(L2)+ 11.
(As noted earlier,RI is remembered
and will remainthe same untilclusterLI is updated.)
If RI is less thanR2, pointI remainsin clusterL1. Otherwise,switchICl (I) and IC2(I) and
updatethecentresof clustersLI and L2. The twoclustersare also notedfortheirinvolvement
in a transfer
at thisstep.
tookplace in thelast M steps,go to Step4. Otherwise,
Step 7. If no transfer
go to Step 6.
STRUCTURE

SUBROUTINE KMNS (A, M, N, C, K, ICI, IC2, NC, AN1, AN2, NCP, D, ITRAN, LIVE,
ITER, WSS, IFAULT)
Formalparameters
A
Real array(M, N)
M
Integer
N
Integer
Real array(K, N)
C
K
IC1
IC2

Integer
Integerarray(M)
Integerarray(M)

NC
AN1
AN2

Integerarray(K)
Real array(K)
Real array(K)

the data matrix

the numberof points
the numberof dimensions
thematrixof initialclustercentres
the matrixof finalclustercentres
thenumberof clusters
the clustereach pointbelongsto
thisarrayis used to remember
theclusterwhich
each pointis mostlikelyto be transferred
to at
each step
output: the numberof pointsin each cluster
workspace:
workspace:
input:
input:
input:
input:
output:
input:
output:
workspace:

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

102

NCP
D
ITRAN
LIVE
ITE R
WSS
IEA ULT

APPLIED

Integerarray(K)
Real array(M)
Integerarray(K)
Integerarray(K)
Integer
Real array(K)
Integer

STATISTICS

workspace:
workspace:
workspace:
workspace:
input: themaximumnumberof iterationsallowed
output: thewithin-cluster
sumofsquaresofeach cluster
output: see Fault Diagnosticsbelow
FAULT DIAGNOSTICS

IFA ULT -0 No fault

IFA ULT 1 At leastone clusteris emptyaftertheinitialassignment.(A bettersetofinitial
clustercentresis called for)
IFA ULT = 2 The allowedmaximumnumberof iterationsis exceeded
IFA ULT = 3 K is less thanor equal to 1 or greaterthanor equal to M
Auxiliaryalgorithms
The followingauxiliaryalgorithmsare called: SUBROUTINE OPTRA (A, M, N, C, K,
IC1, IC2, NC, AN1, AN2, NCP, D, ITRAN, LIVE, INDEX) and SUBROUTINE QTRAN
(A, M, N, C, K, IC1, IC2, NC, AN1, AN2, NCP, D, ITRAN, INDEX) whichare included.
RELATED ALGORITHMS

is AS 113(A transfer
fornon-hierarchial
A relatedalgorithm
classification)
given
algorithm
uses swopsas wellas transfers
to tryto overcome
byBanfieldand Bassill(1977). Thisalgorithm
theproblemof local optima;thatis, forall pairsof points,a testis made whetherexchanging
more
theclustersto [Link] willbe substantially
expensivethanthepresentalgorithmforlargeM.
AS 58 (Euclideanclusteranalysis)givenby
The presentalgorithmis similarto Algorithm
aim at finding
a K-partition
of thesample,withwithin-cluster
Sparks(1973). Bothalgorithms
sum of squares whichcannot be reducedby movingpointsfromone clusterto the other.
of AlgorithmAS 58 does not satisfythis condition. At the
However,the implementation
stage whereeach point is examinedin turnto see if it should be reassignedto a different
cluster,onlythe closestcentreis used to checkforpossiblereallocationof the givenpoint;
a clustercentreother than the closest one may have the smallestvalue of the quantity
+ 1)} dI2,wheren,is thenumberof pointsin cluster/and di is thedistancefromclusterI
{nl/(n1
to the givenpoint. Hence, in general,AlgorithmAS 58 does not providea locally optimal
solution.
are testedon variousgenerateddata sets. The timeconsumedon the
The two algorithms
sum of squares of the resultingK-partitions
IBM 370/158and thewithin-cluster
are givenin
Table 1. While comparingthe entriesof the table, note that AS 58 does not give locally
forthe
optimalsolutionsand so shouldbe expectedto take less time. The WSS are different
two algorithmsbecause theyarriveat different
partitionsof the sets of points. A savingof
about 50 per centin timeoccursin KMNS due to using"live" setsand due to usinga quickiterationsby a factorof 4. Thus,
transfer
stagewhichreducesthenumberof optimaltransfer
KMNS comparedto AS 58 is locallyoptimaland takesless time,especiallywhenthenumber
of clustersis large.
TIME AND AccupAcY
The timeis approximately
equal to CMNKI whereI is the numberof [Link] an
data structures
IBM 370/158,C = 21 x 10-5sec. However,different
requirequite different
numbersof iterations;and a carefulselectionof initialclustercentreswill also lead to a
considerablesavingin time.
Storagerequirement:
M(N+ 3) + K(N+ 7).

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL

ALGORITHMS

103

TABLE 1

Time(sec)

WSS

1. M = 1000, N = 10, K = 10

AS 58

63-86

7056-71

2. M = 1000,N = 10, K = 10

AS 58
KMNS

4349
19 11

7779*70
7822-01

3. M = 1000, N = 10, K = 50

AS 58

135-71

4543-82

4. M = 1000,N = 10,K = 50

AS 58
KMNS

95-51

5131P04

5. M = 50, N = 2, K = 8

AS 58

0-17

21-03

(random
spherical
normal)

KMNS

(twowidelyseparatedrandomnormals)

(random
spherical
normal)

KMNS

(twowidely
separated
random
normals)
random
(twowidely
separated
normals)

KMNS

36-66

76-00
57-96

7065-59

456148
5096-23

0-18

21V03

Missingvariatevalues cannotbe handledby thisalgorithm.

The algorithmproducesa clustering
whichis onlylocallyoptimal;thewithin-cluster
sum
of squares may not be decreasedby transferring
a point fromone clusterto another,but
different
partitionsmayhave the same or smallerwithinclustersum of squares.
The numberof iterationsrequiredto attainlocal optimalityis usuallyless than 10.
ADDITIONALCOMMENTS

One way of obtainingthe initial clustercentresis suggestedhere. The points are

firstordered by their distancesto the overall mean of the sample. Then, for cluster
L (L = 1,2, ..., K), the {1 + (L -1) * [M/K]}thpointis chosen to be its initialclustercentre.
In effect,
someK samplepointsarechosenas [Link]
process,it is guaranteedthat no clusterwill be emptyafterthe initialassignmentin the
whichis dependenton theinputorderof thepoints,takes
subroutine.A quick initialization,
the firstK pointsas the initialcentres.
ACKNOWLEDGEMENTS

This researchis supportedby National ScienceFoundationGrantMCS75-08374.

REFERENCES

AS113. A transfer
C. F. and BASSILL,L. C. (1977). Algorithm
algorithm
fornon-hierarchical
classification.
[Link].,26, 206-210.
New York: Wiley.
HARTIGAN,J.A. (1975). Clustering
Algorithms.
AS 58. [Link].,22, 126-130.
SPARKS,D. N. (1973). Algorithm

BANFIELD,

*
C

SUBROUTINEKIRTS(A, M, N, C, K, ICI, ICZ, NC, ANI, AN2, NCP,

D, ITRAN, LIVE, ITER, WSS, IFAULT)

ALGORITHMAS 136 APPL. STATIST. (1979) VOL.28, NO.1

C
C

DIVIDE M POINTS IN N-DIMENSIONALSPACE INTO K CWSTERS

SO THIATTHE WITHIN CUJSTERSUM OF SQUARESIS MINIMIZED.

C
C
C

DIMENSION A(M, N, ICI(Ml, IC2(M), D(Ml

DIMENSIONC(K, N), NC(K), AN1(K), AN2(K), NCP(K)
DIMENSION ITRAN(K), LIVE(K), WSStK), DT(2)
DEFINE BIG TO BE A VERY LARGEPOSITIVE NUMBER
DATABIG /1,0E1O/

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

104

APPLIED
IFAULT = 3

C
C
C
C

LE.

OR, K

FOR EACH POITr I,

AND IC2(I).
IC1(I)

STATISTICS

GE, M) RETURN
FIND ITS TWO CLOSEST
ASSIGN IT TO IC1(I).

CENTRES,

DO 50 I = 1, M
= 1
ICl(I)
= 2
IC2(I)
DO 10 IL-- 1, 2
= 0.0
DT(IL)
DO 10 J = 1, N;
DA = A(I,
J) - C(IL,
J)
= DT(IL)
DT(IL)
+ DA * DA
10 COtNTINUE
IF (I)T(1)
.
T(2!)) GOTO 2o
.
= 2
ICI(I)
=
1
IC2(I)
TEMP = DT(1
DT(1)
DT(2)
= TEMP
DT()
20 DO 50 L = 3, K
DB = 0.0

DO 30 J = 1, NJ

C
C
C
C

C
C

c
C
C
c
C
C
C
C

DC = A(I,
J) - C(L, J)
DB = DB + DC * DC
IF (DO .GE. DT(2))
GOTO 50
30 CONTINUE
IF (DB *LT. DT(1)*
GCTO 40
= D
DT(2)
IC2(I
= L
GOTO 50
= DT(1l
40 DT(2)
= ICI(I)
IC2(I)
=
DO
DT(l)
= L
IC1(I)
50 C0NTIN4UE
UPDATE CUISTER CENTRES TO BE TIIE AVERAGE
(OF POINTS CONTAINED WITHIN THEM
DO 70 L
1, K
NC(L) = 0
= 1, tN
DO 6o
6o C(L, J) = 0.0
70 CONTINUE
DO 90 I = 1, M
L = IC(I)
TIC(L) = NC(L) + i
D)O So 3 = 1, 1N
80 C(L, J) = C(L, J) + A(I,
90 CONTINTUE

CHIECK TO SEE IF THIERE IS

ANY EMPTY CLUSTER AT THIS

STAGE

IFAULT = I
K
DO 100 L =1
IF (NC(L
*EQ. 01 RETURN
100 CONTINUE
IFAULT = 0
DO 120 L = 1, IC
AA = NC(L)
1, N
DO 110 J
110 C(L, J) - C(L, J) / AA
INITIALIZE
ANI(L)
IS
AN2(L) IS

ANI, AN2, ITRiAN AND NCP

- 1)
EQUAL TO NCML) / (NC(L)
EQUAL TO NC'L) / (NC(Ll + 1)
IF CLUSTER L IS UPDATED IN THE QUICiK-TRtANSFER STAGE
ITRAN(L)=l
ITRAlTN'L)=0 OTHJERWISE
IN THE1DOPTIMALITRANSFER STAGE, NCP(L) INDICATES THE STEP AT
WIiICHI CLUSTER L IS LAST UPDAITED

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

STATISTICAL
C
C
C

C
C
C
C
C
C

= AA / CM + 1.0)
AN2(L
AN1(L) = BIG
AN1(L) =
IF (AA GT. 1.0)
ITRAN(L)
I1
NCP(LL = -1
120 CONTINUE
INDEX = 0
DO 140 IJ _ 1, ITER

C
C
C
C
C
C
C
C

/ (AA - 1.0)

CALL OPTRA(A, M, N, C, K, ICI,

D, ITRAN, LIVE, INDEX)

IF (INDEX

IC2,

NC, ANI,

ANZ, NCP,

) GOtO 150

EAXCHPOINT IS TESTED IN TURN TO SEE IF IT SHOULD BE

REALDOCATED TO THE CLUSTER WIIICH IT IS MOST LIKELY TO
BE TRANSFERRED TO (IC2(I))
FROM ITS PRESENT CLUSTER (ICI(I)).
LOOP THIROUGHITHE DATA UNTIL NO FURTHER CHANGE IS TO TAKE PLACE

CALL QTRAN(A, Nl, N, C, K, ICi,

NCP, D, ITRAN, INDEX)

C
C
C

STOP IF NO] TRANSFER TOOK PLACE IN THE LAST M

OPTIMAL-TRANSFER STEPS

C
C
C
C
C
C

C
C

105

IN TIHIS STA\GE, THERE IS ONLY ONE PASS THROUGH THI DATA.

EACIH POINT IS REALLOCATED, IF NECESSARY, TO TIIE CLUSTER
TIHAT WILLi INDUCE THE MAXIMUMREDUCTION IN WITIIIN-CLUSTER
SUM OF SQUARES

C
C
C
C

ALGORITHMS

IN THE QUICK-TRANSFER STAGE, NCP(L) IS EQUAL TO THE STEP AT

WHICH CLUSTER L IS LAST UPDATED PLUS M

IC2,

NC, ANI,

AN2,

IF THERE ARE ONLY TWO CLUSTERS,

NO NEED TO RE-ENTER OPTIMAL-TRANSFER STAGE
(KC EQ. 2)

GOT 150

NCP HIAS TO BE SET TO 0 BEFORE ENTERING OPTRA

DO 130 L 1, K
130 NCP W = 0
140 CONTIINUE
SINCE TlE SPECIFIED
NUMBER OF ITERATIONS
IFAULT IS SET TO BE EQUAL TO 2.
MAY INDICATE UNFORESEEN LOPING
TIIS

IS EXCEEDED

IFAULT = 2
CUMPUTE WITHIIN CLUSTER SUM OF SQUARES FOR EACH CLUSTER

150 Do 160 L

K
1-,
WSS(L) S 0.0
jX i6o J = 1, N
0.0
C(L, J)
16o CoNTINUE
DO 170 I
1,
It

II = ICi(I)
DO 170 J = 1, N
C(II,

J) = C(II,

170 CONTINUE

J) + ACI,

DO 10 J
1, N
1, K
DOt 130 L
180 CML, J) = C(L, J) / FLOAT(NC(L))
DO 190 I = 1, M
II = ICI(I)
DA = A(I,
J) - CCII# J)
= WSS(II)
WSS(II)
+ DA * DA
190 COTlINUE
RETURN
END

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

106

APPLIED STATISTICS
C
C
C
C
C
C
C
C
C

SUBROUTINE OPTRA(A, M, N, C, K, ICI,

* AN2, NCP, D, ITRAN, LIVE, INDEX)
AIM(JRITHM AS 136.1
THIS

DEFINE

BIG TO BE A VERY IARGE POSITIVE

NCP(K)

NUMBER

IF CLUSTER L IS UPDATED IN THE LAST QUICK-TRANSFER STAGE,

IT BELONGS TO THE LIVE SET TIHROGHOUT THIS STAGE.
OTIHE-RVISE, AT EACH STEP, IT IS NOT IN THE LIVE SET IF IT
HItS NUT BEEN UPDATED IN THE LAST M OPTIMAL-TRANSFER STEPS
DO 10 L = 1, K
IF (ITRAN(L)
EQ. 1)
10 CONTINIJE
DO 100 I =1,
m
INDEX = INDEX, + 1
Li = 1C1(I)
I2 = IC2(I)
LL =
IF POINT
IF

(NC(LI)

LIVE(L)

- M + 1-

I IS THE ONLY MEMBER OF CWSTER IA, NO TRANSFER

*EQ.

IN GTO

IF L IHAS NUT YET BEEN UPDATED IN THIS

NO NE-ED TO RECOMPUTE D(I)

STAGE

IF (NCP(LI) .EQ. 0) GOTO30

DE =. 0.0
DO 20 J = 1, N
J
DF = A(I,
J - C(I=,
DE = DE + DF * DF
20 CONfTINUE
D(I) = DE * AN1(L1,
FIND TIM CLUSTERWITH MINIMUMf2
30 BJA= 0.0
DO 40 J

1, N

DB = A(I,
J. - C(IR,
DA = DA + M * D

NO.1

DATA BIG /1.OE1O/

C
C

VOL.28,

IS THE OPTIMAL-TRANSFER STAGE

(1979)

DIMENSION A(M, N), IC1(M),

IC2WMv, D(M)
DIMENSION C(K, N), NC(KW, AN1(K), AN2(K),
DIMENSION ITRAN(K),
LIVE(K)

C
C
C
C
C
C

STATIST,

NC, ANI,

EACII POINT IS REALWCATED, IF NECESSARY, TO THE

CLJSTER THlAT WILL INDUCE A MAXIMUMREDUCTION IN
TH}I1 WITHIN-CLUSTER SUM OF SQUlARES

C
C
C
C
C
C
C
C
C

APPL.

IC2,

40 CONTINUE
R2 = DA * AN2()
Do 60 L
Is K

C
C
C
C
C
C

IF I IS GsREATERTHAN OR EQUAL TO [Link]),

THEN tl IS
NUT IN TIHE LIVE SET.
IF THIS IS TRUE, WE ONLY NEED TO
CONSIDER CUJSTERS TIAT ARE IN TIE LIVE SET FOR POSSIBLE
TRANSFER OF POINT I.
UTHERWISE, WE NEED TO CONSIDER
ALL POSSIBLE CLUSTERS

.GE,

LIV,E(L)

AND.

I .GE.

LIVE(L)

i OR. L ,EQ. LL) GUTO60

L .EQ.
RR = NT / AN2(L)

OR.

DC = 0.0

DO 50 J = 1, NJ
DD = A(I,

J) - C'L,

DC = DC + DD * DD
n
IF (DC .GE. RR) GA

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

107

STATISTICAL ALGORITHMS
50 CONTINUE3
R2 = DC * AN2(L)
L2 = L

6o CoNTINUE3
IF

C
C
C

C
C
C
C

(R2

.LT.

DIV)

G(FO 70

IF NO TRAiNSFER IS

NECESSARY,

12 IS THE NE?t IC2(I)

= J2
IC2(I)
GTrO go
UPDATE CLJSTER CENTRES,
FOR CLUSTERS L AND 12,

LIVE, NCP, AN1 AND AN2

AND IC2(I)
AND) UPDAT13 IC1(I)

70 INDIX = 0

= M+ I
LIVE(L1)
= M + I
LIVEL'T)
= I
NCP(L1)
= I
NCP(LE)
ALl = NC(1)
ALY = ALA - 1. 0
ALE2 = C S(2)
ALT = ALTE?+ 1,0

1, N

DO 8o0J

C(L1,
C(L2,

J) = (C(L,r
J) - (C(2,

80 CONTINUE)

J) * ALI - A(I,
J) * AL9E + A(I,

1
NC(L1) = NCC(L1I
= NC(1 2) + 1
NC(L2)
= ALW / ALL
AN2 (LI)
AN1(LIL = BIG
= A1T
AN(L1
IF (ALW GT. 1.0)
= ALT / AU
AN1(L2)
= ALT / (ALT + 1,0)
AN2(L2)
= 2
IC1(I)
= LI
IC2(I)
90 COnUE=
IF (INDEX .EQ, MN RETURN
100 CONITINUE
1, 1
DOI 110 L
C
C
C
C
C

0
ITRAN (L)
- LIVE(L)
LIVE(L!
110 C(ONTINUE
RETURIN
END

C
C
C
C
C
C
C
C
C
C

C
C
C

(ALW

ALW
ALT

1.0)

ITRAN(Ll IS SET TO ZERO BEFORE ENTERING, QTRAXL

liAS TO BE DECREASED BY M BEFORE
ALSO, LIVE(L)
RE-ENTERING OPTRAt

J))
J))

- M

SUBROlTINE QTRAN(A, M, N, C, K,
ANE, NCP, D, ITRAN, INDEX)

ALORITIJULAS 136.?.

IC1,

IC2,

NC, AN1,

APPL. STATIST. (17q)

VOL.28, NO.1

TIIIS IS TIIE QUICK TRANiSFER STAGE.

IS TIHE CLJSTER WHIICII POINT I BELONGS TO.
IC1(IN
IS TE CLUJSTER WHIICHIP0INT I IS MOST
IC2(I)
LIKEL.Y TO BE [Link] TO.
AND IC2(I*
ARE SWITCHED, IFs
FOR EACH POINT I, IC1(I)
NECESSARY, TO REDUCE WITHIN CLUSTER SUM OF SQUARES.
THE CLJUSTER CENTRES ARE UPDATED AFTE1R EACH STEP
DIMENSION
DIMENSIOi
DEFINE
DATA BIG

IC2(M),
A(V, N), IC1(M),
C(KC, N1), NC(KC), AN1(1),

D(M)
AN2(K),

BIG TO BE A VERY LRGlE POSITIVE

NCP(K),

ITRAN(K)

NUMIIER

/1,OE1O/

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

APPLIED STATISTICS

108

IN THE )OPTIMAL-TRANSFERSTAGE, NCP(L) INDICATES THE

STEP AT WHICHICUJSTER L IS LAST UPDATED
IN TIHE QUICIC-TRANSFER STAGE, NCP(LN IS EQUAL TO THE
STEP AT WHICII CLUSTER L IS LAST UPDATED PLUS M

C
C
C
C
C

ICOUN
ISTEP

=
=

0
0

10 DO 70 I = 1, M
ICWUN ICOlUN+ 1
ISTEP + 1
ISTEP
LI = IC1(I)
L2 = ICZ(I)

IF POINT I IS THE ONLY MEMBEROF CUJSTER Li,

C
C

*EQ. 1) GOTO 6

IF (NC(.L1)
C

C
C
C
C
C

NO TRANSFER

IF IST1EP IS GREATER THIANNCP(LA), NO NEED TO RECOMPUTE

DISTANCE FROM4POINT I TO CLUSTER LI
NOTE THAT IF CLUSTER LI IS LAST UPDATED EXACTLY MASTEPS
AGO WE STILL NEED TO COMPUTE THE, DISTANCE FROM POINT I
TO CLUSTER LI

IF (ISTEP
0.0
DO 20 J =
DB = A(I,
DA = DA +
20 CONTIlUE
D(I) = DA
DA =

C
C
C

.GT. NCP(L1))
1, NJ
J) - C(LX,
DO * DO
*

GOTO 30

ANl(Li)

IF ISTEP IS GREATER THAN OR EQUAL TO BOTl NCP(Li) AND

NCP(12) THERE WILL BE NO TRANSFER OF POINT I AT THIS STEP

30 IF (ISTEP
R2 = D(I)

.AND. ISTEP

.GE. NCP(i)
/ AN2(L2)

GE. NCP(L2))

GOTo

DD = 0.0

DO 40 J = 1, N
DE = A(I, J) - C(L2,
DD =

DO + DE

* DE

IF (DD
GE. R12) GOTr
40 COtNTINUE
C

UPD}ATECLJUSTERCENTRES, NCP, NC, ITRAN, ANI AND AN2

AND IC2(I).
FOR CLlJSTERS LA AND 12.
ALSO, UPDATE IC1(I)
NOTE THIAT IF Al4Y UPDATING OCCURS IN THIS STAGE,
INDEX IS SET BACK TO 0

C
C
C
C
C

0
0
INDEX
ITRAN(Li) = 1

ICOUN

= 1

ITRAWT(2)

NCP(L1) = ISTEP + M
NCP(L2) = ISTEP + M
ALi = N4C
(LI)
ALW = ALL - 1.0
AI2 = NC(U.)
ALT = AL2 + 1. 0

DO 507 = 1, N
J) * ALI - A'I, J.) / ALU
C(LI, J) = (C(Li,
(C(12, J) * ALS + A(I, J)) / ALT
C(L2, J)
50 CONTINIUE
- 1
=NC(Li
NC(LI)
NC(L2) = NC(12) + 1
=
ALU( / ALi
AN2(L1)
AN1(LU) = BIG
= ALW / (ALW - 1.0)
IF (ATJ .GT. 1.0) AN1()
AN1(L-2) = ALT / ALI
AN2(12) = ALT / t(ALT + 1.0)
-= 2
ICiCI)
IC2(I)
C

IF NO REALUICATION TOOK PLACE IN TliE LAST M STEPS,

C
C

IF (ICOUN .EQ.
70 CONTINUE
GOYTOl
10
IND

RERN

M) RETURN

This content downloaded from [Link] on Sat, 19 Dec 2015 [Link] UTC
All use subject to JSTOR Terms and Conditions

1979 - Hartigan - A K-Means Algorithm PDF
No ratings yet
1979 - Hartigan - A K-Means Algorithm PDF
10 pages
Hartigan 1979 Kmeans
No ratings yet
Hartigan 1979 Kmeans
10 pages
Clustering1 K-Means 1979 3
No ratings yet
Clustering1 K-Means 1979 3
10 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
7 K-Means
No ratings yet
7 K-Means
55 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
K-Means Clustering Overview
No ratings yet
K-Means Clustering Overview
5 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
Agglomerative Mean-Shift Clustering
No ratings yet
Agglomerative Mean-Shift Clustering
7 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Report 1
No ratings yet
Report 1
3 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Partitioning-Based Clustering Overview
No ratings yet
Partitioning-Based Clustering Overview
27 pages
Pattern Recognition Letters: Krista Rizman Z Alik
No ratings yet
Pattern Recognition Letters: Krista Rizman Z Alik
7 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
33 pages
Unit 4
No ratings yet
Unit 4
22 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
Lecture08b Kmeans
No ratings yet
Lecture08b Kmeans
10 pages
Algo
No ratings yet
Algo
59 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
KMeans Variants
No ratings yet
KMeans Variants
27 pages
Kmeans and Adaptive K Means
No ratings yet
Kmeans and Adaptive K Means
6 pages
K Means Clustering
No ratings yet
K Means Clustering
37 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
AI ML Lecture 6
No ratings yet
AI ML Lecture 6
20 pages
Clustering
No ratings yet
Clustering
17 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
K Means
No ratings yet
K Means
23 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
7 pages
2.10 Partitioning Methods - K-Means and K-Medoids
No ratings yet
2.10 Partitioning Methods - K-Means and K-Medoids
38 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Lecture7 KMeans
No ratings yet
Lecture7 KMeans
30 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
7 Clustering1
No ratings yet
7 Clustering1
72 pages
Optimal k-Means++ for Scalar Data
No ratings yet
Optimal k-Means++ for Scalar Data
6 pages
Cluster Center Initialization Algorithm For K-Means Clustering
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
10 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Clustering MIT 15.097 Course Notes
No ratings yet
Clustering MIT 15.097 Course Notes
9 pages
ML Cat 2 - 3
No ratings yet
ML Cat 2 - 3
13 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
56 pages
Lect 4
No ratings yet
Lect 4
34 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
K-Means Clustering Algorithm Overview
No ratings yet
K-Means Clustering Algorithm Overview
47 pages
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
No ratings yet
WINSEM2021-22 ECE6093 ETH VL2021220505450 Reference Material I 23-03-2022 Slides Kmeans
28 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering Techniques for CS Students
100% (1)
Clustering Techniques for CS Students
26 pages
K Means
No ratings yet
K Means
10 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
Intro to Cluster Analysis
No ratings yet
Intro to Cluster Analysis
90 pages
Graph Partitioning & Clustering Techniques
No ratings yet
Graph Partitioning & Clustering Techniques
14 pages
Thirteen Rules: Bradley Efron
No ratings yet
Thirteen Rules: Bradley Efron
1 page
Richards 2005
No ratings yet
Richards 2005
11 pages
Landsat8DataUsersHandbook PDF
No ratings yet
Landsat8DataUsersHandbook PDF
106 pages
Confidence Intervals For The Kappa Statistic: 1 Description
No ratings yet
Confidence Intervals For The Kappa Statistic: 1 Description
8 pages
Portonacional To
No ratings yet
Portonacional To
1,680 pages
Curso Ci Encias Biol Ogicas LCE0164 Matem Atica Aplicada Profa. Roseli Aparecida Leandro
No ratings yet
Curso Ci Encias Biol Ogicas LCE0164 Matem Atica Aplicada Profa. Roseli Aparecida Leandro
2 pages
Sigmund Freud and Psychoanalysis
No ratings yet
Sigmund Freud and Psychoanalysis
56 pages
Indemnity in International Oil and Gas Contracts
100% (1)
Indemnity in International Oil and Gas Contracts
23 pages
Ballad of A Mother
91% (23)
Ballad of A Mother
4 pages
Testing, Assessing, and Teaching Chapter 1
91% (11)
Testing, Assessing, and Teaching Chapter 1
11 pages
The Third & Fourth Misconception - The Fallacy of Pragmatism & Sociologism
No ratings yet
The Third & Fourth Misconception - The Fallacy of Pragmatism & Sociologism
3 pages
The Language of Influence: Ericksonian Language Patterns & Hypnosis
No ratings yet
The Language of Influence: Ericksonian Language Patterns & Hypnosis
2 pages
Calculus I Syllabus - NWSSU
No ratings yet
Calculus I Syllabus - NWSSU
10 pages
Lesson Plan in Mathematics
100% (1)
Lesson Plan in Mathematics
4 pages
Allen UtopiaEuropeanHumanism 1963
No ratings yet
Allen UtopiaEuropeanHumanism 1963
18 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
2 pages
Admission Committee For Professional Courses (ACPC) : 2012-13
No ratings yet
Admission Committee For Professional Courses (ACPC) : 2012-13
47 pages
Carl Jung Theory of Personalities G3 PPT Presentation 20250217 222117 0000
No ratings yet
Carl Jung Theory of Personalities G3 PPT Presentation 20250217 222117 0000
19 pages
VCS105 Chapter 4 - 2022
No ratings yet
VCS105 Chapter 4 - 2022
20 pages
Slave Mentality: The Bane of Development in Africa
No ratings yet
Slave Mentality: The Bane of Development in Africa
6 pages
Constituent Structure & Tests
67% (3)
Constituent Structure & Tests
32 pages
Developing a Successful KM Strategy
No ratings yet
Developing a Successful KM Strategy
41 pages
Ahadith 11-15: Ethics & Kindness in Islam
67% (3)
Ahadith 11-15: Ethics & Kindness in Islam
3 pages
Embodied AI: Next Step to AGI
No ratings yet
Embodied AI: Next Step to AGI
15 pages
Gawi, Gawa at Kagawian
No ratings yet
Gawi, Gawa at Kagawian
3 pages
Symbolism in Coleridge's Ancient Mariner
No ratings yet
Symbolism in Coleridge's Ancient Mariner
10 pages
Strategic Leadership Insights
100% (2)
Strategic Leadership Insights
33 pages
DAY - Yahweh and The Gods and Goddesses of Canaan RESENHA
67% (3)
DAY - Yahweh and The Gods and Goddesses of Canaan RESENHA
3 pages
Keeping Quiet
No ratings yet
Keeping Quiet
1 page
Inquiry Lesson Plan
100% (1)
Inquiry Lesson Plan
3 pages
Field Study Manual for Pre-Service Teachers
0% (2)
Field Study Manual for Pre-Service Teachers
19 pages
Demand Forecasting
No ratings yet
Demand Forecasting
20 pages
Middle School Thinking Maps Assessment
No ratings yet
Middle School Thinking Maps Assessment
6 pages
How To Pick Up Fluency in English Thro' Self-Study: Fluentzy
No ratings yet
How To Pick Up Fluency in English Thro' Self-Study: Fluentzy
20 pages

K-Means Clustering Explained

Uploaded by

K-Means Clustering Explained

Uploaded by

Algorithm AS 136: A K-Means Clustering Algorithm

Author(s): J. A. Hartigan and M. A. Wong

FIND WIMUM ENTRY

.LE. PIVOT) GOTO70

The algorithmrequiresas inputa matrixof M pointsin N dimensionsand a matrixof

Step 1. For each pointI (I = 1,2, ..., M), finditsclosestand secondclosestclustercentres,

the data matrix

IFA ULT -0 No fault

Missingvariatevalues cannotbe handledby thisalgorithm.

One way of obtainingthe initial clustercentresis suggestedhere. The points are

This researchis supportedby National ScienceFoundationGrantMCS75-08374.

SUBROUTINEKIRTS(A, M, N, C, K, ICI, ICZ, NC, ANI, AN2, NCP,

ALGORITHMAS 136 APPL. STATIST. (1979) VOL.28, NO.1

DIVIDE M POINTS IN N-DIMENSIONALSPACE INTO K CWSTERS

DIMENSION A(M, N, ICI(Ml, IC2(M), D(Ml

FOR EACH POITr I,

CHIECK TO SEE IF THIERE IS

ANY EMPTY CLUSTER AT THIS

ANI, AN2, ITRiAN AND NCP

CALL OPTRA(A, M, N, C, K, ICI,

EAXCHPOINT IS TESTED IN TURN TO SEE IF IT SHOULD BE

CALL QTRAN(A, Nl, N, C, K, ICi,

STOP IF NO] TRANSFER TOOK PLACE IN THE LAST M

IN TIHIS STA\GE, THERE IS ONLY ONE PASS THROUGH THI DATA.

IN THE QUICK-TRANSFER STAGE, NCP(L) IS EQUAL TO THE STEP AT

IF THERE ARE ONLY TWO CLUSTERS,

NCP HIAS TO BE SET TO 0 BEFORE ENTERING OPTRA

SUBROUTINE OPTRA(A, M, N, C, K, ICI,

BIG TO BE A VERY IARGE POSITIVE

IF CLUSTER L IS UPDATED IN THE LAST QUICK-TRANSFER STAGE,

I IS THE ONLY MEMBER OF CWSTER IA, NO TRANSFER

IF L IHAS NUT YET BEEN UPDATED IN THIS

IF (NCP(LI) .EQ. 0) GOTO30

DATA BIG /1.OE1O/

IS THE OPTIMAL-TRANSFER STAGE

DIMENSION A(M, N), IC1(M),

EACII POINT IS REALWCATED, IF NECESSARY, TO THE

IF I IS GsREATERTHAN OR EQUAL TO [Link]),

i OR. L ,EQ. LL) GUTO60

12 IS THE NE?t IC2(I)

LIVE, NCP, AN1 AND AN2

ITRAN(Ll IS SET TO ZERO BEFORE ENTERING, QTRAXL

APPL. STATIST. (17q)

TIIIS IS TIIE QUICK TRANiSFER STAGE.

BIG TO BE A VERY LRGlE POSITIVE

IN THE )OPTIMAL-TRANSFERSTAGE, NCP(L) INDICATES THE

IF POINT I IS THE ONLY MEMBEROF CUJSTER Li,

IF IST1EP IS GREATER THIANNCP(LA), NO NEED TO RECOMPUTE

IF ISTEP IS GREATER THAN OR EQUAL TO BOTl NCP(Li) AND

UPD}ATECLJUSTERCENTRES, NCP, NC, ITRAN, ANI AND AN2

IF NO REALUICATION TOOK PLACE IN TliE LAST M STEPS,

You might also like