Academia.eduAcademia.edu

Clustering via normal mixture models

1997

Abstract

We consider a model-based approach to clustering, whereby each observation is assumed to have arisen from an underlying mixture of a nite number of distributions. The number of components in this mixture model corresponds to the number of clusters to be imposed on the data. A common assumption is to take the component distributions to be multivariate normal with perhaps some restrictions on the component covariance matrices. The model can be tted to the data using maximum likelihood implemented via the EM algorithm. There is a number of computational issues associated with the tting, including the speci cation of initial starting points for the EM algorithm and the carrying out of tests for the number of components in the nal version of the model. We shall discuss some of these problems and describe an algorithm that attempts to handle them automatically.