K mean Clustering
K-mean Clustering
• KMeans Clustering is an Unsupervised Learning algorithm,
which groups the unlabeled dataset into different clusters.
Here K defines the number of pre-defined clusters that
need to be created in the process, as if K2, there will be
two clusters, and for K3, there will be three clusters, and
so on.
• It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one
group that has similar properties.
K-mean Clustering
• it allows us to cluster the data into different groups and a
convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any
training.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this algorithm is
to minimize the sum of distances between the data point
and their corresponding clusters.
K-mean Clustering
• The k-means clustering algorithm mainly performs two
tasks:
• Determines the best value for K center points or centroids
by an iterative process.
• Assigns each data point to its closest k-center. Those data
points which are near to the particular k-center, create a
cluster.
• Hence each cluster has datapoints with some
commonalities, and it is away from other clusters.
• The below diagram explains the working of the K-means
Clustering Algorithm:
How Does k means Algorithm works
• The working of the KMeans algorithm is explained in the below
steps:
• Step-1 Select the number K to decide the number of clusters.
• Step-2 Select random K points or centroids. It can be other
from the input dataset).
• Step-3 Assign each data point to their closest centroid, which
will form the predefined K clusters.
• Step-4 Calculate the variance and place a new centroid of each
cluster.
• Step-5 Repeat the third steps, which means reassign each
datapoint to the new closest centroid of each cluster.
• Step-6 If any reassignment occurs, then go to step-4 else go to
FINISH.
• Step-7: The model is ready.
K-Means Algorithm
• Q. Apply K(=2)-Means algorithm over the data (185, 72), (170,
56), (168, 60), (179,68), (182,72), (188,77) up to two iterations
and show the clusters. Initially choose first two objects as initial
centroids.
• Given, number of clusters to be created (K) = 2 say c1 and c2,
number of iterations = 2 and
• also, first two objects as initial centroids:
Centroid for first cluster c1 = (185, 72)
Centroid for second cluster c2 = (170, 56)
The given data points can be represented in tabular form as:
• Iteration 1: Now calculating similarity by using Euclidean
distance measure as:
• Since, the clustering doesn’t change after second iteration, so
terminate the iteration even if question doesn’t say so.
Disadvantages of k-means
Choosing 𝑘 manually.
Being dependent on initial values.
Clustering data of varying sizes and density.
Clustering outliers.
Scaling with number of dimensions.