0% found this document useful (0 votes)
31 views18 pages

K Mean Algorithm

K-means clustering is an unsupervised learning algorithm that groups unlabeled datasets into predefined clusters based on similarity. It operates iteratively by selecting centroids, assigning data points to the nearest centroid, and recalculating centroids until no changes occur. The algorithm has limitations, including the need for manual selection of the number of clusters and sensitivity to initial values.

Uploaded by

saadalimubarack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views18 pages

K Mean Algorithm

K-means clustering is an unsupervised learning algorithm that groups unlabeled datasets into predefined clusters based on similarity. It operates iteratively by selecting centroids, assigning data points to the nearest centroid, and recalculating centroids until no changes occur. The algorithm has limitations, including the need for manual selection of the number of clusters and sensitivity to initial values.

Uploaded by

saadalimubarack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

K mean Clustering

K-mean Clustering
• KMeans Clustering is an Unsupervised Learning algorithm,
which groups the unlabeled dataset into different clusters.
Here K defines the number of pre-defined clusters that
need to be created in the process, as if K2, there will be
two clusters, and for K3, there will be three clusters, and
so on.
• It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one
group that has similar properties.
K-mean Clustering
• it allows us to cluster the data into different groups and a
convenient way to discover the categories of groups in the
unlabeled dataset on its own without the need for any
training.
• It is a centroid-based algorithm, where each cluster is
associated with a centroid. The main aim of this algorithm is
to minimize the sum of distances between the data point
and their corresponding clusters.
K-mean Clustering
• The k-means clustering algorithm mainly performs two
tasks:
• Determines the best value for K center points or centroids
by an iterative process.
• Assigns each data point to its closest k-center. Those data
points which are near to the particular k-center, create a
cluster.
• Hence each cluster has datapoints with some
commonalities, and it is away from other clusters.
• The below diagram explains the working of the K-means
Clustering Algorithm:
How Does k means Algorithm works
• The working of the KMeans algorithm is explained in the below
steps:
• Step-1 Select the number K to decide the number of clusters.
• Step-2 Select random K points or centroids. It can be other
from the input dataset).
• Step-3 Assign each data point to their closest centroid, which
will form the predefined K clusters.
• Step-4 Calculate the variance and place a new centroid of each
cluster.
• Step-5 Repeat the third steps, which means reassign each
datapoint to the new closest centroid of each cluster.
• Step-6 If any reassignment occurs, then go to step-4 else go to
FINISH.
• Step-7: The model is ready.
K-Means Algorithm

• Q. Apply K(=2)-Means algorithm over the data (185, 72), (170,


56), (168, 60), (179,68), (182,72), (188,77) up to two iterations
and show the clusters. Initially choose first two objects as initial
centroids.
• Given, number of clusters to be created (K) = 2 say c1 and c2,
number of iterations = 2 and
• also, first two objects as initial centroids:
Centroid for first cluster c1 = (185, 72)
Centroid for second cluster c2 = (170, 56)
The given data points can be represented in tabular form as:
• Iteration 1: Now calculating similarity by using Euclidean
distance measure as:
• Since, the clustering doesn’t change after second iteration, so
terminate the iteration even if question doesn’t say so.
Disadvantages of k-means

Choosing 𝑘 manually.
Being dependent on initial values.
Clustering data of varying sizes and density.
Clustering outliers.
Scaling with number of dimensions.

You might also like