K-Means Algorithm

K-Means Clustering is an unsupervised learning algorithm that groups data into K clusters based on similarity, minimizing variance within each cluster. The algorithm involves selecting the number of clusters, initializing centroids, assigning data points to the nearest centroid, computing new centroids, and repeating until convergence. Key concepts include intra-cluster and inter-cluster distances, the Elbow Method for determining optimal K, and the use of StandardScaler to standardize features for improved clustering performance.

Uploaded by

wilsonabi2406

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

46 views6 pages

K-Means Algorithm

Uploaded by

wilsonabi2406

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

1. What is K-Means Clustering? k-Means is an unsupervised learning algorithm used for clustering data into K groups based on similarity. It minimizes the variance within each cluster. 2. Steps of K-Means Algorithm Step 1: Choose the Number of Clusters (K) * Decide the number of clusters K in advance. * This can be done using methods like the Elbow Method or Silhouette Score. Step 2: Initialize K Cluster Centroids * There are two common ways to initialize: 1. Random Initialization: Randomly select K data points as initial centroids. 2. KMeans++ Initialization: Smartly selects initial centroids to improve accuracy. ‘Step 3: Assign Each Data Point to the Nearest Centroid * Calculate the Euclidean distance from each date point to each centroid. * Assign each point to the nearest centroid, forming K clusters. Step 4: Compute New Centroids * For each cluster, compute the new centroid as the mean (average) of all points in that cluster. Step 5: Repeat Until Convergence * Repeat Steps 3 and 4 until: © Centroids do not change (or change very little). o Amaximum number of iterations is reached.3. Example Calculation (Step-by-Step) Let's go through a small dataset example. Dataset (2D Points): Data Points = {(2, 3), (3, 3), (5, 7), (8,8), (9, 10)} Let's assume K = 2 (two clusters). Step 1: Initialize K Centroids Assume random initial centroids: © = (23) + =(8,8) Step 2: Assign Points to Nearest Centroid We calculate Euclidean distance: d= J (ae — 21)? + (w — wr) Data Point Distance to (2, 3) Distance to C'p(3, 8) Assigned Cluster (23) oO 7.81 ‘Cluster 1 @3) 41 7.21 ‘Cluster 1 (57) 5.00 3.61 Cluster 2 @a) 781 ° Cluster 2 (9.10) 10.30 2.24 Cluster 2 ‘Step 3: Compute New Centroids New centroid for Cluster 1: New centroid for Cluster 2: = ( ana = (7.33,8.33) Step 4: Repeat Until Convergence « Reassign points to new centroids. « Update centroids again. « Stop when centroids don’t change.1, Intra-Cluster Distance (Within-Cluster Distance) «The average or sum of distances between data points within the same cluster. + Measures how compact a cluster is. Formula: Intra-Cluster Distance = ys d(z;,C) ieCe Where: + d(2;, Cy) is the distance between a data point «; and its cluster centroid C. * The lower the intra-cluster distance, the better the clustering (points are close together). 2. Inter-Cluster Distance (Between-Cluster Distance) Definition: « The distance between different clusters (usually the centroids). © Measures how well-separated clusters are. Formula: Inter-Cluster Distance = d(C;,C;) Where: © C;and CG; are centroids of two different clusters. © The higher the inter-cluster distance, the better the separation. 3. Good Clustering Characteristics fi Low Intra-Cluster Distance (points inside clusters are close to each other). High Inter-Cluster Distance (clusters are well-separated from each other). This balance ensures that clusters are coherent and distinct. The Elbow Method is a technique used to determine the optimal number of clusters (K)1. Concept of the Elbow Method ‘The idea is to plot the Within-Cluster Sum of Squares (WCSS) vs. K and find the “elbaw point” where adding more clusters doesn*t significantly improve the clustering. * WCSS (Within-Cluster Sum of Squares) measures the compactness of the clusters: K Wess =S> 3 Ile — Cl? kai, where: © aj isa data pointin cluster Ci, © Gy is the centroid of cluster k, © ||2; — C;||? is the squared Euclidean distance. * Goal: Minimize WCSS while keeping K reasonable. 2. Steps of the Elbow Method Step 1: Run K-Means for Different Values of K «Choose different values of KK (e.g,, from 1 to 10). * For each K,, calculate the WCSS. Step 2: Plot K vs. WCSS © On the x-axis, plot K (number of clusters). + Onthe y-axis, plot WCSS. Step 3: Find the Elbow Point @ The "elbow" is where WCSS stops decreasing sharply. This means increasing K further does nat reduce WCSS significantly. «The optimal Kis at the elbow point.Elbow Method for Optimal K 400 350 ‘. 300 250 wess 200 150 100 * 50 e--_ 2 4 6 8 10 Number of Clusters (K) Why Use StandardScaler? 1. K-Means Uses Euclidean Distance © If features have different scales, larger values dominate smaller ones. © Example: = Annual Income (0-100) = Spending Score (0-1) — Will be ignored by K-Means due ta its small range. 2. Improves Performance © Standardization makes clustering more accurate by balancing feature influence. How StandardScaler Works It transforms data using the formula: Where: «X= original data © = mean of the feature «= standard deviation of the featureExample Imagine this dataset: ‘Customer Annual Income ($) 20 8 80 40 Step 1: Compute Mean & Standard Deviation * Annual Income: Mean j: = 46.67, Std o = 24.94 + Spending Score: Mean j: = 30, Std o = 16.33, Step 2: Apply Standardscaler Formula Customer Annual Income (Scaled) A (20 — 46.67)/24.94 = —1.07 (80 — 46.67)/24.94 = 1.34 c (40 — 46.67)/24.94 = -0.97 Spending Score 8 ‘Spending Score (Scaled) (10 — 30)/16.33 = —1.22 (50 - 30)/16.33 = 1.22 (30 — 30)/16.33 =0 Now, both features have mean = 0 and variance = 1, making them equally important. 1. What is WCSS? WCSS (Within-Cluster Sum of Squares) measures how close data points are to their assigned cluster centroids. * Lower WCSS — Clusters are tight and well-defined. ‘+ Higher WCSS — Clusters are spread out (not ideal). © Formula: k woss=S> FIle sl? i=l eG; © w= Data point © 44 = Centroid of cluster i © C;=Setof points in cluster i

K-Means Clustering Overview and WCSS
No ratings yet
K-Means Clustering Overview and WCSS
7 pages
Data Mining-4
No ratings yet
Data Mining-4
9 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
4 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
12 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
K-Means Clustering
No ratings yet
K-Means Clustering
14 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K-Means Clustering and Elbow Method Guide
No ratings yet
K-Means Clustering and Elbow Method Guide
53 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
K-Means Clustering Guide & Python Implementation
No ratings yet
K-Means Clustering Guide & Python Implementation
21 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
16 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
24 pages
Algo
No ratings yet
Algo
59 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Understanding Clustering Methods in ML
No ratings yet
Understanding Clustering Methods in ML
15 pages
K Clustering
No ratings yet
K Clustering
28 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
8 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Elbow Method For Optimal Cluster Number in K-Means
No ratings yet
Elbow Method For Optimal Cluster Number in K-Means
8 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
K-Means Clustering on Iris Data
No ratings yet
K-Means Clustering on Iris Data
2 pages
K-Means Clustering Clearly Explained
No ratings yet
K-Means Clustering Clearly Explained
12 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
4 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Unit 4 Aiml
No ratings yet
Unit 4 Aiml
24 pages
K-means Clustering Explained
No ratings yet
K-means Clustering Explained
38 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 4
No ratings yet
Unit 4
22 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Clustering
No ratings yet
Clustering
18 pages
Clustering
No ratings yet
Clustering
42 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K-Means Clustering Algorithm Overview
No ratings yet
K-Means Clustering Algorithm Overview
47 pages
Clustering Analysis: What Is Cluster Analysis?
No ratings yet
Clustering Analysis: What Is Cluster Analysis?
5 pages
Pilot
No ratings yet
Pilot
3 pages
Clustering
No ratings yet
Clustering
17 pages
Determining Clusters
No ratings yet
Determining Clusters
4 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
20 pages
K Mean Cluster Analysis
No ratings yet
K Mean Cluster Analysis
16 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
13 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
ML13 Kmeans
No ratings yet
ML13 Kmeans
11 pages
Unit 4
No ratings yet
Unit 4
63 pages
9unsupervised Learning - K-Means Clustering (ML)
No ratings yet
9unsupervised Learning - K-Means Clustering (ML)
12 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
8 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
K Means Clustering
No ratings yet
K Means Clustering
13 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
10 Sci June 25 em QN
No ratings yet
10 Sci June 25 em QN
2 pages
12th Physics Volume 2 Problems Solutions em 216512
No ratings yet
12th Physics Volume 2 Problems Solutions em 216512
16 pages
Statistical Analysis and Hypothesis Testing
No ratings yet
Statistical Analysis and Hypothesis Testing
60 pages

K-Means Algorithm

Uploaded by

K-Means Algorithm

Uploaded by

You might also like