K-Means Clustering Overview

K-means clustering is an unsupervised machine learning algorithm that groups unlabeled data points into a specified number (k) of clusters, with each data point belonging to the cluster with the nearest mean. It works by first selecting k random cluster centers, then assigns each data point to its nearest cluster center and recalculates the cluster centers, repeating this process until cluster membership stabilizes. K-medoids is a more robust variant of k-means that uses actual data points as cluster centers rather than means. Self-organizing maps can also be used for clustering to identify groups with similar characteristics for further analysis like churn rate or loyalty.

Uploaded by

farida1971yasmin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views1 page

K-Means Clustering Overview

Uploaded by

farida1971yasmin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Anomaly Detection is the technique of identifying rare events or observations which can raise suspicions by To develop and manage

nd manage a production-ready
being statistically different from the rest of the observations. Such “anomalous” behavior typically translates model, you must work through the
to some kind of a problem like a credit card fraud, failing machine in a server, a cyber-attack etc. following stages:
K-Medoids: K-medoids clustering is a variant of K-means that is more robust to noises and outliers. Instead •Source and prepare your data.
of using the mean point as the center of a cluster, K-medoids uses an actual point in the cluster to represent •Develop your model.
it. Medoid is the most centrally located object of the cluster, with minimum sum of distances to other points. •Train an ML model on your data:
Self-organize map: Lending- Identifying clusters of borrowers for potential default sinre-payments. Customer •Train model
Segmentation- Customers with similar characteristics can be clustered to gather for further analysis like •Evaluate model accuracy
churn rate, loyalty, promotions etc. •Tune hyperparameters
A Gaussian mixture model can be used for clustering, which is the task of grouping a set of data points into •Deploy your trained model.
clusters. GMMs can be used to find clusters in data sets where the clusters may not be clearly defined. •Send prediction requests to your model:
Additionally, GMMs can be used to estimate the probability that a new data point belongs to each cluster. •Online prediction
Hard clustering is method to grouping the data items such that each item is only assigned to one cluster, K- •Batch prediction
Means is one of them. •Monitor the predictions on an ongoing
While Soft clustering is method to grouping the data items such that an item can exist in multiple clusters, basis.
Fuzzy C-Means (FCM) is an example. •Manage your models and model versions.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which
Gradient descent Normal equation
groups the unlabeled dataset into different clusters. Here K defines
the number of pre-defined clusters that need to be created in the Iterative. Closed-form
process, as if K=2, there will be two clusters, and for K=3, there will It may converge gradually It converges straightforwardly
be three clusters, and so on. It is effective and productive for large It is inefficient for large datasets
The algorithm takes the unlabeled dataset as input, divides the datasets
dataset into k-number of clusters, and repeats the process until it It is slower for complex models It is quicker
does not find the best clusters. The value of k should be It must be carefully chosen It is not applicable
predetermined in this algorithm. It is applicable to different models It is restricted to linear regression
The k-means clustering algorithm mainly performs two tasks: It may get stuck in neighbourhood It is steady for most cases
•Determines the best value for K center points or centroids by an optima
iterative process.
It is appropriate for large datasets. It is constrained by matrix inversion
•Assigns each data point to its closest k-center. Those data points
which are near to the particular k-center, create a cluster. for large datasets.
Hence each cluster has datapoints with some commonalities, and it is It underpins regularization methods. It requires alteration for regularization
away from other clusters. It may require include scaling. Inot influenced by highlight scaling

How does the K-Means Algorithm Work? Classification Regression

1: Select the number K to decide the number of clusters.
Classification gives out discrete values. Regression gives continuous values.
2: Select random K points or centroids. (It can be other from the input
Given a group of data, this method It uses the mapping function to map
dataset).
helps group the data into different values to continuous output.
3: Assign each data point to their closest centroid, which will form the
groups.
predefined K clusters.
In classification, the nature of the Regression has ordered predicted data.
4: Calculate the variance and place a new centroid of each cluster.
predicted data is unordered.
5: Repeat the third steps, which means reassign each data point to the
The mapping function is used to map It attempts to find a best fit line. It tries to
new closest centroid of each cluster.
values to pre−defined classes. extrapolate the graph to find/predict the
6: If any reassignment occurs, then go to step-4 else go to FINISH.
values.
7: The model is ready.
Example include Decision tree, logistic Examples include Regression tree
regression. (Random forest), Linear regression
Classification is done by measuring the Regression is done using the root mean
accuracy. square error method.

Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Probability and Statistics Mansoura Day4
No ratings yet
Probability and Statistics Mansoura Day4
23 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unsupervised Learning & Clustering Guide
No ratings yet
Unsupervised Learning & Clustering Guide
49 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
97 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
12 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Unit 2 R Programming
No ratings yet
Unit 2 R Programming
15 pages
Key AI Algorithms and Metrics
No ratings yet
Key AI Algorithms and Metrics
19 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
K Means Final
No ratings yet
K Means Final
10 pages
Determining Clusters in K-Means
No ratings yet
Determining Clusters in K-Means
21 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Clustering Techniques for Analysts
No ratings yet
Clustering Techniques for Analysts
7 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Unit 4
No ratings yet
Unit 4
125 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
Unit 4 Introduction To Algorithm
No ratings yet
Unit 4 Introduction To Algorithm
10 pages
K Means
No ratings yet
K Means
9 pages
Data Mining For BI - Part 5
No ratings yet
Data Mining For BI - Part 5
34 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Unit 4
No ratings yet
Unit 4
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
CV Unit IV
No ratings yet
CV Unit IV
26 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ml2 1
No ratings yet
ml2 1
7 pages
ML - Unit - 2
No ratings yet
ML - Unit - 2
13 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
6 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
Data Mining: Techniques and Methods
No ratings yet
Data Mining: Techniques and Methods
20 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Key Concepts in Clustering and EM Algorithm
No ratings yet
Key Concepts in Clustering and EM Algorithm
18 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages

K-Means Clustering Overview

Uploaded by

K-Means Clustering Overview

Uploaded by

Anomaly Detection is the technique of identifying rare events or observations which can raise suspicions by To develop and manage

What is K-Means Algorithm?

How does the K-Means Algorithm Work? Classification Regression

You might also like