0% found this document useful (0 votes)
16 views12 pages

K-Means Clustering in Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views12 pages

K-Means Clustering in Machine Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

k-means clustering in ML

2021/2022 ‫الجامع‬
‫ي‬ ‫العام‬
‫[‪]K-MEANS CLUSTERING IN ML‬‬

‫[من مقرر التعلم الذكاء احلسابي]‬

‫[‪]2021/12/24‬‬
‫اسم المقرر‪ :‬الذكاء الحساب‬
‫الفرقة الدراسية‪ :‬الثالثة‬
‫كود المقرر‪AI304 :‬‬

‫‪k-means clustering in ML‬‬

‫دراسة بحثيه مقدمه من الطالب‬

‫اسم الطالب‪ :‬أحمد رضا محمد عبد النعيم‬ ‫اسم الطالب‪ :‬أحمد خالد فوزي منصور‬
‫رقم الجلوس‪3008 :‬‬ ‫رقم الجلوس‪3005 :‬‬

‫اسم الطالب‪ :‬عبدالرحمن قاسم محمد‬ ‫ى‬


‫مصطف كامل عىل‬ ‫اسم الطالب‪:‬‬
‫رقم الجلوس‪3080 :‬‬ ‫رقم الجلوس‪3168 :‬‬

‫اسم الطالب‪ :‬محمود رأشف عامر‬


‫رقم الجلوس‪3153 :‬‬

‫تحت رإشاف‬

‫الدكتور‬
‫سمر البديهى‬

‫‪ 1442‬ه ‪ 2202-‬م‬
Introduction

Machine learning:
It is a sub-field of Computer Science that makes computers able to learn without programming it
Machine learning types:
a) supervised learning:
In this type of learning, we supervise and direct the implementation of tasks on the deity learning
model in order to be able to deal with untrained and new situations and this is done by training
the set of data that we have.

Teaching the model by data set we


have

The model can predict unknown or


future instances

b) unsupervised learning:
In it we let the model discover the information that we want to expect without guidance or
supervision, where we give it the data, then it trains the set of data given to it and then produces
the results.

c) reinforcement learning:
In this type, models are trained to make decisions, here it all comes to making the
appropriate decision, so the model learns through trial and error, so it gets the best
decision that must be taken in a particular situation, in this type the model is not given
a set of data that contains the decision What must be taken, but the model makes the
decision by itself to perform the task given to it, when there is no data set, the model
learns by trial and error.
Major Machine learning techniques:

Machine
Learning
Techniques

Supervised Unsupervised Reinforcement


Learning Learning Learning

Data with label Data without label State and Action

A.Classification A.Clustering A.Model-Free

B.Association
B.Regression B.Model-Based
Analysis

C.Dimensionality
Reduction
k-means clustering
in ML
K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science , which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.
It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The
main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value
of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative
process.
o Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from
other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Python Implementation of K-means Clustering
Algorithm

In the above section, we have discussed the K-means algorithm, now let's see how
it can be implemented using Python.
, we have a dataset of Mall_Customers, which is the data of customers who visit
the mall and spend there.
In the given dataset, we have Customer_Id, Gender, Age, Annual Income ($),
and Spending Score (which is the calculated value of how much a customer has
spent in the mall, the more the value, the more he has spent). From this dataset, we
need to calculate some patterns, as it is an unsupervised method, so we don't know
what to calculate exactly.
The steps to be followed for the implementation are given below:
o Data Pre-processing
o Finding the optimal number of clusters using the elbow method
o Training the K-means algorithm on the training dataset
o Visualizing the clusters

Step-1: Data pre-processing Step


The first step will be the data pre-processing, as we did in our earlier topics of Regression and
Classification. But for the clustering problem, it will be different from other models. Let's discuss it:

o Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our model, which is part of
data pre-processing. The code is given below:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd

5. # Importing the dataset


6. dataset = pd.read_csv('Mall_Customers_data.csv')
By executing the above lines of code, we will get our dataset in the Spyder IDE.
The dataset looks like the below image:

From the above dataset, we need to find some patterns in it.


o Extracting Independent Variables
Here we don't need any dependent variable for data pre-processing step as it is a
clustering problem, and we have no idea about what to determine. So we will just
add a line of code for the matrix of features.
7. x = dataset.iloc[:, [3, 4]].values
clustering problem

Step-2: Finding the optimal number of clusters using the elbow method
9. #finding optimal number of clusters using the elbow method
10. from sklearn.cluster import KMeans
11. wcss_list= [] #Initializing the list for the values of WCSS
12.
13. #Using for loop for iterations from 1 to 10.
14. for i in range(1, 11):
15. kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
16. kmeans.fit(x)
17. wcss_list.append(kmeans.inertia_)
18. mtp.plot(range(1, 11), wcss_list)
19. mtp.title('The Elobw Method Graph')
20. mtp.xlabel('Number of clusters(k)')
21. mtp.ylabel('wcss_list')
22. mtp.show()

Output: After executing the above code, we will get the below output:
From the above plot, we can see the elbow point is at 5. So the number of clusters
here will be 5.

Step- 3: Training the K-means algorithm on the training dataset


23. #training the K-means model on a dataset
24. kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
25. y_predict= kmeans.fit_predict(x)
Step-4: Visualizing the Clusters
The last step is to visualize the clusters. As we have 5 clusters for our model, so we will visualize each cluster
one by one.

To visualize the clusters will use scatter plot using mtp.scatter() function of matplotlib.

26. #visulaizing the clusters


27. mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
#for first cluster
28. mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2
') #for second cluster
29. mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = 'Cluster 3') #
for third cluster
30. mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
#for fourth cluster
31. mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', label = 'Clust
er 5') #for fifth cluster
32. mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow'
, label = 'Centroid')
33. mtp.title('Clusters of customers')
34. mtp.xlabel('Annual Income (k$)')
35. mtp.ylabel('Spending Score (1-100)')
36. mtp.legend()
37. mtp.show()
The Output

Output:

The output image is clearly showing the five different clusters with different colors. The clusters are
formed between two parameters of the dataset; Annual income of customer and Spending. We can
change the colors and labels as per the requirement or choice. We can also observe some points from
the above patterns, which are given below:

o Cluster1 shows the customers with average salary and average spending so we can categorize
these customers as

o Cluster2 shows the customer has a high income but low spending, so we can categorize them
as careful.

o Cluster3 shows the low income and also low spending so they can be categorized as sensible.

o Cluster4 shows the customers with low income with very high spending so they can be
categorized as careless.

o Cluster5 shows the customers with high income and high spending so they can be categorized
as target, and these customers can be the most profitable customers for the mall owner.

You might also like