100% found this document useful (1 vote)

155 views10 pages

K Means Clustering

This document analyzes customer data using k-means clustering. It loads customer data, cleans and prepares the data, runs k-means clustering for k values from 1 to 9, and analyzes the results. It finds that the sum of squared distances decreases most significantly (around 30%) when going from 1 to 2 clusters and again from 2 to 3 clusters, indicating those are the optimal numbers of clusters for the data.

Uploaded by

Walid Sassi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

155 views10 pages

K Means Clustering

Uploaded by

Walid Sassi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [1]:

1 import pandas as pd

In [5]:

1 ml = pd.read_csv("mall_kmeans.csv")

In [6]:

1 ml.head()

Out[6]:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39

1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [8]:

1 ml.isnull().sum()

Out[8]:

CustomerID 0
Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64

In [9]:

1 ml.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 200 non-null int64
1 Genre 200 non-null object
2 Age 200 non-null int64
3 Annual Income (k$) 200 non-null int64
4 Spending Score (1-100) 200 non-null int64
dtypes: int64(4), object(1)
memory usage: 7.9+ KB

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 1/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [10]:

1 ml.Genre.value_counts()

Out[10]:

Female 112
Male 88
Name: Genre, dtype: int64

In [11]:

1 ml.Genre.replace({'Female':0,'Male':1},inplace=True)

In [14]:

1 ml.select_dtypes(include='object').columns

Out[14]:

Index([], dtype='object')

In [15]:

1 from sklearn.cluster import KMeans

In [111]:

1 kmeans_ml = KMeans(n_clusters=5)

In [112]:

1 kmeans_ml.fit(ml)

Out[112]:

KMeans(n_clusters=5)

In [113]:

1 kmeans_ml.labels_

Out[113]:

array([2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4,
2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4,
2, 4, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 3, 0, 3, 1, 3, 1, 3,
1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3,
1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3,
1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3,
1, 3])

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 2/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [114]:

1 set(kmeans_ml.labels_)

Out[114]:

{0, 1, 2, 3, 4}

In [115]:

1 kmeans_ml.cluster_centers_

Out[115]:

array([[ 92.53030303, 0.42424242, 42.72727273, 57.75757576,

49.46969697],
[164. , 0.52777778, 40.80555556, 87.91666667,
17.88888889],
[ 33.34285714, 0.37142857, 45.31428571, 31.8 ,
30.31428571],
[162. , 0.46153846, 32.69230769, 86.53846154,
82.12820513],
[ 25.16666667, 0.41666667, 25.83333333, 26.95833333,
77.79166667]])

In [116]:

1 len(kmeans_ml.cluster_centers_)

Out[116]:

In [117]:

1 centroid_df = pd.DataFrame(kmeans_ml.cluster_centers_)

In [118]:

1 centroid_df.columns = ml.columns

In [119]:

1 centroid_df

Out[119]:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 92.530303 0.424242 42.727273 57.757576 49.469697

1 164.000000 0.527778 40.805556 87.916667 17.888889

2 33.342857 0.371429 45.314286 31.800000 30.314286

3 162.000000 0.461538 32.692308 86.538462 82.128205

4 25.166667 0.416667 25.833333 26.958333 77.791667

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 3/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [120]:

1 kmeans_ml.score(ml)

Out[120]:

-157141.33959373957

In [94]:

1 lst = []
2 for k in range(1,10):
3 kmeans_ml = KMeans(n_clusters=k)
4 kmeans_ml.fit(ml)
5 score = kmeans_ml.score(ml)
6 lst.append(score)
7 print("cluster over are",k, "cluster left are",len(range(1,10))-k)
8 print("____________________")

C:\Users\MR.GODHADE\anaconda3\lib\site-packages\sklearn\cluster\_kmeans.p
y:1036: UserWarning: KMeans is known to have a memory leak on Windows with
MKL, when there are less chunks than available threads. You can avoid it b
y setting the environment variable OMP_NUM_THREADS=1.
warnings.warn(

cluster over are 1 cluster left are 8

____________________
cluster over are 2 cluster left are 7
____________________
cluster over are 3 cluster left are 6
____________________
cluster over are 4 cluster left are 5
____________________
cluster over are 5 cluster left are 4
____________________
cluster over are 6 cluster left are 3
____________________
cluster over are 7 cluster left are 2
____________________
cluster over are 8 cluster left are 1
____________________
cluster over are 9 cluster left are 0
____________________

In [121]:

1 import numpy as np

In [122]:

1 lst = np.round(np.abs(lst))

In [123]:

1 cluster_num = list(range(1,10))

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 4/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [124]:

1 import matplotlib.pyplot as plt

In [125]:

1 plt.plot(cluster_num,lst, marker ="*")

2 plt.grid()

In [126]:

1 lst

Out[126]:

array([975512., 387066., 271385., 195401., 157621., 122608., 103233.,

86004., 77299.])

In [127]:

1 (975512 - 387066)*100/975512 #60% drop in ssd when k change from 1 to 2

2 (387066 - 271397)*100/387066 #29% drop in ssd when k change from 1 to 2
3 (271397 - 195401)*100/271397 #28% drop in ssd when k change from 1 to 2
4 (195401 - 157506)*100/195401 #19% drop in ssd when k change from 1 to 2
5 (157506 - 122630)*100/195401 #17% drop in ssd when k change from 1 to 2

Out[127]:

17.848424521880645

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 5/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [128]:

1 (387066 - 271397)*100/387066

Out[128]:

29.88353407429224

In [129]:

1 (271397 - 195401)*100/271397

Out[129]:

28.001783365328283

In [130]:

1 (195401 - 157506)*100/195401

Out[130]:

19.393452438830916

In [131]:

1 colormap = np.array(['Red','Green','Blue','Yellow','Black'])

In [140]:

1 kmeans_ml.labels_

Out[140]:

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 6/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [139]:

1 colormap[kmeans_ml.labels_]

Out[139]:

array(['Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',

'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',
'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',
'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',
'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',
'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Blue',
'Blue', 'Blue', 'Blue', 'Black', 'Blue', 'Blue', 'Blue', 'Blue',
'Blue', 'Blue', 'Red', 'Blue', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red', 'Red',
'Red', 'Red', 'Red', 'Red', 'Yellow', 'Red', 'Yellow', 'Red',
'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow',
'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green',
'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow',
'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green',
'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow',
'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green',
'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow',
'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green',
'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow',
'Green', 'Yellow', 'Green', 'Yellow', 'Green', 'Yellow', 'Green',
'Yellow', 'Green', 'Yellow'], dtype='<U6')

In [133]:

1 plt.scatter(ml['Age'],ml['Annual Income (k$)'], c = colormap[kmeans_ml.labels_])

Out[133]:

<matplotlib.collections.PathCollection at 0x1c7ec6cefa0>

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 7/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [134]:

1 ml

Out[134]:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 1 19 15 39

1 2 1 21 15 81

2 3 0 20 16 6

3 4 0 23 16 77

4 5 0 31 17 40

... ... ... ... ... ...

195 196 0 35 120 79

196 197 0 45 126 28

197 198 1 32 126 74

198 199 1 32 137 18

199 200 1 30 137 83

200 rows × 5 columns

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 8/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [136]:

1 plt.scatter(ml['Age'],ml['Spending Score (1-100)'], c = colormap[kmeans_ml.labels_])

2 plt.xlabel('Age')
3 plt.ylabel('Spending Score')

Out[136]:

Text(0, 0.5, 'Spending Score')

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 9/10
9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

In [137]:

1 plt.scatter(ml['Annual Income (k$)'],ml['Spending Score (1-100)'], c = colormap[kmea

2 plt.xlabel('Annual Income (k$)')
3 plt.ylabel('Spending Score')

Out[137]:

Text(0, 0.5, 'Spending Score')

In [ ]:

localhost:8888/notebooks/Desktop/ML/Mall_kmeans/Mall_kmean.ipynb 10/10

Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Program 8
No ratings yet
Program 8
11 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
No ratings yet
Tugas Clustering - 132021012 - Kevin Gazkia Naufal
6 pages
Practical-8: Import As Import As Import As Import Import As
No ratings yet
Practical-8: Import As Import As Import As Import Import As
9 pages
K Means
No ratings yet
K Means
5 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Customer Segmentation Using K-Means Clustering
No ratings yet
Customer Segmentation Using K-Means Clustering
11 pages
Customer Clustering Analysis
No ratings yet
Customer Clustering Analysis
22 pages
Project 13 Customer Segmentation Using K Means Clustering
No ratings yet
Project 13 Customer Segmentation Using K Means Clustering
9 pages
K Means Clustering For Customer Data
No ratings yet
K Means Clustering For Customer Data
6 pages
K-Means Clustering with Elbow Method
No ratings yet
K-Means Clustering with Elbow Method
2 pages
K-Means Clustering - Jupyter Notebook
No ratings yet
K-Means Clustering - Jupyter Notebook
11 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
KMEANS
No ratings yet
KMEANS
13 pages
K-Means for Customer Segmentation
No ratings yet
K-Means for Customer Segmentation
13 pages
KMeans Memory Leak in Windows MKL
No ratings yet
KMeans Memory Leak in Windows MKL
1 page
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
Practical 5
No ratings yet
Practical 5
6 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Intro to ML with Sklearn & Python
No ratings yet
Intro to ML with Sklearn & Python
10 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
K Means
No ratings yet
K Means
15 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Program 2 Hierarchical Cluestring
No ratings yet
Program 2 Hierarchical Cluestring
5 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
ML Solution
No ratings yet
ML Solution
60 pages
21mic0107 1
No ratings yet
21mic0107 1
7 pages
ML Lab
No ratings yet
ML Lab
8 pages
Data Mining Practicals Complete
No ratings yet
Data Mining Practicals Complete
13 pages
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
100% (5)
Jupyter Notebook Project DM Nikita Chaturvedi 25.07.2021
83 pages
KMeans Clustering for Universities
No ratings yet
KMeans Clustering for Universities
9 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Experiment 9
No ratings yet
Experiment 9
10 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Kmeansclustering Sales Dataset
No ratings yet
Kmeansclustering Sales Dataset
6 pages
Day59 K Means Clustering 1701989733
No ratings yet
Day59 K Means Clustering 1701989733
5 pages
Clothes Size Prediction With KNN
No ratings yet
Clothes Size Prediction With KNN
11 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Intro to Pandas for Data Science
No ratings yet
Intro to Pandas for Data Science
6 pages
K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
12 pages
K-Means Clustering of Mall Customers
No ratings yet
K-Means Clustering of Mall Customers
11 pages
Feature Engineering: Scaling Techniques
No ratings yet
Feature Engineering: Scaling Techniques
13 pages
Mall Customer Segmentation Guide
No ratings yet
Mall Customer Segmentation Guide
8 pages
Sales Data Clustering
No ratings yet
Sales Data Clustering
15 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Exp 81
No ratings yet
Exp 81
7 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
Ads Exp5 Code
No ratings yet
Ads Exp5 Code
2 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Clustering Algorithms for Data Analysis
No ratings yet
Clustering Algorithms for Data Analysis
7 pages
Btech1010622 Lab4
No ratings yet
Btech1010622 Lab4
4 pages
Even Students
No ratings yet
Even Students
36 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Section 1: Introduction To Software Lifecycle
No ratings yet
Section 1: Introduction To Software Lifecycle
44 pages
Scientific Python Guide 2024
100% (3)
Scientific Python Guide 2024
687 pages
Numpy Day7
No ratings yet
Numpy Day7
12 pages
Intro Gen AI 6p
100% (1)
Intro Gen AI 6p
6 pages
Chacha: Leveraging Large Language Models To Prompt Children To Share Their Emotions About Personal Events
No ratings yet
Chacha: Leveraging Large Language Models To Prompt Children To Share Their Emotions About Personal Events
20 pages
G10-Week1-T2-2024-2025 (Electricity and Electronics)
No ratings yet
G10-Week1-T2-2024-2025 (Electricity and Electronics)
34 pages
G10 Week4 T2 2024 2025
No ratings yet
G10 Week4 T2 2024 2025
35 pages
Stream Project: Creating An Autonomous Vehicle Control Circuit
No ratings yet
Stream Project: Creating An Autonomous Vehicle Control Circuit
18 pages
Pca Handwritten
No ratings yet
Pca Handwritten
13 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Pca Implementation Notebook
No ratings yet
Pca Implementation Notebook
4 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
32 pages
Data Analysis Process
No ratings yet
Data Analysis Process
95 pages
Python Interview Prep Guide
No ratings yet
Python Interview Prep Guide
38 pages
Career With AI - Himanshu Ramchandani
No ratings yet
Career With AI - Himanshu Ramchandani
19 pages
Categorical Data in Python Guide
No ratings yet
Categorical Data in Python Guide
33 pages
Logic Gates: Symbol For AND Gate
No ratings yet
Logic Gates: Symbol For AND Gate
3 pages
First: Lego League UK and Ireland Operational Partner
No ratings yet
First: Lego League UK and Ireland Operational Partner
12 pages
Working With Categorical Data Chapter3
No ratings yet
Working With Categorical Data Chapter3
33 pages
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
No ratings yet
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
28 pages
All Saints Upton Primary Computing Curriculum
No ratings yet
All Saints Upton Primary Computing Curriculum
29 pages
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
No ratings yet
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
30 pages
Intro HTML Css Preso 2
No ratings yet
Intro HTML Css Preso 2
8 pages
Writing For The Web
No ratings yet
Writing For The Web
10 pages
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
No ratings yet
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
24 pages
Web Authoring for Beginners
No ratings yet
Web Authoring for Beginners
1 page
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
No ratings yet
Computing Scheme of Work and Planning: All Saints Upton Primary School Computing Curriculum
24 pages
Lesson 1 Week 18 Do Now
No ratings yet
Lesson 1 Week 18 Do Now
1 page
Do Now Lesson 2
No ratings yet
Do Now Lesson 2
1 page
HResume PDF
No ratings yet
HResume PDF
2 pages
C++ Notes
No ratings yet
C++ Notes
30 pages
AspenPolymersExsAppsV8 4-Ref
No ratings yet
AspenPolymersExsAppsV8 4-Ref
268 pages
Fortinetwork
No ratings yet
Fortinetwork
4 pages
DogSM State Machine Pseudocode
No ratings yet
DogSM State Machine Pseudocode
3 pages
Manual Studio 5000 Safety
No ratings yet
Manual Studio 5000 Safety
540 pages
Assignment Group and Problem Set List
No ratings yet
Assignment Group and Problem Set List
23 pages
Scanscore 2 Manual
No ratings yet
Scanscore 2 Manual
26 pages
Test Monitoring and Control Overview
No ratings yet
Test Monitoring and Control Overview
20 pages
Lean Portfolio Management EN
No ratings yet
Lean Portfolio Management EN
22 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
Waller Man
No ratings yet
Waller Man
43 pages
Engineering Heist Strategy
No ratings yet
Engineering Heist Strategy
7 pages
Mindray DP6600 Operation Manual Advanced PDF
100% (1)
Mindray DP6600 Operation Manual Advanced PDF
113 pages
Cisco-Tandberg Deal Passes First Hurdle: News and Views On Real-Time Unified Communications
No ratings yet
Cisco-Tandberg Deal Passes First Hurdle: News and Views On Real-Time Unified Communications
5 pages
Python UNIT3 Notes-1
No ratings yet
Python UNIT3 Notes-1
36 pages
Azure Database For MySQL E-Book
No ratings yet
Azure Database For MySQL E-Book
16 pages
English User Manual
No ratings yet
English User Manual
18 pages
BAdi2 For MM02
100% (1)
BAdi2 For MM02
14 pages
AutoForm-Trim Optimization Guide
100% (1)
AutoForm-Trim Optimization Guide
20 pages
MCQ Class Work Sol (No Password)
No ratings yet
MCQ Class Work Sol (No Password)
8 pages
Arduino Bluetooth Home Automation System
No ratings yet
Arduino Bluetooth Home Automation System
16 pages
4dtp For PP Lab Manual
No ratings yet
4dtp For PP Lab Manual
33 pages
SIC Assembler Language Program Example
100% (1)
SIC Assembler Language Program Example
66 pages
Create Persistent Knoppix Settings
No ratings yet
Create Persistent Knoppix Settings
4 pages
Getting To Know Road To IELTS: Teacher Support
No ratings yet
Getting To Know Road To IELTS: Teacher Support
2 pages
Square and Square Roots Worksheet
No ratings yet
Square and Square Roots Worksheet
7 pages
Network System Management - Implementation and Applications of The IEC 62351
100% (3)
Network System Management - Implementation and Applications of The IEC 62351
86 pages
Database System Concepts and Architecture: Basic Client/server DBMS Architecture
No ratings yet
Database System Concepts and Architecture: Basic Client/server DBMS Architecture
44 pages
2025 MAR MOCK - Computing 2
No ratings yet
2025 MAR MOCK - Computing 2
6 pages

K Means Clustering

Uploaded by

K Means Clustering

Uploaded by

9/1/23, 2:11 PM Mall_kmean - Jupyter Notebook

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

1 from sklearn.cluster import KMeans

array([[ 92.53030303, 0.42424242, 42.72727273, 57.75757576,

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 92.530303 0.424242 42.727273 57.757576 49.469697

1 164.000000 0.527778 40.805556 87.916667 17.888889

2 33.342857 0.371429 45.314286 31.800000 30.314286

3 162.000000 0.461538 32.692308 86.538462 82.128205

4 25.166667 0.416667 25.833333 26.958333 77.791667

cluster over are 1 cluster left are 8

1 import matplotlib.pyplot as plt

1 plt.plot(cluster_num,lst, marker ="*")

array([975512., 387066., 271385., 195401., 157621., 122608., 103233.,

1 (975512 - 387066)*100/975512 #60% drop in ssd when k change from 1 to 2

array(['Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black', 'Blue', 'Black',

1 plt.scatter(ml['Age'],ml['Annual Income (k$)'], c = colormap[kmeans_ml.labels_])

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

... ... ... ... ... ...

195 196 0 35 120 79

196 197 0 45 126 28

197 198 1 32 126 74

198 199 1 32 137 18

199 200 1 30 137 83

200 rows × 5 columns

1 plt.scatter(ml['Age'],ml['Spending Score (1-100)'], c = colormap[kmeans_ml.labels_])

Text(0, 0.5, 'Spending Score')

1 plt.scatter(ml['Annual Income (k$)'],ml['Spending Score (1-100)'], c = colormap[kmea

Text(0, 0.5, 'Spending Score')

You might also like