0% found this document useful (0 votes)
38 views6 pages

Customer Segmentation Using Clustering

The document explains customer segmentation using K-Means clustering, which groups customers based on purchasing behavior. It details the steps involved, including data standardization, choosing the number of clusters using the Elbow Method, and running K-Means to identify customer profiles. Additionally, it provides runnable code for generating synthetic data, visualizing distributions, and plotting clusters.

Uploaded by

Sakthi Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views6 pages

Customer Segmentation Using Clustering

The document explains customer segmentation using K-Means clustering, which groups customers based on purchasing behavior. It details the steps involved, including data standardization, choosing the number of clusters using the Elbow Method, and running K-Means to identify customer profiles. Additionally, it provides runnable code for generating synthetic data, visualizing distributions, and plotting clusters.

Uploaded by

Sakthi Priya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Customer Segmentation using

Clustering (K-Means)
Customer segmentation means grouping customers into different clusters
based on their purchasing behavior or attributes. This helps businesses tailor
marketing strategies or services to each group.
Clustering is an unsupervised machine learning technique that automatically
finds natural groupings in data without pre-labeled categories.

Why K-Means Clustering?


 It partitions data points into K clusters where each point belongs to the
cluster with the nearest mean.
 It’s simple, efficient, and widely used in marketing segmentation.

Step-by-step Explanation
1. Dataset
We consider two features for each customer:
o Annual Income (in thousands)
o Spending Score (a score from 1 to 100 that indicates how much
the customer spends)
2. Data Standardization
Since these features have different scales, we standardize them to have
zero mean and unit variance. This prevents bias where features with
larger scales dominate the clustering.
3. Choosing Number of Clusters (K)
We use the Elbow Method:
o Run K-Means for a range of K (say 1 to 10).
o Calculate the sum of squared distances (WCSS) between points
and their cluster centers.
o Plot WCSS vs K and look for the "elbow" where adding more
clusters does not reduce WCSS significantly.
o This “elbow” point indicates a good trade-off between model
complexity and explained variance.
4. Run K-Means
Using the chosen K, cluster the data points.
5. Interpretation
Each cluster represents a group of customers with similar income and
spending patterns, helping businesses understand customer profiles like:
o High income, high spending
o Low income, low spending
o Medium income, high spending, etc.

Full runnable code with output plot


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# 1. Create sample data


np.random.seed(42)
data = {
'CustomerID': range(1, 201),
'Annual Income (k$)': np.random.randint(15, 150, 200),
'Spending Score (1-100)': np.random.randint(1, 100, 200)
}
df = pd.DataFrame(data)

# 2. Visualize data distribution


plt.figure(figsize=(8,5))
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)', data=df)
plt.title('Customer Data Distribution')
plt.show()

# 3. Scale features
features = df[['Annual Income (k$)', 'Spending Score (1-100)']]
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

# 4. Elbow method to find optimal K


wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(scaled_features)
wcss.append(kmeans.inertia_)

plt.figure(figsize=(8,5))
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method For Optimal K')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

# 5. From the elbow plot, let's choose K=5


kmeans = KMeans(n_clusters=5, random_state=42)
df['Cluster'] = kmeans.fit_predict(scaled_features)

# 6. Visualize clusters
plt.figure(figsize=(8,5))
sns.scatterplot(x='Annual Income (k$)', y='Spending Score (1-100)',
hue='Cluster', palette='Set1', data=df)
plt.title('Customer Segments (K=5)')
plt.show()

# 7. Print cluster centers in original scale (optional)


centers = scaler.inverse_transform(kmeans.cluster_centers_)
print("Cluster centers (Annual Income, Spending Score):")
print(centers)

 We created synthetic customer data.


 Standardized features.
 Used the Elbow method to find optimal clusters.
 Applied K-Means clustering.
 Visualized customer segments.

You might also like