0% found this document useful (0 votes)

33 views9 pages

Cluster Analysis

Cluster analysis is a statistical method used to group data into distinct clusters based on shared characteristics, primarily for marketing purposes. Key clustering algorithms include agglomerative methods, which create a tree-like structure, and K-means clustering, which partitions data into a predefined number of clusters. Evaluating clusters involves metrics such as cluster diameter, variance, and silhouette scores to assess the similarity and distinctiveness of the groups formed.

Uploaded by

Aneesh Sasidharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views9 pages

Cluster Analysis

Uploaded by

Aneesh Sasidharan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Cluster Analysis

Cluster analysis is a statistical method for organizing data into groups based on their closely
associated characteristics. The goal of cluster analysis is to find distinct groups or "clusters"
within a data set.

Purpose of clustering is to divide the

customer into subgroups allow
marketers to differentiate their
approach by segments in order to
maximize customer value
Clustering Algorithms
Two important classes
Agglomerative Methods

Creates a tree-like structure of clusters. At each step algorithm identifies two clusters
that are closest and merges them.

Partitioning Methods

The goal is to divide the dataset into nonoverlapping groups such that the points within
each group are relatively similar and points within different groups relatively dissimilar
Hierarchical Clustering

• A hierarchical clustering approach is based on the determination of successive clusters

based on previously defined clusters.

• It's a technique aimed more toward grouping data into a tree of clusters called
dendrograms, which graphically represent the hierarchical relationship between the
underlying clusters.

• Hierarchical clustering has a variety of applications in our day-to-day life, including

biology, image processing, marketing, economics, and social network analysis.
K-means Clustering

• K-means clustering is a popular unsupervised machine

learning algorithm for grouping data points into a
predefined number of clusters (K).

• It works by iteratively minimizing the within-cluster

variance, aiming to create clusters with high similarity
within themselves and distinction between each other.

• K-means is a centroid-based algorithm or a distance-based

algorithm
Steps in K-means Clustering
▪ Initialization:

▪ Choose the number of clusters (K).

▪ Randomly pick K data points as the initial centroids (cluster centers/cluster seeds).

▪ Assignment:

▪ Assign each data point to the closest centroid based on distance (usually Euclidean distance).

▪ Centroid Update:

▪ Re-calculate the centroid of each cluster as the average of its assigned data points.

▪ Reassignment & Termination:

▪ Repeat steps 2 and 3:

▪ Re-assign data points based on the new centroids.

▪ Re-calculate centroids based on the newly assigned data points.

▪ Stop when the centroids no longer move significantly (clusters stabilize) or a maximum number of iterations is reached.
Interpreting Clusters

What cluster members have in How each cluster is different

common from other clusters

• Centroid used to define typical member • Key to differentiating segments

(Hypothetical customer who has • Take the average value of each
average value in each of the cluster variable in the cluster and compare it to
dimensions) the average of the same variable in the
entire customer base
Evaluating Clusters
• Cluster Diameter – Maximum distance between any two points within the cluster and
indicates the maximum dissimilarity between members of the same cluster. The lower the
diameter, the more similar the cluster is.

• Cluster Variance – Sum of the squared distance from the centroid of the cluster. Lower
the variance, tighter and similar the cluster.

• Cluster Silhouette – It measures how well a point in a cluster is matched to that cluster,
compared to other clusters. The silhouette score of a clustering solution is the average of
the silhouette scores of all individual customers in the database. Scores can be computed
at various levels: customer base, clusters and individual customers.
Silhouette Score
(𝒃−𝒂)
• Silhouette Score for each customer =
𝐦𝐚𝐱(𝒂,𝒃)
Where
• a = Mean Intra-cluster distance (Average of distance of each customer from all other
customers within the cluster)
• b = Mean Inter-cluster distance (Average of distance of each customer from all customers
in the nearest cluster)

Higher the silhouette score, the more similar the customer is to other customers in the
cluster.

Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Cluster Analysis: Kaushik B
No ratings yet
Cluster Analysis: Kaushik B
41 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Cluster Analysis: Methods and Applications
No ratings yet
Cluster Analysis: Methods and Applications
14 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Clustering Algorithms Guide
No ratings yet
Clustering Algorithms Guide
85 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Clustering
No ratings yet
Clustering
104 pages
Clustering
No ratings yet
Clustering
125 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Cluster Analysis for Researchers
No ratings yet
Cluster Analysis for Researchers
76 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Customer Segmentation Techniques Explained
No ratings yet
Customer Segmentation Techniques Explained
46 pages
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
No ratings yet
Chapter 14 - Cluster Analysis: Data Mining For Business Intelligence
31 pages
Complete Clustering
No ratings yet
Complete Clustering
80 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
10 Clus Basic
No ratings yet
10 Clus Basic
66 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
101 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Unit IV
No ratings yet
Unit IV
96 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 155-202
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 155-202
48 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Data Warehousing PDF 6
No ratings yet
Data Warehousing PDF 6
13 pages
UNIT II-Segmentation, Positioning, and Product Optimization
No ratings yet
UNIT II-Segmentation, Positioning, and Product Optimization
48 pages
10 Clus Basic
No ratings yet
10 Clus Basic
31 pages
Data Mining Chapter 5 Cluster Analysis
No ratings yet
Data Mining Chapter 5 Cluster Analysis
44 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
93 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
76 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Cluster Lecture-1
No ratings yet
Cluster Lecture-1
20 pages
Unit 4-DWDM
No ratings yet
Unit 4-DWDM
23 pages
Chapter 5 CLUSTERING
No ratings yet
Chapter 5 CLUSTERING
36 pages
The Math Behind The K-Means and Hierarchical Clust+
No ratings yet
The Math Behind The K-Means and Hierarchical Clust+
13 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
07 Clustering
No ratings yet
07 Clustering
34 pages
Overview of Clustering Algorithms
No ratings yet
Overview of Clustering Algorithms
83 pages
Clustering
No ratings yet
Clustering
6 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
K Mean Clustering
No ratings yet
K Mean Clustering
59 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
Unit 5
No ratings yet
Unit 5
85 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Understanding Clustering - A Comprehensive Guide To
No ratings yet
Understanding Clustering - A Comprehensive Guide To
5 pages
Clustering
No ratings yet
Clustering
118 pages
Understanding Cluster Analysis Techniques
No ratings yet
Understanding Cluster Analysis Techniques
31 pages
Cluster Analysis Concepts & Algorithms
No ratings yet
Cluster Analysis Concepts & Algorithms
82 pages
09 Clustering
No ratings yet
09 Clustering
21 pages
05 - Class 12th - IP - MAGNET BRAINS SAMPLE PAPER SOLUTION FOR BOARD EXAM 2022-23 (Paper 5) - Barkha Mam
No ratings yet
05 - Class 12th - IP - MAGNET BRAINS SAMPLE PAPER SOLUTION FOR BOARD EXAM 2022-23 (Paper 5) - Barkha Mam
88 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
72 pages
New Karthik Resume
No ratings yet
New Karthik Resume
2 pages
MCKV Institute of Engineering: CO2 Analyze Analyze
No ratings yet
MCKV Institute of Engineering: CO2 Analyze Analyze
2 pages
BigData - Resume
No ratings yet
BigData - Resume
5 pages
SQL Cheatsheet: Icbc Road Test
No ratings yet
SQL Cheatsheet: Icbc Road Test
3 pages
Introduction to Django Framework
No ratings yet
Introduction to Django Framework
42 pages
PerconaXtrabackup-8 0
No ratings yet
PerconaXtrabackup-8 0
177 pages
Cockpit Assessment Checklist
No ratings yet
Cockpit Assessment Checklist
8 pages
10.46632 Daai 2 6 7 1
No ratings yet
10.46632 Daai 2 6 7 1
3 pages
Data Science Course in Chennai & Bangalore
No ratings yet
Data Science Course in Chennai & Bangalore
19 pages
Idera Datasheet SQL Diagnostic Manager For SQL Server
No ratings yet
Idera Datasheet SQL Diagnostic Manager For SQL Server
2 pages
Information Retrieval System MODULE 3 Mumbai University
No ratings yet
Information Retrieval System MODULE 3 Mumbai University
27 pages
DBMS-5 Transaction
No ratings yet
DBMS-5 Transaction
7 pages
WRKQRY Query Definition
No ratings yet
WRKQRY Query Definition
3 pages
Finance Analytics: Reference Data Insights
No ratings yet
Finance Analytics: Reference Data Insights
9 pages
Dynamo: Amazon's Highly Available Key-Value Store
No ratings yet
Dynamo: Amazon's Highly Available Key-Value Store
16 pages
Google Cloud Services Overview
No ratings yet
Google Cloud Services Overview
7 pages
Continuous Integration in Oracle Systems
No ratings yet
Continuous Integration in Oracle Systems
29 pages
Cambridge International AS & A Level: Computer Science 9618/11
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/11
16 pages
Software Engineering 11 12 Higher School Certificate Course Specifications
No ratings yet
Software Engineering 11 12 Higher School Certificate Course Specifications
32 pages
Brill Formulation: Instruction Manual
100% (1)
Brill Formulation: Instruction Manual
288 pages
Python API Assignment - Overview
0% (1)
Python API Assignment - Overview
2 pages
Power BI Certification for Analysts
No ratings yet
Power BI Certification for Analysts
5 pages
Ms Word: IT Application Tools in Business - COMP1
100% (1)
Ms Word: IT Application Tools in Business - COMP1
5 pages
Data Warehouse Multiple Choice Questions and Answers Visit: For More Questions
No ratings yet
Data Warehouse Multiple Choice Questions and Answers Visit: For More Questions
4 pages
Unit 3-1
No ratings yet
Unit 3-1
20 pages
Big Data Security Implementation Guide
No ratings yet
Big Data Security Implementation Guide
9 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
DBMS Question Bank
No ratings yet
DBMS Question Bank
9 pages

Cluster Analysis

Uploaded by

Cluster Analysis

Uploaded by

Cluster Analysis

Purpose of clustering is to divide the

• A hierarchical clustering approach is based on the determination of successive clusters

• Hierarchical clustering has a variety of applications in our day-to-day life, including

• K-means clustering is a popular unsupervised machine

• It works by iteratively minimizing the within-cluster

• K-means is a centroid-based algorithm or a distance-based

▪ Choose the number of clusters (K).

▪ Reassignment & Termination:

▪ Repeat steps 2 and 3:

▪ Re-assign data points based on the new centroids.

▪ Re-calculate centroids based on the newly assigned data points.

What cluster members have in How each cluster is different

• Centroid used to define typical member • Key to differentiating segments

You might also like