0% found this document useful (0 votes)

12 views14 pages

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views14 pages

Partitioning Algorithms

Uploaded by

Pradeep ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining

(19ADOCN1001)
Mr.M.VijayaKumar, AP/AI&DS

19ADCN1303 - Data Mining 1

Course Outcomes

CO4: Classify
data for the given dataset using real world
applications.

19ADCN1303 - Data Mining 2

UNIT IV – Classification and Clustering
Classification: Basic Concepts - Decision Tree
Induction – Bayes Classification Methods – Rule
Based Classification – K-Nearest-Neighbor
Classifier - Model Evaluation and Selection –
Techniques to Improve Classification Accuracy.
Cluster Analysis: Basic Concepts and Methods-
Cluster Analysis - Partitioning Methods -
Hierarchical Methods - Density-Based Methods -
Grid-Based Methods.

19ADCN1303 - Data Mining 3

Partitioning Algorithms: Basic Concept
• Partitioning method: Partitioning a database D of n objects into a set of k
clusters, such that the sum of squared distances is minimized (where ci is
the centroid or medoid of cluster Ci)

E  ik1 pCi ( p  ci ) 2

• Given k, find a partition of k clusters that optimizes the chosen

partitioning criterion
• Global optimal: exhaustively enumerate all partitions
• Heuristic methods: k-means and k-medoids algorithms
• k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented
by the center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in
the cluster
Data Mining 4
The K-Means Clustering Method
• Given k, the k-means algorithm is implemented in four
steps:
• Partition objects into k nonempty subsets
• Compute seed points as the centroids of the clusters
of the current partitioning (the centroid is the center,
i.e., mean point, of the cluster)
• Assign each object to the cluster with the nearest
seed point
• Go back to Step 2, stop when the assignment does not
change

19ADCN1303 - Data Mining 5

An Example of K-Means Clustering

19ADCN1303 - Data Mining 6

Comments on the K-Means Method
• Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
• Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
• Comment: Often terminates at a local optimal.
• Weakness
• Applicable only to objects in a continuous n-dimensional space
• Using the k-modes method for categorical data
• In comparison, k-medoids can be applied to a wide range of
data
• Need to specify k, the number of clusters, in advance (there are
ways to automatically determine the best k (see Hastie et al.,
2009)
• Sensitive to noisy data and outliers
• Not suitable to discover19ADCN1303
clusters- Data
with non-convex shapes 7
Mining
Variations of the K-Means Method
• Most of the variants of the k-means which differ in

• Selection of the initial k means

• Dissimilarity calculations

• Strategies to calculate cluster means

• Handling categorical data: k-modes

• Replacing means of clusters with modes

• Using new dissimilarity measures to deal with categorical

objects
• Using a frequency-based method to update modes of clusters

• A mixture of categorical and numerical data: k-prototype method

19ADCN1303 - Data Mining 8

What Is the Problem of the K-Means
Method?
• The k-means algorithm is sensitive to outliers !

• Since an object with an extremely large value may substantially

distort the distribution of the data

• K-Medoids: Instead of taking the mean value of the object in a cluster as

a reference point, medoids can be used, which is the most centrally
located object in a cluster

19ADCN1303 - Data Mining 9

PAM: A Typical K-Medoids Algorithm

19ADCN1303 - Data Mining 10

The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one

of the medoids by one of the non-medoids if it improves the total
distance of the resulting clustering
• PAM works effectively for small data sets, but does not scale well
for large data sets (due to the computational complexity)

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling

19ADCN1303 - Data Mining 11

Summary
• Partitioning Methods

19ADCN1303 - Data Mining 12

Reference
1. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining:
Concepts and Techniques”, 3rd Edition, Elsevier, 2012.

19ADCN1303 - Data Mining 13

Thank you

19ADCN1303 - Data Mining 14

Partitioning-Based Clustering Methods
No ratings yet
Partitioning-Based Clustering Methods
27 pages
Clustering
No ratings yet
Clustering
18 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
36 pages
Clustering Techniques Guide
100% (1)
Clustering Techniques Guide
33 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
33 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Clustering
No ratings yet
Clustering
9 pages
Clustering
No ratings yet
Clustering
23 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
Data Mining-Partitioning Methods
100% (1)
Data Mining-Partitioning Methods
7 pages
Partitioning-Based Clustering Overview
No ratings yet
Partitioning-Based Clustering Overview
27 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
Clustering
No ratings yet
Clustering
29 pages
Overview of Partitioning Clustering Methods
No ratings yet
Overview of Partitioning Clustering Methods
26 pages
Understanding Clustering in Data Mining
No ratings yet
Understanding Clustering in Data Mining
48 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
Clustering
No ratings yet
Clustering
25 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
76 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
20 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Cluster Analysis and Methods Overview
No ratings yet
Cluster Analysis and Methods Overview
47 pages
Chapter 6
No ratings yet
Chapter 6
12 pages
Clustering Methods
No ratings yet
Clustering Methods
64 pages
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
24 pages
Session 34 - 35clustering
No ratings yet
Session 34 - 35clustering
50 pages
(3rd Year) Pattern REcognition Lecture 4
No ratings yet
(3rd Year) Pattern REcognition Lecture 4
48 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Neural Network Clustering Guide
No ratings yet
Neural Network Clustering Guide
168 pages
Cluster Analysis for CS Students
No ratings yet
Cluster Analysis for CS Students
43 pages
Pam Clustering Technique
No ratings yet
Pam Clustering Technique
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Cluster Analysis Techniques Overview
No ratings yet
Cluster Analysis Techniques Overview
101 pages
Cluster Analysis Essentials
No ratings yet
Cluster Analysis Essentials
50 pages
Cluster Analysis for Researchers
No ratings yet
Cluster Analysis for Researchers
76 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
2.10 Partitioning Methods - K-Means and K-Medoids
No ratings yet
2.10 Partitioning Methods - K-Means and K-Medoids
38 pages
Chap 19 - CLustering
No ratings yet
Chap 19 - CLustering
18 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Cluster
No ratings yet
Cluster
20 pages
Agglomerative Clustering Steps Explained
No ratings yet
Agglomerative Clustering Steps Explained
80 pages
Clustering Deep Dive
No ratings yet
Clustering Deep Dive
8 pages
Ijret 110306027
No ratings yet
Ijret 110306027
4 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Cluster Analysis and K-Means Guide
No ratings yet
Cluster Analysis and K-Means Guide
20 pages
Lec.3.D. M. Spring 2025
No ratings yet
Lec.3.D. M. Spring 2025
21 pages
DBMS Lab: Car Database Assignment
No ratings yet
DBMS Lab: Car Database Assignment
2 pages
MongoDB JSON Schema Guide
No ratings yet
MongoDB JSON Schema Guide
8 pages
Oracle RMAN Backup and Recovery Guide
No ratings yet
Oracle RMAN Backup and Recovery Guide
24 pages
Proposed Framework For Automatic Grading System of ER Diagram
No ratings yet
Proposed Framework For Automatic Grading System of ER Diagram
6 pages
Milestone Project 2 Brief
No ratings yet
Milestone Project 2 Brief
3 pages
SQL Database Operations and Queries
No ratings yet
SQL Database Operations and Queries
71 pages
ADB Chapter6
No ratings yet
ADB Chapter6
26 pages
Understanding Relational Integrity Rules
No ratings yet
Understanding Relational Integrity Rules
81 pages
Installing OpenDJ and OpenAM Server On Linux (Debian, Ubuntu, Red Hat, CentOS)
No ratings yet
Installing OpenDJ and OpenAM Server On Linux (Debian, Ubuntu, Red Hat, CentOS)
6 pages
Tutorial Questions
No ratings yet
Tutorial Questions
6 pages
Understanding Database Normalization Forms
No ratings yet
Understanding Database Normalization Forms
27 pages
Relational Database Design Overview
No ratings yet
Relational Database Design Overview
41 pages
Databases and Database Management Systems: Understanding Computers: Today and Tomorrow, 13th Edition
No ratings yet
Databases and Database Management Systems: Understanding Computers: Today and Tomorrow, 13th Edition
43 pages
Change Documents in SAP ECC, S4HANA and BTP
No ratings yet
Change Documents in SAP ECC, S4HANA and BTP
4 pages
Introduction to Hadoop Ecosystem Basics
No ratings yet
Introduction to Hadoop Ecosystem Basics
23 pages
Amazon RDS
No ratings yet
Amazon RDS
7 pages
App C
No ratings yet
App C
15 pages
LAB Sheet 5
No ratings yet
LAB Sheet 5
13 pages
Lab Mannual DBMS
No ratings yet
Lab Mannual DBMS
128 pages
iFIX Error Messages
No ratings yet
iFIX Error Messages
45 pages
Cics Combined
No ratings yet
Cics Combined
26 pages
Ab Initio - Study Material - Part 1
No ratings yet
Ab Initio - Study Material - Part 1
39 pages
Data Resource Management Overview
No ratings yet
Data Resource Management Overview
95 pages
DBMS Project
No ratings yet
DBMS Project
52 pages
CICS ClassBook Lesson12
No ratings yet
CICS ClassBook Lesson12
33 pages
SQL Commands Syntax Guide
No ratings yet
SQL Commands Syntax Guide
3 pages
Understanding SQL Injection Types and Prevention
No ratings yet
Understanding SQL Injection Types and Prevention
26 pages
DWDM Unit-I
No ratings yet
DWDM Unit-I
25 pages
Computer Science Practical File
No ratings yet
Computer Science Practical File
43 pages

Partitioning Algorithms

Uploaded by

Partitioning Algorithms

Uploaded by

Data Mining

19ADCN1303 - Data Mining 1

19ADCN1303 - Data Mining 2

19ADCN1303 - Data Mining 3

• Given k, find a partition of k clusters that optimizes the chosen

19ADCN1303 - Data Mining 5

19ADCN1303 - Data Mining 6

• Selection of the initial k means

• Strategies to calculate cluster means

• Handling categorical data: k-modes

• Replacing means of clusters with modes

• Using new dissimilarity measures to deal with categorical

• A mixture of categorical and numerical data: k-prototype method

19ADCN1303 - Data Mining 8

• Since an object with an extremely large value may substantially

• K-Medoids: Instead of taking the mean value of the object in a cluster as

19ADCN1303 - Data Mining 9

19ADCN1303 - Data Mining 10

• PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one

• Efficiency improvement on PAM

• CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

• CLARANS (Ng & Han, 1994): Randomized re-sampling

19ADCN1303 - Data Mining 11

19ADCN1303 - Data Mining 12

19ADCN1303 - Data Mining 13

19ADCN1303 - Data Mining 14

You might also like