0% found this document useful (0 votes)

62 views16 pages

Clustering

Machine learning

Uploaded by

aishwary srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views16 pages

Clustering

Machine learning

Uploaded by

aishwary srivastav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INSTITUTE OF INFORMATION TECHNOLOGY & MANAGEMENT

Accredited ‘A’ Grade by NAAC &Recognised U/s 2(f) of UGC act

Rated Category `A+’ by SFRC & `A’ by JAC Govt. of NCT of Delhi
Approved by AICTE & Affiliated to GGS Indraprastha University, New Delhi

Machine Learning with Python

Programme : BCA
Semester : V
Subject Code : BCAT311
Subject : Machine Learning with Python
Topic : Clustering
Faculty : Ms. Shilpi Bansal
© Institute of Information Technology and Management, D-29, Institutional Area, Janakpuri, New Delhi-110058
List of Topics

 Introduction to clustering
 K-mean clustering
 Hierarchical clustering

© Institute of Information Technology and Management, D-29,

Institutional Area, Janakpuri, New Delhi-110058
Examples of Clustering Applications
 Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing
programs
 Land use: Identification of areas of similar land use in an earth
observation database
 Insurance: Identifying groups of motor insurance policy holders with a
high average claim cost
 Urban planning: Identifying groups of houses according to their house
type, value, and geographical location
 Seismology: Observed earth quake epicenters should be clustered along
continent faults
4
What Is a Good Clustering?

 A good clustering method will produce clusters with

 High intra-class similarity
 Low inter-class similarity

 Precise definition of clustering quality is difficult

 Application-dependent
 Ultimately subjective

5
Requirements for Clustering in Data
Mining
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal domain knowledge required to determine input
parameters
 Ability to deal with noise and outliers
 Insensitivity to order of input records
 Robustness wrt high dimensionality
 Incorporation of user-specified constraints
 Interpretability and usability
6
Similarity and Dissimilarity Between
Objects

 Same we used for IBL (e.g, Lp norm)

 Euclidean distance (p = 2):
 Properties | + | x d(i,j)
d (i, j) = (| xof−ax metric
2
− x |: +...+ | x
2
−x |2 )
i
1 j
1 i
2 j2 i p jp
 d(i,j)  0
 d(i,i) = 0
 d(i,j) = d(j,i)
 d(i,j)  d(i,k) + d(k,j)

7
Major Clustering Approaches

 Partitioning: Construct various partitions and then evaluate them by

some criterion
 Hierarchical: Create a hierarchical decomposition of the set of objects
using some criterion
 Model-based: Hypothesize a model for each cluster and find best fit of
models to data
 Density-based: Guided by connectivity and density functions

8
Partitioning Algorithms

 Partitioning method: Construct a partition of a database D of n objects

into a set of k clusters
 Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen, 1967): Each cluster is represented by the center
of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw,
1987): Each cluster is represented by one of the objects in the cluster

9
K-Means Clustering

 Given k, the k-means algorithm consists of four steps:

 Select initial centroids at random.
 Assign each object to the cluster with the nearest centroid.
 Compute each centroid as the mean of the objects assigned to it.
 Repeat previous 2 steps until no change.

10
K-Means Clustering (contd.)
 Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

11
Comments on the K-Means Method

 Strengths
 Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is
# iterations. Normally, k, t << n.
 Often terminates at a local optimum. The global optimum may be found
using techniques such as simulated annealing and genetic algorithms
 Weaknesses
 Applicable only when mean is defined (what about categorical data?)
 Need to specify k, the number of clusters, in advance
 Trouble with noisy data and outliers
 Not suitable to discover clusters with non-convex shapes

12
Hierarchical Clustering
 Use distance matrix as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition

Step 0 Step 1 Step 2 Step 3 Step 4

agglomerative
(AGNES)
a ab
b abcde
c
cde
d
de
e
divisive
13 Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
AGNES (Agglomerative Nesting)

 Produces tree of clusters (nodes)

 Initially: each object is a cluster (leaf)
 Recursively merges nodes that have the least dissimilarity
 Criteria: min distance, max distance, avg distance, center distance
 Eventually all nodes belong to the same cluster (root)

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

14
A Dendrogram Shows How the
Clusters are Merged Hierarchically

Decompose data objects into several levels of nested

partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the

dendrogram at the desired level. Then each connected
component forms a cluster.

15
DIANA (Divisive Analysis)

 Inverse order of AGNES

 Start with root cluster containing all objects
 Recursively divide into subclusters
 Eventually each cluster contains a single object

10 10
10

9 9
9

8 8
8

7 7
7

6 6
6

5 5
5

4 4
4

3 3
3

2 2
2

1 1
1

0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering
No ratings yet
Clustering
32 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Clustering
No ratings yet
Clustering
45 pages
Data Analytics for B.Tech Students
No ratings yet
Data Analytics for B.Tech Students
98 pages
Unit IV
No ratings yet
Unit IV
96 pages
Overview of Clustering Algorithms
No ratings yet
Overview of Clustering Algorithms
83 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Segment 7 (Ch10)
No ratings yet
Segment 7 (Ch10)
60 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Customer Segmentation Techniques Explained
No ratings yet
Customer Segmentation Techniques Explained
46 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Clustering
No ratings yet
Clustering
104 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
10 Clus Basic
No ratings yet
10 Clus Basic
95 pages
Clustering Methods
No ratings yet
Clustering Methods
14 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
Data Mining-Unit 3-Part1
No ratings yet
Data Mining-Unit 3-Part1
41 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Clustering
No ratings yet
Clustering
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Session 34 - 35clustering
No ratings yet
Session 34 - 35clustering
50 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
24 pages
Cluster Analysis in Data Mining Techniques
No ratings yet
Cluster Analysis in Data Mining Techniques
76 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Cluster Analysis for Researchers
No ratings yet
Cluster Analysis for Researchers
76 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
10 Clus Basic
No ratings yet
10 Clus Basic
31 pages
Understanding Cluster Analysis Methods
No ratings yet
Understanding Cluster Analysis Methods
29 pages
Clustering
No ratings yet
Clustering
18 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Cluster Analysis Essentials
No ratings yet
Cluster Analysis Essentials
50 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering Methods
No ratings yet
Clustering Methods
64 pages
Unit 5
No ratings yet
Unit 5
85 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Big Data Clustering Techniques
No ratings yet
Big Data Clustering Techniques
28 pages
Clustering
No ratings yet
Clustering
35 pages
4.1 Clustering
No ratings yet
4.1 Clustering
69 pages
UNIT5
No ratings yet
UNIT5
60 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Cluster
100% (1)
Cluster
72 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering Part1
No ratings yet
Clustering Part1
79 pages
Data Mining Chapter 5 Cluster Analysis
No ratings yet
Data Mining Chapter 5 Cluster Analysis
44 pages
PCA
No ratings yet
PCA
32 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
11 pages
Neural Network
No ratings yet
Neural Network
18 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Slides 23quicksort
No ratings yet
Slides 23quicksort
53 pages
Fundamentals of Multi-Objective Optimization
No ratings yet
Fundamentals of Multi-Objective Optimization
2 pages
DAA Question Answer
No ratings yet
DAA Question Answer
36 pages
Lec 28
No ratings yet
Lec 28
81 pages
Chapter-5 - Disrete Fourier Transform
No ratings yet
Chapter-5 - Disrete Fourier Transform
69 pages
Prefix Postfix Expression
No ratings yet
Prefix Postfix Expression
12 pages
Design & Analysis of Algorithms: Chapter 3 Greedy Method Department of Computer Science Mekdela Amba University
No ratings yet
Design & Analysis of Algorithms: Chapter 3 Greedy Method Department of Computer Science Mekdela Amba University
34 pages
Unit 3 Kruskal Algorithm
No ratings yet
Unit 3 Kruskal Algorithm
3 pages
Newff (Neural Network Toolbox)
No ratings yet
Newff (Neural Network Toolbox)
2 pages
IV Sem DS and RDBMS
No ratings yet
IV Sem DS and RDBMS
3 pages
CS8451 DAA Unit I Notes
No ratings yet
CS8451 DAA Unit I Notes
8 pages
Hash Tables and Collision Resolution
No ratings yet
Hash Tables and Collision Resolution
47 pages
Priority Queue Case Study
No ratings yet
Priority Queue Case Study
13 pages
Inter IIT Interview Experience 2019
No ratings yet
Inter IIT Interview Experience 2019
25 pages
DSA MCQs for Students & Learners
No ratings yet
DSA MCQs for Students & Learners
10 pages
Assignment 6 Answer Key
No ratings yet
Assignment 6 Answer Key
10 pages
DSA Practical Telephone Book
No ratings yet
DSA Practical Telephone Book
3 pages
Bloom Filter 1
No ratings yet
Bloom Filter 1
4 pages
Top 30 Linked List Pattern DSA Questions
No ratings yet
Top 30 Linked List Pattern DSA Questions
35 pages
VLOOKUP Guide for Excel Users
No ratings yet
VLOOKUP Guide for Excel Users
3 pages
Data Structures & Algorithms Assignment
No ratings yet
Data Structures & Algorithms Assignment
28 pages
Non-Recursive Algorithm Efficiency Analysis
100% (1)
Non-Recursive Algorithm Efficiency Analysis
15 pages
Query Processing: Cost and Optimization
No ratings yet
Query Processing: Cost and Optimization
25 pages
Advanced Network Flow Concepts
No ratings yet
Advanced Network Flow Concepts
29 pages
CS310 Notes 06 Dijkstra Algorithm For SSSP
No ratings yet
CS310 Notes 06 Dijkstra Algorithm For SSSP
15 pages
01 Arrays Easy
No ratings yet
01 Arrays Easy
7 pages
GO Classes Gate-CSE Course Hours
No ratings yet
GO Classes Gate-CSE Course Hours
1 page
Heapsort: Comp 122, Spring 2004
No ratings yet
Heapsort: Comp 122, Spring 2004
31 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
16 pages
DSA Sheet by Rohit Negi
100% (1)
DSA Sheet by Rohit Negi
38 pages

Clustering

Uploaded by

Clustering

Uploaded by

INSTITUTE OF INFORMATION TECHNOLOGY & MANAGEMENT

Accredited ‘A’ Grade by NAAC &Recognised U/s 2(f) of UGC act

Machine Learning with Python

© Institute of Information Technology and Management, D-29,

 A good clustering method will produce clusters with

 Precise definition of clustering quality is difficult

 Same we used for IBL (e.g, Lp norm)

 Partitioning: Construct various partitions and then evaluate them by

 Partitioning method: Construct a partition of a database D of n objects

 Given k, the k-means algorithm consists of four steps:

Step 0 Step 1 Step 2 Step 3 Step 4

 Produces tree of clusters (nodes)

Decompose data objects into several levels of nested

A clustering of the data objects is obtained by cutting the

 Inverse order of AGNES

You might also like