0% found this document useful (0 votes)

66 views6 pages

Cure Algorithm

paper

Uploaded by

Prathamesh Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views6 pages

Cure Algorithm

paper

Uploaded by

Prathamesh Kulkarni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)

Cure

Cluster Using REpresentative i.e. CURE is very efficient data clustering algorithm for specifically large
databases.
CURE is robust to outliers.
Traditional clustering algorithm :
In traditional clustering, it selects for any one point and it is only point considered as a cluster i.e. clusters
centroid
approach.
Points in a cluster appear close to each other compared to other data points of any other clusters. It works
in eclipse
shape in better way.
Drawback of traditional clustering algorithm is all-points approach makes algorithm highly sensitive to
outliers and a
minute change in position of data points.
Cluster centroid and all points approach not work on arbitrary shape.

CURE Algorithm

CURE algorithm works better in spherical as well as non-spherical clusters.

− CURE : An efficient clustering algorithm for large database : sudipto Guha, Rajeev Rastogi, Kyuseok
Shim.
− It prefers a set of points which are scattered as representative cluster than all-points or centroid
approach.
− CURE uses random sampling and partitioning to speed up clustering.

Overview of CURE (Cluster Using REpresentative)

Data

Make random sample

Make partitioning of sample

Partially cluster partitions

Eliminate outliers

Cluster partial clusters

Label data in disk

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
Hierarchical Clustering Algorithm

− A centroid-based point ‘c’ is chosen. All remaining scattered points are just at a fraction distance of  to get
shrunk
towards centroid.
− Such multiple scattered points help to discover in non spherical cluster i.e. elongated cluster.
− Hierarchical clustering algorithm uses such space which is linear to input size n.
− Worst-case time complexity is O (n2 long n) and it may reduce to O(n2) for lower dimensions.

CURE algorithm : CURE cluster procedure

− It is similar to hierarchical clustering approach. But it use sample point variant as cluster
representative rather than
every point in the cluster.

− First set a target sample number C. Then we try to select C well scattered sample points from cluster.
− The chosen scattered points are shrunk towards the centroid in a fraction of  where0    1.
Fig.
− These points are used as representative of clusters and will be used as point in dmin cluster merging
approach.
− After each merging, C sample points will be selected from original representative of previous clusters
to represent
new cluster.

These points are used as representative of clusters and will be used as point in dmin cluster merging
approach.
− After each merging, C sample points will be selected from original representative of previous clusters
to represent
new cluster.

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)

Cluster merging will be stopped until target K cluster is found.

Random Sampling and Partitioning Sample

To reduce size of input to CURE’s clustering algorithm random sampling is used in case of large data sets.
− Good clusters can be obtained by moderate size random samples, it provides tradeoff between efficiency and
accuracy.
− Portioning sample reduces time required for execution because before final cluster made each partition get clustered
whenever it is in pre-clustered data format at eliminated outliers.

Eliminate Outlier’s and Data Labelling

Outliers points are generally less than number in cluster.

− As random sample gets clustered, multiple representative points from each cluster are labelled with data set
remainders.
− Clustering based on scattered point i.e. CURE approach found most efficient compared to centroid or all-points
approach of traditional clustering algorithm.

Pseudo function of CURE (clustering algorithm)

Procedure cluster (s, k)
Begin
T : = build – kd – tree (s)
Q : = build – heap (s)
While size (Q) > k
do {
u : = extract – min (Q)
v : = u – closest
delete (Q, v)
w : = merge (u, v)
delete – rep (T, u) ;
delete – rep (T, v) ;
insert – rep (T, w) ;
w – closest : = x

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
for each x Q
do {
if dist (w, x) < dist (w, w – closest)
w – closest : = x
if x – closest is either u or v {
if dist (x, x – closest) < dist (x, w)
x – closest : = closest – cluster
(T, x, dist (x, w))
else
x – closest : = w
relocate (Q, x)
}
else if dist (x, x–closest) > dist (x, w){
x – closest : = w
relocate (Q, x)
}
}
insert (Q, w)}

Procedure for merging clusters

Procedure merge (u, v)
being
w:=uUv
w. mean : = |u| [Link] + |v| v-mean/|u| + |v|
‘tmpset : = ∮
For i : = 1 to c do {
maxDist : = 0
for each point p in cluster w do {
if i = 1
min Dist : = dist (p, w, mean)
else
min Dist : = min {dist (p, q) : q € tmpset}
if (min Dist > max Dist)
{ max Dist : = min Dist
Max point : = P
}
}
tmpset : = tmpset U {maxpoint}
}

For each point P in tempest do

w – rep : = w_rep {p + (w-mean-p)}
return w
end

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
Stream Computing
− Stream computing is useful in real time system like count of items placed on a conveyor belt.
− 1BM announced stream computing system in 2007, which runs 800 microprocessors and it enables to software
applications to get split to task and rearrange data into answer.
− AT1 technologies derives stream computing with Graphical Processors (GPUs) working with high performance with
low latency CPU to resolve computational issues.
− AT1 preferred stream computing to run application on GPU instead of CPU

A Stream - Clustering Algorithm

− BDMO Algorithm has complex structures and it is designed in approach to give guaranteed
performance even in worst
case.
− BDMO designed by B. Bahcock, M. Datar, R. Motwani and L. OCallaghan.
Details of BDMO algorithm
(i) Stream of data are initially partitioned and later summarized with help of bucket size and bucket is a
power of two.
(ii) Bucket size has few restrictions size of buckets are one or two of each size within a limit. Required
bucket may start
with sized or twice to previous for example bucket size required are 3, 6, 12, 24, 48 and so on.
(iii) Bucket size are restrained in some scenario, buckets mostly O (log N).
(iv) Bucket consists with contents like size, timestamp, number of points in cluster, centriod etc.
Few well – known algorithm for data stream clustering are :
(a) Small – Spaces algorithm (b) BIRCH
(c) COBWEB (d) C2ICM

Initializing and Merging Buckets

A small size ‘p’ is chosen for bucket where p is power of 2. Timestamp of this bucket belongs to a timestamp of most
recent points of bucket.
− Clustering of these points done by specific strategy. Method preferred for clustering at initial stage provide the
centriod or clustroids, it becomes record for each cluster.
Let,
* ‘p’ be smallest bucket size.
* Every p point, creates a new bucket, where bucket is time stamped along with cluster points.
* Any bucket older than N is dropped

Department of Computer Science & Engineering-(AI&ML) | APSIT

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
* If number of buckets are 3 of size p
p → merge oldest two
− Then propagated merge may be like (2p, 4p, …).
− While merging buckets a new bucked created by review of sequence of buckets.
− If any bucket with more timestamp than N time unit prior to current time, at such scenario nothing will be in window
of the bucket such bucket will be dropped.
− If we created p bucket then two of three oldest bucket will get merged. The newly merged bucket size nearly z p, as we
needed to merge buckets with increasing sizes.
− To merge two consecutive buckets we need size of bucket twice than size of 2 buckets going to merge. Timestamp of
newly merged bucket is most recent timestamp from 2 consecutive buckets. By computing few parameters decision of
cluster merging is taken.
− Let, k-means Euclidean. A cluster represent with number of points (n) and centriod (c).
Put p = k, or larger – k-means clustering while creating bucket
To merge, n = n1 + n2, c =n1c1 + n2c2/ n1 + n2
− Let, a non Euclidean, a cluster represented using clusteroid and CSD. To choose new clusteroid while merging, k-points
furthest are selected from clusteroids.
CSDm (P) = CSD1 (P) + N2 (d2 (P, c1) + d2 (c1, c2)) + CSD2 (c2)

Answering Queries
− Given m, choose the smallest set of bucket such that it covers the most recent m points. At most 2m
points.
− Bucket construction and solution generation are the two steps used for quarry rewriting in a shared –
variable bucket algorithm, one of the efficient approaches for answering queries.

Department of Computer Science & Engineering-(AI&ML) | APSIT

Clustering, THE MACHINE LEARNING TOPIC..
No ratings yet
Clustering, THE MACHINE LEARNING TOPIC..
25 pages
Presentation On Clustering Algorithms
No ratings yet
Presentation On Clustering Algorithms
43 pages
Efficient Clustering Algorithm For Large Database
No ratings yet
Efficient Clustering Algorithm For Large Database
25 pages
CURE
No ratings yet
CURE
14 pages
Unit 5
No ratings yet
Unit 5
10 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
2025 TPAMI Gauging-Delta A Non-Parametric Hierarchical Clustering Algorithm
No ratings yet
2025 TPAMI Gauging-Delta A Non-Parametric Hierarchical Clustering Algorithm
11 pages
Exp11 DWM
No ratings yet
Exp11 DWM
9 pages
Chapter 9 Clustering
No ratings yet
Chapter 9 Clustering
6 pages
Reaction Paper On BFR Clustering Algorithm
No ratings yet
Reaction Paper On BFR Clustering Algorithm
5 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
6 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
50 pages
Cluster
100% (1)
Cluster
72 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Segment 7 (Ch10)
No ratings yet
Segment 7 (Ch10)
60 pages
DB Scan Clustering
No ratings yet
DB Scan Clustering
11 pages
Module 5
No ratings yet
Module 5
43 pages
Clustering
No ratings yet
Clustering
45 pages
Clustering Framework for Data Streams
No ratings yet
Clustering Framework for Data Streams
21 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Clustering Algorithms Overview
No ratings yet
Clustering Algorithms Overview
37 pages
Clustering Part2
No ratings yet
Clustering Part2
40 pages
Clustering Techniques and Clustroids
No ratings yet
Clustering Techniques and Clustroids
43 pages
3.k-Metoids and Hierarchical Updated
No ratings yet
3.k-Metoids and Hierarchical Updated
50 pages
Unit 5 Notes PR
No ratings yet
Unit 5 Notes PR
8 pages
DWM Experiment5 E059
No ratings yet
DWM Experiment5 E059
15 pages
Fuzzy C Mean
No ratings yet
Fuzzy C Mean
6 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
ML 8
No ratings yet
ML 8
5 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
1 s2.0 S1877050923018549 Main
No ratings yet
1 s2.0 S1877050923018549 Main
5 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
51 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Ul 1
No ratings yet
Ul 1
19 pages
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages
DMML Unit 4 Part 1&2
No ratings yet
DMML Unit 4 Part 1&2
52 pages
Agglomerative Mean-Shift Clustering
No ratings yet
Agglomerative Mean-Shift Clustering
7 pages
2015 Elsevier Dynamic Clustering With Improved Binary Artificial Bee Colony Algorithm
No ratings yet
2015 Elsevier Dynamic Clustering With Improved Binary Artificial Bee Colony Algorithm
12 pages
Wa0001
No ratings yet
Wa0001
3 pages
2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
No ratings yet
2022-A Comprehensive Survey of Clustering Algorithms State-Of-The-Art Machine Learning Applications Taxonomy Challenges
43 pages
Clustering Algorithms in Data Mining
No ratings yet
Clustering Algorithms in Data Mining
44 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Unit-4 Notes
No ratings yet
Unit-4 Notes
16 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Week 10
No ratings yet
Week 10
84 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
51 pages
BDA Notes Part 2
No ratings yet
BDA Notes Part 2
5 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Clustering
No ratings yet
Clustering
28 pages
Unsupervised Learning Updated1
No ratings yet
Unsupervised Learning Updated1
47 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Chapter 7
No ratings yet
Chapter 7
29 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
GIS Clustering Techniques
No ratings yet
GIS Clustering Techniques
8 pages
UT2 NLP Final
No ratings yet
UT2 NLP Final
1 page
Deep Learning Theory
No ratings yet
Deep Learning Theory
3 pages
Applications
No ratings yet
Applications
1 page
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
4 pages
Ut 2 - QP - Sepm 2023-24
No ratings yet
Ut 2 - QP - Sepm 2023-24
2 pages
TE AINDS Syllabus REV 2019 - DAV
No ratings yet
TE AINDS Syllabus REV 2019 - DAV
3 pages
Syllabus
No ratings yet
Syllabus
2 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Can You Disclose Your ESE 2018 Marks Sheet Along With Some Tips To Maximize Score in Conventional Papers?
No ratings yet
Can You Disclose Your ESE 2018 Marks Sheet Along With Some Tips To Maximize Score in Conventional Papers?
7 pages
Resume of Mohammed Subhan
No ratings yet
Resume of Mohammed Subhan
3 pages
LP Simplex Minimization
No ratings yet
LP Simplex Minimization
23 pages
SP234 Bird Guano
No ratings yet
SP234 Bird Guano
3 pages
Business Statistics Exam Guide RIMS
No ratings yet
Business Statistics Exam Guide RIMS
1 page
Survey Design Essentials
No ratings yet
Survey Design Essentials
25 pages
Mage: Real Estate Tycoon
No ratings yet
Mage: Real Estate Tycoon
16 pages
Paraphrasing
100% (1)
Paraphrasing
4 pages
(Ebook) Behind Human Error by Richard Cook, Sidney Dekker, Leila Johannesen, David D. Woods ISBN 9780754678335, 9780754696506, 9781315568935, 9781317175520, 9781317175537, 9781317175544, 0754678334, 0754696502, 1315568934 Complete Edition
No ratings yet
(Ebook) Behind Human Error by Richard Cook, Sidney Dekker, Leila Johannesen, David D. Woods ISBN 9780754678335, 9780754696506, 9781315568935, 9781317175520, 9781317175537, 9781317175544, 0754678334, 0754696502, 1315568934 Complete Edition
162 pages
Test-Range Filter Buy and Sell $$
No ratings yet
Test-Range Filter Buy and Sell $$
3 pages
IELTS Writing Task 1 Guide
No ratings yet
IELTS Writing Task 1 Guide
5 pages
General Statement Thesis Statement
100% (3)
General Statement Thesis Statement
4 pages
IELTS Reading: Sentence Completion Guide
100% (1)
IELTS Reading: Sentence Completion Guide
5 pages
DVP08ST11N Installation Guide
No ratings yet
DVP08ST11N Installation Guide
2 pages
Developing An AUV Manual Remote Control System: Ann Marie Polsenberg, MIT
No ratings yet
Developing An AUV Manual Remote Control System: Ann Marie Polsenberg, MIT
11 pages
Final 04-05-2025 Pncf-09th Minor Test-01 Paper (WC)
No ratings yet
Final 04-05-2025 Pncf-09th Minor Test-01 Paper (WC)
10 pages
Nilgsoutline Eng
No ratings yet
Nilgsoutline Eng
3 pages
STA Starter Set Campaign Booklet PDF
100% (10)
STA Starter Set Campaign Booklet PDF
52 pages
Solar Mini Rice Transplanter Robot
0% (1)
Solar Mini Rice Transplanter Robot
9 pages
OS Worksheet From Past Papers
No ratings yet
OS Worksheet From Past Papers
7 pages
David Easton
No ratings yet
David Easton
12 pages
Sa1 Syllabus
No ratings yet
Sa1 Syllabus
3 pages
‎⁨حل كتاب workbook للصف الخامس الفصل الاول⁩ 1
No ratings yet
‎⁨حل كتاب workbook للصف الخامس الفصل الاول⁩ 1
37 pages
AiiDA-Phonopy Phonon Calculation Guide
No ratings yet
AiiDA-Phonopy Phonon Calculation Guide
25 pages
Bioluminescent Organisms
No ratings yet
Bioluminescent Organisms
2 pages
Group Work Strategies for the Elderly
100% (1)
Group Work Strategies for the Elderly
21 pages
INSE 6640: Smart Grids and Control System Security: Lecture 10 - Cyber-Attacks Against State Estimation in Smart Grid
No ratings yet
INSE 6640: Smart Grids and Control System Security: Lecture 10 - Cyber-Attacks Against State Estimation in Smart Grid
55 pages
PCS Management Quiz
No ratings yet
PCS Management Quiz
2 pages
Top 10 CCTV Cameras & Benefits
No ratings yet
Top 10 CCTV Cameras & Benefits
2 pages
Cell Culture Automation
No ratings yet
Cell Culture Automation
42 pages