0% found this document useful (0 votes)
30 views70 pages

Density Based Clustering

Uploaded by

cmptup2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views70 pages

Density Based Clustering

Uploaded by

cmptup2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Data Mining Techniques

UNIT-III
Unit III Cluster Analysis

Cluster Analysis: Introduction - Applications of Cluster Analysis - Desired Features of


Clustering - Distance Metrics-Clustering Methods: K-Means Clustering - K-Medoids-
Agglomerative Clustering - Divisive Clustering- Density Based-Clustering - DBSCAN
Algorithm - Evaluation of Clustering.

By
Dr.C.Mohanapriya
MSc(CT).,Mphil.,SET.,PhD., 1
Density-Based Methods
• Partitioning and hierarchical methods are designed to find spherical-shaped clusters.
• They have difficulty finding clusters of arbitrary shape such as the “S” shape and
oval
• clusters in Figure

2
Density-Based Methods

• Given such data, they would likely inaccurately identify convex regions,
where noise or outliers are included in the clusters.
• To find clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
• This is the main strategy behind density-based clustering methods, which
can discover clusters of nonspherical shape.
• Basic techniques of density-based clustering methods, namely,
• DBSCAN
• OPTICS and
• DENCLUE
3
Density-Based Methods

• Given such data, they would likely inaccurately identify convex regions,
where noise or outliers are included in the clusters.
• To find clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
• This is the main strategy behind density-based clustering methods, which
can discover clusters of nonspherical shape.
• Basic techniques of density-based clustering methods, namely,
• DBSCAN
• OPTICS and
• DENCLUE
4
Why DBSCAN?

• Partitioning methods (K-means, PAM clustering) and


hierarchical clustering work for finding spherical-shaped
clusters or convex clusters.
• In other words, they are suitable only for compact and
well-separated clusters.
• Moreover, they are also severely affected by the
presence of noise and outliers in the data.
• Real-life data may contain irregularities, like:
1.Clusters can be of arbitrary shape such as those shown in
the figure.
2.Data may contain noise
5
DBSCAN vs K-Means
DBSCAN vs K-Means

DBSCAN K-Means

In DBSCAN we need not specify the K-Means is very sensitive to the number of
number of clusters. clusters so it need to specified
Clusters formed in DBSCAN can be of any Clusters formed in K-Means are spherical
arbitrary shape. or convex in shape
K-Means does not work well with outliers
DBSCAN can work well with datasets
data. Outliers can skew the clusters in K-
having noise and outliers
Means to a very large extent.
In DBSCAN two parameters are required In K-Means only one parameter is required
for training the Model is for training the model

6
DBSCAN
When WeK-Means
Shouldvs Use DBSCAN Over K- Means In
Clustering Analysis?

• DBSCAN(Density-Based Spatial Clustering of Applications with


Noise) and K-Means are both clustering algorithms that group together
data that have the same characteristic.
• However, They work on different principles and are suitable for different
types of data
• We prefer to use DBSCAN when the data is not spherical in shape or the
number of classes is not known beforehand.

7
Question 1

• What is a Hierarchical Clustering Method?


• A. A method that groups data objects into a hierarchy or tree of clusters
• B. A method that divides data objects into equal groups
• C. A method that does not require grouping of data objects
• D. None of the above
Answer 1

• The correct answer is: A method that groups data objects into a hierarchy or tree of
clusters
Question 2

• Which clustering method is useful for data summarization and visualization?


• A. Hierarchical Clustering
• B. Partitioning Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 2

• The correct answer is: Hierarchical Clustering


Question 3

• What is the difference between Agglomerative and Divisive hierarchical clustering?


• A. Agglomerative is a bottom-up approach while Divisive is a top-down approach
• B. Agglomerative is a top-down approach while Divisive is a bottom-up approach
• C. Both are top-down approaches
• D. Both are bottom-up approaches
Answer 3

• The correct answer is: Agglomerative is a bottom-up approach while Divisive is a top-
down approach
Question 4

• Which hierarchical clustering method is used in BIRCH?


• A. Multiphase Hierarchical Clustering Using Clustering Feature Trees
• B. Density-Based Clustering
• C. Partitioning Clustering
• D. Grid-Based Clustering
Answer 4

• The correct answer is: Multiphase Hierarchical Clustering Using Clustering Feature
Trees
Question 5

• What does DBSCAN stand for?


• A. Density-Based Spatial Clustering of Applications with Noise
• B. Data-Based Spatial Clustering of Applications with Noise
• C. Density-Based Spatial Clustering of Applications with Nodes
• D. Data-Based Spatial Clustering of Applications with Nodes
Answer 5

• The correct answer is: Density-Based Spatial Clustering of Applications with Noise
Question 6

• In which situation is DBSCAN preferred over K-Means?


• A. When data is not spherical in shape
• B. When data is spherical in shape
• C. When the number of classes is known beforehand
• D. When the data is linear
Answer 6

• The correct answer is: When data is not spherical in shape


Question 7

• What is the key feature of DBSCAN?


• A. It is density-based clustering that connects regions with high density
• B. It uses a top-down approach
• C. It groups data objects into a hierarchy
• D. It is a partitioning clustering method
Answer 7

• The correct answer is: It is density-based clustering that connects regions with high
density
Question 8

• Which of the following is not a hierarchical clustering method?


• A. K-Means Clustering
• B. Agglomerative Clustering
• C. Divisive Clustering
• D. BIRCH Clustering
Answer 8

• The correct answer is: K-Means Clustering


Question 9

• What is Chameleon?
• A. A Multiphase Hierarchical Clustering Using Dynamic Modelling
• B. A Grid-Based Clustering Method
• C. A Density-Based Clustering Method
• D. A Partitioning Clustering Method
Answer 9

• The correct answer is: A Multiphase Hierarchical Clustering Using Dynamic Modelling
Question 10

• Which method is used for Probabilistic Hierarchical Clustering?


• A. Probabilistic Models
• B. Partitioning Methods
• C. Density-Based Methods
• D. Grid-Based Methods
Answer 10

• The correct answer is: Probabilistic Models


Question 11

• What is the purpose of hierarchical clustering?


• A. To group data into a hierarchy or tree of clusters
• B. To divide data into equal groups
• C. To create non-hierarchical clusters
• D. To summarize data with no structure
Answer 11

• The correct answer is: To group data into a hierarchy or tree of clusters
Question 12

• Which clustering method can further partition groups into subgroups?


• A. Hierarchical Clustering
• B. Partitioning Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 12

• The correct answer is: Hierarchical Clustering


Question 13

• Which hierarchical clustering method uses dynamic modeling?


• A. Chameleon
• B. BIRCH
• C. DBSCAN
• D. K-Means
Answer 13

• The correct answer is: Chameleon


Question 14

• In which scenario is hierarchical clustering preferred?


• A. When a hierarchy of clusters is needed
• B. When only flat clusters are required
• C. When data is linearly separable
• D. When data is non-linear
Answer 14

• The correct answer is: When a hierarchy of clusters is needed


Question 15

• Which method helps in finding average values in hierarchical clustering?


• A. Summarization by levels
• B. Random sampling
• C. Partitioning method
• D. Grid-Based method
Answer 15

• The correct answer is: Summarization by levels


Question 16

• What kind of approach is used in agglomerative clustering?


• A. Bottom-up approach
• B. Top-down approach
• C. Random approach
• D. Linear approach
Answer 16

• The correct answer is: Bottom-up approach


Question 17

• How does divisive hierarchical clustering start?


• A. By considering all objects in one cluster
• B. By placing each object in a separate cluster
• C. By randomly assigning objects to clusters
• D. By dividing objects based on density
Answer 17

• The correct answer is: By considering all objects in one cluster


Question 18

• Which algorithm is known for using density-based clustering?


• A. DBSCAN
• B. K-Means
• C. BIRCH
• D. Chameleon
Answer 18

• The correct answer is: DBSCAN


Question 19

• Which method should be avoided when data is spherical?


• A. DBSCAN
• B. Hierarchical Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 19

• The correct answer is: DBSCAN


Question 20

• What does the BIRCH algorithm primarily use?


• A. Clustering Feature Trees
• B. Dynamic Modeling
• C. Density-Based Methods
• D. Grid-Based Methods
Answer 20

• The correct answer is: Clustering Feature Trees

You might also like