Data Mining Techniques
UNIT-III
Unit III Cluster Analysis
Cluster Analysis: Introduction - Applications of Cluster Analysis - Desired Features of
Clustering - Distance Metrics-Clustering Methods: K-Means Clustering - K-Medoids-
Agglomerative Clustering - Divisive Clustering- Density Based-Clustering - DBSCAN
Algorithm - Evaluation of Clustering.
By
Dr.C.Mohanapriya
MSc(CT).,Mphil.,SET.,PhD., 1
Density-Based Methods
• Partitioning and hierarchical methods are designed to find spherical-shaped clusters.
• They have difficulty finding clusters of arbitrary shape such as the “S” shape and
oval
• clusters in Figure
•
2
Density-Based Methods
• Given such data, they would likely inaccurately identify convex regions,
where noise or outliers are included in the clusters.
• To find clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
• This is the main strategy behind density-based clustering methods, which
can discover clusters of nonspherical shape.
• Basic techniques of density-based clustering methods, namely,
• DBSCAN
• OPTICS and
• DENCLUE
3
Density-Based Methods
• Given such data, they would likely inaccurately identify convex regions,
where noise or outliers are included in the clusters.
• To find clusters of arbitrary shape, alternatively, we can model clusters as
dense regions in the data space, separated by sparse regions.
• This is the main strategy behind density-based clustering methods, which
can discover clusters of nonspherical shape.
• Basic techniques of density-based clustering methods, namely,
• DBSCAN
• OPTICS and
• DENCLUE
4
Why DBSCAN?
• Partitioning methods (K-means, PAM clustering) and
hierarchical clustering work for finding spherical-shaped
clusters or convex clusters.
• In other words, they are suitable only for compact and
well-separated clusters.
• Moreover, they are also severely affected by the
presence of noise and outliers in the data.
• Real-life data may contain irregularities, like:
1.Clusters can be of arbitrary shape such as those shown in
the figure.
2.Data may contain noise
5
DBSCAN vs K-Means
DBSCAN vs K-Means
DBSCAN K-Means
In DBSCAN we need not specify the K-Means is very sensitive to the number of
number of clusters. clusters so it need to specified
Clusters formed in DBSCAN can be of any Clusters formed in K-Means are spherical
arbitrary shape. or convex in shape
K-Means does not work well with outliers
DBSCAN can work well with datasets
data. Outliers can skew the clusters in K-
having noise and outliers
Means to a very large extent.
In DBSCAN two parameters are required In K-Means only one parameter is required
for training the Model is for training the model
6
DBSCAN
When WeK-Means
Shouldvs Use DBSCAN Over K- Means In
Clustering Analysis?
• DBSCAN(Density-Based Spatial Clustering of Applications with
Noise) and K-Means are both clustering algorithms that group together
data that have the same characteristic.
• However, They work on different principles and are suitable for different
types of data
• We prefer to use DBSCAN when the data is not spherical in shape or the
number of classes is not known beforehand.
7
Question 1
• What is a Hierarchical Clustering Method?
• A. A method that groups data objects into a hierarchy or tree of clusters
• B. A method that divides data objects into equal groups
• C. A method that does not require grouping of data objects
• D. None of the above
Answer 1
• The correct answer is: A method that groups data objects into a hierarchy or tree of
clusters
Question 2
• Which clustering method is useful for data summarization and visualization?
• A. Hierarchical Clustering
• B. Partitioning Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 2
• The correct answer is: Hierarchical Clustering
Question 3
• What is the difference between Agglomerative and Divisive hierarchical clustering?
• A. Agglomerative is a bottom-up approach while Divisive is a top-down approach
• B. Agglomerative is a top-down approach while Divisive is a bottom-up approach
• C. Both are top-down approaches
• D. Both are bottom-up approaches
Answer 3
• The correct answer is: Agglomerative is a bottom-up approach while Divisive is a top-
down approach
Question 4
• Which hierarchical clustering method is used in BIRCH?
• A. Multiphase Hierarchical Clustering Using Clustering Feature Trees
• B. Density-Based Clustering
• C. Partitioning Clustering
• D. Grid-Based Clustering
Answer 4
• The correct answer is: Multiphase Hierarchical Clustering Using Clustering Feature
Trees
Question 5
• What does DBSCAN stand for?
• A. Density-Based Spatial Clustering of Applications with Noise
• B. Data-Based Spatial Clustering of Applications with Noise
• C. Density-Based Spatial Clustering of Applications with Nodes
• D. Data-Based Spatial Clustering of Applications with Nodes
Answer 5
• The correct answer is: Density-Based Spatial Clustering of Applications with Noise
Question 6
• In which situation is DBSCAN preferred over K-Means?
• A. When data is not spherical in shape
• B. When data is spherical in shape
• C. When the number of classes is known beforehand
• D. When the data is linear
Answer 6
• The correct answer is: When data is not spherical in shape
Question 7
• What is the key feature of DBSCAN?
• A. It is density-based clustering that connects regions with high density
• B. It uses a top-down approach
• C. It groups data objects into a hierarchy
• D. It is a partitioning clustering method
Answer 7
• The correct answer is: It is density-based clustering that connects regions with high
density
Question 8
• Which of the following is not a hierarchical clustering method?
• A. K-Means Clustering
• B. Agglomerative Clustering
• C. Divisive Clustering
• D. BIRCH Clustering
Answer 8
• The correct answer is: K-Means Clustering
Question 9
• What is Chameleon?
• A. A Multiphase Hierarchical Clustering Using Dynamic Modelling
• B. A Grid-Based Clustering Method
• C. A Density-Based Clustering Method
• D. A Partitioning Clustering Method
Answer 9
• The correct answer is: A Multiphase Hierarchical Clustering Using Dynamic Modelling
Question 10
• Which method is used for Probabilistic Hierarchical Clustering?
• A. Probabilistic Models
• B. Partitioning Methods
• C. Density-Based Methods
• D. Grid-Based Methods
Answer 10
• The correct answer is: Probabilistic Models
Question 11
• What is the purpose of hierarchical clustering?
• A. To group data into a hierarchy or tree of clusters
• B. To divide data into equal groups
• C. To create non-hierarchical clusters
• D. To summarize data with no structure
Answer 11
• The correct answer is: To group data into a hierarchy or tree of clusters
Question 12
• Which clustering method can further partition groups into subgroups?
• A. Hierarchical Clustering
• B. Partitioning Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 12
• The correct answer is: Hierarchical Clustering
Question 13
• Which hierarchical clustering method uses dynamic modeling?
• A. Chameleon
• B. BIRCH
• C. DBSCAN
• D. K-Means
Answer 13
• The correct answer is: Chameleon
Question 14
• In which scenario is hierarchical clustering preferred?
• A. When a hierarchy of clusters is needed
• B. When only flat clusters are required
• C. When data is linearly separable
• D. When data is non-linear
Answer 14
• The correct answer is: When a hierarchy of clusters is needed
Question 15
• Which method helps in finding average values in hierarchical clustering?
• A. Summarization by levels
• B. Random sampling
• C. Partitioning method
• D. Grid-Based method
Answer 15
• The correct answer is: Summarization by levels
Question 16
• What kind of approach is used in agglomerative clustering?
• A. Bottom-up approach
• B. Top-down approach
• C. Random approach
• D. Linear approach
Answer 16
• The correct answer is: Bottom-up approach
Question 17
• How does divisive hierarchical clustering start?
• A. By considering all objects in one cluster
• B. By placing each object in a separate cluster
• C. By randomly assigning objects to clusters
• D. By dividing objects based on density
Answer 17
• The correct answer is: By considering all objects in one cluster
Question 18
• Which algorithm is known for using density-based clustering?
• A. DBSCAN
• B. K-Means
• C. BIRCH
• D. Chameleon
Answer 18
• The correct answer is: DBSCAN
Question 19
• Which method should be avoided when data is spherical?
• A. DBSCAN
• B. Hierarchical Clustering
• C. Grid-Based Clustering
• D. Density-Based Clustering
Answer 19
• The correct answer is: DBSCAN
Question 20
• What does the BIRCH algorithm primarily use?
• A. Clustering Feature Trees
• B. Dynamic Modeling
• C. Density-Based Methods
• D. Grid-Based Methods
Answer 20
• The correct answer is: Clustering Feature Trees