Overview of CLIQUE Algorithm
CLIQUE is a bottom-up algorithm designed for clustering high-dimensional data
by dividing dimensions into subspaces, leveraging the a priori principle to prune
potential clusters efficiently.
The algorithm aims to identify meaningful clusters within subspaces rather than
considering all dimensions, which can be challenging.
Subspace Division and Density Identification
CLIQUE partitions each dimension into intervals, creating a grid structure that
allows for the exploration of single-dimensional subspaces and their combinations.
It identifies dense areas (crowded) and sparse areas (noise) within these subspaces,
marking the latter as outliers.
Steps in the CLIQUE Algorithm
Partitioning: Each dimension is divided into non-overlapping rectangular units,
allowing for the counting of data points within each unit.
Density Evaluation: A unit is considered dense if the number of data points
exceeds a user-defined thresholds.
Cluster Formation: Dense units are connected to form clusters, with the algorithm
recursively merging higher-dimensional spaces.
A Priori Principle Application
The a priori principle states that if a k-dimensional unit is dense, all its projections
into lower dimensions must also be dense. This principle aids in efficiently
determining the density of candidate units.
Strengths and Weaknesses of CLIQUE
Strengths:
Automatically identifies subspaces where data clusters exist.
Insensitive to the order of input records and does not require prior knowledge of
data distribution.
Scales well and can find overlapping clusters.
Weaknesses:
The algorithm's rigidity is due to the requirement of a global density threshold and
grid size, which may limit its flexibility.
Clustering results can be less intuitive, especially with large datasets, making it
difficult to visualize the overall data grouping.
Conclusion
CLIQUE is a powerful tool for clustering in high-dimensional spaces, but users
must be aware of its limitations and the need for careful parameter selection to
ensure meaningful results.