Grid-Based Clustering
The grid-based clustering methods use a multi-resolution grid data structure. It quantizes the
object areas into a finite number of cells that form a grid structure on which all of the
operations for clustering are implemented.
The benefit of the method is its quick processing time, which is generally independent of the
number of data objects, still dependent on only the multiple cells in each dimension in the
quantized space.
Several interesting methods
STING (a STatistical INformation Grid approach) by Wang, Yang, and Muntz (1997)
WaveCluster by Sheikholeslami, Chatterjee, and Zhang (VLDB’98) - A multi-
resolution clustering approach using wavelet method
CLIQUE - Agrawal, et al. (SIGMOD’98)
STING - A Statistical Information Grid Approach
A STING is a grid-based clustering technique. It uses a multidimensional grid data
structure that quantifies space into a finite number of cells. Instead of focusing on data
points, it focuses on the value space surrounding the data points.
In STING, the spatial area is divided into rectangular cells and several levels of cells
at different resolution levels. High-level cells are divided into several low-level cells.
In STING Statistical Information about attributes in each cell, such as mean,
maximum, and minimum values, are precomputed and stored as statistical parameters.
These statistical parameters are useful for query processing and other data analysis
tasks.
For each cell, the high level is partitioned into several smaller cells in the next lower
level.
The statistical info of each cell is calculated and stored beforehand and is used to
answer queries.
The parameters of higher-level cells can be easily calculated from parameters of
lower-level cell
These parameters contain the following: the attribute-independent parameter, count,
and the attribute-dependent parameters, mean, stdev (standard deviation), min
(minimum), max (maximum).
Type of distribution—normal, uniform, etc.
When the records are loaded into the database, the parameters count, mean, stdev,
min, and a max of the bottom-level cells are computed directly from the records.
CLIQUE(Clustering in QUEST) Algorithm in Data Mining
CLIQUE is a density-based and grid-based subspace clustering algorithm.
Grid-Based Clustering Technique: In Grid-Based Methods, the space of instance is
divided into a grid structure. Clustering techniques are then applied using the Cells of
the grid, instead of individual data points, as the base units.
Density-Based Clustering Technique: In Density-Based Methods, A cluster is a
maximal set of connected dense units in a subspace.
CLIQUE Algorithm:
CLIQUE Algorithm uses density and grid-based technique i.e subspace clustering
algorithm and finds out the cluster by taking density threshold and a number of grids
as input parameters.
It is specially designed to handle datasets with a large number of dimensions.
CLIQUE Algorithm is very scalable with respect to the value of the records, and a
number of dimensions in the dataset because it is grid-based and uses the Apriori
Property effectively.
Working of CLIQUE Algorithm:
The CLIQUE algorithm first divides the data space into grids. It is done by dividing
each dimension into equal intervals called units. After that, it identifies dense units. A
unit is dense if the data points in this are exceeding the threshold value.
Once the algorithm finds dense cells along one dimension, the algorithm tries to find
dense cells along two dimensions, and it works until all dense cells along the entire
dimension are found.
After finding all dense cells in all dimensions, the algorithm proceeds to find the
largest set (“cluster”) of connected dense cells. Finally, the CLIQUE algorithm
generates a minimal description of the cluster. Clusters are then generated from all
dense subspaces using the apriori approach.