UNIT-II: Clustering in Machine Learning
Clustering in Machine Learning:
-------------------------------
1. Types of Clustering Methods:
- Partitioning Clustering: Involves dividing the data into distinct, non-overlapping clusters.
- Distribution Model-Based Clustering: Assumes the data is generated by a mixture of underlying
probability distributions.
- Hierarchical Clustering: Builds a hierarchy of clusters either agglomeratively (bottom-up) or
divisively (top-down).
- Fuzzy Clustering: Allows a data point to belong to multiple clusters with varying degrees of
membership.
2. Birch Algorithm:
- A clustering algorithm that constructs a CF (Clustering Feature) tree for efficient clustering of
large datasets.
- It works by dynamically adjusting the threshold to maintain a balance between clustering quality
and efficiency.
3. CURE Algorithm:
- A hierarchical clustering algorithm designed to handle large datasets.
- CURE uses representative points and applies a combination of centroid-based and
distance-based techniques to improve cluster quality.
4. Gaussian Mixture Models (GMM) and Expectation Maximization (EM):
- GMM is a probabilistic model that assumes all data points are generated from a mixture of
several Gaussian distributions.
- The EM algorithm is used to estimate the parameters of the GMM by iteratively refining the
likelihood of the model based on observed data.
5. Parameters Estimations:
- Maximum Likelihood Estimation (MLE): A method for estimating the parameters of a statistical
model by maximizing the likelihood function.
- Maximum A Posteriori (MAP): A method similar to MLE but incorporates prior information (a
prior distribution) to improve the estimation process.
6. Applications of Clustering:
- Image segmentation, market segmentation, anomaly detection, social network analysis, and
document categorization are some common applications of clustering.