0% found this document useful (0 votes)
11 views10 pages

Unsupervised Machine Learning

The document provides an overview of various unsupervised machine learning models categorized into clustering algorithms, dimensionality reduction algorithms, association rule learning, anomaly detection algorithms, generative models, graph-based models, and neural network-based approaches. Each category includes specific algorithms along with their strengths and weaknesses. The information serves as a comprehensive guide for understanding the capabilities and limitations of different unsupervised learning techniques.

Uploaded by

karltasi150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Unsupervised Machine Learning

The document provides an overview of various unsupervised machine learning models categorized into clustering algorithms, dimensionality reduction algorithms, association rule learning, anomaly detection algorithms, generative models, graph-based models, and neural network-based approaches. Each category includes specific algorithms along with their strengths and weaknesses. The information serves as a comprehensive guide for understanding the capabilities and limitations of different unsupervised learning techniques.

Uploaded by

karltasi150
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Unsupervised machine learning models are algorithms designed to find patterns or structure in data

without predefined labels. Here’s a categorized list of commonly used unsupervised machine learning
models:
1. Clustering Algorithms
These algorithms group data points into clusters based on their similarities.
 K-Means Clustering
 Hierarchical Clustering (e.g., Agglomerative and Divisive)
 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 OPTICS (Ordering Points to Identify Clustering Structure)
 Mean Shift
 Gaussian Mixture Models (GMM)
 Spectral Clustering
 Affinity Propagation
 Self-Organizing Maps (SOMs)
2. Dimensionality Reduction Algorithms
These models reduce the number of features in the dataset while preserving its structure.
 Principal Component Analysis (PCA)
 Kernel PCA
 t-SNE (t-Distributed Stochastic Neighbor Embedding)
 UMAP (Uniform Manifold Approximation and Projection)
 Factor Analysis
 Independent Component Analysis (ICA)
 Non-Negative Matrix Factorization (NMF)
 Latent Dirichlet Allocation (LDA) (for topic modeling)
3. Association Rule Learning
Used to discover relationships between variables in large datasets.
 Apriori Algorithm
 Eclat
 FP-Growth (Frequent Pattern Growth)
4. Anomaly Detection Algorithms
These are used to identify data points that deviate significantly from the majority.
 Isolation Forest
 One-Class SVM
 Autoencoders (Unsupervised variants)
 Local Outlier Factor (LOF)
 Elliptic Envelope
5. Matrix Factorization
Used in recommendation systems and collaborative filtering.
 Singular Value Decomposition (SVD)
 Non-Negative Matrix Factorization (NMF)
 Alternating Least Squares (ALS)
6. Generative Models
Used to generate data similar to the input dataset.
 Generative Adversarial Networks (GANs)
 Variational Autoencoders (VAEs)
 Boltzmann Machines (e.g., Restricted Boltzmann Machines)
7. Graph-Based Models
Used to analyze data represented in graph structures.
 Graph Clustering (e.g., Louvain algorithm for community detection)
 DeepWalk
 Node2Vec
 Spectral Graph Algorithms
8. Neural Network-Based Approaches
Unsupervised learning techniques using neural networks.
 Autoencoders
o Variants: Denoising Autoencoders, Sparse Autoencoders, Contractive Autoencoders

 Self-Organizing Maps (SOM)


 Contrastive Predictive Coding
 Deep Belief Networks (DBNs)
9. Density Estimation
Used to estimate the probability density function of data.
 Kernel Density Estimation (KDE)
 Gaussian Mixture Models (GMMs)

STRENGTHS AND WEAKNESS OF THE MODELS

1. Clustering Algorithms
K-Means Clustering
 Strengths:
o Simple and efficient for large datasets.

o Easy to interpret.

o Works well when clusters are spherical and equally sized.

 Weaknesses:
o Sensitive to the initial centroids.

o Struggles with non-spherical clusters and varying densities.

o Requires specifying the number of clusters (k) beforehand.

Hierarchical Clustering
 Strengths:
o No need to pre-specify the number of clusters.

o Produces a dendrogram for better data visualization.

 Weaknesses:
o Computationally expensive for large datasets (not scalable).

o Sensitive to noise and outliers.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)


 Strengths:
o Handles clusters of arbitrary shapes.

o Can detect outliers as noise points.

o No need to specify the number of clusters.

 Weaknesses:
o Struggles with varying density clusters.
o Sensitive to parameters (eps and min_samples).

OPTICS
 Strengths:
o Extension of DBSCAN for varying density clusters.

o Better cluster hierarchy detection.

 Weaknesses:
o Computationally more expensive than DBSCAN.

o Complex to fine-tune parameters.

Mean Shift
 Strengths:
o No need to specify the number of clusters.

o Detects clusters of arbitrary shapes.

 Weaknesses:
o Computationally intensive for large datasets.

o Bandwidth parameter selection is challenging.

Gaussian Mixture Models (GMM)


 Strengths:
o Handles overlapping clusters well.

o Provides probabilistic cluster assignments.

 Weaknesses:
o Assumes data follows a Gaussian distribution.

o Requires specifying the number of components.

Spectral Clustering
 Strengths:
o Effective for non-convex clusters.

o Works well with similarity graphs.

 Weaknesses:
o Not scalable to large datasets.

o Requires specifying the number of clusters.


Affinity Propagation
 Strengths:
o No need to predefine the number of clusters.

o Works well with sparse data.

 Weaknesses:
o Computationally expensive.

o Tends to converge to suboptimal solutions for large datasets.

Self-Organizing Maps (SOMs)


 Strengths:
o Useful for visualizing high-dimensional data.

o Can learn complex relationships in data.

 Weaknesses:
o Convergence can be slow.

o Results depend on initialization and hyperparameters.

2. Dimensionality Reduction Algorithms


Principal Component Analysis (PCA)
 Strengths:
o Computationally efficient.

o Works well for linearly correlated features.

 Weaknesses:
o Assumes linear relationships.

o Loses interpretability as dimensions are reduced.

Kernel PCA
 Strengths:
o Extends PCA to capture non-linear relationships.

o Effective with kernel tricks.

 Weaknesses:
o Computationally expensive.

o Choice of kernel parameters affects performance.


t-SNE
 Strengths:
o Excellent for visualizing high-dimensional data.

o Preserves local structure.

 Weaknesses:
o Computationally expensive.

o Does not preserve global structure.

o Results vary with perplexity parameter.

UMAP
 Strengths:
o Faster and better global structure preservation than t-SNE.

o Works well with large datasets.

 Weaknesses:
o Sensitive to hyperparameters.

o May not capture fine details in local relationships.

Factor Analysis
 Strengths:
o Reduces data redundancy.

o Handles linear dependencies.

 Weaknesses:
o Assumes Gaussian distribution of data.

o Limited to linear relationships.

Independent Component Analysis (ICA)


 Strengths:
o Finds independent components in data.

o Useful in blind signal separation.

 Weaknesses:
o Sensitive to noise and outliers.

o Assumes statistical independence.


Non-Negative Matrix Factorization (NMF)
 Strengths:
o Produces interpretable, non-negative features.

o Good for text and image data.

 Weaknesses:
o Sensitive to initialization.

o Struggles with non-linear relationships.

3. Association Rule Learning


Apriori Algorithm
 Strengths:
o Easy to implement.

o Effective for small datasets.

 Weaknesses:
o Computationally expensive for large datasets.

o Requires careful threshold selection.

Eclat
 Strengths:
o More efficient than Apriori for large datasets.

o Uses a vertical data format.

 Weaknesses:
o Limited scalability for high-dimensional data.

o Parameter tuning can be complex.

FP-Growth
 Strengths:
o Efficient and scalable.

o Avoids candidate generation.

 Weaknesses:
o Memory-intensive for very large datasets.

o Complex implementation.
4. Anomaly Detection Algorithms
Isolation Forest
 Strengths:
o Efficient for high-dimensional data.

o Handles outliers well.

 Weaknesses:
o Assumes anomalies are less frequent and different.

One-Class SVM
 Strengths:
o Effective in high-dimensional spaces.

o Works well for complex boundaries.

 Weaknesses:
o Sensitive to kernel selection.

o Computationally expensive.

Autoencoders
 Strengths:
o Capable of learning complex representations.

o Useful for high-dimensional data.

 Weaknesses:
o Requires large datasets for training.

o Sensitive to architecture and hyperparameters.

Local Outlier Factor (LOF)


 Strengths:
o Detects local anomalies effectively.

o Works well with varying density.

 Weaknesses:
o Computationally expensive.

o Sensitive to the number of neighbors.

5. Generative Models
Generative Adversarial Networks (GANs)
 Strengths:
o Generates realistic synthetic data.

o Handles complex data distributions.

 Weaknesses:
o Training is unstable and sensitive to hyperparameters.

o Prone to mode collapse.

Variational Autoencoders (VAEs)


 Strengths:
o Produces interpretable latent space.

o Effective for generative tasks.

 Weaknesses:
o Reconstructions may lack sharpness.

o Requires careful balance between reconstruction loss and regularization.

Boltzmann Machines
 Strengths:
o Capable of learning joint probability distributions.

o Useful for feature extraction.

 Weaknesses:
o Computationally expensive to train.

o Limited scalability.

6. Graph-Based Models
Spectral Graph Algorithms
 Strengths:
o Effective for graph-structured data.

o Captures relationships in community detection.

 Weaknesses:
o Not scalable to very large graphs.

o Requires careful selection of eigenvectors.

You might also like