Unit 2 Unsupervised learning
Below is a detailed elaboration of the topics under Unit 2: Unsupervised Learning ,
including definitions, algorithms/procedures, applications, and examples for each
concept.
🟩 1. Clustering: K-means / Kernel K-means
🔹 Definition:
K-means Clustering : A popular iterative algorithm that partitions data into k
distinct clusters based on distances (typically Euclidean).
Kernel K-means : An extension of K-means that uses kernel methods to handle
non-linearly separable data by mapping it into a higher-dimensional space.
🔹 Algorithm:
✅ K-means:
1. Choose number of clusters k .
2. Randomly initialize k centroids.
3. Assign each data point to the nearest centroid.
4. Recalculate centroids as the mean of points in each cluster.
5. Repeat steps 3–4 until convergence (centroids don’t change significantly).
✅ Kernel K-means:
1. Use a kernel function (e.g., RBF) to implicitly map data into a higher-dimensional
feature space.
2. Perform clustering in this new space using inner products defined by the kernel.
3. Update cluster assignments based on kernel-induced distance.
4. Repeat until convergence.
🔹 Applications & Examples:
Customer segmentation
Image compression (color quantization)
Document clustering
Anomaly detection
Example: Grouping users by browsing behavior or grouping pixels in an image.
🟩 2. Dimensionality Reduction: PCA and Kernel PCA
🔹 Definition:
PCA (Principal Component Analysis) : A linear technique that transforms data
into a lower-dimensional space while preserving maximum variance.
Kernel PCA : Nonlinear version of PCA using kernel trick to capture complex
patterns.
🔹 Algorithm:
✅ PCA:
1. Standardize the data.
2. Compute covariance matrix.
3. Find eigenvectors and eigenvalues of the covariance matrix.
4. Select top k eigenvectors corresponding to largest eigenvalues.
5. Project original data onto these eigenvectors.
✅ Kernel PCA:
1
1. Apply a kernel function (e.g., Gaussian, polynomial) to compute similarity
between data points.
2. Construct Gram matrix (kernel matrix).
3. Center the kernel matrix.
4. Compute eigenvalues and eigenvectors of the centered kernel matrix.
5. Project data into the reduced space using top eigenvectors.
🔹 Applications & Examples:
Face recognition (Eigenfaces)
Data visualization (e.g., t-SNE alternative)
Noise reduction
Feature extraction
Example: Reducing dimensions of gene expression data for analysis.
🟩 3. Matrix Factorization and Matrix Completion
🔹 Definition:
Matrix Factorization : Decomposes a matrix into multiple matrices whose product
approximates the original. Often used for latent factor modeling.
Matrix Completion : Fills missing entries of a partially observed matrix assuming
low-rank structure.
🔹 Algorithm:
✅ Matrix Factorization (e.g., SVD, Alternating Least Squares):
1. Let matrix R≈U⋅VT , where:
U : User latent features
V : Item latent features
2. Minimize reconstruction error using optimization techniques like gradient descent
or ALS.
✅ Matrix Completion:
1. Start with a sparse matrix (e.g., ratings matrix).
2. Assume the matrix has a low rank.
3. Optimize to find a low-rank matrix that fits observed entries.
4. Techniques: Nuclear norm minimization, alternating minimization.
🔹 Applications & Examples:
Recommender systems (Netflix Prize)
Topic modeling
Collaborative filtering
Example: Predicting movie ratings a user hasn't seen yet.
🟩 4. Generative Models: Mixture Models and Latent Factor Models
🔹 Definition:
Mixture Models : Probabilistic models representing subpopulations within the
overall population (e.g., Gaussian Mixture Models).
Latent Factor Models : Models that use hidden variables to explain observed data
(e.g., Probabilistic PCA, Factor Analysis).
🔹 Algorithm:
✅ Mixture Models (GMM + EM Algorithm):
1. Initialize parameters (means, covariances, mixing coefficients).
2. E-step : Estimate probability of each data point belonging to each component.
3. M-step : Update parameters using weighted maximum likelihood.
4. Iterate E-M steps until convergence.
2
✅ Latent Factor Models:
Probabilistic PCA : Assumes observed data is generated from latent variables via
a linear transformation plus noise.
Inference often done using Expectation-Maximization or variational Bayes.
🔹 Applications & Examples:
Clustering (via GMMs)
Density estimation
Anomaly detection
Topic modeling (e.g., LDA – Latent Dirichlet Allocation)
Example: Segmenting customer groups or modeling topics in document
collections.
✅ Summary Table
TOPIC METHOD TYPE KEY APPLICATION
ALGORITHM
Clustering K-means / Partitioning Iterative Customer
Kernel K- reassignment segmentation, image
means compression
Dimensionality PCA / Kernel Linear/ Eigendecomposition Face recognition,
Reduction PCA Nonlinear visualization
Matrix Methods Matrix Latent space ALS, Gradient Recommender
Factorization / Descent systems,
Completion collaborative filtering
Generative GMM, LDA Probabilistic EM algorithm Density estimation,
Models topic modeling