0% found this document useful (0 votes)

81 views2 pages

Unsupervised Learning Cheatsheet

This document provides an overview of several unsupervised learning techniques including: 1) K-means clustering which aims to partition observations into k clusters by minimizing within-cluster distances. 2) Expectation-maximization which estimates parameters through maximum likelihood by constructing a lower bound on the likelihood. 3) Hierarchical clustering which builds nested clusters through successive merging in either an agglomerative or divisive manner. 4) Principal component analysis which projects data onto a lower dimensional space while preserving as much variance as possible.

Uploaded by

El Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views2 pages

Unsupervised Learning Cheatsheet

Uploaded by

El Roy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS 229 – Machine Learning https://stanford.

edu/~shervine

VIP Cheatsheet: Unsupervised Learning

Afshine Amidi and Shervine Amidi

August 12, 2018

Introduction to Unsupervised Learning k-means clustering

r Motivation – The goal of unsupervised learning is to find hidden patterns in unlabeled data We note c(i) the cluster of data point i and µj the center of cluster j.
{x(1) ,...,x(m) }.
r Algorithm – After randomly initializing the cluster centroids µ1 ,µ2 ,...,µk ∈ Rn , the k-means
r Jensen’s inequality – Let f be a convex function and X a random variable. We have the algorithm repeats the following step until convergence:
following inequality: m
X
E[f (X)] > f (E[X]) 1{c(i) =j} x(i)
i=1
c(i) = arg min||x(i) − µj ||2 and µj = m
j X
1{c(i) =j}
Expectation-Maximization
i=1
r Latent variables – Latent variables are hidden/unobserved variables that make estimation
problems difficult, and are often denoted z. Here are the most common settings where there are
latent variables:

Setting Latent variable z x|z Comments

Mixture of k Gaussians Multinomial(φ) N (µj ,Σj ) µj ∈ Rn , φ ∈ Rk

Factor analysis N (0,I) N (µ + Λz,ψ) µj ∈ Rn

r Algorithm – The Expectation-Maximization (EM) algorithm gives an efficient method at

r Distortion function – In order to see if the algorithm converges, we look at the distortion
estimating the parameter θ through maximum likelihood estimation by repeatedly constructing function defined as follows:
a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:
m
X
J(c,µ) = ||x(i) − µc(i) ||2
• E-step: Evaluate the posterior probability Qi (z (i) ) that each data point x(i) came from
i=1
a particular cluster z (i) as follows:

Qi (z (i) ) = P (z (i) |x(i) ; θ) Hierarchical clustering

r Algorithm – It is a clustering algorithm with an agglomerative hierarchical approach that
• M-step: Use the posterior probabilities Qi (z (i) ) as cluster specific weights on data points build nested clusters in a successive manner.
x(i) to separately re-estimate each cluster model as follows: r Types – There are different sorts of hierarchical clustering algorithms that aims at optimizing
different objective functions, which is summed up in the table below:

Xˆ
P (x(i) ,z (i) ; θ)
Ward linkage Average linkage Complete linkage
θi = argmax Qi (z (i) ) log dz (i) Minimize within cluster Minimize average distance Minimize maximum distance
z (i) Qi (z (i) )
θ
i distance between cluster pairs of between cluster pairs

Stanford University 1 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

m
Clustering assessment metrics 1 X T
• Step 2: Compute Σ = x(i) x(i) ∈ Rn×n , which is symmetric with real eigenvalues.
m
In an unsupervised learning setting, it is often hard to assess the performance of a model since i=1
we don’t have the ground truth labels as was the case in the supervised learning setting.
• Step 3: Compute u1 , ..., uk ∈ Rn the k orthogonal principal eigenvectors of Σ, i.e. the
r Silhouette coefficient – By noting a and b the mean distance between a sample and all
other points in the same class, and between a sample and all other points in the next nearest orthogonal eigenvectors of the k largest eigenvalues.
cluster, the silhouette coefficient s for a single sample is defined as follows: • Step 4: Project the data on spanR (u1 ,...,uk ). This procedure maximizes the variance
b−a among all k-dimensional spaces.
s=
max(a,b)

r Calinski-Harabaz index – By noting k the number of clusters, Bk and Wk the between

and within-clustering dispersion matrices respectively defined as

k m
X X
Bk = nc(i) (µc(i) − µ)(µc(i) − µ)T , Wk = (x(i) − µc(i) )(x(i) − µc(i) )T
j=1 i=1

the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such
that the higher the score, the more dense and well separated the clusters are. It is defined as
follows: Independent component analysis
Tr(Bk ) N −k It is a technique meant to find the underlying generating sources.
s(k) = ×
Tr(Wk ) k−1 r Assumptions – We assume that our data x has been generated by the n-dimensional source
vector s = (s1 ,...,sn ), where si are independent random variables, via a mixing and non-singular
matrix A as follows:
Principal component analysis x = As
It is a dimension reduction technique that finds the variance maximizing directions onto which The goal is to find the unmixing matrix W = A−1 by an update rule.
to project the data.
r Bell and Sejnowski ICA algorithm – This algorithm finds the unmixing matrix W by
r Eigenvalue, eigenvector – Given a matrix A ∈ Rn×n , λ is said to be an eigenvalue of A if following the steps below:
there exists a vector z ∈ Rn \{0}, called eigenvector, such that we have:
• Write the probability of x = As = W −1 s as:
Az = λz n
Y
p(x) = ps (wiT x) · |W |
r Spectral theorem – Let A ∈ Rn×n .
If A is symmetric, then A is diagonalizable by a real i=1
orthogonal matrix U ∈ Rn×n . By noting Λ = diag(λ1 ,...,λn ), we have:
• Write the log likelihood given our training data {x(i) , i ∈ [[1,m]]} and by noting g the
∃Λ diagonal, A = U ΛU T sigmoid function as:
m n
!
Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of X X
0
matrix A. l(W ) = log g (wjT x(i) ) + log |W |
r Algorithm – The Principal Component Analysis (PCA) procedure is a dimension reduction i=1 j=1
technique that projects the data on k dimensions by maximizing the variance of the data as
follows: Therefore, the stochastic gradient ascent learning rule is such that for each training example
x(i) , we update W as follows:
• Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.
1 − 2g(w1T x(i) )
  
1 − 2g(w2 x ) (i) T
T (i)
W ←− W + α  x + (W T )−1 

(i)
xj − µ j
m m ..
(i) 1 X (i) 1 X (i) .
xj ← where µj = xj and σj2 = (xj − µj )2
σj m m 1 − 2g(wn T x(i) )
i=1 i=1

Stanford University 2 Fall 2018

Unsupervised Learning Cheatsheet
No ratings yet
Unsupervised Learning Cheatsheet
3 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
Unsupervised Learning - A Comprehensive Overview of
No ratings yet
Unsupervised Learning - A Comprehensive Overview of
5 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
8 pages
Unsupervised Learning Techniques in Python
100% (2)
Unsupervised Learning Techniques in Python
89 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
QSRI Lecture4
No ratings yet
QSRI Lecture4
56 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
73 pages
Module 3
No ratings yet
Module 3
17 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
30 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Data Science: Unsupervised Learning
No ratings yet
Data Science: Unsupervised Learning
49 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
Key Concepts in Clustering and EM Algorithm
No ratings yet
Key Concepts in Clustering and EM Algorithm
18 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
57 pages
Chapter 04
No ratings yet
Chapter 04
42 pages
Tema5 Teoria-2830
No ratings yet
Tema5 Teoria-2830
57 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Module 3
No ratings yet
Module 3
21 pages
Clustering
No ratings yet
Clustering
82 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Variance
No ratings yet
Variance
6 pages
Intro To Data Science
No ratings yet
Intro To Data Science
47 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
43 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
32 pages
MLSlides5 - Selected - Shared
No ratings yet
MLSlides5 - Selected - Shared
30 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Supervised vs Unsupervised Learning
No ratings yet
Supervised vs Unsupervised Learning
50 pages
Unsupervised Learning & Clustering
No ratings yet
Unsupervised Learning & Clustering
102 pages
Bayesian Networks & EM Algorithm
No ratings yet
Bayesian Networks & EM Algorithm
7 pages
Optimisation and Dimension Reduction Tech-Unlocked
No ratings yet
Optimisation and Dimension Reduction Tech-Unlocked
29 pages
Cluster Analysis Overview
No ratings yet
Cluster Analysis Overview
77 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
CP4252 ML Unit-Iii
No ratings yet
CP4252 ML Unit-Iii
18 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
Unit IV
No ratings yet
Unit IV
96 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
54 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Clustering
No ratings yet
Clustering
55 pages
Ai - W8L15
No ratings yet
Ai - W8L15
44 pages
Inventory & Downtime Cost Analysis
No ratings yet
Inventory & Downtime Cost Analysis
2 pages
Technostress in University Learning
No ratings yet
Technostress in University Learning
39 pages
Kelas 2 Grammar Exercises 2023/2024
No ratings yet
Kelas 2 Grammar Exercises 2023/2024
2 pages
TTSC Chapter 8 Answer Act 37
No ratings yet
TTSC Chapter 8 Answer Act 37
20 pages
EFFECT OF AI IN TALENT ACQUISITION Chapter - 1
No ratings yet
EFFECT OF AI IN TALENT ACQUISITION Chapter - 1
3 pages
Lessons For The Pharmacy Technician
No ratings yet
Lessons For The Pharmacy Technician
129 pages
Unit 5: Psychrometry of Air-Conditioning Processes
No ratings yet
Unit 5: Psychrometry of Air-Conditioning Processes
23 pages
Oral Pathology Charts II
No ratings yet
Oral Pathology Charts II
7 pages
RUAG Schweiz AG Testing Accreditation
No ratings yet
RUAG Schweiz AG Testing Accreditation
12 pages
Dark Service 1 - The Dark Lord Awakens - Zara K Lee
No ratings yet
Dark Service 1 - The Dark Lord Awakens - Zara K Lee
359 pages
Ryzen 7 - Google Search
No ratings yet
Ryzen 7 - Google Search
1 page
Unit IV
No ratings yet
Unit IV
14 pages
The Banking Concept of Education
100% (1)
The Banking Concept of Education
4 pages
Theories of Accounting Regulation
No ratings yet
Theories of Accounting Regulation
39 pages
Computer Assisted Detection, Prognosis and Management of Diabetic Retinopathy
No ratings yet
Computer Assisted Detection, Prognosis and Management of Diabetic Retinopathy
4 pages
Optical Modulation - Advanced Techniques and Applications in Transmission Systems and Networks (PDFDrive)
No ratings yet
Optical Modulation - Advanced Techniques and Applications in Transmission Systems and Networks (PDFDrive)
681 pages
3452
No ratings yet
3452
12 pages
Grade 4 Q 2 Reading
No ratings yet
Grade 4 Q 2 Reading
14 pages
Business Research Methods Guide
No ratings yet
Business Research Methods Guide
9 pages
CSEC QIG Fee Per Question Item Group
No ratings yet
CSEC QIG Fee Per Question Item Group
3 pages
Impact of Extracurriculars on Grade 11 Performance
No ratings yet
Impact of Extracurriculars on Grade 11 Performance
43 pages
Clinimetric Perspectives: Clinimetrics
No ratings yet
Clinimetric Perspectives: Clinimetrics
6 pages
Risk Prediction of Theft Crimes in Urban Communities - An Integrated Model of LSTM and ST-GCN
No ratings yet
Risk Prediction of Theft Crimes in Urban Communities - An Integrated Model of LSTM and ST-GCN
9 pages
Puerto Condition Report
No ratings yet
Puerto Condition Report
21 pages
English 11 - Unit 1 - Day 1 - Vocabulary Activities
No ratings yet
English 11 - Unit 1 - Day 1 - Vocabulary Activities
2 pages
ETHIOPIA AT THE CROSSROAD - THE PERILS OF NARROW NATIONALISM Hibist Kassa2 From ASN - RB - April-May - 2021 - June-2021
No ratings yet
ETHIOPIA AT THE CROSSROAD - THE PERILS OF NARROW NATIONALISM Hibist Kassa2 From ASN - RB - April-May - 2021 - June-2021
7 pages
Characteristics of Indian Philosophy
No ratings yet
Characteristics of Indian Philosophy
13 pages
Datasheet PDF
No ratings yet
Datasheet PDF
3 pages
Alkanes Chemistry
No ratings yet
Alkanes Chemistry
13 pages
ACS 100 User Manual and Installation Guide
No ratings yet
ACS 100 User Manual and Installation Guide
52 pages

Unsupervised Learning Cheatsheet

Uploaded by

Unsupervised Learning Cheatsheet

Uploaded by

CS 229 – Machine Learning https://stanford.

VIP Cheatsheet: Unsupervised Learning

Afshine Amidi and Shervine Amidi

August 12, 2018

Introduction to Unsupervised Learning k-means clustering

Setting Latent variable z x|z Comments

Mixture of k Gaussians Multinomial(φ) N (µj ,Σj ) µj ∈ Rn , φ ∈ Rk

r Algorithm – The Expectation-Maximization (EM) algorithm gives an efficient method at

Qi (z (i) ) = P (z (i) |x(i) ; θ) Hierarchical clustering

Stanford University 1 Fall 2018

r Calinski-Harabaz index – By noting k the number of clusters, Bk and Wk the between

Stanford University 2 Fall 2018

You might also like