0% found this document useful (0 votes)

26 views55 pages

Chapter 12 - Unsupervised Learning

Chapter 12 discusses unsupervised learning techniques, focusing on Principal Components Analysis (PCA) and clustering. It highlights the challenges of identifying interesting groups in data and explains how PCA reduces dimensionality by summarizing correlated features into principal components that capture the most variability. The chapter also covers the proportion of variance explained by these components and the mathematical foundations behind PCA.

Uploaded by

jauntyjjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views55 pages

Chapter 12 - Unsupervised Learning

Uploaded by

jauntyjjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Chapter 12:

Unsupervised Learning
Jeremy Selva
Introduction
This chapter introduces a diverse set of unsupervised learning techniques
Principal Components Analysis
Clustering
12.1 The Challenge of Unsupervised Learning
Unsupervised learning is often performed as part
of an exploratory data analysis.
Given a set of p features X , X , . . . , X
1 2 p

measured on n observations, find interesting

groups about these features.
Interesting groups include
Finding subgroups among the variables
Finding subgroups among the observation
However, there is no clear criteria to determine if a group is interesting or not as it is subjective.
Image from https://www.sciencedirect.com/science/article/pii/S1532046415001380
12.2 Principal Components Analysis (PCA)
When we have a large set of (preferably correlated) features X , X , . . . , X , PCA helps to summarise them
1 2 p

into a smaller number of representative variables, Z , Z , . . . , Z , where M < p that helps to explain
1 2 M

most of the variability in the original set. Z , Z , . . . , Z are also called principal components.
1 2 M
12.2.1 What Are Principal Components?
Given a n by p data set X, with all p variable to have mean 0,
a principal component Z , for 1 ≤ j ≤ M , must be expressed in terms of its features X
j 1, X2 , . . . , Xp and
loadings ϕ , ϕ , . . . , ϕ in the following (linear combination) way
1j 2j pj

Zj = ϕ1j X1 + ϕ2j X2 +. . . +ϕpj Xp

where the sum of loading squares add up to one or ∑ p

k=1
(ϕkj )
2
= 1 .
12.2.1 What Are Principal Components?
These M loadings make up the principal component loading vector/direction, ϕ j = (ϕ1j , ϕ2j , . . . , ϕpj )
T

Images from TileStats youtube video

12.2.1 What Are Principal Components?
The loading vector ϕ is used to calculate the
j

scores z for 1 ≤ i ≤ n, 1 ≤ j ≤ M for each

principal component Z . j

T
Zj = (z1j , z2j , . . . , zij , . . . , znj )

where z is the sum of the products of the

loadings and the individual data values

zij = ∑(ϕkj xik ) = ϕ1j xi1 + ϕ2j xi2 +. . . +ϕpj xip

k=1

Images from TileStats youtube video

12.2.1 What Are Principal Components?
Each principal component loadings ϕ j = (ϕ1j , ϕ2j , . . . , ϕpj )
T
is optimised such that
n p p
1
2 2
maximize { ∑(∑(ϕkj xik )) } subject to ∑(ϕkj ) = 1
ϕ1j ,...,ϕpj n
i=1 k=1 k=1

where ∑ p

k=1
(ϕkj xik ) = zij are the scores in the principal component Z . j

As the mean of the scores z is 0,

ij
n
1
∑
n

i=1
(zij − 0)
2
is the sample variance of the scores in the principal
component Z . j

Images from TileStats youtube video

12.2.2 Another Interpretation
Principal components provide low-dimensional linear surfaces that are closest to the n observations.
The first principal component loading vector ϕ 1 = (ϕ11 , ϕ21 , . . . , ϕp1 )
T
is the line in p-dimensional space
that is closest to the n observations.
12.2.2 Another Interpretation
The first two principal component loading vector ϕ and ϕ is the 2D plane in p-dimensional space that best
1 2

fit the n observations.

12.2.2 Another Interpretation
By a special property of the loading matrix ϕ 1 = ϕ and when M = min(p, n − 1), it is possible to
− T

express each dataset in terms of principal component scores and loadings or x = ∑ (z ϕ )

ij
p

k=1
ik kj
12.2.2 Another Interpretation
However for m < p, x ij ≈ ∑
p

k=1
(zik ϕkj )

xij − ∑
p

k=1
(zik ϕkj ) will get smaller as M gets closer to min(p, n − 1)
12.2.3 The Proportion of Variance Explained
Variance represents how much information for a given data is lost as a result of projecting the observations
onto the first few principal components. We define some parameters.
The total variance present in a data set (assuming that the variables have mean 0) is
p p n
1
2
∑ V ar(Xj ) = ∑( ∑(xij − 0) )
n
j=1 j=1 i=1

The variance explained by the mth principal component is the variance of the scores (which also has the
mean of 0).
n n p
1 1 2
2
∑z = ∑(∑ ϕjm xij )
im
n n
i=1 i=1 j=1
12.2.3 The Proportion of Variance Explained

Images from TileStats youtube video

12.2.3 The Proportion of Variance Explained
The proportion of variance explained (P V E) by the mth principal component
1 n
2
V ariance explained by the mth principal component ∑ z
n i=1 im
P V Em = =
1 n p
T otal variance present in a data set 2
∑ ∑ (xij )
n i=1 j=1

Images from TileStats youtube video

12.2.3 The Proportion of Variance Explained
The variance of the data can also be decomposed into the variance of the M principal components plus the
mean squared error of this M -dimensional approximation.
n p n M n p M
1 1 1
2 2 2
∑ ∑(xij ) = ∑ ∑ (zik ) + ∑ ∑(xij − ∑ zim ϕjm )
n n n
i=1 j=1 i=1 m=1 i=1 j=1 m=1

  

Var. of data Var. of first M PCs MSE of M-dimensional approximation

Images from TileStats youtube video

12.2.3 The Proportion of Variance Explained
n p n M n p M
1 1 1
2 2 2
∑ ∑(xij ) = ∑ ∑ (zim ) + ∑ ∑(xij − ∑ zim ϕjm )
n n n
i=1 j=1 i=1 m=1 i=1 j=1 m=1

  

Var. of data Var. of first M PCs MSE of M-dimensional approximation

Bring 1

n
∑
n

i=1
∑
p

j=1
(xij − ∑
M

m=1
zim ϕjm )
2
to the left hand side

MSE of M-dimensional approximation

n p n p M n M
1 1 1
2 2 2
∑ ∑(xij ) − ∑ ∑(xij − ∑ zim ϕjm ) = ∑ ∑ (zim )
n n n
i=1 j=1 i=1 j=1 m=1 i=1 m=1

  

Var. of data MSE of M-dimensional approximation Var. of first M PCs

12.2.3 The Proportion of Variance Explained
n p n p M n M
1 1 1
2 2 2
∑ ∑(xij ) − ∑ ∑(xij − ∑ zim ϕjm ) = ∑ ∑ (zim )
n n n
i=1 j=1 i=1 j=1 m=1 i=1 m=1

  

Var. of data MSE of M-dimensional approximation Var. of first M PCs

Divide by 1

n
∑
n

i=1
∑
p

j=1
(xij )
2
≠ 0 on both side

Var. of data

n p M n M n
2 1 2 M 1 2
∑ ∑ (xij − ∑ zim ϕjm ) ∑ ∑ (zim ) ∑ (zim )
i=1 j=1 m=1 n i=1 m=1 n i=1
1 − n p
= = ∑
1 n p 1 n p
2 2 2
∑ ∑ (xij ) ∑ ∑ (xij ) ∑ ∑ (xij )
i=1 j=1 n i=1 j=1 m=1 n i=1 j=1

1 n 2

Recall that P V E V ariance explained by the mth principal component ∑ z

n i=1 im

m = = 1 n p
T otal variance present in a data set ∑ ∑ (xij )
2
n i=1 j=1
12.2.3 The Proportion of Variance Explained
We have
n p M 2 n
M 1 2 M
∑ ∑ (xij − ∑ zim ϕjm ) ∑ (zim )
i=1 j=1 m=1 n i=1
1 − n p
= ∑ = ∑ P V Em
1 n p
2 2
∑ ∑ (xij ) ∑ ∑ (xij )
i=1 j=1 m=1 n i=1 j=1 m=1

Using T SS as the total sum of squared elements of X, and RSS as the residual sum of squares of the M-
dimensional approximation given by the M principal components.
M
RSS
2
∑ P V Em = 1 − = R
T SS
m=1

We can interpret the cumulative P V E as the R of the approximation for X given by the first M principal
2

components
12.2.4 More on PCA
Why do we need to scale the variables have standard deviation one ?
12.2.4 More on PCA
Each principal component loading vector is unique, up to a sign flip or
Given j and j , ϕ
′
′
j m = kϕjm where k = 1 or −1.
12.2.4 More on PCA
How many principal components is enough ? Typically decided using the scree plot though it can be
subjective.
12.2.4 More on PCA

Image from Nature's Points of Significance Series Principal component analysis

12.2.4 More on PCA
Image from Ten quick tips for effective
dimensionality reduction
12.3 Missing Values and Matrix Completion
It is possible for datasets to have missing values. Unfortunately, the statistical learning methods that we
have seen in this book cannot handle missing values.
We could remove data with missing values but it is wasteful.
Another way is set the missing x to be the mean of the jth column.
ij

Although this is a common and convenient strategy, often we can do better by exploiting the correlation
between the variables by using principal components.
12.3 Missing Values and Matrix Completion
Full Data Missing Data Initialisation 2a function Iteration 1 Iteration 1 cont

SBP DBP
data <- data.frame(
SBP = c(-3,-1,-1, 1, 1, 3),
-3.00 -4.00
DBP = c(-4,-2, 0, 0, 2, 4)) -1.00 -2.00
-1.00 0.00
1.00 0.00
1.00 2.00
3.00 4.00
12.3 Missing Values and Matrix Completion
iter <- 1 #> Iter: 2
rel_err <- mss_old_pca #> Rel. Err: 0.05686009
thresh <- 1e-2 #> Iter: 3
while(rel_err > thresh) { #> Rel. Err: 0.2107907
iter <- iter + 1 #> Iter: 4
# Step 2(a) #> Rel. Err: 0.3222623
Xapp <- fit_pca(initialisation_data_pca , #> Iter: 5
# Step 2(b) #> Rel. Err: 0.2031161
initialisation_data_pca[ismiss] <- Xapp[is #> Iter: 6
# Step 2(c) #> Rel. Err: 0.08375675
mss_pca <- mean(((missing_data - Xapp)[!is #> Iter: 7
rel_err <- mss_old_pca - mss_pca #> Rel. Err: 0.03220227
mss_old_pca <- mss_pca #> Iter: 8
cat("Iter:", iter, "\n", #> Rel. Err: 0.01423225
"MSS:", mss_pca, "\n") #> Iter: 9
} #> Rel. Err: 0.00830349
 
12.3 Missing Values and Matrix Completion
Actual Data Imputed Data

SBP DBP SBP DBP

-3.00 -4.00 -3.00 4.38
-1.00 -2.00 1.28 -2.00
-1.00 0.00 -1.00 0.00
1.00 0.00 1.00 -1.29
1.00 2.00 1.00 2.00
3.00 4.00 -2.57 4.00
12.3 Missing Values and Matrix Completion
Actual Data Soft Imputed Data

SBP DBP SBP DBP

-3.00 -4.00 -3.00 -4.19
-1.00 -2.00 -1.42 -2.00
-1.00 0.00 -1.00 0.00
1.00 0.00 1.00 1.40
1.00 2.00 1.00 2.00
3.00 4.00 2.85 4.00
12.4 Clustering Methods
Clustering is to split into distinct homogeneous groups such that
Observations within each group are quite similar to each other
Observations in different groups are quite different from each other.
In this section we focus on perhaps the two best-known clustering approaches:
K-means Clustering
Hierarchical Clustering.
12.4 Clustering Methods

Image from Francesco Archetti's slides

12.4.1 K-Means Clustering
An approach to partition a data set into K distinct, To measure how much two observations x and
i

non-overlapping clusters. x are different from each other, the squared

i
′

Euclidean distance is used

i be the number of observations
x and x be two different observations
p
′
i i
2

j be the number of features

∑(xij − xi′ j )

j=1

Image from Wikipedia

12.4.1 K-Means Clustering
kbe the number of clusters
C be the kth cluster
k

Sum of all of pairwise squared Euclidean distances between the observations in the kth cluster
p

2
∑ (∑(xij − xi′ j ) )
′
i,i ∈Ck j=1

|Ck | be the number of samples in the kth cluster

The within-cluster variation for cluster C denote as W (C ) is the mean measurement of how much the
k k

observations within the cluster C differ from each other.

p
1
2
W (Ck ) = ∑ (∑(xij − xi′ j ) )
|Ck |
′
i,i ∈Ck j=1
12.4.1 K-Means Clustering
An approach to partition a data set into K distinct, non-overlapping clusters such that the sum of within-
cluster variation is as small as possible.
K

minimize{∑ W (Ck )}
C1 ,...,Ck
k=1

K p
1 2
minimize{∑ ∑ (∑(xij − xi′ j ) )}
C1 ,...,Ck |Ck | ′
k=1 i,i ∈Ck j=1
12.4.1 K-Means Clustering
Using k = 3 and j = 2 as an example, the algorithm starts by randomly assign a cluster number/group
C , C and C to each of the observations.
1 2 k
12.4.1 K-Means Clustering
We proceed with iteration 1.
For each cluster group, calculate the cluster's
centroid. The cluster centroids are computed as
the mean of the observations assigned to each
cluster.
1
x̄k = ∑ xi
|Ck |
i∈Ck

where |C | is the number of observations in

cluster Ck
12.4.1 K-Means Clustering
Next, for each given observation, calculate the
distance between the observation and the
centroids.
Reassign the given observation to the cluster that
corresponds to the shortest distance centroid.
Calculate ∑ K

k=1
W (Ck ) for that iteration.
12.4.1 K-Means Clustering
Repeat the iteration step on the newly assigned clusters until there are minimal changes to ∑ K

k=1
W (Ck ) .
12.4.1 K-Means Clustering
Caveats
We must decide how many clusters we expect in the data.
Results obtained will depend on the initial (random) cluster assignment of each observation which may
give inconsistent results.
For this reason, it is important to run the algorithm multiple times from different random initial
configurations. Then one selects the best solution, i.e. that for which ∑ W (C ) is the smallest.
K

k=1 k

Hierarchical clustering is an alternative approach to overcome some of the weaknesses of K-Means

Clustering
12.4.2 Hierarchical Clustering
Hierarchical clustering results in a tree-based representation of the observations, called a dendrogram. Each
leaf of the dendrogram represents one observation. Clusters are obtained by cutting the dendrogram at a
given height.
12.4.2 Hierarchical Clustering
In practice, people often look at the dendrogram and select by eye a sensible number of clusters, based on
the heights of the fusion and the number of clusters desired.
12.4.2 Hierarchical Clustering
Each leaf of the dendrogram represents one observation. Starting out at the bottom of the dendrogram,
each of the n observations is treated as its own cluster.

Image from Kavana Rudresh GitHub page for R-Ladies Philly

12.4.2 Hierarchical Clustering
The algorithm then measures all possible pairwise distance within the n observations.
The two observations with the smallest distance is then classified as one cluster, giving a remaining of
‘‘ n − 1 " observations

Image from Kavana Rudresh GitHub page for R-Ladies Philly

12.4.2 Hierarchical Clustering
The algorithm continues until all observations are used.

Height of the blocks represents the distance between clusters.

Image from Kavana Rudresh GitHub page for R-Ladies Philly
12.4.2 Hierarchical Clustering
How did we determine that the cluster {p5, p6} should be fused with the cluster {p4} ?

The concept of dissimilarity between a pair of observations needs to be extended to a pair of groups of
observations. This extension is achieved by developing the notion of linkage,
Image from Kavana Rudresh GitHub page for R-Ladies Philly
12.4.2 Hierarchical Clustering
Single-linkage: the distance between two clusters
is defined as the shortest distance between two
points in each cluster
Complete-linkage: the distance between two
clusters is defined as the longest distance between
two points in each cluster.
Average-linkage: the distance between two
clusters is defined as the average distance
between each point in one cluster to every point in
the other cluster.
Centroid-linkage: finds the centroid of cluster 1
and centroid of cluster 2, and then calculates the
distance between the two before merging. Image from Kavana Rudresh GitHub page for R-
Ladies Philly
12.4.2 Hierarchical Clustering
Mustafa Murat's Hierarchical clustering slides
12.4.2 Hierarchical Clustering
12.4.2 Hierarchical Clustering

Image from Kavana Rudresh GitHub page for R-Ladies Philly

12.4.3 Practical Issues in Clustering

Image from Kavana Rudresh GitHub page for R-Ladies Philly

12.4.3 Practical Issues in Clustering

Mustafa Murat's Hierarchical clustering slides

12.4.2 Hierarchical Clustering
Different dissimilarity measure gives different clusters.
12.4.3 Practical Issues in Clustering
Should the observations or features be standardized to have standard deviation of one ?
In the case of hierarchical clustering,
What dissimilarity measure should be used ?
What type of linkage should be used ?
Where should we cut the dendrogram in order to obtain clusters ?
In the case of K-means clustering,
How many clusters should we look for in the data ?
12.4.3 Practical Issues in Clustering
Kmeans and hierarchical clustering force every observation into a cluster, the clusters found may be heavily
distorted due to the presence of outliers that do not belong to any cluster.
Clustering methods generally are not very robust to perturbations to the data. We recommend clustering
subsets of the data in order to get a sense of the robustness of the clusters obtained.
12.4.3 Practical Issues in Clustering
EduClust webpage

Understanding PCA in Unsupervised Learning
No ratings yet
Understanding PCA in Unsupervised Learning
52 pages
Understanding PCA in Unsupervised Learning
No ratings yet
Understanding PCA in Unsupervised Learning
17 pages
Unsupervised Handout
No ratings yet
Unsupervised Handout
50 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
58 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
AA11 - Unsupervised Learning - 2024
No ratings yet
AA11 - Unsupervised Learning - 2024
39 pages
Agenda: Principal Component Analysis (PCA)
No ratings yet
Agenda: Principal Component Analysis (PCA)
14 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
22 pages
20 Pca
No ratings yet
20 Pca
50 pages
Module12.01 UnsupervisedLearning
No ratings yet
Module12.01 UnsupervisedLearning
21 pages
Understanding DA and AI Concepts
No ratings yet
Understanding DA and AI Concepts
84 pages
Unsupervised Learning: PCA & Clustering
No ratings yet
Unsupervised Learning: PCA & Clustering
96 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Unit 3
No ratings yet
Unit 3
28 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
39 pages
Unsupervised Learning & PCA Guide
No ratings yet
Unsupervised Learning & PCA Guide
82 pages
Week 9 Lecture - Revision Test-Dual-Translated
No ratings yet
Week 9 Lecture - Revision Test-Dual-Translated
92 pages
Lecture Note5
No ratings yet
Lecture Note5
53 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
82 pages
PCA Concepts and Techniques
No ratings yet
PCA Concepts and Techniques
16 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Principal Component Analysis Guide
No ratings yet
Principal Component Analysis Guide
37 pages
Chapter 2 - Machine Learning 2. Principal Component Analysis
No ratings yet
Chapter 2 - Machine Learning 2. Principal Component Analysis
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
What Is Principal Component Analysis (PCA) ?
No ratings yet
What Is Principal Component Analysis (PCA) ?
13 pages
Principal Components Analysis (PCA) : R. Jothi
No ratings yet
Principal Components Analysis (PCA) : R. Jothi
47 pages
Game Point
No ratings yet
Game Point
34 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Ferath Kherif PCA
No ratings yet
Ferath Kherif PCA
17 pages
Principal Components Analysis Guide
No ratings yet
Principal Components Analysis Guide
53 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Principal Component Analysis: 2.1 Definition of Principal Components
No ratings yet
Principal Component Analysis: 2.1 Definition of Principal Components
8 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
ML15 Pca
No ratings yet
ML15 Pca
12 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
02 Principal Components
No ratings yet
02 Principal Components
9 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
Principal Component Analysis Steps
No ratings yet
Principal Component Analysis Steps
14 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Ch8-Principal Components
No ratings yet
Ch8-Principal Components
77 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
PCA and SVD in Classification Analysis
No ratings yet
PCA and SVD in Classification Analysis
19 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Dimensionality Reduction With Principal Component Analysis
No ratings yet
Dimensionality Reduction With Principal Component Analysis
39 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
Lec6 Pca
No ratings yet
Lec6 Pca
10 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
PCA vs. Factor Analysis Overview
No ratings yet
PCA vs. Factor Analysis Overview
52 pages
Principal Component Analysis: Pierre Alliez
No ratings yet
Principal Component Analysis: Pierre Alliez
22 pages
Eigenvalues and PCA Analysis
No ratings yet
Eigenvalues and PCA Analysis
9 pages
Chapter 12 - Unsupervised Learning 2
No ratings yet
Chapter 12 - Unsupervised Learning 2
4 pages
Cleaning Lipid Names For Annotation Part 2
No ratings yet
Cleaning Lipid Names For Annotation Part 2
8 pages
Cleaning Lipid Names For Annotation Part 1
No ratings yet
Cleaning Lipid Names For Annotation Part 1
9 pages
BioPAN Tutorial
No ratings yet
BioPAN Tutorial
9 pages
KM Notes Unit-4
No ratings yet
KM Notes Unit-4
14 pages
Clustering Validation
No ratings yet
Clustering Validation
4 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
Detect SQL Injection with ML Techniques
No ratings yet
Detect SQL Injection with ML Techniques
8 pages
Cs3491-Aiml Lab Manual Final
No ratings yet
Cs3491-Aiml Lab Manual Final
78 pages
Machine Learning for Insurance Fraud Detection
No ratings yet
Machine Learning for Insurance Fraud Detection
54 pages
Review PAMI
No ratings yet
Review PAMI
20 pages
Learning from Observations in ML
No ratings yet
Learning from Observations in ML
11 pages
Mahima Sunil Bachhav Resume
No ratings yet
Mahima Sunil Bachhav Resume
1 page
Published Paper
No ratings yet
Published Paper
8 pages
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
No ratings yet
Department of Electronics & Telecommunications Engineering: ETEL71A-Machine Learning and AI
4 pages
A Survey On GANs For Computer Vision
No ratings yet
A Survey On GANs For Computer Vision
25 pages
Offline Multi-agent Decision Transformer
No ratings yet
Offline Multi-agent Decision Transformer
16 pages
AI in Talent & Job Management 2023
No ratings yet
AI in Talent & Job Management 2023
26 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
Orange AI 843 12 QP
No ratings yet
Orange AI 843 12 QP
8 pages
CSE 445 - Lecture 1 - Machine Learning Introduction
No ratings yet
CSE 445 - Lecture 1 - Machine Learning Introduction
28 pages
Awirose Richa Agrawal CV Engineer
No ratings yet
Awirose Richa Agrawal CV Engineer
1 page
Module 1 - VTU-Introduction To AI and Applications - Continued
No ratings yet
Module 1 - VTU-Introduction To AI and Applications - Continued
66 pages
AI Internship 45 Days Plan
No ratings yet
AI Internship 45 Days Plan
3 pages
Data Science Assignments
No ratings yet
Data Science Assignments
6 pages
Major Project CSE (2024-2025)
No ratings yet
Major Project CSE (2024-2025)
6 pages
Final Fyp
No ratings yet
Final Fyp
16 pages
SC - Unit 2 - Hopfield Network
No ratings yet
SC - Unit 2 - Hopfield Network
12 pages
IIP Book Chapter
No ratings yet
IIP Book Chapter
10 pages
Machine Learning Lab Manual 2020-21
No ratings yet
Machine Learning Lab Manual 2020-21
43 pages
Sachin Biradar
No ratings yet
Sachin Biradar
2 pages
Introduction to Computer Vision Techniques
No ratings yet
Introduction to Computer Vision Techniques
2 pages
Bird Recognition via Mixed CNN Design
No ratings yet
Bird Recognition via Mixed CNN Design
11 pages
AWS Certified AI Practitioner - Exam Guide
No ratings yet
AWS Certified AI Practitioner - Exam Guide
19 pages

Chapter 12 - Unsupervised Learning

Uploaded by

Chapter 12 - Unsupervised Learning

Uploaded by

Chapter 12:

measured on n observations, find interesting

Zj = ϕ1j X1 + ϕ2j X2 +. . . +ϕpj Xp

where the sum of loading squares add up to one or ∑ p

Images from TileStats youtube video

scores z for 1 ≤ i ≤ n, 1 ≤ j ≤ M for each

where z is the sum of the products of the

loadings and the individual data values

zij = ∑(ϕkj xik ) = ϕ1j xi1 + ϕ2j xi2 +. . . +ϕpj xip

Images from TileStats youtube video

As the mean of the scores z is 0,

Images from TileStats youtube video

fit the n observations.

express each dataset in terms of principal component scores and loadings or x = ∑ (z ϕ )

Images from TileStats youtube video

Images from TileStats youtube video

  

Images from TileStats youtube video

  

  

Var. of data MSE of M-dimensional approximation Var. of first M PCs

  

Var. of data MSE of M-dimensional approximation Var. of first M PCs

Recall that P V E V ariance explained by the mth principal component ∑ z

Image from Nature's Points of Significance Series Principal component analysis

SBP DBP SBP DBP

SBP DBP SBP DBP

Image from Francesco Archetti's slides

non-overlapping clusters. x are different from each other, the squared

Euclidean distance is used

j be the number of features

Image from Wikipedia

|Ck | be the number of samples in the kth cluster

observations within the cluster C differ from each other.

where |C | is the number of observations in

Hierarchical clustering is an alternative approach to overcome some of the weaknesses of K-Means

Image from Kavana Rudresh GitHub page for R-Ladies Philly

Image from Kavana Rudresh GitHub page for R-Ladies Philly

Height of the blocks represents the distance between clusters.

Image from Kavana Rudresh GitHub page for R-Ladies Philly

Image from Kavana Rudresh GitHub page for R-Ladies Philly

Mustafa Murat's Hierarchical clustering slides

You might also like