0% encontró este documento útil (0 votos)
62 vistas104 páginas

Análisis de Componentes Principales y Funciones Empíricas Ortogonales

Cargado por

TUUNION COMUNAL
Derechos de autor
© © All Rights Reserved
Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.
Formatos disponibles
Descarga como PDF, TXT o lee en línea desde Scribd
0% encontró este documento útil (0 votos)
62 vistas104 páginas

Análisis de Componentes Principales y Funciones Empíricas Ortogonales

Cargado por

TUUNION COMUNAL
Derechos de autor
© © All Rights Reserved
Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.
Formatos disponibles
Descarga como PDF, TXT o lee en línea desde Scribd

l cambio climático

general,forzado por de
las fases el aumento dede
los modos gases de
variabilidad natural co
Análisis de Componentes
siguen exactamente un mismoPrincipales y por lo
patrón siempre),
combinación
Funciones deEmpíricas
los modos descritos acá (más otros).

Ortogonales
n cierta semejanza a ENOS, el cual debe reflejar una parte
que no se captura con el modo principal (otro ‘sabor’ de
teratura). Este modo, por construcción, debe estar en
ue su sentido físico debe interpretarse con cuidado. En
e variabilidad natural como ENOS no son canónicos (no
atrón siempre), por lo que puede decirse que son una
s acá (más otros).

GF601 - semestre otoño 2022


Análisis de Componentes Principales y
Funciones Empíricas Ortogonales

El análisis lineal de Componentes Principales (PCA) o descomposición en


Funciones Empíricas Ortogonales (EOF) se re eren a una misma técnica,
utilizada para caracterizar la co-variabilidad de múltiples variables.

→ Por la multiplicidad de datos en el espacio, este análisis resulta


particularmente útil y es muy usado en Ciencias Atmosféricas y de Clima.

El objetivo general es la análisis de múltiples variables con un grado de


correlación entre sí a partir de un conjunto menor de variables o modos de
variabilidad principales.

GF601 - semestre otoño 2022


fi

Aplicaciones

• Análisis de co-variabilidad en un conjunto grande de datos (e.g., campos bi-


dimensionales) para describir fenómenos (modos) climáticos (e.g. ENSO).
• Uso más funcional
- Reducción de información
- Interpolación
- Reconstrucciones
- Modelos de regresión (pronóstico estadístico,
regionalización)

El cómputo de componentes principales (PCs) o


EOFs es relativamente simple a partir funciones pre-
establecidas en muchos lenguajes.

Entender bien el procedimiento revisar conceptos de álgebra lineal.


→ Se recomienda repasar el Capítulo 10 del Wilks.

GF601 - semestre otoño 2022


Ejemplo de motivación

Se tienen n observaciones de dos variables x e y, las cuales queremos “reducir”


considerando la co-variabilidad entre ellas.
La anomalía de y (y’) puede ser en parte descrita por x’ (o viceversa) mediante una
regresión lineal simple:
y’ ≃ α x’

→ El sistema original de tamaño 2 n se reduce a n+1.

Si el coe ciente de correlación r entre ambas series es 0.95, la varianza de y


explicada por x es r2 ≃ 0.9.

GF601 - semestre otoño 2022


fi

En el ejemplo anterior, se usó x para representar el sistema compuesto por x e y.

En cambio, en PCA se de nen nuevas variables que llamaremos componentes


principales (PCs; siguiendo el Wilks las denotaremos um) y una base ortogonal (EOFs,
denotadas ekm) para representar los datos originales.

De forma análoga a un modelo de regresión lineal múltiple, o como en un análisis


espectral, las anomalías de las variables analizadas (xk) se componen como una
combinación lineal de las nuevas variables buscadas:
n−1


x′k = ekm um k = 1, . . . , K
m=1

Los vectores dependientes del tiempo um(t) representan las nuevas variables
buscadas.

> Éstas darán cuenta de la evolución temporal de modo comunes al conjunto de


variables K.

El coe cientes ekm indica el peso (contribución) del modo m en la variable xk.

> Éstos son los elementos de las EOFs.



GF601 - semestre otoño 2022
fi

fi

Equation
th
11.2 expresses the transformation of a (K × 1) data vector x to a vector u of
. Ifm[E] eigenvector, ineeigenvectors
contains all K ,
mwhich the original data jointly
of [Sx ] (assuming it isexhibit maximum
nonsingular) variability.
as its columns,
Equivalentemente,
vector u will alsolas nuevas variables um1).seEquation
componen de las variables originales…
esulting Puthave dimension
another way, (K the×eigenvectors 11.2 sometimes
define a new iscoordinate
called system
analysis formula forthe x# , data.
expressing that the
In particular,!K data can be analyzed, or summarized in
the !orthogonal matrix [E] whose columns are
T !
s of the principal components. em x = the
um =Reversing xk ! m = 1! "in" "Equation
ekmtransformation ! M# 11.2, the (11.1)
# (Equation 9.49) defines the rigid rotation
x can be reconstructed from the principal components according to
k=1

Notice
De formathatmatricial:
each of the M eigenvectorsx# = [E] contains u ( one elementu = %E& T !
x!
pertaining to each of the K
(11.5)
$K×1% $K×K%$K×1%
variables, xk . Similarly, each realization of the mth principal component in Equation 11.1
ch is is computed
obtained from from a particular
Equation
which set
by of
11.2simultaneous
is the observations
multiplying on theof the byvariables
left K
matrix-notation [E] xk . That
and using
representation theof is,
M each
= K of lin
the
ogonality M principal
Para ser estrictos,
property components
of this cuandois
matrix a sort
hablamos
(Equation of weighted
9.42). deThe average
componentes of
reconstruction the # values. (PCs),
principales
of xx expressed Although nosthe
11ofun the form of Equation 11.1 (i.e., here the matrix [E] is square, wi
k
C centraremos
H A P T E R ! en Principal Component (EOF) Analysis
weights
Equation 11.5(the ekm s) do
is sometimes not sum
called
subconjunto thetode1,Mtheir
synthesis squares
formula.
modos, dobastante
ojaláIf because
the full set of
of the
menor M a=scaling
KK(M PCs <<convention
K).
sed#einm #the = synthesis,
1. (Note that the reconstruction
a fixed scalingis convention
complete and forexact, since )memRm2of=the
the weights 1 (cf.
linear combi-
ation La11.4).
ideaines
nations que
M < estos
IfEquation K PCs pocos
11.1 (usually
allowsM modos describan
corresponding
the maximum al M
tovariance
the máximo
largest eleigenvalues)
constraintsistema original.
defining are
the PCs to be
or
, the Esreconstruction
meaningful.)
decir, If the is data
approximate,
sample consists of n observations (and therefore of n data vectors
x, or n rows in the data matrix !M[X]), there will be n values for each of the principal
!x# ≈ [E]
x ≈ e u u! ( k = 1! " " " ! K! (11.6a) (11.6b)
components, or new variables, $K×1%um .$K×M%
k Each$M×1%
km of these constitutes a single-number index of the
m
m=1
resemblance between the eigenvector em and the corresponding individual data vector x.
but Geometrically,
En improves
PCA, se buscan as the the first eigenvector,
también
number Mmodosof eused
1 , points
linealmente
PCs (or, in the accurately,
directionentre
independientes
more (inasthe r(u i, uj) = 0
sí:theK-dimensional
sum of the
!
space of
corresponding x ) in which the because
eigenvalues, data vectorsof jointly 11.4)
Equation exhibitincreases.
the mostBecausevariability.
[E] Thisonly
has first
> Se requiere
eigenvector is de one
the unaassociated
transformación with lineal
the del espacio
largest eigenvalue, que$diagonalize
. The second la eigenvector
matriz de
M columns, and operates on a truncated PC vector u of dimension1 (M × 1), Equation 11.6
covarianza (o with
correlación) y que asocie máxima varianza a los primerosbe modos.
ise2called
, associated
the truncated thesynthesis
second-largest
formula. eigenvalue
The original $2 , (in
is constrained
the case of to Equation perpendicular
11.5) or
to e1 (Equation
approximated (for 9.48),
Equation but 11.6)
subject to this constraint
uncentered data x canit easily
will align
be in the direction
obtained by adding in
Ver caso 2D…
which
back thethe
vectorx! vectors
of sample exhibit
means; their
thatnext
is, bystrongest
reversingvariations. Subsequent eigenvectors
Equation 9.33.
GF601 - semestre otoño 2022

nonzero eigenvalues, where K is the number of rows and columns in [A]. Each eigenvector
AnMatemáticamente,
eigenvalue !, andlaanmatriz
will be dimensioned %Keigenvector,
× 1&.E If(EOFs)
[A] is e seof a square
singular
obtiene at least
dematrixone of
resolver [A]itsun
are a scalar will
eigenvalues
problema andauto-
de nonzer
be
vector,zero,
valores with
parathe unacorresponding
respectively, satisfying
matriz cuadradaeigenvectors
the equation being arbitrary.
AKxK (operador lineal): Synonymous terminology that
M A T R I X also
is sometimes A L Gused
E B Rfor
A Reigenvalues
EVIEW and eigenvectors includes characteristic values 42
and characteristic vectors, latent values"A#e and latent
= !e$vectors, and proper values and proper(9.46a
vectors.
T
transformation !E"
Because each eigenvector x defines a rigid rotation
is defined to have of unitthelength,
K-dimensional coordinate
the dot product axes of
of any
or Si
equivalently
A no es singular y, porisloThistanto, K “territory”
called
eigenvector anwith
eigenspace.
itself one. If,invertible,
space in covers
addition,existen
thethe same matrixpares [A]deisvalores
as (λi) ythen
the original
symmetric, vectores
coordinate
its
but using
(ei)eigenvectors
propios. themutually
Si are different
además set of axesso
A esorthogonal,
simétrica, defined
losthat by the
vectores solutions
propios seránto Equation
ortogonales:9.46.
The K eigenvalue-eigenvector %"A# − !"I#&e
pairs
!
=
contain 0$the same information as the (9.46b
matrix [
from which they were computed, T and so 1$ cani =bej regarded as a transformation of [A]. Th
where 0 equivalence
is a vector consisting entirely ei eof
j =zeros. For every eigenvalue and eigenvector (9.48) pa
can be expressed, again for 0$[A]i $=symmetric,
j' as the spectral decomposition,
that can Jordan
be found to satisfy Equation 9.46, any scalar multiple of the eigenvector, ce, wi
decomposition,
also satisfy
Orthogonal the vectors
equation of together
unit length withare that
said eigenvalue.
to be orthonormal. Consequently, for definiteness
(This terminology has
is usual
Denothing to require
forma to
más thatthethe
do general:
with eigenvectors
Gaussian, or “normal”
!A" havedistribution.)
= !E"!#"!E" unit T length, The orthonormality property is (9.50
analogous to Equation 8.66, expressing theorthogonality of the sine and cosine functions.
For many statistical applications, eigenvalues $1 0and ··· 0
0 eigenvectors are calculated for real (9.47
"e" = 1'
(not containing complex or imaginary numbers) 2 0 · · · 0matrices,
 0 $symmetric 
  T such as covariance
or correlation matrices. Eigenvalues = and!E"  0 0 $3 of
eigenvectors · · ·such
0 matrices
!E" have
&sign, a number of (9.50
This restriction removes the ambiguity only  % up% to% a change  in since
%%  is that their eigenvaluesif a vector
important and remarkable properties. The  %% of %%these
first % properties
satisfies Equation 9.46 then its negative, −e will also. % %
and eigenvectors are real-valued. Also, as 0just0 noted, 0 · · ·the eigenvectors of symmetric
Ifmatrices
[A] is are nonsingular there will be K eigenvalue-eigenvector pairs ! and e wi
$
orthogonal. That is, their dot products with eachK other are zero, so kthat theyk
nonzero eigenvalues,
are mutually where KinisK-dimensional
perpendicular the number ofspace. rows and columns in [A]. Each eigenvecto
will be Oftenso that !#" denotes
dimensioned × a diagonal matrix whoseatnonzero elements are the K eigenvalu
the %K × K& matrix [E] is formed, the K columns of which are the eigenvectorswill b
%K 1&. If [A] is singular least one of its eigenvalues
of [A]. It is illuminating to consider also the equivalent of Equation 9.50 in summati
zero, ewith
k . theis,corresponding eigenvectors being arbitrary. Synonymous terminology th
That
notation,
is sometimes also used for eigenvalues and eigenvectors
GF601 -"semestre otoño 2022 # includes characteristic value
k
sum of these !Ek " matrices, where the weights are the corresponding eigenvalues. Hence
the
La matriz valores propiosofΛacorresponde
spectraldedecomposition matrix is analogous toAthe
a la matriz Fourier decomposition
(covarianza o correlación) of
en a
function
el nuevoor data series
espacio (Equation
de nido 8.62a),
por E. Es decir,with the eigenvalues
contiene playing the pero
la misma información, role of the
vista
Fourier amplitudes
desde otro ángulo. and
Λ esthe !Ek " matrices
diagonal porque corresponding to the cosine functions.
la base E es ortogonal.
Other consequences of the equivalence of the information on the two sides of Equa-
tion 9.50que
Puesto A y Λtocuanti
pertain the eigenvalues. The conjunta
can la varianza first of these is
del sistema, se cumple que:

K
' K
'
tr!A" = ak&k = $k = tr!#"% (9.52)
k=1 k=1

This
Si A relationship
es la matriz deis particularly
covarianza deimportant whenla[A]
K variables, sumais de
a covariance matrix,
los K valores in será
propios which
case
igualitsa diagonal
la suma deelements ak&k aredethe
las varianzas K variances.
todas Equation
las variables 9.52 says the sum of these
originales.
variances is given by the sum of the eigenvalues of the covariance matrix.
EnThe
estesecond consequence
caso, los of Equation
valores propios 9.50 forde
—ordenados themayor
eigenvalues
a menor—is corresponden a
las varianzas de los modos independientes (anti-correlacionados entre sí) y dan
K
(
cuenta de la variabilidad conjunta
det!A" = K variables.
de las $k = det!#"& (9.53)
k=1
→ Esto es precisamente lo que se busca!
which is consistent with the property that at least one of the eigenvalues of a singu-
Los modos quedan entonces de nidos por los vectores propios ei. Estos representan
lar matrix (that has zero determinant) will be zero. A real symmetric matrix with all
una base ortogonal
eigenvalues y serán
nonnegative las EOF
is called buscadas.
positive definite.

GF601 - semestre otoño 2022


fi
fi
fi

gridpoints, then the dimensionality of the data vector k!k x is given by the produc
nto a rectangular array of numbers having n rows, each corresponding to
observation, and with
En la práctica… may
firsteach ofseem
K elements a oflittle
the K columns x are strange at nfirst,
observations
containing all of ittheisfirst
observations conventional
variable, the in second
multivariat
K el
observations
iables. This arrangement the of
n ×the
alsoofconvenient second
from the
K numbers variable, and the
standpoint
in the multivariateofdatalast K elements
arranging of x will be calcu
the covariances obse
a matrix, Lth variable.
the Equation 9.4Since
into the L different
a square arrayvariables
called the generally
samplewill be measured
covariance in un
matrix,
 it will almost
 always be  appropriate to base the PCA of such data on the correlat
T
xThe x x · · · x  
1 dimension
1"1 1"2 of [R], 1"K and of the matrix of s1!1eigenvectors
s1!2 s1!3 · · ·[E], s1!'will then be !K
   
 xApplication
 2   x2"1 xof2"2PCA to this
T  · · · x2"K  kind of correlation s2!1 matrixs2!2 s2!3will· · produce
· s2!'  principal c
  
 successively
  maximizing  the joint variance s of the
s L variables
s · · · s in a way that co
 T 
!X$ =  x3  =  x3"1 x3"2 · · · x3"K  %  
#S$ =  (9.2)3!1 3!2 3!3 
3!' %
 correlations
  both between  and among these %variables
% %at the K % 
locations. This
% %
%%   %% is sometimes
 procedure %
%% %% called combined 
% %% %% %%
PCA, or CPCA.
%% 
   
xnT Figure xn"111.5
xn"2 ·illustrates
· · xn"K the structure ofsk!1 k!2 sk!3 · · · matrix
the scorrelation sk!' (left) and
of eigenvectors (right) for PCA of vector field data. The first K rows of [
th th
or observations ofthe That is, the
thecorrelations
form shown in covariance
Equation
between the9.1first
shave
k!' is displayed
been
of the stacked
L in
variablesthe at
k row
these and
locations
' colum
and
d a rectangular array, called The
matrix. a matrix,
sample withcovariance
n rows andmatrix, K columns.
or variance-covariance matrix, is
he first of theresolver
Permite two subscripts of the scalar
el problema elements of a matrix denotes
de auto-valores
to the sample (Pearson) correlation matrix (see Figure 3.25), with the re
}
and para
the second indicates
la matriz de the column
covarianza number
S = n so,
-1 XXT : for example, x 3"2 is the
ervations of the second corresponding
of the K [R variables.
1,1]
elements
[R1,2In of[Rthe
] this book ] two matrices
1,Lmatrices will be being given by Equation F
1/2 e1 e2 e3 e4 eM Var
sk!' /#&s
uare brackets, as a pictorial = E(&s
S Ereminder
k!k Λ'!'that($ the . The
symbolK covariances
within represents sk!k in the diagonal positions b

}
ay. left and lower-right corners of the sample covariance matrix are sim
[R ] [R ] [R ] Se
ix [X] in Equation 9.2 corresponds
variances.
[R] =
2,1 exactly
The 2,2 to a conventional
remaining, data
off-diagonal,
2,L table
[E] = elements are covariances amon Var
splay,Lainsolución
which each entrega
columnuna matriz
pertains one of the Λ
to diagonal variables
KxK considered,
resents one of the n and the values
observations. Its below can
contents andalso to be thevisualized
left of theor diagonal positions duplicat
de valores propios y una matriz EKxK cuyas
and K-dimensional
rically within an abstract to the right. space, with each of the n rows
columnas
point.
K = vectores
contienen
The simplest
2 [Link]
example
The pairo of
losisvariance-covariance
The correspondientes
numbers
EOFs.
a L,1
[R data
]
in
matrix
each of the
[RL,L] matrix
[RL,2] for bivariate
rows locates a
it describes how the observations are dispersed around their (vec
is also known as the dispersio
data, which
point
} Var
plane. The collection of these n points on the plane defines a scatterplot
ata.
K-dimensional space defined by the K variables. The diagonal element
FIGURE 11.5 Illustration GF601of- semestre
the structures of the correlation matrix and of the matrix of
otoño 2022

for the maximum amount of the joint variability of x (and therefore also of x)—are
uniquely defined by the eigenvectors of the covariance matrix of x, [S]. In particular, the
Las thseries de tiempo de las variables nuevas (um) resultarán de la proyección de los
m principal component, um is obtained as the projection of the data vector x! onto the
datos
mth originales X esobre
eigenvector, , la base de nida por E :
m

K
!
um = eTm x! = ekm xk! ! m = 1! " " " ! M# (11.1)
k=1

Notice that each of the M eigenvectors contains one element pertaining to each of the K
Las ejes de la nueva base (em) se orientan de manerath que el 1 er modo (m=1) explique
variables, xk . Similarly, each realization of the m principal component in Equation 11.1
la is computed
máxima from a particular
variabilidad [Link] of the K variables xk . That is, each of
the M principal components is a sort of weighted average of the xk values. Although the
El weights
2do eje (the ekm s) dode
se orienta notmanera
sum to tal
1, their
que squares do because
sea ortogonal of ythe
al 1ro scaling convention
explique también el
#em # =de
máximo 1. varianza
(Note that a fixed(luego
posible scaling
delconvention
1ro). Y así for the weights em of the linear combi-
sucesivamente.
nations in Equation 11.1 allows the maximum variance constraint defining the PCs to be
Demeaningful.)
forma análoga,If the udata
1 essample consists
la nueva serieofde n observations
tiempo con (and mayor therefore of n(componente
varianza data vectors
x, or n rows
principal). u2 esinla the
seriedata
conmatrix [X]),mayor
segunda there varianza
will be ne values for each de
independiente of uthe
1. uprincipal
3 …
components, or new variables, um . Each of these constitutes a single-number index of the
Lasresemblance
varianzasbetween
de cada the modo son elos
eigenvector m and the corresponding individual data vector x.
valores propios encontrados en la matriz
diagonalGeometrically, the first
Λ. Por lo tanto, eigenvector, ae1la, points
Λ corresponde matrizindethe direction (in
covarianza the K-dimensional
de U.
space of x! ) in which the data vectors jointly exhibit the most variability. This first
eigenvector is the one associated with the largest eigenvalue, $1 . The second eigenvector
e2 , associated with the second-largest eigenvalue $2 , is constrained to be perpendicular
to e1 (Equation 9.48), but subject to this constraint
GF601 - semestre otoño 2022
it will align in the direction in
fi

the variances )m 'm . of the principal component variables u.


6 C H AEquation
P T E R !11.211 expresses
Principal Component (EOF) Analysis
the transformation
# [E] of a (K × 1) data vector x #
to a vector u of
Así, PCs.
un subconjunto M de u k, x
correspondiente = ua (los primeros modos, explicarán(11.5) el
If [E] contains all K eigenvectors $K×1% x ] (assuming
of [S$K×K% $K×1% it is nonsingular) as its columns,
gruesothe de
resulting vector u will
la variabilidad also haveEstos
conjunta. dimension modos (Kde × 1). nenEquation
las PCs. 11.2 sometimes is called
orthe
which is obtained
analysis formulafromfor xEquation
# 11.2 that
, expressing by multiplying
the data canonbethe left by [E]
analyzed, and using the
or summarized in
#
orthogonality
terms
Los datos of
originales property
the principal
pueden of this
components.
luego matrix
ser
M
(Equation
Reversing
reconstruidos 9.42).
the The reconstruction
transformation
a partir de in
los of
Equation
modos x expressed
11.2, the
retenidos:
by Equation
data x# can be11.5 is sometimes
reconstructed !
! called theu synthesis formula. to of M = K(11.6b)
If the full set PCs
xk ≈ theeprincipal
from
km m ! kcomponents
= 1! " " " ! K!according
2
is used in the synthesis, the reconstruction m=1
is complete and exact, since ) m m = 1 (cf.
R
Equation 11.4). If M < K PCs (usually x# = [E] u ( to the M largest eigenvalues)
corresponding are
(11.5)
(todos los modos)
66 but Cused,
HA P the
T E reconstruction
R ! 11 Principal is approximate,
Component $K×1%(EOF) Analysis
$K×K% $K×1%
improves as the number M of PCs used (or, more accurately, as the sum of the
corresponding
which is obtained eigenvalues,
from Equation because 11.2of Equation 11.4)onincreases.
# by multiplying the left byBecause
[E] and [E] has the
using only
x ≈ [E] u ( (con M PCs) (11.6a)
M columns, and operates on a
orthogonality property of this matrix$K×1%
or truncated PC
(Equation vector u of dimension (M × 1),
$M×1% The reconstruction of x expressed
$K×M%9.42). Equation
# 11.6
isbycalled
Equationthe truncated synthesiscalled
11.5 is sometimes formula. The original
the synthesis formula.(in the case
If the fullofsetEquation
of M = K11.5) PCs or
approximated
is used in the (for Equation
synthesis, the !11.6) M uncentered
reconstruction
! data x can
is complete andeasily
exact, be obtained
since )m Rm2 by = 1adding
(cf.
back dethe
Equationforma
vector escalar:
11.4). of
If sample
M < K means; xk ≈(usually
PCs ekmis,
that mby k = 1! " "to
ucorresponding
! reversing K! M largest
"Equation
! the 9.33. eigenvalues) (11.6b)
are
m=1
Because
used, each principal
the reconstruction is component
approximate, um is a linear combination of the original variables
xk (Equation 11.1), and vice versa (Equation 11.5), pairs of principal components and
but improves as the number M of PCs # used
[E] (or, more accurately, as the sum of the
original variables will be correlated x
unless ≈ the use( puedeelement
eigenvector (11.6a)
El peso relativo de un modo
corresponding eigenvalues, because$K×1%sobre cierta variable
of Equation $K×M%$M×1% 11.4) increases. Because [E] hasthem
cuanti e
car
k!m relating
mediante la is
only
zero.
M It can sometimes
columns, and operates beon informative
a truncated toPCcalculate
vector u these
of correlations,
dimension (M × which
1), are given
Equation 11.6 by
correlación entre ambos:
is called the truncated synthesis formula. The original"(in the case of Equation 11.5) or
approximated (for Equation 11.6) uncentered data x can%measily be obtained by adding
ru!x = corr#um ! xk $ = ek!m & (11.7)
back the vector of sample means; that is, by reversing Equation sk!k 9.33.
Because each principal component um is a linear combination of the original variables
xk (Equation 11.1), and vice versa (Equation 11.5), pairs of principal components and
EXAMPLE 11.1 PCA in Two Dimensions
original variables will be correlated unless the eigenvector element ek!m relating them is
The
[Link]
It can of PCA arebemost
sometimes easily
informative appreciated
to calculate
GF601 - semestre otoño 2022
in a
these simple example
correlations, whichwhere
are the
givengeom-
by
fi
fi
That is, the variance of the mth principal component um is the mth eigenvalue 'm .
Como en regresión
Equation 9.52 thenlineal o en
implies thatanálisis
each PCespectral,
represents se de ne
a share unatotal
of the cantidad Rmin como
variation
2
x that la
fracción de varianza
is proportional to itsconjunta explicada por un cierto modo:
eigenvalue,

'm 'm
Rm2 = K × 100% = K × 100%& (11.4)
! !
'k sk(k
k=1 k=1

472 Here
CHR
Cuidado 2P T E R ! 11 Principal Component (EOF) Analysis
Acon is el
used in the(no
léxico same
es sense that is familiar from linear regression (see Section 6.2).
estándar)…
The total variation exhibited by the original data is completely represented in (or accounted
for by) the full set of K um ’s, in the sense that the sum of the variances of the centered
dataTABLE
x# (and11.2 A partialalso
therefore guideoftothe
synonymous terminology
uncentered associated
variables x), )kwith
sk(k ,PCA.
is equal to the sum of
variances )emm 'm . ofEigenvector
the Eigenvectors, the principal component variables u.
Principal Principal Component
Equation 11.2 expresses Elements,theetransformation
k!m of a (K ×um1) data vector
Components, x# to
Elements, ui!ma vector u of
PCs. If [E] contains all K eigenvectors of [Sx ] (assuming it is nonsingular) as its columns,
the EOFs
resulting vector u will Loadings
also have dimension Empirical Orthogonal
(K × 1). Equation 11.2 Scores
sometimes is called
Variables
the analysis formula for x# , expressing that the data can be analyzed, or summarized in
Modes
terms of of
theVariation
principal Coefficients
components. Reversing the transformationAmplitudes in Equation 11.2, the
dataPattern
x# canVectors Pattern from
be reconstructed Coefficients
the principal components according Expansion
to Coefficients
Principal Axes Empirical Orthogonal Coefficients
Weights x# = [E] u ( (11.5)
$K×1% $K×K%$K×1%
Principal Vectors
Proper Functions
which is obtained from Equation 11.2 by multiplying on the left by [E] and using the
Principal Directions
orthogonality property of this matrix (Equation 9.42). The reconstruction of x# expressed
by Equation 11.5 is sometimes called the synthesis formula. If the full set of M = K PCs
is used in the synthesis, the reconstruction is otoño
GF601 - semestre 2022 and exact, since ) R 2 = 1 (cf.
complete
fi
Volvamos al caso de 2 variables, particularmente útil pues su transformación lineal se
puede visualizar.

Ejemplo del Wilks: Tmin' en Ithaca (x1) y Canandaigua (x2)

X = [x1, x2]
r = 0.92

S = 185.47 110.84
110.84 77.58

Haciendo PCA tenemos:

E = 0.85 -0.53
0.53 0.85

Λ = 254.8 0
0 8.29

λ1/(λ1+λ2) = 0.968

(S y Λ en °F2)

GF601 - semestre otoño 2022


En este ejemplo se emplea la matriz de covarianza calculada a partir de anomalías


simples (con dimensiones).

En este caso x1, con mayor varianza, tiene mayor incidencia en la orientación de las
nuevas coordenadas de nidas por e1 y e2.

Esto no ocurre cuando las variables están normalizadas o —equivalentemente— se


utiliza la matriz de correlación.
GF601 - semestre otoño 2022
fi
tively account for 99.9% of the total variance of the data set. Computing the principal components from
Ejemplo con más dematrix
the correlation 2 variables:
ensures that variations of the temperature and precipitation variables are weighted
equally.
(a) Covariance results:
Variable Sample Variance e1 e2 e3 e4 e5 e6
Ithaca ppt. 0$059 inch2 $003 $017 $002 −$028 $818 −$575
Ithaca T max 892$2# F2 $359 −$628 $182 −$665 −$014 −$003
Ithaca T min 185$5# F2 $717 $527 $456 $015 −$014 $000
Canandaigua ppt. 0$028 inch2 $002 $010 $005 −$023 $574 $818
Canandaigua T max 61$8# F2 $381 −$557 $020 $737 $037 $000
Canandaigua T min 77$6# F2 $459 $131 −$871 −$115 −$004 $003

Eigenvalues, #k 337$7 36$9 7$49 2$38 0$065 0$001


Cumulative % variance 87$8 97$4 99$3 99$9 100$0 100$0

(b) Correlation results:


Variable Sample Variance e1 e2 e3 e4 e5 e6
Ithaca ppt. 1.000 $142 $677 $063 −$149 −$219 $668
Ithaca T max 1.000 $475 −$203 $557 $093 $587 $265
Ithaca T min 1.000 $495 $041 −$526 $688 −$020 $050
Canandaigua ppt. 1.000 $144 $670 $245 $096 $164 −$658
Canandaigua T max 1.000 $486 −$220 $374 −$060 −$737 −$171
Canandaigua T min 1.000 $502 −$021 −$458 −$695 −$192 −$135

Eigenvalues, #k 3$532 1$985 0$344 0$074 0$038 0$027


Cumulative % variance 58$9 92$0 97$7 98$9 99$5 100$0

GF601 - semestre otoño 2022


Modos de variabilidad en variables geofísicas:

Esta técnica es ampliamente usada para estudios de co-variabilidad en variables


distribuidas en el espacio.

El procedimiento es el mismo. En este caso, las K variables corresponderán a una


variable física determinada (e.g., PSL’) en distintas regiones o puntos de grilla en el
caso campos 2D (o ND).

GF601 - semestre otoño 2022


or now we can gridpoints, then the
safely ignore it. dimensionality
Because the Kofindividual values x is given by the product KL
the data vector
ally, Equation
Modos defirst9.1Kiselements
variabilidad of xvector,
calledena variables
row and each ofofthe
are geofísicas:
observations thepositions
first variable, the second K elemen
to one of the K scalars whose
observations of the second simultaneous variable, relationships
and the last willK be
elements of x will be observati
convenient
Esta técnicato L
the visualize
th
[Link] (for
ampliamenteSincehigher the
usada L dimensions,
different K) imagine
variables
para estudios generally
de a will be measured
co-variabilidad in unlike
en variables
cally, as a point eninalmost
it will
distribuidas ela espacio.
K-dimensional
always be appropriate space, or as to an
basearrowthe PCAwhose of such data on the correlation m
d by the listedThescalars,
dimensionand whose
of [R],base andisofatthe thematrix
origin. of Depending
eigenvectors [E], will then be !KL ×
data,
El this abstract
procedimiento geometric
Application es of
el PCA space
mismo. to may
this correspond
En kind
este of to
lasa Kphase-
correlation
caso, matrix or willcorresponderán
variables produce principal compo
a una
on variable
6.6.2), orfísica
somedeterminada
subset maximizing
successively of the (e.g., dimensions the ’) (aensubspace)
Z500joint distintasofof
variance such
the a o puntos
L variables
regiones in ade
way thaten
grilla conside
el
caso correlations
campos 2D (o both between and among these variables at the K locations. This join
ND).
set consists of a collection
procedure of n scalarcalled
is sometimes observations
combined = 1" #or
xi " iPCA, # # CPCA.
" n.
ate data set consistsFigureof a11.5
collection
illustratesof n data the vectors
structure xCada i =the
i " of 1" # #correlation
# " [Link] EOFmatrix
elemento corresponde al the m
(left) and
onal and computational convenience this collection inofuencia data vectors
of eigenvectors
La k-ésima variable corresponde a (right) for PCA of vector fielddel data.
modo The first K rows en
correspondiente of [R] c
rectangular array of numbers having n rows, each corresponding to de grilla) determinada
una the correlations
muestra (serie de between
tiempo) en the first of the L variables at these locations and all
una región (punto
vation, and with each of the K columns containing all n observations
unaarrangement
s. This cierta regiónof o punto
the n × deKgrillanumbers in the multivariate data
atrix,
  
x1 T
[R 1,1 ] [R
x1"1 x1"2 · · · x1"K
1,2 ]  [R 1,L]
e1 e2 e3 e4 }
eM
First
Variable

}
   
 xT   x [Rx ] · · [R · x ]
 [R ] Second
 2  [R] = 2"1 2,12"2 2,22"K  2,L
[E] = Variable
 T  
!X$ =  x
  
3
 =  x3"1 x3"2 · · · x3"K  %
 (9.2)
 %   % % % 
 %%   %% %% %% 
  
xnT
[RL,1] [RL,2]
xn"1 xn"2 · · · xn"K

[RL,L]
} Lth
Variable

bservations of the form


FIGURE 11.5shown in Equation
Illustration 9.1 have
of the -structures
GF601 semestre of been
the
otoño stacked matrix and of the matrix of eigen
correlation
2022
fl
Ejemplo:

Si analizamos anomalías anuales de PSL’ en 30 años en una región de 10×10 puntos,


tendremos una matriz de datos inicial

X30×100

y una matriz de covarianza o correlación

R100×100

Matemáticamente, de éstos se obtendrán 100 modos con sus correspondientes EOFs


(ek) y series de tiempo uk.

En la práctica, sólo se tienen min (n-1, K) modos efectivos (los restantes son nulos).

→ princomp(X, ‘econ’), pca

De igual manera, el subconjunto de modos que expliquen la mayor varianza conjunta


(los primeros) serán las componentes a retener, y nos darán información muy útil, que
puede ser interpretada (con cautela) físicamente.

GF601 - semestre otoño 2022


general, las fases de los modos de variabilidad natural como ENOS no son canónicos (no
siguen exactamente
Ejemplo: Modos un1,mismo
2 y 3 patrón
de lassiempre),
anomalíaspor mensuales
lo que puedededecirse que son una
TSM (global)
combinación de los modos descritos acá (más otros).

GF601 - semestre otoño 2022


general, las fases de los modos de variabilidad natural como ENOS no son canónicos (no
siguen exactamente un Modos
Ejemplo: mismo patrón siempre),
1, 2 y 3 de las por lo que puede
anomalías decirse
de TSM que son una
(global)
combinación de los modos descritos acá (más otros).

5. Compare (cuantitativamente) la serie de tiempo del modo principal de TSM’ con el indice
típico de ENOS, calculado como la TSM’ media en la región ‘Niño 3.4’. Para apoyar esta
comparación, haga un nuevo mapa de correlación que muestre la influencia de este índice
en la TSM’ de todo el globo. ¿Cómo contrasta el modo principal de TSM’, con el que
obtiene a partir de series normalizadas de TSM’? Haga nuevamente un mapa de
correlación para describir su resultado (procure usar la misma paleta la colores y rango en
todos los gráficos anteriores).

El figura siguiente muestra la correlación entre la TSM’ de todo el globo y el índice Niño 3.4. El
patrón es casi idéntico al del modo principal de TSM’ descrito en el punto anterior. De hecho
ambas series tienen una muy alta correlación (0.92), como también se puede apreciar en el
diagrama dispersión entre las observaciones de ambas series.

Los tres modos principales que se obtienen a partir de los datos de TSM’ normalizadas se
muestran a continuación. Se pueden reconocer los modos obtenidos en primera instancia
(datos sin normalizar), sobre todos las configuraciones ‘ENOS’, que en este caso emergen
como como el 2do y 3er componentes. Cabe destacar la importancia de este modo natural de
variabilidad, que emerge en el PCA aunque
GF601 se anule otoño
- semestre la distribución
2022 de varianza de la TSM con
PSA).
SLP’ mensual 1960-2014 (solo HS)

GF601 - semestre otoño 2022


eos atmosféricos. Al mismo tiempo, estos modos están asociados con anomalías positivas
te de los centros de alta presión, afectando
Correlación entre cadaelmodo
centro-sur de Chile en participar.
y la precipitación

AAO o SAM (-) Modos PSA (Paci c–South American Modes)

econstrucción de la precipitación en Chile Relacionados con ENSO

ta parte le daremos un uso más pragmático al análisis EOF. Queremos estimar la precipit
(P) en un número grande de localidades en Chile
GF601 - semestre para el periodo anterior a 1979. Para
otoño 2022
fi

Ejemplos en la literatura: modos de variabilidad de baja frecuencia en el Hemisferio Sur

Low-Frequency Variability of Southern Hemisphere Sea Level Pressure


and Weather System Activity
2538 MONTHLY WEATHER REVIEW VOLUME 125
MARK R. SINCLAIR, JAMES A. RENWICK, AND JOHN W. KIDSON
National Institute of Water and Atmospheric Research, Ltd., Wellington, New Zealand
(Manuscript received 18 September 1996, in final form 7 January 1997)
OCTOBER 1997 SINCLAIR ET AL. 2537
ABSTRACT
This study examines the month-to-month variations in the tracks of Southern Hemisphere weather systems
and their relation to low-frequency circulation variability. Cyclones and anticyclones are identified and tracked
from ECMWF analyses during 1980–94 via an automated method and the principal patterns of variation identified
by EOF analysis of monthly track density anomaly fields. Only the first three EOFs of cyclone track density
involving about one-third of the total track variance were distinguishable from noise. Spatial patterns derived
from both unrotated and rotated EOF analysis were not reproducible on subsets of the data, pointing to secular
changes in the variance structure of the cyclone dataset. An increase in cyclone numbers over the Southern
Ocean during the 1980s suggested that detection of small-scale cyclones is sensitive to changes in data coverage
and analysis procedure, as associated changes in the mean circulation were small during this period. EOFs of
anticyclone track data were found to be more robust, indicating a variety of seesaw patterns across the hemisphere.
The leading modes of sea level pressure variability are found to be associated with regional variations in
cyclone and anticyclone activity. The first pressure EOF, the so-called high-latitude mode, modulates cyclone
activity between middle and high latitudes, with increased (decreased) westerlies near 558–568S accompanied
by more (fewer) cyclones in the circumpolar regions and fewer (more) in middle latitudes. The second and third
EOFs have centers of action near 608S, 1208W and 558S, 1658W, respectively, and are linked with blocking
activity in these two regions. Blocks in the New Zealand sector occur in conjunction with a zonal wavenumber
3 pattern, while southeast Pacific blocks have no significant correlations outside the Pacific. A coherent cyclone
response to ENSO was also found. During FIGEl Niño
. 6. winters, increased
Eigenvectors 1–3 ofcyclone activity
monthly MSL occurs in a band
pressure spiraling with positive (negative) values solid
anomalies,
southeastward from the subtropical Pacific
2538 toward South America, while fewer cyclones
MONTHLY W are E A T H E R Rthe
found across EVIEW VOLUME 125
(dashed).
subtropical Indian Ocean, Australasia, and the southwest Pacific. During La Niñas, these patterns are almost
exactly reversed, suggesting a predominantly linear cyclone response to ENSO.

but somewhat poleward of earlier results (Kidson 1975; near 408S and below normal activity within the circum-
Rogers and van Loon 1982). We can have more con- polar trough near Antarctica. Thus, stronger (weaker)
Introduction fidence in the stability of the present As withresults,
other aspects
as they of are
the atmospheric
westerlies circulation,
near 558S imply more (fewer) cyclones in the
The regular passage of based on 15 years
anticyclones, cyclones,of recent
and weather system behavior can vary considerably
state-of-the-art numerical circumpolar regions from
the time-averaged picture. Marked variations in the lo-
and fewer (more) in middle lati-
ir associated fronts accounts for
analyses. most of the shorter- tudes.
m ‘‘weather’’ variationsThe seensecond
in middle latitudes. cation and strength of storm tracks can The
occurrelation
within between PC 1 and anticyclones (Fig.
pressure EOF (Fig. 6b) shows a center 7d) is less coherent, the main response being a corre-
ather systems develop in preferred regions where at- individual seasons (e.g., Lau 1988). Anomalous local
of action
spheric and physiographic in the
features southeast
combine to pro-Pacific near
weather 608S,can
patterns 1208Woccur that lation systems
when weather with subtropical
depart highs to the north of NZ and in
e conditions favoringistheir
out growth.
of phase Thewith variations
time-aver- in the
from theirsubtropical
usual [Link]
For example, during the
the Indian 1982 The time series for pressure PC 1 is
Ocean.
d behavior of weatheron one side
systems has and extensively onwinter
beenAntarctica whenWhile
the other. very few cyclones crossed
the south- New Zealand
also correlated with cyclone PC 2 (r 5 20.60, Fig. 3b)
east Pacific
cumented. In the Southern center(SH),
Hemisphere is common
cy- (NZ), rainfall was well below normal around the coun-
to all studies, some studies and anticyclone PC 5 (r 5 20.49, Fig. 4e). However,
nes tend to form and(e.g.,
intensify in middle latitudes try. In contrast, a preference for cyclones to track across
Kidson 1988b) show additional wavelike structure
d near the principal upper-tropospheric jet streams, NZ during the winters of 1990–92asproduced the cyclone EOFs are not well defined, these corre-
unusually
for Fthis mode
IG. 7. (a)–(c)involving
Distributionextrema near NZcoefficient
of the correlation and thebetween
Ant- lations
cyclone yield
track littleanomalies
density physicalandinsight beyond
the three leading representing
pressure PCs. (d)–
d migrate eastward and poleward as they mature and wet and stormy conditions (Sinclair 1996b).
arctic peninsula. Eigenvector Previous
ay (Taljaard 1967; Jones and Simmonds 1993; Sin-
3 (Fig. studies
6c) describes
(f)ofAs
storm atrackabut
in (a)–(c) for anticyclones.
general
variability association
have fo- between low (high) pressure anom-
ir 1994, 1995). On thewave trainanticyclones
other hand, extendinghave from cused
the Pacific across
GF601
on variations in-South
semestre
transient aliesotoño
eddy activity 2022
and its
and (anti-) cyclones.
3. SPATIAL STRUCTURE
Ejemplos en la literatura: modos de variabilidad de baja frecuencia en el Hemisferio Sur
Figure 1(a) and (b) shows the spatial patterns of PSA1 and PSA2 modes. They are the second and third
EOF patterns of seasonal mean 500-hPa height anomalies for the SH with all seasons pooled together.

PACIFIC –SOUTH AMERICAN MODES 1215

Figure 1. (a) EOF 2 (PSA1) and (b) EOF 3 (PSA2) of the 500-hPa seasonal mean height anomalies. Contour interval five
non-dimensional units. Zero contours are omitted, positive loadings are shaded. (c) Same as (a) but for EOF 3 of the 200-hPa eddy
streamfunction anomalies (psi200) with zonal means removed, and (d) same as (c) but for EOF 4

Copyright © 2001 Royal Meteorological Society Correlation21:


Int. [Link].
Figure 1211–1229
between seasonal(2001)
mean SSTA and the PSA1 PC for (a) DJF, (b) MAM, (c) JJA and (d) SON. Contour
interval is 0.1. Only correlations that are statistically significant are plotted. Zero contours are omitted. Positive values are shaded

Atlantic with weak negative correlations located to the south (Figure 3(d)). The pattern (Figure 3(d)) is
GF601 - semestre
quite similar otoño
to that of2022
Figure 2(d) south of about 10°N except that correlations were stronger in the
Ejemplos en la literatura: modos de variabilidad de baja frecuencia en el Hemisferio Sur

Circulation Regimes and Low-Frequency Oscillations in the South Pacific Sector


ANDREW W. ROBERTSON* AND CARLOS R. MECHOSO
Department of Atmospheric Sciences, University of California, Los Angeles, Los Angeles, California

(Manuscript received 27 April 2002, in final form 18 November 2002)

ABSTRACT
The characteristics of subseasonal circulation variability over the South Pacific are examined using 10-day
lowpass-filtered 700-hPa geopotential height NCEP–NCAR reanalysis data. The extent to which the variability
in each season is characterized by recurrent geographically fixed circulation regimes and/or oscillatory behavior
is determined. Two methods of analysis (a K-means cluster analysis and a cross-validated Gaussian mixture
model) both indicate three to four geographically fixed circulation regimes in austral fall, winter, and (to some
extent) spring. The spatial regime structures are found to be quite similar in each season; they resemble the so-
called Pacific–South American (PSA) patterns discussed in previous studies and often referred to as PSA 1 and
PSA 2. Oscillatory behavior is investigated using singular spectrum analysis. This identifies a predominantly
stationary wave with a period of about 40 days and a spatial structure similar to PSA 1; it is most pronounced
in winter and spring and exhibits a noticeable eastward drift as it decays. The power spectrum of variability is
otherwise well approximated by a red spectrum, together with enhanced broader-band 15–30-day variability.
The results presented herein indicate that low-frequency variability over the South Pacific is not dominated
by a propagating wave whose quadrature phases are PSA 1 and PSA 2, as hitherto described. Rather, it is found
that the variability is well described by the occurrence of three to four geographically fixed circulation regimes,
with a (near) 40-day oscillation that is predominantly stationary in space. The potential subseasonal predictability
implied by this duality is discussed. Only during austral spring is a strong correlation found between El Niño
and the frequency of occurrence of the circulation regimes.

1. Introduction by, although not systematically associated with, a par-


ticular phase of a 30–35-day oscillatory component.
Two distinct approaches have been used to study the This type of relationship has potentially important prac-
‘‘coarse grain’’ structure of atmospheric low-frequency tical implications due to the higher inherent predict-
variability (10 , T , 100 day): the episodic or inter- ability of oscillatory behavior.
mittent and the oscillatory or periodic. Ghil and Rob- The extratropical circulation of the Southern Hemi-
ertson (2002) have reviewed studies of the Northern sphere, being much more zonally symmetric than that
Hemisphere in these terms. The intermittency approach of the Northern Hemisphere, is deceptively simpler. A
describes geographically fixed multiple-flow (or weath- closer look hints at a higher complexity since the var-
er) regimes, their persistence and recurrence, and the iance spectrum of empirical orthogonal functions
Markov chain of transitions between them. The peri- (EOFs)-issemestre
flatter and the leading modes are characterized
odicity approach studies intraseasonal oscillations and GF601 otoño 2022
A retener…

• Identi camos las EOFs como los vectores propios del sistema S E = E Λ.

• S es la matriz de covarianza o de correlación de X. Λ es la matriz equivalente en la


nueva base ortogonal (matriz diagonal, son los valores propios).

• Las series de tiempo asociadas a los distintos modos (PCs) se obtienen


proyectando X sobre E (U = ET X).

• Por construcción (simetría de S), las EOFs están anti-correlacionadas en el espacio


(ortogonal) y las PCs están anti-correlacionadas en el tiempo.

• Si las variables originales son valores en un campo 2D (o ND), las EOFs indican la
in uencia (dirección y amplitud) de cada modo en una región (patrones
espaciales).

• Las PCs indican la evolución temporal del modo correspondiente.

GF601 - semestre otoño 2022


fl
fi

Puesto que los PCs explican el máximo de variabilidad conjunta, estarán


in uenciadas por:

1. Modos coherentes en el tiempo presentes en muchas variables (o regiones


extensas).
> Se debe considerar problema de la proyección espacial (mandatorio)

2. Variables o regiones con alta varianza


> Se debe considerar estandarización de variables (discrecional)

Especí camente, para no sobre-representar las latitudes altas en el caso de una


proyección rectangular (latitud-longitud), se debe corregir por el coseno de la latitud
(por cos lat0.5 si se aplica sobre X).
Se debe evaluar el uso de variables normalizadas o, equivalentemente, el uso de
matriz de correlación, si se quiere amortiguar la in uencia de regiones con alta
variabilidad (ej., SLP o Z en latitudes medias).
> La EOFs de variables normalizadas cuanti can sólo coherencia temporal.

GF601 - semestre otoño 2022


fl
fi

fi

fl

Proyección de Mollweide (áreas equivalentes)

Proyección equirectangular (lat-lon)

N pixeles representan un
área pequeña.

Altas latitudes están


sobre-representadas en
datos lat-lon
GF601 - semestre otoño 2022

Ej: Global annual SLP’ (1960-2012) — EOF-1


X sin tratamiento previo

GF601 - semestre otoño 2022


Ej: Global annual SLP’ (1960-2012) — EOF-1
X ajustado por latitud

GF601 - semestre otoño 2022


Ej: Global annual SLP’ (1960-2012) — EOF-1
X normalizado y (luego) ajustado por latitud

GF601 - semestre otoño 2022


EOF SLP’ mensual (Mayo-Octubre, sin normalizar)

EOF SLP’ mensual (Mayo-Octubre, normalizado)

GF601 - semestre otoño 2022


The dimension of [R], and of the matrix of eigenvectors [E], will then be !KL × KL".
Application
EOF combinadas of PCA to this kind of correlation matrix will produce principal components
successively maximizing the joint variance of the L variables in a way that considers the
correlations both between and among these variables at the K locations. This joint PCA
procedure
El análisis is
desometimes called
EOF/PCs se combined
se puede PCA,
aplicar or CPCA.
simultáneamente en variables de distinta
Figure 11.5 illustrates the structure of the correlation matrix (left) and the matrix
naturaleza.
of eigenvectors (right) for PCA of vector field data. The first K rows of [R] contain
Paracorrelations
the ello, basta construir
between la first X
thematriz ofconsiderando las múltiples
the L variables [Link] all of the
at these locations

[R1,1] [R1,2] [R1,L]


e1 e2 e3 e4 eM } First
Variable

[R] =
[R2,1] [R2,2] [R2,L]
[E] = } Second
Variable

[RL,1] [RL,2] [RL,L]


} Lth
Variable

FIGURE 11.5 Illustration of the structures of the correlation matrix and of the matrix of eigenvectors
En PCA
for este of
caso,
vectorsi field
las variables son de
data. The basic datadiversa
consist naturaleza (con unidades
of multiple observations of L y magnitudes
variables at each of
locations, soesthe
distintivas),
K dimensions oflaboth
fundamental [R] and [E] are
normalización uso×de
o el!KL alaThe
KL". matriz R.
correlation matrix consists of
!K ×K" submatrices containing the correlations between sets of the L variables jointly at the K locations.
The submatrices located on the diagonal of [R] are the ordinary correlation matrices for each of the
L variables. The off-diagonal submatrices contain
GF601 correlation
- semestre coefficients, but are not symmetrical and
otoño 2022

Ej: Combined global annual SLP’ and SST’ (1960-2012) — EOF-1

PC1, - SOI

GF601 - semestre otoño 2022


datos. Las estructuras espaciales de las tres componentes principales asociadas a SLP’ y P’ se ilustran
en los panelesEj: Combined
superiores SH monthly
e inferiores, SLP’ Estos
respectivamente. and P CL’modos
tres (1960-2014)
explican un 51% de la
varianza conjunta. El 80% de ésta, se obtiene reteniendo 14 modos.

GF601 - semestre otoño 2022


Uso de distintos dominios, interpolaciones y reconstrucciones

• Una vez hecho el análisis PC/EOF en una cierta región, se puede ver la in uencia
de un cierto modo en un área mayor o en otra región. Ésto se puede evaluar
calculando las correlaciones entre un cierto modo y la variable x en la región de
interés, o mediante regresión lineal utilizando las PCs como predictores.

• De forma similar, se puede expandir el periodo abarcado por las PCs (o rellenar
algún periodo faltante) utilizando algún predictor de referencia que explique buena
parte de la varianza de los PCs.

• Alternativamente, se puede emplear EOFs combinadas para hacer


reconstrucciones en el tiempo.

Ver ejemplos…

GF601 - semestre otoño 2022


fl
Uso de distintos dominios, interpolaciones y reconstrucciones
ANUARY 2013 GARREAUD ET AL.

EOF1 (U850,region)

r { PC1(U850, region), U850,globo }

Garreaud et al. (2013)

GF601 - semestre otoño 2022


GF601 - semestre otoño 2022
3% de la varianza total de Z300 en el dominio considerado. Estos patrones dan cuenta de
ares (máximos y mínimos identifican dorsales y vaguadas) de los sistemas sinópticos 5de
Ejemplo: Relación entre la circulación sobre el Pací co y Tmax en Chile
as y altas que se propagan típicamente hacia el oeste, por lo que no se pueden interpretar
nos de variabilidad sobresalientes, como los ocurren en escalas tiempo mayor (ej. ENSO 4
o

R2k (%)
o, el espectro de valores propios no discrimina claramente estos modos, ni los distingue 3de
(Figura 2). Para explicar el grueso de la varianza conjunta de Z300 se deben considerar 88
%). 2
PCA de Z300 en HS
1

0
0 10 20 30 40 50 60 70 80
componentes

Baja
Figura 2: Varianza explicada, y error típico (North et al.) dein
los uencia
primeros 88de Z300
modos sobre Tmax
de variabilidad de Z300.
¿Problemas de datos?
¿In uencia relativa mayor de la TSM en
b. Calcule, ahora, el coeficiente de correlación de Pearson entre las PCs de los 2 modos principales de
la costa?
Z300’ y las anomalías de Tmax (Tmax’). Utilice mapas que indiquen la correlación para cada punto de
grilla (restrinja el análisis a las regiones comprendidas entre 30ºS y 40ºS). Describa brevemente los
resultados en relación a lo obtenido en a.

nes espaciales (EOF) del 1er y 2do modo de variabilidad de Z300’.

Figura 3: Coeficiente de correlación entre los modos 1 y 2 de Z300, y la Tmax en distintas regiones de centro y sur
de Chile. Panel de la izquierda muestra la varianza de Tmax explicada por el conjunto de los primeros 88 modos.

GF601 - semestre otoño 2022


fl
fl

fi
Ejemplo: Relación entre la circulación sobre el Pací co y Tmax en Chile
GF601 - Otoño 2018

Figura 4: Patrones espaciales (EOF) del 1er y 2do modo de variabilidad para Z300’ y Tmax (análisis PCA-conjunto).

Este método PCA-conjunto también se podría utilizar para construir un modelo predictivo de Tmax. Es
este caso habría que generar un nuevo grupo de series ‘truncadas” de PCs, proyectando solamente los
datos de Z300 sobre los EOF-conjuntos (considerando los elementos de las EOF correspondientes a
Z300). Las PCs truncadas se utilizarían como predictores en un modelo de regresión de Tmax.
GF601 - semestre otoño 2022
fi
Principal Component (EOF) Analysis
Ejemplo: Reconstrucción de la precip. en Chile
tions
Se exhibited by the
quiere estimar field
P en unor fields grande
número being analyzed, and new
de localidades interpre-
en Chile para el periodo anterior a 1979.
data xguía
nalComo can para
be suggested by the nature
la reconstrucción, se of the linear
usarán combinations
series más largas (1960-2014) de sólo 13
tiveestaciones.
in compressing the data.
nvenient to calculate the PCs as linear combinations EOF − 1 of the anomalies EOF − 2

PC, u1 , is thatconjuntas
• EOF/PCs linear combination
de P of x! having the largest variance. The
al •components
Normalización um !de = 2! 3! " " " , −20
m varianzas are the linear combinations having −20

variances, subject to the condition−30that they are uncorrelated with the


(igual en ambos grupos) −30
nts having lower indices. The result is that all the PCs are mutually
latitude

latitude
• Extension de PCs
−40 −40

les or(proyecciones
PCs—that is,del
thegrupo 1) u of u that will account successively
elements m
amount de PCs of−50x! (and therefore also of x)—are
of the joint variability
• Re-normalización
−50

y theextendidas
eigenvectors of the covariance
−60
matrix of x, [S]. In particular, the
−60
ponent, um is obtained
• Reconstrucción P the projection of the data vector x! onto the
de as
−0.05 0 0.05 −0.05 0 0.05
m, loading loading

cipal Component (EOF) Analysis


PC − 1 PC − 2
K
! 50 50
um = eTm x! = ekm xk! ! m = 1! " " " ! M# (11.1)
k=1
scores

scores
0 0

!M
the M! eigenvectors contains one element pertaining to each of the K
xk ≈ ekm um ! k = 1!th" " " !−50K! (11.6b) −50
larly, each m=1
realization of the m principal component
1960 1970 1980 1990 in Equation
2000 2010 11.11960 1970 1980 1990 2000 2010

a particular set of observations of the K variables xk . That is, each of


mponents
umber M isofa PCs
sort of weighted
used average
(or, more of the
accurately, x as
k values.
the sumAlthough
of
GF601 - semestre otoño 2022
the the

Ejemplo: Reconstrucción de la precip. en Chile

Se quiere estimar P en un número grande de localidades en Chile para el periodo anterior a 1979.
Como guía para la reconstrucción, se usarán series más largas (1960-2014) de sólo 13
estaciones.

Coef. de correlación entre los PCs


originales y extendidos
1.2

0.8

0.6
r

0.4

0.2

0
1 2 3 4 5 6 7 8 9 10 11 12
componentes

GF601 - semestre otoño 2022


Consideraciones prácticas
466 CHAPTER ! 11 Principal Component (EOF) Analysi

Convención de escala or
M
!
Las EOFs conforman una base ortonormal, es decir: ||e|| = 1 xk! ≈ ekm um ! k=
m=1

> Es la solución común (e.g., función pca de matlab)


but improves as the number M of PCs used (or,
corresponding eigenvalues, because of Equation 1
Sin embargo, el problema de V&V propios sigue siendo M columns, and operates
válido on a truncated
si se aplica de u
PC vector
un factor
escala cualquiera a los em. is called the truncated synthesis formula. The orig
approximated (for Equation 11.6) uncentered data
back the vector of sample means; that is, by revers
En tal caso, el signi cado del valor propio (y de la magnitud
B A S I C S O F P R I N C I P A L C O M P O N E N T A N A L Y SBecause
IS each um) también
deprincipal cambiará.
473
component um is a linea
xk (Equation 11.1), and vice versa (Equation 11.5
original variables will be correlated unless the eige
TABLE 11.3 Three common eigenvector scalings zero. used inItPCA,
can and their consequences
sometimes be informative to calculate
for the properties of the principal components, um ; and their relationship to the original
variables, xk , and the standardized original variables, zk .
Eigenvector Scaling E!um " Var!um " Corr!um # xk " Corr!um #rzu!x
k"
= corr#um ! xk $ = e
!!em !! = 1 0 $m ek#m %$m &1/2 /sk ek#m %$m &1/2
!!em !! = %$m &1/2 0 $2m ek#m /sk 11.1 PCAein
EXAMPLE k#m Two Dimensions

!!em !! = %$m &−1/2 0 1 The ebasics k PCA areek#m


k#m $m /sof $m easily appreciated in
most
etry can be visualized. If K = 2 the space of the
graphed on a page. Figure 11.1 shows a scatterpl
Table 11.3 summarizes the GF601
effects- semestre
of three otoño
common2022scalings of the eigenvectors on
fi

the most spectrum


basic criterion
for a givenisPCAto will
retain enough
exhibit a singleof theseparation,
slope principal components
or that to represent a
it (or they) will
Criteriossufficient fraction
be of abrupt
sufficiently
de selección de the
PCs variances of the locate
to unambiguously original x. That
a cutoff is, enough
M. Sometimes this principal
approach to components
principal component truncation is called the scree test, although this name implies more
are retained for the total amount of variability represented to be larger than some critical
objectivity and theoretical justification than is warranted: the scree-slope criterion does
• De nir value, not involve total
una varianza quantitative
mínimastatistical
a inference.
explicar Figure 11.8a70%):
(e.g., shows the scree graph (circles)
for the PCA summarized in Table 11.1b. This is a relatively well-behaved example, in
which the last three eigenvalues are quite
! M small, leading to a fairly distinct bend at K = 3,
and so a truncation after the first M = 3 principal
2 components.
2
R
An alternative but similar approach is basedm ≥ R the# log-eigenvalue spectrum, or log-
on crit (11.12)
eigenvalue (LEV) diagram. Choosingm=1 a principal component truncation based on the LEV
diagram is motivated by the idea that, if the last K − M principal components represent
uncorrelated noise, then the magnitudes of their eigenvalues should decay exponentially
where R2mwith
is increasing
defined as in Equation
principal component11.4.
[Link]
course
behavior theshould
difficulty comesin in
be identifiable the determining
• “Al ojo”
howa large
partirthe
LEV del espectro
fraction
diagram R2crit
as an de varianza
must be in
approximately order toportion
straight-line be considered sufficient.
on its right-hand side. TheUltimately
M this
> se be a retained
willbusca subjective
los principal
valores components
choice,propio would by
informed thenthe
be the
independientes ones whose
analyst’s del log-eigenvalues
knowledge
que le of the
sucede lie above
data at hand and
the leftward extrapolation of this line. As before, depending on the data set there may 2
the uses to
no, which
or more they
than will
one, be put.
quasi-linear Jolliffe
portions, (2002)
and
> la varianza de modos que solo dan cuenta de ruido deberían caer their suggests
limits may that
not be 70%
clearly≤ R crit ≤exponencial
defined. 90% may
often be aFigure
reasonable range.
11.8b shows the LEV diagram for the PCA summarized in Table 11.1b. Here
en el espectro
M = 3 de wouldvalores
probably propios.
be chosen bySe mostpodría
viewers of identi
this LEV car como
diagram, una the
although recta utilizando
Another essentially
choice is not
subjective approach to principal component truncation is based
unambiguous.
escala logarítmica (ver log-eigenvalue diagram)
on the shape of the graph of the eigenvalues $m in decreasing order as a function of
their index m = (a)
1# % % % # K, known as the (b)
eigenvalue spectrum. Since each eigenvalue
measures the variance
4
represented in its corresponding
5.0
principal component, this graph
is analogous to the power spectrum (see Section 8.5.2), extending the parallel between
2.0
EOF and Fourier analysis.
3
1.0
Eigenvalue

Eigenvalue

0.5
2
0.2

0.1
1
0.05

0 0.02
1 2 3 4 5 6 1 2 3 4 5 6
Principal Component Number Principal Component Number

FIGURE 11.8 Graphical displays of eigenvalue spectra; that is, eigenvalue magnitudes as a function
GF601
of the principal component number - semestre
(heavier otoñocircled
lines connecting 2022points), for a K = 6 dimensional
fi

fi

k
λCriterios
2
hereCriterios
is the más
closest eigenvalues
objetivos:
eigenvalue to areλ 2 (approximately)
, and n ∗
is
itude isindependent
applied to each univariate
grid point Gaussian distribu
to account for the
j objetivos: k converging longitudes poleward. The data over the NH
e number of independent observations in the sample,
north√ of 20 ° N are used to compute EOFs. Note that the
so known as the effective sample size, or the number ˆ k − (khere 2
• cálculo del error a partir de pruebas
degrees of freedom (Trenberth, 1984; Thiébaux and
no
examples n!
paramétricas
presented
( " ∼ N!0#
con
have 2(
datos
also been
k "#
sintéticos
used in Hannachi
• Cálculo
o de del error thea partir dere-sampleos
re-muestreos et al. (2006).
wiers, 1984). Forexperimentos
example,
" ! # 95% de
confidence interval (Monte-Carlo)
Figure 1 shows the spectrum of the covariance matrix
λ2k is>given >
H0: Las H0:
by λ2kPCsLas
1 or PCs2
±representan representan ruido
ruido sample along with their standard errors as given by the first
∗ . The effective
n
ze of a time series of length n involves in general equation of (18) with sample size n = 3 × 52 = 156.
• Análysis asintótico The leading two !
eigenvalues " nondegenerate and
seem
e autocorrelation structure of the series. For $ example,
separated from the rest,
2 overall
but 2 the spectrum looks
e sumDistribución
• y
of the>autocorrelation error muestral
function,
North et al.: Error típicok≥1 1 + de
2 λ:δλ ∼ λ (2/n*)
ρ(k), 1.2 ( k ∼ N ( k # ( k '
ovides a measure of the decorrelation time, and an in general smooth, which n makes truncation difficult.
(se asume Normalidad en
timate% of n∗ is given by (Thiébaux and
laZwiers,
dist. de 1984);
x) Figure 2 shows the first two EOFs. These EOFs explain
$n−1 &−1
n* : grados
= n 1 + 2 k=1 (1 −Note de libertad
k/n)ρ(k) however . (obs. thatindependientes)
there is a bias in the sample eigenvalues for finit
Eigenvalue spectrum
Another alternative is Equations
to use Monte 11.15 Carlo simulations,
and 11.16 are30large-sample approximations. In particu
ee for example
n*
Regla Björnsson
= general
n((1 - ρde 2 and +
1 )/(1
North Venegas
ρ 2 ))
et1 al.: 1997).(Wilks)
2 This can
e achieved by forming eigenvalues
surrogate data by will be overestimated
resampling a (will tend to be larger than their po
art ofSithe
el data
error típico
using terparts)
δλ ∼ λ (2/n)
randomisation. and
An0.5 the
es-1smallest
example would eigenvalues
25 will tend to be underestimated, an
n* = n(1select
e to randomly + 2 increase
aΣnsubsample
(1 - k/n)with ρ(k))
and decreasing
apply EOFs, sample size.
comparable o mayor a la diferencia

Eigenvalue (%)
20
en select another subsample etc. This operation, which
n beentre λ
repeated vecinos,
many times,
Using
éstos Equation
representarán
yields various
11.16a to construct a standard Gaussian variate provide
realisations
(Hannachi etfor al.)the distribution of the relative 15 error of the eigenvalue estimate,
themultipletos
eigenelements from which
degenerados de EOFs. one can estimate
e uncertainties. Another example would be to fix a
bsetDos
of variables then scramble them by breaking √
the
10
# $ %
o más modos “confundidos” (dentro ˆ
hronological order then apply EOFs, and so on. Further n!(k − (k " − 0 n (ˆ k − (k
ontedel error)
Carlo serán combinaciones
alternatives exist to assess uncertainty z = on √ 5
lineales = ∼ N!0# 1"'
2( k
2 (k
e spectrum
de los modosof the covariance
reales (de matrix. One could for
la población). 0
xample scramble blocks of the data, for example two- 0 10 20 30 40
three-year blocks of monthly Equation data11.17
keepingimplies thus some that Rank
arts of the autocorrelation structure, (e.g. Peng and Fife
Figureotoño
GF601 - semestre 1. Spectrum,
2022 in percentage, of the covariance matrix of winter

304 δλ ≃ 1 13: Empiricalδλ ≃ 0.6 Functions


Orthogonal

True Estimated (n=300) Estimated (n=1000

GF601 - semestre otoño 2022


proyección polar en su mapas). Compare este resultado con el que se obtiene al us
Ej: SLP’ mensual (may-oct
correlación 1960-2012)
o datos normalizados (una figura de 6 paneles ayudaría esta comparación).

PC 1:3 ΣλPC/Σλ = 0.46


EOF SLP’ mensual (Mayo-Octubre, sin normalizar)

GF601 - Otoño 2017


EOF SLP’ mensual (Mayo-Octubre, normalizado)

GF601 - semestre otoño 2022


Rotación de EOF (REOF)

¿Para qué ?

Los V&V propios y descomposición lineal de X entrega información muy útil, pero
introduce limitaciones no necesariamente presentes en la naturaleza (ortogonalidad).

Los procesos físicos no son independientes

→ Como regla general, hay que tener cuidado al interpretar los modos secundarios.

La rotación de EOFs es un post/proceso que “relaja” la condición de ortogonalidad y


crea patrones más localizados que, en ciertos casos, son más fáciles de interpretar.

GF601 - semestre otoño 2022


Las R-EOF resultan de transformar linealmente (rotar) las EOFs asociadas a M


componentes principales:

ER = E(M) T

con T, la matriz de rotación y E(M) = E(k=1:M)

Las nuevas PCs “rotadas” se obtienen al proyectar X sobre ER:

UR = ERT X

GF601 - semestre otoño 2022


is called an orthogonal rotation. Otherwise the rotation is called oblique.


Si TTT = I, la rotación esRichman (1986)
ortogonal. Enlists
caso19 contrario,
approaches la to rotación
defining the
se rotation
dice matrix [T
e rotation matrix, and the achieveof
matrix simple structure,
rotated althoughis his
eigenvectors list is by
denoted not the
exhaustive. However, by
oblicua.
rthogonal, that is, if !T"!T"commonly
T usedthe
= !I", then approach is the orthogonal
transformation Equation rotation
11.22called the varimax (Ka
varimax rotation is determined by choosing the elements of [T] to maximiz
hogonal rotation.
Rotación Otherwise
ortogonal the rotation
usando criterioisVarimax
called oblique.
para T
986) lists 19 approaches to defining the rotation matrix  [T] in $ to %2 
order
M K K
structure, although his list is not exhaustive.
!
However, 
!
by 1 !
∗ far the most∗2
4
Se busca maximizar: ek#m − ek#m  #
d approach is the orthogonal rotation called the varimax (Kaiser 1958).
m=1 k=1 K k=1 A
n is determined by choosing the elements of [T] to maximize
where
 $ %2 
M K K ẽk#m
! ! ∗
 ek#m −4 1 ! ∗ 2 ∗
ek#m  # ek#m = ( (11.23a)*1/2
K k=1 M
) 2
m=1 k=1 ẽk#m
m=1

are scaled versions of the rotated eigenvector elements. Together Equation


Genera patrones más 11.23b
simples definey the normal varimax, whereas Equation 11.23 alone, using
localizados.
∗ ẽk#m elements ẽ , is known as the raw varimax. In either case the t
eigenvector
ek#m = ( * 1/2
k#m (11.23b)
is )
sought
M
2
that maximizes the sum of the variances of the (either scaled or
rotatedẽk#m
eigenvector elements, which tends to move them toward either their
m=1
minimum (absolute) values (which are 0 and 1), and thus tends toward sim
The solution is iterative, and is a standard feature of many statistical softw
ons of the rotated eigenvector Theelements.
results ofTogether
eigenvectorEquations
rotation can11.23a andon how many of the o
depend
he normal varimax, whereas Equation
vectors 11.23
are selected for alone,
[Link] thesome
That is, unscaled
or all of the leading rotated
ments ẽk#m , is known as themay
rawbevarimax.
differentInif,either
say, Mcase
+ 1the transformation
rather than M eigenvectors are rotated (
maximizes the sum of the and variances
Livezey, of the
1988).(either scaled
Unfortunately
GF601 - semestre otoño 2022
or raw)
there is squared
often not a clear answer to th
Caso K = 2496 CHAPTER ! 11 Principal Component (EOF) Analysis
mapas

(a)

x1 x2
x2
e1
e1 + +
– ++
EOFs
e2
e2
x1

(b)

x1 x2

REOFs (orthogonal)
~
e2
x2 ~
e1
+ +

+
~
e1
~
e2 –
x1

(c)

~
e2 x2 ~
e1
+ +

+
REOFs (oblicua) ~
e1
~
e2 –
x1

FIGURE 11.11 GF601


Schematic comparison of (a) unrotated, (b) orthogonally rotated, and (c)
- semestre otoño 2022
Table 11.3 summarizes the effects of three common scalings of the eigenvectors on
the
Si Tproperties of the principal components. The first row indicates their properties under
ortogonal…
the scaling convention !!em !! ≡ 1 adopted in this presentation. Under this scaling, the
expected
y E no se value (mean)
escala of each
antes of rotación,
de la the principallascomponents
REOFs serán is zero, and the variance
ortogonales.
of each is equal to the respective eigenvalue, $m . This result is simply an expression
> Las RPCs asociadas no estarán anti-correlacionadas.
of the diagonalization of the variance-covariance matrix (Equation 9.54) produced by
adopting the geometric coordinate system defined by the eigenvectors. When scaled in
Si se
this usa
way, theuna algún between
correlation factor de escala,component
a principal las REOFs um no
andserán ortogonales.
a variable xk is given by
Equation
> Las RCPs 11.7. The correlation
estarán between um and thesolo
anti-correlacionadas standardized
para el variable
caso ||ezkk||is=given
λk-0.5by
the product of the eigenvector element and the square root of the eigenvalue, since the
standard deviation of a standardized variable is one.
The eigenvectors sometimes are rescaled by multiplying each element by the square
root of the corresponding eigenvalue. This rescaling produces vectors of differing lengths,
!!em !! ≡ %$m &1/2 , but which point in exactly the same directions as the original eigenvectors
with unit lengths. Consistency in the analysis formula implies that the principal compo-
nents are also changed by the factor %$m &1/2 , with the result that the variance of each um
increases to $2m . A major advantage of this rescaling, however, is that the eigenvector
elements are more directly interpretable in terms of the relationship between the principal
components and the original data. Under this rescaling, each eigenvector element ek#m is
numerically equal to the correlation ru#z between the mth principal component um and the
kth standardized variable zk .
The last scaling shown in Table 11.3, resulting in !!em !! ≡ %$m &−1/2 , is less commonly
used. This scaling is achieved by dividing each element of the original unit-length eigen-
vectors by the square root of the corresponding eigenvalue. The resulting expression for
the correlations between the principal components and the original data is more awkward,
but this scaling has the advantage GF601that all theotoño
- semestre principal
2022 components have equal, unit
SLP’ mensual (HS)

EOFs

R-EOFs
(10 pcs)

GF601 - semestre otoño 2022


PC2, -SOI

(series filtradas)

<PC2, SOI> = 0.55


<PC3, SOI> = 0.47

<RPC2, SOI> = 0.14


<RPC3, SOI> = 0.54
GF601 - semestre otoño 2022
factor de escala: λk0.5

REOF1 REOF2 REOF3

factor de escala: λk-0.5


REOF1 REOF2 REOF3

GF601 - semestre otoño 2022


Análisis de Correlación Canónica (CCA)

De forma a similar a PCA, CCA es una metodología que permite caracterizar múltiples
variables a partir de un grupo más pequeño de nuevas variables representativas del
conjunto, en este caso llamadas ‘variables canónicas’.

CCA se enfoca especí camente en el análisis de dos grupos de variables.

Aunque con resultados diferentes, esta técnica guarda cierta similitud con el análisis
EOF/PC conjunto.

GF601 - semestre otoño 2022


fi

C
# $ #
Var matrix¼of the J variables in y. ¼
variance-covariance $ The ½matrices !S , and !Syx # contain ð1
#
w S w , v S w , w RC # ½ I # xy
thetienen
Si se covariances between
dos grupos deall x e y,oflas
combinations
variables thevariables of x and theseelements
elementscanónicas of y, and
construyen
T
are related according to !S # = !S las .variables
#
[Rcomo
C ] is unadiagonal
the
A
transformación
CCA
matrix
transforms
lineal
ofxy
pairs
the
of
decanonical
yx
original centered
contenidas en cada uno de estos
correlations,
data vectors x!
and y !
into sets of new
grupos: 2 v and w , defined by the 3 dot products
variables, called canonical variates,r m0 0m & & & 0
C1
6 0# I r C2 ! 0 &&& 0 7
T !6 7 J%(
vm = am x 6
=0 a m"i 0 xi " m = 1" ' '
rC3 & & & 0 7 ' " min$I" (12.2a)
½RC # ¼ 6 i=1 7: ð1
6 .. .. .. . . .. 7
and 4. . . . . 5
0# J 0 0 &&& r CM
wm = bTm y! = bm"j yj! " m = 1" ' ' ' " min$I" J%& (12.2b)
e definition of the canonical vectors is reminiscent
j=1 of PCA, which finds a new orthonormal basi
multivariate data set (the eigenvectors of its covariance matrix), subject to a variance-maximizin
This construction of the canonical variates is similar to that of the principal components
In De
CCA, two new
(Equation
umanera bases are
11.1),
equivalente inadefined
that
los eachbyisthe
EOFs, a
los canonical
linear
coe vectors
combination
cientes am,i
a
(a
y b
m and
sort
m,j
b
ofm . However,
weighted
conforman una these
average)
nueva basis
of v
m
therbase
orthogonal
elements
de
nor
of of
the
dimensión
unit length.
respective
m, y serán
The
data
los
canonical
vectors
vectores x! variates
and y! are
. These
canónicos
the projections
vectors
asociados
ofgrupo
the
ofa weights,
cada amcentered
and
de bm , dat
and y0are called
onto the canonical
the canonical vectors.
vectors One be
and can data- and canonical-vector
expressed in matrix formpair need not
through the have the for
analysis
variables.
same dimension as the other. The vectors x! and am each have I elements, and the vectors
y! and bm each have J elements. The
v number
¼ ½A # of pairs,
T
x 0 M, of canonical variates that can ð1
De forma matricial:
be extracted from the two data sets
ðM x is
1Þ equal
ðM xtoIÞthe
ðI xsmaller
1Þ of the dimensions of x and y;
that is, M = min$I" J %.
The canonical vectors am and bm are the unique choices that result in the canonical
variates having the properties
T
w ¼ ½B# y0 : ð1
Corr!v1 " w1 # ≥ðMCorr!v
x 1Þ 2 "ðMw2x# JÞ≥ ·ðJ· ·x≥1ÞCorr!vM " wM # ≥ 0( (12.3a)
$
k = motoño 2022
rC "- semestre
GF601

fi
Se busca representar ambos grupos mediante componentes principales, es decir un
subconjunto con un mismo número de variables canónicas.

Puesto que no necesariamente I = J, este número será M = min(I, J).

A diferencia de la base de nida mediante EOFs, en la cual se busca maximizar la


varianza conjunta de las variables originales, los vectores canónicos se de nen de
manera tal que:

Se requiere también que las variables canónicas tengan varianza 1:

GF601 - semestre otoño 2022


fi

fi
v !Sv " !Svw " !I" !RC "
Var = = # (12.4a)
w nen un!Súnico
Las propiedades anteriores de par de vectores
wv " !Sv "
canónicos.
!RC " !I"

Los rCm se reconocen como coe cientes de correlación canónicos:


where !RC " is the diagonal matrix of the canonical correlations,
HAPTER 12 Canonical Correlation Analysis (CCA)
!
 
rC1 0 0 · · · 0
 0 rC 0 · · · 0 
  ! !
at is, because the CCA is calculated  0 from
!RC " =  0
2
r the
· · · centered
0  data x and y whose
(12.4b)
C 3 %
tors are both 0, the averages of the  %%canonical
 % % %
%% %% variables
%% 

vm and wm are both zer
%
t all the intercepts in Equation 12.20 0are0also 0 zero.
· · · rCMEquation 12.22 also holds whe
A has been calculated from a principal component truncation of the original (cent
iables, Thebecause E!uof
definition x" =theE!u y " = 0.
canonical vectors is reminiscent of PCA, which finds a new
Once
Por the CCAbasis
orthonormal
construcción,haslafor
been fit, the
a single
descomposición basic
queforecast
multivariate data
se setprocedure
obtiene CCAisesasparticularmente
(thedeeigenvectors follows. First,útilcen
of its covariance
!
uespara
forderivar
matrix), to a variance x
predictores modelos (or
thesubject
predictor field maximizingits multi-variados.
lineales first few Inprincipal
constraint. CCA, two components, ux ) are
new bases are defined
by the canonical
Equation 12.5a tovectors am andthe
calculate bm .MHowever,
canonicalthese variates
basis vectors
vm are
to neither
be usedorthogonal
as regre
nor ofCombining
Puesto
dictors. unitestán
que length. The canonicaluna
normalizadas,
Equations variates
12.20 are canónica
variable
throughthe projections
12.22,puede
theof !M
the centered
modelarse
× 1" endata vectors
función
vector of de
predi
#
la and y# onto the canonical vectors, and can be expressed in matrix form through the
xotra:
onical variates is
analysis formulae
forecast to be
ŵ = #RC $v%# (1
v = !A" x (12.5a)
&M×1' &M×I' &I×1'
ere #RC $ is the diagonal !M × M" matrix of the canonical correlations. In genera
ecastandmap ŷ will need to be synthesized from its predicted canonical variates
uation 12.9b, in order to see theGF601
forecast inotoño
- semestre a physically
2022 meaningful way. Howev
fi
fi

would fill the


algorithms to extra rows of one and
find eigenvalues of the matrices
−1/2 9.60,in
Equation
eigenvectors forEquation
although
−1 two 12.9.
general matricesEquation
algorithms
−1/2 are
are used12.24
less most can
frequently
stable
!Syy "If [A]!Sisyxof "!Sfullxx "matrices
!Sxya "!S
rank, yy " not #symmetric,
lower-triangular matrix [B]
(1
satisfyin
be difficult
Comonumerically
en PCA, thancomputationally
routinesCCA
el análisis because
designed in general
specifically
requiere resolver these
for real and symmetric
un problema are matrices.y vectores
de valores and
algorithms
The to −1/2
find eigenvalues
eigenvalue-eigenvector −1 and found
computations
−1/2
using thefor
eigenvectors are Cholesky
easier general
and decomposition
matrices
more stable, areof less
and [A]. (A
the lower-trian
stable
same
propios. respectively.
!Syy " Equation
!Syx "!Sxx " !S12.25a "is
xy "!Syyabove
M Adimensioned
#and
T R I XtoA the
L G right
E B R$IA ×
of
R the
I%,
E V (12.25b)
and
E W Equation
Imain diagonal; 12.25b
i.e., bi%j = 0isfor
dimen
i<j
numerically than routines designed specifically for real and
results are achieved, if the eigenvectors em and fm are calculated from the symmetric symmetric matrices.
The$J
matrices × J %. Here the
eigenvalue-eigenvector reciprocal square-root
computations are matrices
easier and must
more be
stable,bsymmetric
and
=
√ the same
a (Equation
y. En
Equation
este 12.25a
caso,
and is
el
not dimensioned
conjunto
derived from $I ×
propiedades
I%,
Choleskyand Equation
buscadas 12.25b
decompositions se is dimensioned
puede
of the alcanzar
corresponding
1%1 resolviendo
1%1
inverses el
ortheo
results are achieved, if the eigenvectors em and
Equation 9.58afm expresses
are calculated
the from
spectral the symmetric
decomposition of [A] in
Hereproblema
the reciprocal
de square-root
V&V para el matrices −1/2
siguiente must
par asdebe
the symmetric
only nonzero
matrices: (Equation
element in 9.64),
the firstsymmetric
row of [B], the Choleskyin
by
matrices other means. The eigenvalue-eigenvector
and −1
Equations 9.58b pairs
−1/2
and for
9.58c the
show the same matrices
(12.25a)
decomposition in the
rived from Cholesky decompositions!Sof xx "the corresponding
!Siteratively,
xy "!Syy " by !Syx "!S
inverses
xx " or obtained
calculating the nonzero elements of each of the su
tion 12.25 can be
means. The eigenvalue-eigenvector pairs−1/2
computed using
in turn
for the
Thean algorithm
matrix
according
symmetric
of eigenvectorsspecialized
to −1/2 !in Equa-
matrices
diagonalizes to the task,
the original or throu
matrix [A
−1 " ! matrices. " ! th
and singular value decomposition
!Sxx " !Sxy(Equation
"!Syy " !Syx9.68) "!Sxx " operating 0#848the on these (12.25a)
0#530 185#47 110#84 0#8 In
can be computed using an algorithm specialized T
to the task,
!E"!−1 !A"!E" orT= through j−1
#
case, the
alue decomposition results9.68)
(Equation are !E"!&"!E"
424 Con
operating H and
these R 9 Matrix, Algebra
A P T E!F"!&"!F"
matrices.−1/2 Inrespectively
−0#530
the and
− Random
ai%jlatter bi%k b(compare
0#848 j%k
110#84 77#58
Matrices Equatio 0#5
and and 9.50a), T where the
!S Tyy "−1/2
columns !S of
"!S [E]
"−1
are
!S the
"!S e
" b and# ! the k=1
= columns "
% of(12.25b)
j=[F]1% are i − 1'f
& & & %the
esults are !E"!&"!E" and !F"!&"!F" , respectively (compare Equations
yx xx xy yy m i%j 254#76 9.680
= bj%j = !$"#
Regardless of
, where the columns of [E] are the em andhow the eigenvectors
−1/2 the where
columns [E]
−1 is e
ofthe
m[F] and are
matrix f ,
mthe
−1/2 of and 0 their
fm . 8#29
eigenvectors common
for both eigenvalues
[A] and [B] (i.e.,
respectively. Equation 12.25a
!Syyis" dimensioned
!Sand "!S " $I ×
!S I%,"!Sand " Equation
# 12.25b is (12.25b)
dimensioned
less
Losof how arrived
the
primeros at, the ecanonical
M valores
eigenvectors and
propios
m f m ,decorrelations
and yxtors).
their
ambos common
Because
and
xx The matrix
sistemas
of
xy
the
canonical
eigenvalues vectors
serán idénticos.
!"#
orthonormality
'mof , are areeigenvectors,
the
Acalculated
yy contains the eigenvalues of [A], which are t
partir dethe from
éstos
inverse
the
$J × J %. Here the reciprocal square-root valuesmatrices
of [B] onmust
the be symmetric
diagonal of 1/2
$ .(Equation
That is, 9.64),
1/2
& 1/2is a diagon
thesecanonical
derivan canonical
correlations
los
respectively.M coe correlations
and
Equation
canonical
cientes de are simply
vectorsbeen
correlación
12.25a is decompositions
dimensioned
arethe positive
calculated
replaced
canónicos:
× byand square
from
its them.
transpose
Equation roots The
!"#
in of the
Equation
12.25b is M nonzero
!"#
9.59.
dimensioned
i−1 Finally, eigen
the s
and not derived from Cholesky $ 1/2
, $I
where ofthethe
I%, $ corresponding
are the eigenvalues inverses
of [A]. % or obtained
Equation 9.63 is still
correlations are simply the positive square roots254#76 kof the + M
8#29 nonzero
= k
263#05, eigenvalues,
equals b the= sum a −of theb 2
diagonal # eleme
by×other
$J J %. Here
[Link] square-root
eigenvalue-eigenvector these matrices
matrix,
!pairs must
eigenvalues
185#47
for
+
the
are
77#58
be
zero,
=
symmetric
symmetric
so thisi%imethod
263#05. ♦
(Equation
matrices
i%i
cank=1 be ini%k 9.64),
used Equa-
to find a squa
and
tionnot derived
12.25 be!computed
can from Choleskyusing decompositionsm
=
anrCisalgorithm
not of (
full
ofm #
rank.
the m Note= 1#
that
corresponding
specialized )
to )
!"#)
the# M*
1/2
also conforms
inverses
task, to thethe
or obtained
or through definition
by r
other means. = ( # m = 1# ) )
The eigenvalue-eigenvector) #
ItM*
since
is a good
!"# 1/2
idea
%!"# 1/2 T
to do
& =these
!"# (12.26)
1/2 1/2
calculations
!"# = !"#. The
inIndouble square-root
precision de in
singular value decomposition
C m m (Equation 9.68)pairs for theonsymmetric
operating these matrices. matrices in Equa-
the latter
tion 9.63 produces
accumulation roundoff a symmetric
errors thatsquare-root
can lead to matrix.
apairs It is more
division tole
by zero
tion
case,
Los vectores The
12.25
the pairs
can
results
propios of
be canonical
computed
are !E"!&"!E"
de cada vectors
9.3.4
using
T
and
sistemaan are calculated
Square
algorithm
!F"!&"!F" ,
m, fm)dimension
(ematrix
decomposition Roots
T specialized from
respectively
se to utilizan
roundoff
the
ofeven corresponding
aerror
to Symmetric
the
(compare
para task,
derivar or Matrix
through
Equations
los of
the
9.68 eigenv
vectoresis la
of canonical vectors are calculated from the large
corresponding pairs of eigenvectors,
K, ifwhen
[A] istheofmatrix
full dimension
rank.
singular value
using
and 9.50a), decomposition
where the columns(Equation
of [E] are 9.68)
tionally,
The operating
eas
thesecond and
m well asthe
commonly on these
columns
truly) zero
used matrices.
of
method[F] are
eigenvalues to do In
the
findnot the
a . latterundefine
fmsquare
produce root of [
canónicos respectivos: T Consider two
T square matrices of the same order, [A] and [B]. If t
case,Regardless
the resultsofare how and and
the eigenvectors
!E"!&"!E" !F"!&"!F"and fm, , respectively
emeigenvectors,
Equation andand
9.63 their
can
 becommon(compare
extended to Equations
eigenvalues
find
is computable even if the symmetric the '
square m 9.68
, are
root of a
matrix m
and 9.50a),
arrived where
at, the the −1/2
canonical columns of [E]and
correlations are the isespectral
if canonical
Using [A] the and the
vectors
msymmetric
−1/2 columns
and fullof(Equation
areofcalculated
decomposition [F]!A"are
rank. from the
Because famTfor
them.
9.50) .matrix
The
[B], has the
a = !S " e  a m
its
= !S
inverse, xx " soand
e
also m

 (12.27a)
willofcommon
itthe
have the
= !B"!B"
same eigenvectors as the squ
(
Regardless
canonical m of how
correlationsxx the m
are eigenvectors
simply the e
positive and f ,
square their
roots M eigenvalues
nonzero eigenvalues,
' , are
and and
m = 1# )
m
) )
holds, # M*
Accordingly,
then
m
[B] multiplied mby= !B" 1#
=
itself ) ) # M*
)yields
!A" 1/2
=
[A],!E"!$"
so [B]
m
1/2
!E"
is
T
% to be
said
arrived at, the canonical correlations and canonical
! bor !B" vectors
−1/2  are calculated from them. The
canonical correlations
−1/2 
bm = !Syy " are fsimply the positive
m = !S
= !A"
square
yy " 1/2
f
. Unlike
rootsm  the square
of the(12.27b) roots
nonzero of scalars,
eigenvalues, the square roo(1
Cm = = 1# ) defined.
m r (ism #not m ) ) # M* That is,!A#there =
M −1/2 (12.26)
−1/2 T
uniquely are!E#!"#
any number !E# 'of matric
! Equation 9.60, −1/2 although two algorithms are used most−1/2 frequently
$ = $fm $The 1,Since
= pairsthis $em $ = $f
transformation
of canonical m$ =
vectors 1,=calculated
ensures
rCare this(
unittransformation
where
# variances
If m !"#
from
[A] = is1#the
of) is
)for ensures
the acanonical
a# rank,
diagonal
corresponding
)
full M* unit
matrixpairs
lower-triangular variances
with of elements
eigenvectors,
matrix for
(12.26) the
, the ca
$k satisfying
[B] re
m
roots of the eigenvalues of [A]. The implications of Equation 9.6
mGF601 - semestre otoño 2022

fi
where
standardized
where the!I$e
the
Since !I× ×$II"= diagonal
"diagonal
variables $f would
$ = matrix
be
matrix
1, this (Equation 9.31)
#Dx$$ (Equation
transformation
#D 9.31)
ensures contains
contains
unit the
the standard
standard
variances for deviations
deviations
the canonical
xa∗ = a #D $%
x hold for the canonical vectors bm(12.6)
m m
Y seofof the x variables,
x variables,
thevariates; and a similar
quethat is,and a similar equation equation m would
wouldm
hold for the canonical vectors b and
cumple ∗ m and
the!J!J××JJ""diagonal
the diagonal matrix
matrix #D #Dy$$ containing am = am #D
containing thex $%standard
the standard deviations deviations of
of the
the (12.6)
yy variables.
variables.
where the !Iof×whether
Regardless I " diagonal
a CCA matrix
is y #Dx $ (Equation
computed using 9.31) contains the standard variables,
deviations
T T −1/2 standardized −1/2 or unstandardized
T
Regardless
of !Iof× whether
var$v
x variables, %= a CCA
amsimilar is"acomputed
= em$ !S using standardized eorm unstandardized
=theem estandard
m = vectors variables,
1# deviations
bm (12.28)
thethe mand m equation
where the I " diagonal !S
matrix "would
(Equation !S hold
"!Sxxfor " the canonical and
resulting canonical xx
correlations #Dare x the xx
same. xx9.31) contains
the
of resulting
thethe!J x× J " canonical
diagonal
variables, and correlations
matrix
a similar#D $ are the would
containing
equation same. the holdstandard for the deviations of vectors
the easily.
y variables.
bm The
Correlations between the original and canonical
y variables cancanonical
be calculated and
the Correlations
!Jbecause
Regardless
× J " of !S between
whether
diagonal
−1/2
" matrix the
isCCA
a original
symmetric
#Dis computed
$ and
and canonical
containing the
using the variablesdeviations
eigenvectors
standardized
standard ecan or be mutually
are calculated
unstandardized
of the y easily. The An
orthogonal.
variables,
variables.
correlations between corresponding
xx y original and canonical variables, sometimes called
m
correlations
the obvious
resulting
Regardless
La relación of
entre between
analogous
canonical
whether
las corresponding
a equation
correlations
CCA
variables is canare
computed
canónicas original
be the written
using
y las and
same. for canonical
the variances
standardized
originales variables,
or
suele var$w sometimes
%.
unstandardized
caracterizarse called
variables,
homogeneous correlations, are given by m
homogeneous
the Correlations
resulting correlations,
between
canonical theare givenareand
original
correlations bythecanonical
same. variables can be calculated easily. The
mediante su correlación. T T −1
correlations between corresponding
Correlations between the original corr#v mand % x
original=
T canonical
$ a and
Tm #S canonical
variables
x%x $ #D x −1 can
$ variables,
be sometimes
calculated easily. called
(12.7a)
The
corr#vm!1×I" % x $ = !1×I" am #S!I×I" $#D $ (12.7a)
homogeneous
correlations correlations,
between are given
corresponding !1×I"
by
original and x%x !I×I"
canonical
!1×I" !I×I" !I×I"
x Desviación
variables, standard (diagonal)
sometimes called
Correlaciones
homogeneous
and
homogéneas:
correlations, are given byT T −1
and corr#v m % x $ = a m #S x%x $ #D x $ (12.7a)
!1×I"TT TT !I×I" !I×I"−1
corr#v % x
corr#wmm % yT $ = bmTm #S
$ = a!1×I"
#Sx%x $$#D
#D x
−1
$$−1 & (12.7a)
(12.7b)
y%y y
corr#w!1×I" % y $ = !1×I" bm #S y%y $#D
!I×I" !I×I"$ & (12.7b)
and m
!1×J"
!1×J"
!1×J" !J×J"
!1×J" !J×J" !J×J"
y
!J×J"

and
These equations specify vectors of correlations,
T T between −1 the m th
canonical variable vm
These equations corr#w % y $ = b #S $ #D $ & th (12.7b)
of the Ispecify
originalvectors of!1×J"
mcorrelations, m betweeny%y y the m canonical variable v
and each variables x
corr#wmx%,yand i , and
T betweenT
bm #Sy%ythe
$ =between
!1×J" the canonical
−1
$#Dcanonical variable w m and each
(12.7b)of
m
y$ &
!J×J" !J×J"
and each of the I original variables
the J original variables yk . Similarly, variable w and each of
!1×J"the vectors
i
!1×J" !J×J" of heterogeneous
!J×J"
correlations,
m between
the
theJ
These original
equations
canonical variables
specify
variables yand
k . Similarly,
vectors
the of
other the vectors
correlations,
original of heterogeneous
between
variables are the m th correlations, between
canonical variable vm
Correlaciones heterogéneas:
the
andcanonical
These each of the
equations variables
original
I specify and the other
variables
vectors original
i , and
of xcorrelations, betweenvariables the
between arethe mthvariable
canonical canonical and eachvmof
wmvariable
T T −1
the each
and J original
of thevariables
I originalyvariables corr#vxm,% and
k . Similarly,
ythe
T
$= aTm #Sx%y
vectors
between ofthe$heterogeneous
#Dcanonical
y $−1 correlations,
variable w and (12.8a)
between
each of
corr#vm!1×J" i% y $ = a #S $#D $ m (12.8a)
!1×I"
the Jcanonical
the original variables yand the other
k . Similarly, !1×J"
original
the vectors m variables
!1×I" !I×J"
x%y
!I×J" y
!J×J"
are
of heterogeneous
!J×J"
correlations, between
the
andcanonical variables and the other original T variables
T are−1
and corr#vm % y $ = am #Sx%y $#Dy $ (12.8a)
−1
corr#v
corr#wmm!1×J" %%yxTT$$ = = a!1×I"
T
bmTm #S
#S!I×J"
x%y
y%x
$$#D
#D yx$
!J×J"$ −1
& (12.8a)
(12.8b)
T T −1
corr#w!1×J" % x $ = !1×I"
!1×J" bm #S
!1×J" !I×J"
!J×I" $#D x$ &
!I×I"
!J×J" (12.8b)
and m
!1×J"
y%x
!1×J" !J×I" !I×I"
and The canonical vectors am and bm are T chosen T to maximize −1 correlations between the
The canonical
resulting canonicalvectorsvariates amcorr#w
v and
and
GF601 mm% -x
bw, are
but $= chosenbmotoño
(unlike
semestre
#SPCA)
to $#D
maximize
y%x2022 $ & or correlations
xmay may not be between (12.8b)
the
particularly
−1

!I×n" !I×I" !I×n"


How well Because the covariance
the canonical matrices the
variables represent of the canonical
underlying variatesisare
variability !n −to1"−1 #V
related
−1 T
how and
accurately the
!n − 1"
underlying
Reconstrucción de variables originales: #W$ #W$ = #I$
variables can(cf.
be Equation
synthesized 12.4a), substituting
from the canonical Equation
variables. 12.10
Solving
B A S I Cthe
S Oanalysis
tion yields (Equation 12.5) yields the CCA synthesis equations 513
A equations
F C C9.30
! T −1
#Y $ = #B̃$ #W$T % ! −1 (12.10b)
!J×n" !J×J" !J×n" x = #1Ã$ v ! I (12.9a)
! T
and $ = !I×I" #X!I×1"
#Sx&x!I×1" $ #X$ = #Ã$−1 !#Ã$−1 "T = ãm ãmT
e covariance matrices of the canonical variatesn − are1−1!n − 1"−1 #V$T #V$ = #I$ m=1
!
−1
#W$T #W$ = #I$ (cf. Equation 12.4a), substituting y = #B̃$ Equationw % 12.10 into Equa- (12.9b)
elds and !J×1" !J×J" !J×1"

If I = J (i.e., if the dimensions


Ojo: ~ denota1 matrices cuadradas (I ⨉ I o J! of the data vectors
1 I J).! T
⨉ x and y are equal), then the matrices
!J [A]
−1 −1 T T
and [B],
#Sx&x $ = whose ! rows
T are −1 #S
the
#X $ #X$ = #Ã$ !#Ã$ $ =
corresponding
−1
y&y " = T #Y
Mã $ #Y$ =
canonical
ã T #B̃$ !#
vectors, B̃$
(12.11a)are" =
both b̃
square. b̃
m mIn &
this n
case −# 1
Ã$ = #A$ and #B̃$ = #B$ in n − 1 12.9,
Equation
m m
and the indicated matrixm=1
inversions
Para satisfacer ésto (en el caso que I ≠m=1 J), se deben incluir modos “fantasmas”,
can be con
rellenando calculated.
0s If I $= J then
elementos de Bone
(si of
I < the
J) matrices
o de A [A]J or
(si < [B] is nonsquare, and so not
I).
invertable. where the canonical
In that case, the last Mvectors
− J rowswith of [A]tilde
(if accents
I > J ), orindicate
the last columns
M − I rowsof of
the inve
[B] (if I < corresponding
J ), are filled outmatrices. These decompositions
with the “phantom” J
canonical are corresponding
vectors akin to the spectral
to the deco
Las zero
variables 1 (Equation
canónicas
eigenvalues, 9.51a)
no
! T as described of the
representan
−1 in Section
−1 T two ! covariance
[Link]
T matrices. modosAccordingly, the proporti
importantes del
#Sy&y $ = #Y $ #Y$ = #B̃$ !#B̃$ " = b̃m b̃m & th (12.11b)
conjuntoEquation
den − variance
variables of
1 12.9 describes x and
de cadathe y represented
synthesis
grupo. by their
of individual
Es decir,
m=1
canonical
no danobservations
m necesariamente ofvariables
x and y are
on de
cuenta the una
basismayor
fracción of theirdecorresponding canonical variables. In matrix form (i.e., for the full set of n
varianza conjunta. T
observations), these become
anonical vectors with tilde accents indicate columns of 2the inversesmofmthe tr!ã ã "
Rm !x" =
ng Por
matrices.
esto, These decompositionsevaluar
es recomendable are akinla tovarianza
the spectral
−1 de decompositions
cada tr!#S x&x $" explicada por cada
grupo
! T T
.51a) of the two covariance matrices. Accordingly, #X $ = #Ã$ the #V$proportions of the (12.10a)
variable
x and canónica,
y represented by la cual
their thse puede !I×n"
canonical calcular
variables como:
are!I×n"
and m !I×I"

and tr!ã ã T T
2 m m " 2 tr!b̃ b̃
m m"
Rm !x" = −1 Rm !y" = (12.12a)%
tr!#Sx&x $"#Y $ = #B̃$ #W$ %
! T T
tr!#Sy&y $" (12.10b)
!J×n" !J×J" !J×n"

EXAMPLE 12.1 GF601


CCA- semestre
of the January
otoño 2022 1987 Temperature Data

11$820
7$466 6$743 correlations; reversing7$740
−1$921 7$923 3$840
−1$372 6$743 −12$95 45$47
!Ã#−1 = " and !B̃#−1 = b̄1 b̄T1 = !
the signs 7$740
on the second canonical
(12.16b) 59$91
vectors 61$36
would put pos
11$820 6$743 weights on the maxima 7$923
and 3$840
negative
$ " !7$740"
weights of 7$923# =
comparable
!
magnitudes
""
Ejemplo Wilks: T 7$923
7$740 61$36 61$36 mini
on
59$91 62$77 the
ibutions made by the canonical variates to the respective b̄1 b̄1 = ! covariance " matrices
!7$740" for = !
7$923# ""
nderlying data depend 7$923 61$36 62$77
ibutions made by the on the outer
canonical products
variates to of
thethe columns
respectiveT
b̃2 b̃2 The
of these matrices
−1$372
covariance
= canonical vectors
! " matrices
!−1$372"
(terms
for 1$882 5$279"
3$840# = to Ithaca tem-
! $
summations of Equations 12.11); that is,
TABLE
nderlying data depend on the outer products of the columns 12.1
T of3$840
−1$372 a
these matrices (termsm (corresponding 5$279
1$882 14$75
5$279
x = [Tx, of Tn] T (Ithaca) peratures) b̃ 2 b̃2bm
and =(corresponding!−1$372" to Canandaigua3$840# =
temperatures) for the $
summations ! Equations
" 12.11); that is, ! " 3$840 5$279 14$75
7$466 partition
55$74of the
88$25covariance matrix in Equation 12.13 with I = J = 2.
T
y ã=1 ã[Tx, Therefore the proportions of the Ithaca temperature variance described
1 =! Tn] " !7$466" 11$820# = ! shown are the "eigenvalues" (12.17a)
T (Canadaigua) Also ' (cf. Example 12.3) and the canonical
11$820
7$466 variates
Therefore 88$25
55$74 139$71
88$25
(Equation
the which 12.12a)
proportions are
of
m
the roots.
Ithaca temperature variance described
ã1 ã1T =! " !7$466" 11$820# =correlations,
! are" their
" square (12.17a)
11$820
−1$921 variates 88$25 139$71
(Equation
3$690 −12$95 12.12a) areb
T
ã2 ã2 = ! " !−1$921" 6$743# =! a " " (12.17b)a2 b2
6$743
−1$921 −12$95
3$690 −12$95
1
45$47 2
1
55$74 + 139$71
T
ã2 ã2 = ! !−1$921" 6$743# = ! (Ithaca) " R 1 &x' =
(Canandaigua) (Ithaca)
(12.17b)
55$74 + 139$71
= 0$798
(Canandaigua)
6$743 " −12$95 45$47 " 2 59$52 185$47
T 7$740" T max 59$91 61$36 .0923 1 &x' =
R.0946 −&1618 = 0$798
−&1952
b̄1 b̄1 = ! !7$740" 7$923# = ! " " (12.17c)
59$52 + 185$47
7$923
7$740 61$36
and=T min59$91 61$3662$77 .0263 .0338 .1022 .1907
b̄1 b̄T1 =! !7$740" 7$923# " (12.17c)
7$923 " and
!61$36 62$77 "
0.938 0.593
T −1$372 ' m 1$882 5$279
b̃2 b̃2 = ! " !−1$372" 3$840# =! √ " $ 3$690 + 45$47
(12.17d)
T 3$840
−1$372 r =5$279 14$75
Cm 1$882m 5$279
' 0.969R 2
&x' = 0.770 = 0$202"
b̃2 b̃2 = !−1$372" 3$840# = $ 2 (12.17d)
3$690+
59$52 +185$47
45$47
3$840 5$279 14$75 2
R2 &x' = = 0$202"
fore the proportions of the Ithaca temperature variance described by its two canonical 59$52 + 185$47
es (Equation
fore 12.12a)ofare
the proportions and thevariance
the Ithaca temperature corresponding
describedvarianceby its two fractions
canonicalfor Canandaigua are
es (Equation 12.12a) are and the corresponding variance fractions for Canandaigua are
55$74 + 139$71 2 59$91 + 62$77
R12 &x' = 55$74 + 139$71 = 0$798 R1 &y' =(12.18a) = 0$880"
59$52 + 185$47 2 59$91 +
61$85 77$5862$77
2
R1 &x' = = 0$798 R 1 &y' =
(12.18a) = 0$880"
59$52 + 185$47 61$85 + 77$58
and
and
3$690 + 45$47 2 1$882 + 14$75
R22 &x' =
3$690 + 45$47 = 0$202" R2 &y' =(12.18b) = 0$120$
59$52 + 185$47 = 0$202"
R22 &x' = 2 1$882
(12.18b)
+ 14$75
61$85 77$58

59$52 + 185$47 R 2 &y' = = 0$120$
61$85 + 77$58
he corresponding variance fractions for Canandaigua are
♦GF601 - semestre
he corresponding variance fractions for Canandaigua are otoño 2022

estaciones, utilizaremos 13 componentes. Estos explican el 100% y 94% de la varia


2.5
grupo.
2
Ejemplo: Reconstrucción de [Link].
Medianteen Chile‘canoncorr’ de Matlab, y utilizando el periodo común a ambos gru
la función 1.5
1
(1979-2014), obtenga los vectores, variables y correlaciones canónicas de las PC
0.5
Se quiere estimar P en un Grafique las correlaciones
número grande de localidades obtenidas.
en Chile para el periodo anterior a

scores
0

1979. Como guía para la reconstrucción, se usarán


La figura siguiente
−0.5
muestra
−1
series más largas
las correlaciones (1960-2014)
canónicas de sólo
para cada modo. Éstas 13
son ce
estaciones. para los primeros modos,
−1.5
pero caen a valores bajo 0.8 del modo #8 en adelante.
−2
−2.5
Correlaciones canonicas
1.2
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Método PCA/CCA: 1

w^(PCs(P378)) = RC v(PCs(P13)) 0.8


d. Mediante una regresión lineal simple, calcule ahora series extendidas de las variables canó
0.6
asociadas al grupo mayor de estaciones. Utilice estas series y los vectores canó

r
correspondientes para reconstruir las PCs del grupo mayor entre 1960 y 2014. Finalmente, u
0.4
las PCs extendidas y EOFs del grupo mayor para reconstruir la precipitación.
0.2
Se utilizaron las correlaciones canónicas para crear modelos simples de la variables canónica
segundo 0 grupo. Estos modelos son simplemente v^k = rk uk. Las series extendidas (1960-2014)
1 2 3 4 5 6 7
son, lógicamente, muy buenas estimaciones de las 8series
9 10 11
originales 12
en el 13
caso de los prim
componentes
modos, y van empeorando en el caso de modos mayores. Como ejemplo, la siguiente figura mu
la serie original (negro) y estimada (extendida, rojo) del modo #5. Como se vio (figura 1), este m
la media. Este sesgo resulta de que la función ‘cannoncorr’ entrega variables canónicas
tiene asociado una correlación canónica de ~ 0.94, por lo que la reconstrucción hacia atrás
Extensión modo 1
das en 0 aunque los PCs utilizadas para calcularlas no lo sean. tiempo es aún bastante confiable.
Modelo modo 5
c. Proyecte las PCs del grupo menor de estaciones sobre los vectores canónicos corr
Primera variable canonica − Modo 1 para obtener variables canónicas extendidas (1960-2014).
Segunda variable canonica − modo 5 Grafique la serie de

2.5 variable canónica principal


2.5
obtenida en b y, sobre ella, la serie extendida hasta 1960.
2 2
1.5
La figura siguiente muestra
1.5
la serie de tiempo de la variable canónica principal obtenida
1 con la función ‘cannoncorr'1
(curva negra), y la obtenida como la proyección de las P
0.5 sobre el vector canónico 0.5 asociado, considerando todo el periodo (en rojo). Éstas
scores

scores

0 definición, correlación 1.00 en el periodo común. Sin embargo, se tuvo que corregir un pe
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−2.5 −2.5

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

GF601 - semestre otoño 2022


La figura siguiente muestra los mapas de anomalías de P para el periodo 1967-69. Los paneles de la
izquierda
Ejemplo:y centro indican las anomalías
Reconstrucción resultantes
de la precip. de las reconstrucciones con PCA conjunta (Tarea
en Chile
4) y PCA/CCA, respectivamente. La reconstrucción PCA/CCA es, en este caso, más verosímil
comparado con las anomalías obtenidas del grupo de estaciones de referencia (panel derecho).
Se quiere estimar P en un número grande de localidades en Chile para el periodo anterior a
Como se describió en la tarea anterior, la reconstrucción utilizando PCA conjunta muestra una
1979. Como guía para la reconstrucción, se usarán series más largas (1960-2014) de sólo 13
dispersión importante en las estaciones de Chile central que no se aprecia en los datos de referencia.
estaciones.

GF601 - semestre otoño 2022


Aplicación en campos de variables geofísicas (PCA/CCA)

Como en EOF/PCA, esta técnica se puede utilizar para relacionar dos variables
espacialmente distribuidas o disponibles en una grilla regular.

Las primeras variables canónicas serán la combinación lineal de las múltiples


observaciones (regiones) de cada variable que tengan mayor correlación.

Un problema importante en el uso de CCA es que se requiere que n >> M. Por lo


tanto, su aplicación en campos (donde M es usualmente muy grande) no es directa.

En este caso, la solución habitual es la aplicación de CCA no sobre las variables


originales, sino sobre un grupo menor de variables que las representen bien. La
estrategia es usar las componentes principales.

→ Esto es un análisis PCA/CCA.

De esta forma, como variables x e y de entrada en CCA, se utilizarán las M


componentes principales de cada grupo de variables.

GF601 - semestre otoño 2022


°
10
Ejemplo Wilks: r = .79
ER ! 12 Canonical Correlation Analysis (CCA)
r = .79
(b)
(a) (b)
C2,2(Z500)
C2,2(SST) 18% C2,2(Z500)

°N
23%
50 23%

87
°
30
–74

I I
NP –95 NP
–95 1 0° GM 60 °N GM 60 °N
40 °
40 ° 20 °
20 °
75
r = .79 75

(b)

C2,2(Z500) –67

–67
23%
FIGURE 12.1 Homogeneous correlation maps for a pair of canonical variables pertainin
age winter sea-surface temperatures (SSTs) in the northern Pacific Ocean, and (b) hemis
FIGURE 12.1 Homogeneous correlation 500 mbmaps for The
heights. a pair of canonical
pattern variablesinpertaining
of SST correlation to (a)
the left-hand aver-
panel (and its negativ
ated with
age winter sea-surface temperatures (SSTs) in the
thePNA patternPacific
northern of 500 mb heightand
Ocean, correlations shown in the
(b) hemispheric right-hand panel.
winter
correlation for this pair of canonical variables is 0.79. From Wallace et al. (1992).
500 mb heights.
–95 The pattern
I
NP of SST correlation in the left-hand panel (and its negative) are associ-
GM 60 °N
ated with the PNA pattern of 500 mb40height
° correlations shown in the right-hand panel. The canonical
20 °
correlation for this75pair of canonical variables is 0.79. From Wallace et al. (1992).
CCA parameters may be unstable (i.e., exhibit large variations from batc
for small samples (e.g., Bretherton et al. 1992; Cherry 1996; Friedrerichs
GF601 - semestre otoño 2022
Ejemplo Wilks (PCA/CCA para pronóstico521estacional; Barnston 1994)
CCA APPLIED TO FIELDS

(a)
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180
30 30 30 15

w(JFM Ts CONUS) = RC v(SST globe, +++)


15 15
60 N 15 60 N
–45
0
5 MAM –30
10 5
50 N –15 50 N
–45 –30
–30 0
40 N –15
–15 40 N
–15 15 0 –15 0
0 0 15 15
30 –150
30 N 30 N
45 –15 15 30 30
15 45
20 N –30 20 N
45 –30 –30 30 45
15 –15 30
10 N 0 10 N
0 0 0
–30 –60 0 –15 –30
EQ 15 0 EQ
–30 –15
0 –30
0 0
10 S –1510 S
–30 0 0
15 –45 –45 0 30 30
20 S –30 0 0 20 S
30 –30 0 0
30 15 15 30
30 S –15 30 30 S
–15

522
0 30 15
00 30
40 S
0
0
0 0
15
15 40 S CHAPTER ! 12 Canonical Correlation Analysis (CCA)
45 30 15 30 45 45
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180

(b)
–45 –60
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180 –49 –60
0 0 30 15 –45
30 –45 –30
60 N –15 60 N
–15
–15
–15
JJA 0
–30
15 15 15
50 N
0 0
50 N –30
–15 0 15
15 15
40 N 15 40 N
15 15 30 30
30 –30 0 0
30 N 15
30 N
30 15 0 –30 30 30 30
0
15
20 N –15
0
0 30
30
15 30
20 N
30
0 –45 –45
–15 –15 15 0
10 N –75 15 45 30 –1510 N
30 0
–60 –15 0 30 –45
EQ 0 EQ
–60 –15 –45
–30 –150 –15 45 30 0
10 S 0 10 S
0 –45 –15 0 0 45 30
30 0 30
20 S
60 –60 –15 60 20 S 45
45
60 –45 –45 –15
–15 –15
–30 –15 0 45 30 45
30 56
30 S –30 –15 0 30 S
–15 0 0 0 45
0 –15 –15
–15 0 0
30
56
40 S 0 40 S
0
30 15
–15
0 15 15 15
30
15 15 15
JFM
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180

(c)
FIGURE 12.3 Spatial display of the first canonical vector for predicted U.S. JFM surfac
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180
0
0
30 30 30
tures. A portion of the corresponding canonical vector for the predictors is shown in Figure
–15
60 N
–15
–15
15
30
SON
60 N
Barnston (1994).
–30 –45 –15
50 N 50 N
0 15
15 –30 15 15
–45 15 30
40 N 40 N
15 15 15 45
45 –45
30 N
30
0
–45
–15
–30
0
–30
0 0 15

–45 –30 –15 –150


60 30 N
45 shows that portion of the first predictor canonical vector a1 pertaining to the thre
15 20 N
MAM, JJA, and SON, relating to the forecast for the following JFM. That is, eac
20 N –75 –60 –30 –15
–15 –30 0 0
–30 –30
–30
10 N –45 10 N
–60
EQ
–60
–45
–30
–30 0

0
15
–45 60 –45
–45
EQ three maps expresses the six elements of a1 in terms of the original 235 spatial
–30 –15
through the corresponding elements of the eigenvector matrix [E] for the predic
10 S 15 –30
–30 0 10 S
0 –30 –30 0 –30 30
–45 15 45
45 –60 –60 –15 20 S
20 S –30 30
45
30 S
60 –45

0
–15
–45 –45

–15 –15
–15
–30

–15
–30
–30 0
–15
–15
–30
15 30
15
45
30
30 S The most prominent feature in Figure 12.2 is the progressive evolution of inc
0 15
negative values in the eastern tropical Pacific, which clearly represents an in
–15 15
40 S 15 15 40 S
0 15 0 30
45 15 30 30 15 30 0 15 15 0 0 15
160 w 140 w 120 w 100 w 80 w 60 w 40 w 20 w 00 20 E 40 E 60 E 80 E 100 E 120 E 140 E 160 E 180
El Niño (warm) event when v1 < 0, and development of a La Niña (cold) ev
FIGURE 12.2 Spatial displays of portions of the first canonical vector for predictor sea-surface v1 > 0, in the spring, summer, and fall before the JFM season to be forecast.
temperatures, in the three seasons preceding the JFM for which U.S. surface temperatures are forecast. Figure 12.3 shows the first canonical predictand vector for the JFM forecast,
GF601 - semestre
The corresponding canonical vector for this predictand is shown in Figure 12.3. From Barnston (1994).
otoño 2022
menos de variabilidad
s siguientes sobresalientes,
(Figura 2). Para explicarcomo los ocurren
el grueso en escalas
de la varianza tiempo mayor
conjunta de Z300([Link]
ENSO o considerar
deben 1 88
ello,Ejemplo:
odos el espectro de
(95.1%). Relación entre
valores propios la circulación
no discrimina (Z300
claramente estos ) y la
modos, Tmax
ni los endeChile central (solo
distingue mes de
tes (Figura 2). Para explicar el grueso de la varianza conjunta de Z300 se deben considerar 88 0
.1%).enero) 0 10 20 30 40 50
componentes

Figura 2: Varianza explicada, y error típico (North et al.) de los prim

b. Calcule, ahora, el coeficiente de correlación de Pearson


Z300’ y las anomalías de Tmax (Tmax’). Utilice mapas qu
grilla (restrinja el análisis a las regiones comprendidas e
resultados en relación a lo obtenido en a.

gura 1:espaciales
atrones Patrones espaciales (EOF)
(EOF) del 1er y 2dodel 1er de
modo y 2do modo dedevariabilidad
variabilidad Z300’. de Z300’.
- Otoño 2018

4
R2k (%)

0
0 10 20 30 40 50 60 70 80
componentes
Figura 3: Coeficiente de correlación entre los modos 1 y 2 de Z30
Varianza explicada, y error típico (North et al.) de los primeros 88GF601
modos de- variabilidad
semestredeotoño
Z300. 2022
de Chile. Panel de la izquierda muestra la varianza de Tmax explic
Ejemplo: Relación entre la circulación (Z300) y la Tmax en Chile central (solo mes de
GF601 - Otoño 2018
enero)

Análisis PCA-conjunto:

Patrones espaciales
(EOF) del 1er y 2do modo
de variabilidad para Z300’
y Tmax.

GF601 - Otoño 2018

Figura 4: Patrones espaciales (EOF) del 1er y 2do modo de variabilidad para Z300’ y Tmax (análisis PCA-conjunto).

Este método PCA-conjunto también se podría utilizar para construir un modelo predictivo de Tmax. Es
este caso habría que generar un nuevo grupo de series ‘truncadas” de PCs, proyectando solamente los
datos de Z300 sobre los EOF-conjuntos (considerando los elementos de las EOF correspondientes a
Z300). Las PCs truncadas se utilizarían como predictores en un modelo de regresión de Tmax.

Análisis PCA/CCA:
Correlaciones homogéneas
entre las variables canónicas y
variables originales
2. PCA/CCA
correspondientes (Z300 y Tmax
a la izquierda y derecha, a. En esta parte, analizaremos la covarianza de Z300 y Tmax utilizando PCA/CCA. Para ello, calcule las
respectivamente) EOF/PCs de Z300’ (1a) y de Tmax’ en el centro-sur de Chile (30-40ºS), separadamente. Para lo que
sigue,
Figura utilizaremos homogéneas
6: Correlaciones un mismo número
entre lasde PCs encanónicas
variables ambos grupos de variables.
y variables Éstos deben explicar
originales correspondientes (Z300 al
y
GF601 - semestre otoño 2022
menos el 95% de varianza conjunta de cada grupo de variables. ¿Cuantos modos se retienen?
Técnicas de agrupamiento (Clustering)

GF601 - semestre otoño 2022


Los métodos de agrupamiento tienen como objetivo clasi car datos de naturaleza no
necesariamente conocida.

> Contrasta con el análisis discriminante, en el cual se tiene una conocimiento priori
de los grupos.

Son métodos de exploración más que de inferencia estadística.

Aplicaciones típicas en Ciencias Atmosféricas:

> Reconocimiento de patrones de circulación

> Clasi cación de regímenes de tiempo

El objetivo es encontrar G grupos (clusters) a partir de una métrica que cuanti ca la


similitud entre sus elementos.

Esta métrica medirá la distancia entre elementos de un grupo, la cual debe ser
pequeña respecto de la distancia entre grupos.

GF601 - semestre otoño 2022


fi

fi

fi
En general, se de nirá una matriz de distancia ∆ con elementos di,j = d(xi, xj)

GF601 - semestre otoño 2022


fi
En un espacio de dimensión K, la distancia entre dos vectores puede de nirse de
varias maneras.

Entre ellas, las más utilizadas son:

• Euclidiana: d(x1, x2) = ||x1 – x2|| = [∑k (x1,k – x2,k)2].5

• Euclidiana ponderada: d(x1, x2) = [∑k wk (x1,k – x2,k)2].5


wk = 1 (Euclidiana)
wk = σk-1 (Estandarizada o de Karl Pearson)

• Minkowski d(x1, x2) = [∑k wk |x1,k – x2,k|λ]1/λ


λ=2 Eucl. ponderada
λ=1 City-block

• Mahalanobis: d(x1, x2) = [(x1 – x2)T S-1 (x1 – x2)].5

• Correlación: d(x1, x2) = 1 – < x1, x2 > (K > 2)

GF601 - semestre otoño 2022


fi
Agrupamiento jerárquico

Método de aglomeración

Aglomera N vectores (e.g., observaciones) a partir de G = N grupos.

En cada paso, los dos grupos más cercanos entre sí se fusionan en un nuevo grupo.

Al cabo de N – 1 iteraciones, todos los vectores formarán un solo grupo.

Cada etapa y los grupos resultantes se pueden visualizar en un diagrama de árbol


(dendrograma).

linkage

GF601 - semestre otoño 2022


El criterio de agrupamiento depende de:


(1) el tipo de distancia utilizada
(2) el vínculo entre grupos (con más de un miembro)

• Vínculo simple: δG1,G2 = min(di,j) i ∈ G1; j ∈ G2

• Completo: δG1,G2 = max(di,j)

• Promedio: δG1,G2 = (n1 n2)-1 ∑i,j di,j

• Centroide: δG1,G2 = d(mean(xG1) – mean(xG2))

GF601 - semestre otoño 2022






El vínculo de tipo Centroide y Completo son usados comúnmente.

El vínculo Simple tiende a generar grupos “cadena”. Pueden entregar estructuras


complejas (no compactas). Es menos usado, pero puede ser útil en algunos casos.

Simple Completo

GF601 - semestre otoño 2022


it begins with n single-member groups, and merges two groups at each step, until all th
dataMétodo
are in de
a single
mínima group n − 1 steps. However, the criterion for choosing which pa
after(Ward)
varianza
of groups to merge at each step is that, among all possible ways of merging two group
theElpair to bedemerged
criterio is chosen
agrupación thatenminimizes
utilizado cada pasothe sum→ofG),squared
(G+1 distances la
busca minimizar between th
points and al
varianza theinterior
centroids
de losofgrupos.
their respective groups, summed over the resulting group
That is, among all possible ways of merging two of G + 1 groups to make G groups, tha
merger
Para is made
cada thatfusión
nueva minimizes
posible entre 2 grupos, se calcula:

G ng G ng K
!! !!!
W= "xi − x̄g "2 = !xi"k − x̄g"k #2 $ (14.7
g=1 i=1 g=1 i=1 k=1

In order to implement Ward’s method to choose the best pair from G + 1 group
to merge, Equation
Luego, entre 14.7 must
las G(G+1)/2 be calculated
posibles fusiones,for all of the
se escoge G!G
la con + 1#/2
menor W. possible pairs o
existing groups. For each trial pair, the centroid, or group mean, for the trial merge
group is que
Notar recomputed using the en
W irá aumentando datacada
for iteración…
both of the previously separate groups, before th
squared distances are calculated. In effect, Ward’s method minimizes the sum, over the K
desde G = N (etapa inicial) => W = O
dimensions of x, of within-groups variances. At the first (n-group) stage this variance
zero, and
hastaat the
G =last
1 ( (1-group)
nal) stage this
=> W variance is tr%Sx &, so that W = n tr%Sx &. For dat
= N Tr(S)
vectors whose elements have incommensurate units, operating on nondimensionalize
values (dividing by standard deviations) will prevent artificial domination of the procedur
by one or a few of the K variables.

GF601 - semestre otoño 2022





fi






Criterios de corte

Los métodos jerárquicos no paran hasta que G = 1, condición que obviamente no


representa la solución buscada.

Se debe tener un criterio de elección de una etapa intermedia, el cual en general no


es muy objetivo.

→ Un criterio base es buscar un número predeterminado de grupos.

En caso contrario, la distancia entre los grupos fusionados en cada etapa puede
servir para de nir objetivamente un criterio de corte.

Con grupos bien de nidos (i.e., distancias cortas intra-grupo y grandes entre-grupos),
la distancia aumenta en cada etapa de forma paulatina mientras se aglomeren
elementos de grupos bien de nidos, y “explota” en la etapa en que se mezclan los
grupos.

Este salto no es tan claro con grupos poco de nidos, pero se puede aplicar un criterio
de corte usando de la tasa de aumento en la distancia de fusión.

Se puede aplicar un método de Monte-Carlo para de nir una distancia límite.

GF601 - semestre otoño 2022



fi

fi
fi

fi

fi

Ejemplo con K = 2 (Wilks)

Distancia: Karl Pearson Vínculo: completo


532 CHAPTER ! 13 Discrimination and Classification

TABLE 13.1 Average July temperature !! F" and precipitation (inches) for locations in three regions
of the United States. Averages are for the period 1951–1980, from Quayle and Presnell (1991).
Group 1: Southeast U.S. (O) Group 2: Central U.S. (X) Group 3: Northeast U.S. !+"

Station Temp. Ppt. Station Temp. Ppt. Station Temp. Ppt.


Athens, GA 79#2 5#18 Concordia, KS 79#0 3#37 Albany, NY 71#4 3#00
Atlanta, GA 78#6 4#73 Des Moines, IA 76#3 3#22 Binghamton, NY 68#9 3#48
Augusta, GA 80#6 4#4 Dodge City, KS 80#0 3#08 Boston, MA 73#5 2#68
Gainesville, FL 80#8 6#99 Kansas City, MO 78#5 4#35 Bridgeport, CT 74#0 3#46
Huntsville, AL 79#3 5#05 Lincoln, NE 77#6 3#2 Burlington, VT 69#6 3#43
Jacksonville, FL 81#3 6#54 Springfield, MO 78#8 3#58 Hartford, CT 73#4 3#09
Macon, GA 81#4 4#46 St. Louis, MO 78#9 3#63 Portland, ME 68#1 2#83
Montgomery, AL 81#7 4#78 Topeka, KS 78#6 4#04 Providence, RI 72#5 3#01
Pensacola, FL 82#3 7#18 Wichita, KS 81#4 3#62 Worcester, MA 69#9 3#58
Savannah, GA 81#2 7#37

Averages: 80#6 5#67 78#7 3#57 71#3 3#17

membership in Group 1 vs. Group 2. This problem might arise if the stations in
Table 13.1 represented the core portions of their respective climatic regions, and on the
basis of these data we wanted GF601
to classify stations
- semestre not 2022
otoño listed in this table as belonging to
Ejemplo con K = 2 (Wilks)

Distancia: Karl Pearson Vínculo: completo

GF601 - semestre otoño 2022


yield somewhat different results. Figure pared14.5a
with theshows distances
complete-linkage atinwhich
result groups
Figure 14.4. ar
The clus
Figure 14.4 occur also in Figure 14.5b. However, one long and
merged for the data in Table 13.1, according
in Figureto14.5b,
single linkageofoperating
composed stations from onGKarl-Pearso
3 ! G4 , and G5 .
distances.
Distancia: KarlThere is a large jump after stage
Pearson 21,
chaining suggesting
phenomenon to
Distancia: a possible natural
which single-linkage
Karl Pearson stopping
clusters are
or groups are accumulated that are close to a point at one ed
poin
pro

with seven
Vínculo: groups. These seven groupseven
completo are though
indicated in Figure
the added
Vínculo: 14.5b,
points may
simple which
be quite canother
far from be com
poin
pared with the complete-linkage result in Figure 14.4. The clusters denoted G2 and G6 i
(a) (b)
Figure 14.4 occur also in Figure 14.5b. However, one long and thin group has develope
in Figure 14.5b, composed of stations from G3 ! G4 , and G5 . This result illustrates th 1.00

chaining phenomenon to which single-linkage clusters are prone, as additional station 2

or groups are accumulated that are close to a point at one edge or another of a group

Standardized Precipitation
0.75

even though the added points may be quite far from other points in the same group. ♦ 1

Distance
0.50

(a) (b) 0

0.25
+++ +
×
–1 + + ++
+
1.00 0.00
1 5 10 15 20 25
2 –2 –1
Stage Number Standardized

FIGURE 14.5 Clustering of the data in Table 13.1, using single linka

Standardized Precipitation
function of stage, showing a large jump after 22 stages. (b) The seven c
0.75 illustrating the chaining phenomenon.

1
Distance

0.50

×
0
×
0.25
+++ + ×× ×
×
× ×
–1 + + ++ ×
+
0.00
1 5 10 15 20 25 –2 –1 0 1 2
Stage Number Standardized Temperature

GF601 - semestre otoño 2022


Ejemplo con K = 2 (Régimen de Tmax de invierno en el centro-sur de Chile)

GF601 - semestre otoño 2022


distancia de mahalanobis, se obtienen grupos que tienden a deformarse en la dirección de covarianza
Ejemplo conentre
K= 2 (Régimen
ambas de precipitación
variables y el resultado es peor que conobservado en Como
distancia euclidiana. tres ejemplo,
estaciones de
cabe notar Chile)
que en el agrupamiento con vínculo de tipo centroide, el grupo 2 resultante (rojo) comparte
observaciones de las tres estaciones.

Linkage: centroid. Distance: euclidean Linkage: centroid. Distance: mahalanobis

2500 2500
Precipitacion anual (mm)

Precipitacion anual (mm)


2000 2000

1500 1500

1000 1000

500 500

0 0
−200 0 200 400 −200 0 200 400
Diferencia pp Julio−Enero (mm) Diferencia pp Julio−Enero (mm)

Linkage: complete. Distance: euclidean Linkage: complete. Distance: mahalanobis

2500 2500
Precipitacion anual (mm)

Precipitacion anual (mm)


2000 2000

1500 1500

1000 1000

500 500

0 0
−200 0 200 400 −200 0 200 400
Diferencia pp Julio−Enero (mm) Diferencia pp Julio−Enero (mm)

GF601 - semestre otoño 2022


Agrupamiento con K > 2

La clasi cación de patrones de circulación es una aplicación común de agrupamiento


con múltiples variables.

En este caso, los elementos de los grupos corresponden a las observaciones (eje
temporal) de una cierta variable, la dimensión del problema (K) a los valores de la
variable en el espacio.

Como en PCA, se debe considerar la proyección geográ ca de los datos.

• Pros: Implementación y interpretación simple

• Cons: No considera fases de un mismo modo de variabilidad

> clasi cación de 2do nivel (e.g. de un compuesto)

GF601 - semestre otoño 2022


fi
fi

fi

Ejemplo: SLP’ anual


SLP’ anual, agrupamiento forzado a 6 grupos (se muestran los 1ros 4):
6 grupos (se muestran los primeros 4):
Vínculo completo, distancia euclidiana
Distancia: Euclidiana Vínculo: Completo

años

G1 G2 G3 G4

GF601 - semestre otoño 2022


Ejemplo: SLP’ anual


Vínculo
6 gruposcompleto, correlación
(se muestran los primeros 4):
G1 G2 G3 G4
Vínculo completo, correlación
Distancia: Euclidiana Vínculo: Completo
G1 G2 G3 G4

SLP’ mensual:
Vínculo completo, correlación
SLP’mensual:
SLP’ mensual Distancia: Correlación Vínculo: Completo
G1correlación
Vínculo completo, G2 G3 G4

G1 G2 G3 G4

GF601 - semestre otoño 2022


Agrupamiento no jerárquico de tipo K-means

Es un método iterativo en el cual, a diferencia del agrupamiento jerárquico, un


elemento de un grupo puede asociarse a otro (sin aglomeración mediante) de una
iteración a otra.
(0) (1) (2) i+1
Es un método robusto (aunque la solución depende de las CIs), pero el número de
grupos debe pre-establecerse. (3) cambia xi
de grupo

inicio convergencia
GF601 - semestre otoño 2022

Algoritmo

paso (0): se de ne un set inicial de G grupos (random)

while (criterio de convergencia: estabilización de centroides)


(1): cálculo de centroides (media de cada grupo)
for i = 1:N (vectores o observaciones)
(2): se calcula distancia entre el vector i y el centroide de cada grupo
If vector i no está en el grupo que le corresponde (+ cercano)
(3): se cambia “militancia” de xi al grupo más cercano
break (vuelve al paso 1)

(0) (1) (2) i+1

(3) cambia xi
de grupo

GF601 - semestre otoño 2022














fi

Utilizando un métrica de distancia euclidiana, los resultados son idénticos ya sea se use un vínculo de
tipo centroide o completo. En ambos casos, uno de los grupos resultante confunde los datos de las
Ejemplo con K = 2 (Régimen de precipitación observado en tres estaciones de Chile)
estaciones #2 y #3. Los otros 2 grupos se construyen con observaciones de la estación #1. Con la
distancia de mahalanobis, se obtienen grupos que tienden a deformarse en la dirección de covarianza
entre ambas variables y el resultado es peor que con distancia euclidiana. Como ejemplo, cabe notar
que en el agrupamiento con vínculo de tipo centroide, el grupo 2 resultante (rojo) comparte c. Calcule ahora 3 grupos usando el método k-means. Compare con el punto anterio
observaciones de las tres estaciones.

El método de agrupamiento k-means muestra un desempeño claramente mejor que


Linkage: centroid. Distance: euclidean Linkage: centroid. Distance: mahalanobis
este caso. Se obtienen tres grupos que corresponden bastante bien a las tres estacio
2500 2500
a pesar que sus elementos tienden a confundirse, sobre todo los de las estaciones #
aglomeración
Precipitacion anual (mm)

Precipitacion anual (mm)


2000 2000 Del total de 36 observaciones por estación, 3 observaciones de las estaciones #1
1500 1500 quedaron mal clasificadas (se asociaron al grupo 1 del análisis, en azul). P
1000 1000
observaciones de a estación #2 (‘o’) se clasificó en otro grupo. Note que, puesto q
inicial se define aleatoriamente, este resultado no es único.
500 500

0
−200 0 200 400
0
−200 0 200 400
K-means
Diferencia pp Julio−Enero (mm) Diferencia pp Julio−Enero (mm)

Linkage: complete. Distance: euclidean Linkage: complete. Distance: mahalanobis K−means clustering
2500 2500
Precipitacion anual (mm)

Precipitacion anual (mm)

2000 2000 2500

1500 1500

1000 1000
2000
500 500

Precipitacion anual (mm)


0 0
−200 0 200 400 −200 0 200 400
Diferencia pp Julio−Enero (mm) Diferencia pp Julio−Enero (mm)
1500

1000

500

0
−300 −200 −100 0 100 200 300 400
Diferencia pp Julio−Enero (mm)

GF601 - semestre otoño 2022


Ejemplo con SLP
mensual

Métodos Climatológicos Universidad de Chile


Otoño 2017 Facultad de Ciencias Fisicas y Matematicas

Figura 8: Mapas de los 3 principales modos para el anali. de EOFS de SLP’ en la primera
fila , mismo resultado pero con la matriz de correlacion en la segunda fila

PCA

4
GF601 - semestre otoño 2022
Análisis de agrupamiento de días con precipitación en Santiago en base al contenido
de agua en la atmósfera

Figura 5: Promedio de las anomalı́as de agua precipitable para el conjunto húmedo


(izquierda), el promedio del primer cluster (centro) y del segundo (derecha) utilizando
el método kmeans con una métrica de correlación..

Pregunta 6:

Finalmente la figura 6 muestra un grupo de estadı́sticas básicas de precipitación en Santiago


asociadas al conjunto húmedo y a los clusters encontrados en el punto anterior. Donde se destaca
que los eventos de precipitación en Santiago se encuentran mayoritariamente en la temporada
Figurade6:invierno,
Estadı́sticas básicas
mientras que de la precipitación
la temporada en
de verano
GF601 - semestre
Santiago
tiene
otoño
para el perı́odo 2000-2019.
2022muy pocos eventos registrados (en este
Análisis Discriminante

Esta técnica determina reglas de discriminación en una población X con grupos


prede nidos.
La idea general es buscar la combinación lineal entre las variables que permita de
diferenciar al máximo estos grupos.

GF601 - semestre otoño 2022


fi

dispersion,
origin horizontal
than point B in distances
Figure 9.2 are
when lessmeasured
unusual than according vertical to ones
the relative
Mahalano
n. scatter.
For a Although
fixed
origin than point B in Figure 9.2 when measured according to the Mahalan
point
MahalanobisAfixed
is closer
distance to theDcenter
2
, EquationofD2the distribution
9.7 defines according
an ellipse to
(Distancia de Mahalanobis) For a Mahalanobis distance , Equation 9.7 defines an ellips
uncorrelated, a distance
(a) measure
is more(b)that
statistical distance on unusual
distance, it reflects
the distance
statistical plane, than on point
and thethat Bellipse
plane, in
andthe thatiscontext
also is
ellipse a established
circle
also if by
if s1"1
a circle s= the
1"1 =
s2"2s2"2.p
and so is9.7statistically
Equationfurther
9.7 to from
threeby the origin.
dimensions by addingterm a thirdforterm for x3 , the set ofpoi
po
er can be defined
x2 simply
Equation as
Because
to
the
three
points
dimensions
distance in
D 2 x
Figure
constitute
caso
2 9.2a
an
adding
2D are
ellipsoid
sin
a third
uncorrelated,
that will
correlación be a
x ,
distance
spherical
the set of
3 if all three varian
measure t
distance D2 constitute an
blimp-like ellipsoid that will be spherical if all three varianc
unusualness in the contextif oftwothe variances are
B nearly
data scatter canequal but smaller
be defined than the
simply as third, an
%2 $x2 − x̄2 %2 A blimp-like if two twovariances
variancesare
In general
nearlyequal
are nearly
theAvariables
equal
within
but smaller
and larger than the third.
a2 multivariate
than the third, an
2 vector x will not be unc
data
+ " twoB variances are nearly equal
these correlations (9.7)
and
2
Dmust
larger
$x
= also be1 − x̄ than
1 %
accounted
the
$x 2 − x̄
third. 2
+ for when defining
%
" distances in terms
s2"2 In general the variables within a
or probability density. Figure
θ
multivariate
s1"1 data
9.2b illustratess2"2 vector x will
the situation in two notdimensions
be unco
R A N Dthese
OM V correlations
E C T O R Smust A Nalso
points be
Aaccounted
D MFigure
from TR 9.2aI Chave for rotated
E Sbeen when arounddefining distances
the origin throughinanterms
angle %o
x x1
or probability1 density. Figure
in the two 9.2bbeing
variables illustrates
relatively the situation
strongly positively incorrelated.
two dimensions,Again poin
points from Figurethe (a) origin in a statistical sense, although
9.2a have been rotated around (b)in
theorder to calculate the actual Mahalan
origin through an angle %,
in terms of the variables x1 and x2 it would be necessary to use an equation of
(b) 9.4.4 Mahalanobis Distance, Revisited
in the two variables being relatively
x2
the origin in a statistical sense, Dalthough2
A = a
strongly positively correlated.
&x
1"1 1 in
− order
x̄1 $ 2
+ to
2a calculate
&x
1"2 1 −
x2
x̄1 $&xthe
2 − actual
x̄2
Again point
B $ + a Mahalano
&x
2"2 2 − x̄2 '
$2

caso 2D con correlación A


Distance in the
x2 context of data scatters centered
in terms of the
Section 9.2.2 introduced variables x 1
Analogous
and xat2 the
it
the Mahalanobis, would origin. (a) Thetostandard
be
expressions of this or
B necessary
kind statistical,
use an equation
for the Mahalanobis distance deviatio
θ distanceas
of t
in K a dim
wa
mately three times B than the standard
larger deviation of xEvenx.1 established
Point
differences or unusualness involve K&K + the
within 1$/2 context
terms. in only A twoisdimensions
closer
by an to
the the x1 origi
coefficients
empirical
2 2 2 2da
A D =
a a, are
2"2 1"1 &x −
fairly
1 x̄ $ +
complicated
1 2a &x
functions
1"2 1 − x̄1 $&x
of the
2 − x̄
rotation2 $ + a
angle
2"2 2&x% −
and x̄
the $ '
2 thre
lidean distance, an
butunderlying
point B ismultivariate s1"1 , sprobability
less unusual relativedensity. to the data
1"2 , and s2"2 . For example,
If thescatter,
K variables and in so the data
is close
tance. (b) The same θAnalogous
mutually
points uncorrelated,
rotated the (squared)
through
expressions ofan angle
this kind Mahalanobis
for $
. distance
& =the40Mahalanobis takes the
distance in simple
K dime
2
cos &($
sum ofinvolve
the
FIGURE squared
K&K9.2+standardized
Distance
1$/2 in the Even
terms. anomalies
context of
= data
a1"1 in only ,
scatters
z two
k as indicated
centered
dimensions at thethe in Equation
origin. (a) The stand
coefficients 9.7
a
x
x1 is approximately
of When cos 2 &($s − 2 sin&($ cos&($s + sin 2
&($s
variables.
a2"2 , are
1
some
fairly orthree
complicated all times larger
of functions
the than the standard
variables
of the are
1"1 deviation of1"2xthe
correlated
rotation angle 2 . Point
% and
A 2"2
Mahalano
the
is closer
three
in terms of Euclidean distance, but point B is less unusual relative sin 2 to the data scatter, and
accountss1"1infor s1"2the correlations
, statistical
, and s2"2 . For
distance.
as well, although
(b)example,
The same points + rotated
as noted
through an
in Section
angle
&($
& = 40 $
.
9.2.2 'the
2
cos2 &($s2"2 + 2 sin&($ cos&($s1"2 + sin &($s1"1
prohibitively complicated in scalar form. In matrix notation, the Mahalano
2
between points x and y Do in nottheir studyK-dimensional
this equation at all cosspace &($ isIt is here to help convince y
closely.
1"1 =
aeven
tters centered at the origin. (a) The standard deviation
required, that conventional scalar notation is hopelessly 2 impractical
the cos 2 &($s
2
mathematical ideas − 2 sin&($
T
1"1 necessary −1
tocos&($s + sin &($s
1"2 statistics. Matrix
multivariate 2"2 notatio
tandard deviation of x2 . Point A is closer
Mejor en forma algebra,to
matricial: D
which
=
thewill origin
!x − y" !S" !x − y"#
be reviewed in the2 next section, are practical necessi
sinwill
&($resume the statistical developm
s unusual relative to the
where [S] data
is the scatter,
covarianceand
the so
+ isincloser
development
matrix
notation, including
further.
the
Section
context
a revisiting
9.4
of which
of the Mahalanobisthe distance
distance2 is 'being
in Section 9.4.
$ cos 2 &($s + 2 sin&($ cos&($s + sin &($s
through an angle & = 40 . GF601 - semestre otoño 2022 2"2 1"2 1"1
Discriminante lineal de Ficher para separar dos grupos (LDA)

La metodología aplica a 2 grupos de datos X1 y X2, de dimensión K.

→ Concierne a poblaciones con igual matriz de covarianza. Se asume que la co-


variabilidad entre las K variables es la misma en ambos grupos.

Una matriz de covarianza global (S) de X se estima a partir de las muestras de


ambos grupos como un promedio ponderado:

Spool = [(N1 −1) S1 + (N2 −1) S2] / (N1 + N2 − 2)

NG es el # de observaciones del grupo G.

GF601 - semestre otoño 2022



El objetivo del método es encontrar un vector a (vector discriminante) en el espacio


de nido por K que permita diferenciar al máximo ambos grupos.

Este vector se de ne de manera tal que la distancia entre sus medias respecto de la
matriz de covarianza se maximice.

a = Spool-1 [media(X1) − media(X2)]

Notar el contraste con la componente Ejemplo 2D


principal de un grupo de datos.

La distancia entre un observación x(n) a un


grupo g respecto de la matriz de
a
covarianza es la distancia de Mahalanobis:

aT [x(n) − media(Xg)]

y mide cuan lejano estadísticamente está


(cuan poco probable es) x(n) de g.

GF601 - semestre otoño 2022


fi
fi

El vector discriminante permite calcular la distancia de Mahalanobis entre ambos


grupos (entre sus medias):

D2 = aT [media(X1) − media(X2)]

La proyección de cualquier elemento de X sobre a de ne la función discriminante δ1:

δ1(n) = aT x(n)

δ1 es un escalar que permitirá discriminar/clasi car las observaciones de la muestra


y pronosticar la pertenencia de nuevas observaciones.

> La proyecciones sobre a son análogas a las nuevas variables en PCA o CCA.

El valor δ1 que se encuentre en el centro de la muestra de ambos grupos se utilizará


como límite discriminante

m = 0.5 aT [media(X1) + media(X2)]

Entonces, una observación x(n) se asignará al grupo 1 si

aT x(n) ≥ m

GF601 - semestre otoño 2022


fi
fi

Veamos el ejemplo para K = 2 descrito en el Wilks

G1: T, P en southeast US
G2: T, P en central US
532 CHAPTER ! 13 Discrimination and Classification

TABLE 13.1 Average July temperature !! F" and precipitation (inches) for locations in three regions
of the United States. Averages are for the period 1951–1980, from Quayle and Presnell (1991).
Group 1: Southeast U.S. (O) Group 2: Central U.S. (X) Group 3: Northeast U.S. !+"

Station Temp. Ppt. Station Temp. Ppt. Station Temp. Ppt.


Athens, GA 79#2 5#18 Concordia, KS 79#0 3#37 Albany, NY 71#4 3#00
Atlanta, GA 78#6 4#73 Des Moines, IA 76#3 3#22 Binghamton, NY 68#9 3#48
Augusta, GA 80#6 4#4 Dodge City, KS 80#0 3#08 Boston, MA 73#5 2#68
Gainesville, FL 80#8 6#99 Kansas City, MO 78#5 4#35 Bridgeport, CT 74#0 3#46
Huntsville, AL 79#3 5#05 Lincoln, NE 77#6 3#2 Burlington, VT 69#6 3#43
Jacksonville, FL 81#3 6#54 Springfield, MO 78#8 3#58 Hartford, CT 73#4 3#09
Macon, GA 81#4 4#46 St. Louis, MO 78#9 3#63 Portland, ME 68#1 2#83
Montgomery, AL 81#7 4#78 Topeka, KS 78#6 4#04 Providence, RI 72#5 3#01
Pensacola, FL 82#3 7#18 Wichita, KS 81#4 3#62 Worcester, MA 69#9 3#58
Savannah, GA 81#2 7#37

Averages: 80#6 5#67 78#7 3#57 71#3 3#17

membership in Group 1 vs. Group 2. This problem might arise if the stations in
Table 13.1 represented the core portions
GF601 of theirotoño
- semestre respective
2022 climatic regions, and on the

Veamos el ejemplo para K = 2 descrito en el Wilks

N1 =10; N2 =9
SEPARATING TWO POPULATIONS
→ Spool = (9 S1 + 8 S2)/17
= 1.76 0.37 δ1

0.37 0.85
7

Spool−1 = 0.63 −0.27


−0.27 0.85 6

media(X1) − media(X2) = 1.9 Precipitation, in.


m
5 Augusta
2.1 Atlanta

×
→ a = 0.62 4 ×

2.20 × × × ×
×
× ×
3 ×
76 78 80 82
Temperature, °F

F IGURE 13.1 Illustration of the geometry of linear discriminant analysis applied to the south
(circles)GF601
and central (Xs) U.S.
- semestre data2022
otoño in Table 13.1. The (vector) means of the two groups of d

símbolos ‘+’ y ‘o’, respectivamente. Los colores, por su parte, señalan la clasificac
Otro ejemplo 2D: análisis. En efecto, las proyecciones sobre el vector discriminante (a) de solo dos
grupo de #1 resultaron menores al valor crítico m (ver línea segmentada) y fuero
2 variables en dos erróneamente
estaciones clasificados en el grupo #2.
de observación Analisis Discriminante

2500

2000

Precipitacion anual (mm)


1500

1000
a.x = m

500

0
−100 0 100 200 300 400
Diferencia pp Julio−Enero (mm)

GF601 - semestre otoño 2022


Veamos ahora un ejemplo con K > 2...

De forma similar a lo evaluado con PCA y CCA, usemos un campo cuyos puntos de
grilla de nan las K variables, y el paso temporal las N observaciones.

Tomemos las anomalías anuales de TSM entre 1960 y 2012 en un dominio tropical, y
pre-de namos 2 grupos:

G1: El Niño
G2: No El Niño.

Usemos para ellos los años de El Niño desde 1960: {1963 1965 1969 1972 1977 1982
1987 1991 1992 1993 1994 1997 2002 2004 2006 2009}

→ N = 53; N1 =16; N2 =37

Problema: se debe elegir un dominio espacial no muy grande, para poder calcular la
inversa de Spool.

> Como en CCA, este problema se puede abordar usando componentes principales.

GF601 - semestre otoño 2022


fi
fi

Análisis Discriminante Múltiple (MDA)

Es la generalización de LDA para discriminar G grupos. En este caso, se busca


encontrar J vectores y funciones discriminantes:

δj(n) = ajT x(n), j=1:J; J=min(G-1,K)

Con G grupos prede nidos, la estimación de la matriz de covarianza a partir de la


muestra de cada grupo esta dada por:

Spool = (N − G)−1 ∑g (Ng − 1) Sg

Spool determina la covariabilidad media entre observaciones al interior de cada grupo.

Para determinar los vectores discriminantes A = (a1, a2, aj, ...), también se debe tomar
en cuenta una medida de covariabilidad entre grupos SB.

GF601 - semestre otoño 2022


fi

Análogamente a Sg …

SB = (G − 1)−1 ∑g [mean(Xg) − mean*(X)] [mean(Xg) − mean*(X)]T

con mean*(X) = N−1 ∑g Ng mean(Xg)

Similar al procedimiento para obtener la base ortogonal en PCA, A se obtiene


resolviendo el problema de valores y vectores propios para la matriz Spool−1 SB:

Spool−1 SB E = E L

Típicamente, se quiere que ajT Spool aj = 1

→ aj = ejT / (ej Spool ej)1/2

La regla de clasi cación para una observación x(n) queda determinada por su
distancia al grupo g en el base de nida por A:

D2 (x(n), g) = ∑j [aj (x(n) − mean(Xg))]2

GF601 - semestre otoño 2022


fi

fi

x(n) se asociará al grupo g si:

D2(x(n), g) ≤ D2(x(n), h), para todo h g

Ejemplo del Wilks par K = 2 y G = 3...

GF601 - semestre otoño 2022


También podría gustarte