0% found this document useful (0 votes)
199 views4 pages

Lecture 14. HGR Maximal Correlation (After-Class)

Uploaded by

laijiahao0430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views4 pages

Lecture 14. HGR Maximal Correlation (After-Class)

Uploaded by

laijiahao0430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 14.

HGR Maximal Correlation (After-


class)

Notes: HGR correlation is a correlation metric in statistics. Compared with commonly-used correlations, it has the
advantage to handle non-linear statistical dependency. HGR maximum correlation can be computed by ACE (Alternative
Conditional Expectation) algorithm.

HGR: Hirschfeld-Gebelein-Rényi

Setup
Given 2 discrete random variables X, Y , want to measure the correlation between X and Y , how can we do?

Example (Pearson Correlation Coefficient):

E[(X − E[X])(Y − E[Y ])]


γ(X, Y ) ≜
var(X)var(Y )

If γ(X, Y ) = 0 ⇏X, Y are independent

A good correlation measurement ρ(X, Y ) should satisfy:

1. commutable ρ(X, Y ) = ρ(Y , X)


2. 0 ≤ ρ(X, Y ) ≤ 1
3. ρ(X, Y ) = 0 if and only if X, Y are independent
4. ρ(X, Y ) = 1, if Y = ξ(X) or X = η(Y ), for some deterministic functions ξ, η.
5. For one-to-one functions ξ, η, ρ(ξ(X), η(Y )) = ρ(X, Y )

Example (Pearson Correlation Coefficient):

E[(X − E[X])(Y − E[Y ])]


γ(X, Y ) ≜ , −1 ≤ γ(X, Y ) ≤ 1
var(X)var(Y )

∣γ(X, Y )∣ satisfies (1), (2), not (3), (4), (5).


∣γ(X, Y )∣ = 1, iff X = aY + b or Y = cX + d

Example (Mutual Information):

Lecture 14. HGR Maximal Correlation (After-class) 1


PX Y (x, y)
I(X; Y ) ≜ ∑ PX Y (x, y) log
x,y
PX (x)PY (y)

I(X; Y ) satisfies (1), (3), (5), not (2), (4).


Can we find some ρ(X; Y ) satisfying (1)-(5)?

Definition: The HGR maximal correlation ρ(X, Y ) is defined as

ρ(X, Y ) ≜ max E[f(X)g(Y )]


f : X →R, g : Y→R
E[f (X)]=E[g(Y )]=0
E[f 2 (X)]=E[g 2 (Y )]=1

ρ(X, Y ) = maxf ,g γ(f(X), g(Y ))


Data X, Y , want to find features of X and Y such that these features are the most related ⇒ HGR maximal
correlation functions

ρ(X, Y ) satisfies (1)-(5).


check (5): for one-to-one functions ξ, η,

ρ(ξ(X), η(Y )) = max γ(f(ξ(X)), g(η(Y )))


f ,g
= max
′ ′
γ(f ′ (X), g′ (Y ))
f ,g
= ρ(X, Y )

= ξ(X), let K(Y ): Y → R be a one-to-one function such that E[K(Y )] = 0,


check (4): If Y
E[K (Y )] = 1, then take f(X) = K(ξ(X)), g(Y ) = K(Y )
2

⇒ E[f(X) ⋅ g(Y )] = E[K(ξ(X)) ⋅ K(Y )] = E[K 2 (Y )] = 1 ⇒ ρ(X, Y ) = 1

~
Definition: The canonical dependence matrix B ∈ R∣Y∣×∣X ∣ is defined as

~ PX Y (x, y) − PX (x)PY (y)


B(y, x) ≜ , ∀x ∈ X , y ∈ Y.
PX (x)PY (y)

⎧ 1
3
if (x, y) = (0, 0)
=⎨
1
if (x, y) = (0, 1)
Example: X, Y are binary, PX Y (x, y) 6


1
6
if (x, y) = (1, 0)
1
3
if (x, y) = (1, 1)
PX (0) = 12 , PX (1) = 12 , PY (0) = 12 , PY (1) = 12
~ ~ 1
−1⋅1 ~ ~ 1 1 1
6−2⋅2
B(0, 0) = B(1, 1) = 3 12 12 = 16 B(0, 1) = B(1, 0) = 1 1
= − 16
2⋅2 2⋅2

Lecture 14. HGR Maximal Correlation (After-class) 2


⎡ − ⎤
1 1
~ 6 6
B=
⎣− ⎦
1 1
6 6

Definition: The information vectors ϕ, ψ associated to functions f , g are defined:

⁍, ⁍

Whenever functions f , g are given, we have the corresponding ϕ, ψ


Property:

Norm-square: ∥ϕ∥2 = ∑x ϕ2 (x) = ∑x PX (x)f 2 (x) = E[f 2 (X)], ∥ψ∥2 = E[g2 (Y )]


Inner produce:

⁍,

⟨ϕ, PX ⟩ = ∑ ϕ(x) ⋅ PX (x) = ∑ PX (x) ⋅ f(x) = E[f(X)]


x x

⁍,

⟨ψ, PY ⟩ = ∑ ψ(x) ⋅ PY (y) = ∑ PY (y) ⋅ g(y) = E[g(Y )]


y y

~
Theorem: The HGR maximal correlation ρ(X, Y ) is the largest singular value of B
Proof: Note that E[f(X) ⋅ g(Y )] = ∑ PX Y (x, y) ⋅ f(x) ⋅ g(y) =
x,y
⎛ ⎞
PXY (x,y) PX (x)PY (y)
∑ ( PX (x)f(x))( PY (y)g(y)) − ( PX (x)f(x))( PY (y)g(y))
PX (x)PY (y) PX (x)PY (y)
⎝ ⎠
x,y

=0
PX (x)PY (y)
∑ ( PX (x)f(x))( PY (y)g(y)) = ∑x,y PX (x)f(x)PY (y)g(y) = E[f(X)]E[g(Y )] =
x,y PX (x)PY (y)

PX Y (x, y) − PX (x)PY (y)


⇒ E[f(X) ⋅ g(Y )] = ∑ ( PX (x)f(x))( PY (y)g(y)) =
x,y PX (x)PY (y)
ϕ(x) ψ(y)
~
B (y,x)
~ ~
∑ B(y, x)ϕ(x)ψ(y) = ψT Bϕ
x,y
~

Lecture 14. HGR Maximal Correlation (After-class) 3


~
⇒ ρ(X, Y ) = max ψT Bϕ
ϕ,ψ
subject to : ∥ϕ∥2 = ∥ψ∥2 = 1 (↔ E[f 2 (X)] = E[g2 (Y )] = 1)
⟨ϕ, PX ⟩ = ⟨ψ, PY ⟩ = 0 (↔ E[f(X)] = E[g(Y )] = 0)

~ ~
Fact 1: arg max∥ϕ∥2 =∥ψ∥2 =1 ψT Bϕ = largest right/left singular vectors of B
Fact 2: singular vectors are orthogonal with each other

Check:
~ ~ PXY (x,y)−PX (x)PY (y) PY (y)−PY (y)
[B ⋅ PX ](y) = ∑x B(y, x) PX (x) = ∑x = =0
PY (y) PY (y)
~ ~
⇒ PX is a right singular vector of B with singular value 0, since B ⋅ PX = 0
~
PY is also a left singular vector of B
~
⇒ ρ(X, Y ) = σ1 (the largest singular value of B), with “=” iff ϕ and ψ are the largest right/left singular vectors of
~
B
~
⇒ Let ϕ1 and ψ1 be the largest right/left singular vectors of B, then

1 1
f ∗ (x) = ⋅ ϕ1 (x), g∗ (y) = ⋅ ψ1 (y)
PX (x) PY (y)
~
Check (3): ρ(X, Y ) = 0 iff σ1 = 0 iff B = 0 iff PX Y (x, y) = PX (x)PY (y) iff X , Y are independent.

Lecture 14. HGR Maximal Correlation (After-class) 4

You might also like