Multi Variate
Multi Variate
ga
Taranga Mukherjee
n
ra
Ta
Contents
1 Introduction 2
ga
2.5 Marginal And Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Mean vector: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Dispersion matrix and variance-covariance matrix: . . . . . . . . . . . . . . . . . 7
2.8 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8.1 Relation between correlation matrix and dispersion matrix: . . . . . . . . 10
3 Concept Of Regression 14
n
4 Multiple linear regression:
19
ra
6 Partial correlation coefficient: 23
9 Multinomial distribution 47
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.2 Multinomial theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.3 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.4 Modified multinomial distribution: . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.1 Modified multinomial theorem: . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.2 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.3 Distribution of subset: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.5 Conditional distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1
10 Ellipsoid Of Concentration 55
n ga
ra
Ta
2
1 Introduction
Suppose in a board exam, n students are giving examination to p topics. Then for each student,
marks of p subjects are recorded. Then the overall marks of a student is represented as a p-tuple
vector. So here, observations regarding each student is a dimension data frame and the overall
information of n students are n, p-variate vectors and the overall data can be represented in a
n × p data matrix.
Let us consider the probability space (Ω, A, P ). Then
X1
X2
X= .
..
ga
e
Xp
is said to be a p-variate random vector if X is a mapping from Ω → Rp .
e
Ω→R
n
Also X is said to be a p-variate random vector defined on (Ω, A, P ) if it is a mapping from
ep such that
3
2.1 Distribution function:
Let
X1
X2
X= . be a p-variate random vector.
..
e
Xp
Then the cumulative distribution function is given by FX (x) = P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xp ≤
xp ], where X = (X1 , X2 , . . . , Xp )0 .
e e
2.1.1 Properties:
ga
(i) FX (x) is monotonically increasing.
e e
(iii) FX (∞, ∞, . . . , ∞) = 1.
e
Define Ai = {Xi ≤ xi }
4
Again, !#0
p p p
! " !
\ [ \
FX (x) = P Ai = P A0i =1−P A0i
i=1 i=1 i=1
e e
Now
p p
!
\ X
P A0i ≤ P (A0i )
i=1 i=1
p p
!
\ X
⇒P A0i ≤ (1 − P (Ai ))
i=1 i=1
p p
!
\ X
⇒ 1−P A0i ≥1−p+ P (Ai )
ga
i=1 i=1
p
X
⇒ FX (x) ≥ FXi (xi ) − (p − 1) (2)
i=1
e e
2.2 Independence
n
Let (X1 , X2 , . . . , Xr , Y1 , Y2 , . . . , Ys ) be an (r + s) dimensional random vector.
Then X = (X1 , X2 , . . . , Xr ) and Y = (Y1 , Y2 , . . . , Ys ) are said to be independent if,
ra
e e
P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xr ≤ xr , Y1 ≤ y1 , Y2 ≤ y2 , . . . , Ys ≤ ys ]
= P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xr ≤ xr ].P [Y1 ≤ y1 , Y2 ≤ y2 , . . . , ys ≤ ys ]
= FX (x)FY (y )
e e e e
p
Y
FX (x) = FXi (xi )
i=1
e e
ga
Z
fX 1 (x1 ) = fX (x) dx2 , x1 ∈ Rq
e e x2 e e e e
e
fX (x)
The conditional distribution of X 2 | X 1 is given by fX | X 1 (x2 |x1 ) = e e provided f1 (x1 ) >
e e e e e e fX 1 (x1 ) e
0. e e
Solution:-
Z Z
f (x) dx = 1 ⇒ k dx = 1
x2i ≤c
e2 x2i e
≤c2
P P
e
x1 = r cos θ1
x2 = r sin θ1 cos θ2
..
.
xp = r sin θ1 sin θ2 . . . sin θp−1
p
X
∴ x2i = r2 , r > 0, r2 ≤ c2 ⇒ r ≤ c [∵ r > 0]
i=1
6
Now, the Jacobian of the transformtion is,
Z c Z π Z π Z 2π
p−1 p−2 p−3
⇒k r ∂r sin θ1 ∂θ1 sin θ2 ∂θ2 · · · ∂θp−1 = 1
0 0 0 0
ga
cp
p−2+1 1 p−3+1 1
⇒k ·β , β , . . . 2π = 1
p 2 2 2 2
!
Γ p−1 p−2
1 1
1
cp 2
Γ Γ Γ Γ (12) Γ
⇒ 2kπ p
2 2
p 1
2 . . . 1
2 =1
p Γ 2 Γ 2−2 Γ 1+ 2
cp π (p−2)/2 kπ p/2 cp
⇒ 2kπ. = =1
p Γ p2 Γ p2 + 1
n
p
Γ 2
+1
∴ k=
π cp
p/2
q
! p−q
2
p−q
X
p 2
c − xi .π
2
Γ +1
⇒ fX 1 (x1 ) = 2p . i=1
π 2 cp p−q
Γ +1
e e
2
q
! p−q
2
X
p 2 2
c − xi
Γ +1
⇒ fX 1 (x1 ) = 2 i=1
p−q π q/2 cp
Γ +1
e e
2
7
Conditional PDF of X 2 |X 1 is
e e
q
! p−q
2
X
c2 − x2i
p , p
fX (x) Γ +1 Γ +1
fX 2 |X 1 (x2 |x1 ) = e e = 2p 2 i=1
q
fX 1 (x1 ) π 2 .cp p−q cp π 2
Γ +1
e e e e
2
e e
q−p p−q
π 2 Γ +1 p
2 X
⇒ fX 2 |X 1 (x2 |x1 ) = p−q ; x2i ≤ c2
(c2 − qi=1 x2i ) 2
P
i=1
e e e e
ga
X1
X2
Let X = .. be a p-variate random vector.
e .
Xp
Then its mean vector is given by,
n
E(X1 )
µ1
E(X2 ) .
µ= .
.. = µ2 .
e .
µp
ra
E(Xp )
σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p
Σ= .. .. ..
. . .
σp1 σp2 . . . σpp
where (
V ar(Xi ), if i = j
σij =
cov(Xi , Xj ), if i 6= j
since cov(Xi , Xj ) = cov(Xj , Xi ), Σ is symmetric.
8
Essentially Σ is given by,
h i
0
Σ = E (X − µ)(X − µ)
e e e e
X1 − µ1
X2 − µ2
= E
..
(X 1 − µ 1 , X2 − µ 2 , . . . , X p − µ p )
.
Xp − µp
(X1 − µ1 )2 (X1 − µ1 )(X2 − µ2 ) . . . (X1 − µ1 )(Xp − µp )
(X2 − µ2 )(X1 − µ1 ) (X2 − µ2 )2 . . . (X2 − µ2 )(Xp − µp )
=E
..
.
ga
2
(Xp − µp )(X1 − µ1 ) ... ... (Xp − µp )
σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p
=
.. ..
. .
n σp1 σp2 . . . σpp
Result: Let x be a random vector with mean vector µ and dispersion matrix Σ. Then for
Y = a0 X , E(Ye) = a0 µ, V ar(Y ) = a0 Σa.
ee e ep e e
0
P
→ here y = a x = ai xi is a univariate random variable.
ra
e e i=1
µ1
p p
µ2
X X
E(Y ) = ai E(Xi ) = ai µi = a1 µ1 + a2 µ2 + · · · + ap µp = (a1 , a2 , . . . , ap ) a0 µ
.. = e
i=1 i=1 . e
µp
Ta
Now
h i h i
V ar(Y ) = E (Y − a0 µ)(Y − a0 µ)0 = E (a0 X − a0 µ)(a0 X − a0 µ)0
h ee ee i he e e e e e ie e
= E a (X − µ)(X − µ) a = a E (X − µ)(X − µ)0 a = a0 Σa
0 0 0
e e e e e e e e e e e e e e
Result: Σ is the variance covariance matrix of a non-degenerate random vector X iff Σ is
positive definite. e
If part: Σ is positive definite matrix. Then there exist a non-singular matrix B such that
Σ = B 0 B.
Define X = µ + B 0 Y , where Y be a random vector with mean vector 0 and dispersion matrix
I, i.e., e e e e e
E(X ) = µ + B 0 E(Y ) = µ + 0 = µ
e e e e e e
9
The dispersion matrix of X will be
he i h i
0 0 0 0
Disp(X ) = E (X − µ)(X − µ) = E (µ + B Y − µ)(µ + B Y − µ)
e e e e e e e e e e e
= E B 0 Y Y 0 B = B 0 E((Y − 0)(Y − 0)0 )B
= B 0 Disp(Y )B = B 0 IB = B 0 B = Σ
ee e e e e
e
Only if part: Here Σ be a variance covariance matrix of a random vector X .
Let Y = a0 X .
e
And, a0 Σa =e0 if a = 0 e e e e
e e e e
ga
∴ Σ is a positive definite matrix.
e
= E[Tr(AX X )] = T r[E[AX X 0 ]] = T r[A E[X X 0 ]]
e e0
Now, Σ = Disp(X ) = E[(X − µ)(X − µ)0 ] = E[X X 0 ] + µµ0 − E[X ]µ0 − µE[X 0 ]
ee
e e e e
ee
ee
ee
e e e e
= E(X 0 X 0 ) + µµ0 − µµ0 − µµ0 = E[X X 0 ] − µµ0
ee
ra
e e ee
⇒ E(X X 0 ) = Σ + µµ0
ee ee ee ee
ee ee
Now
= Tr(AΣ) + µ0 Aµ (Proved)
ee e e
e e
cov(Xi , Xj )
ρij = correlation coefficient between Xi and Xj = p p
V ar(Xi ) V ar(Xj )
ρij = 1 iff i = j ∀ i, j = 1(1)p.
10
2.8.1 Relation between correlation matrix and dispersion matrix:
σij
ρij = √ √
σii σjj
We know
σ11
σ12 σ13 σ1p
√ √ √ √ √ √ ... √ √
σ11 σ11 σ11 σ22 σ11 σ33 σ11 σpp
√ σ21 σ22 σ23 σ2p
σ √σ √ √
σ22 σ22
√ √
σ22 σ33
... √ √
σ22 σpp
R= 22 11
.. .. .. .. ..
. . . . .
σp1 σp2 σp3 σpp
√ √ √ √ √ √ ... √ √
σpp σ11 σpp σ22 σpp σ33 σpp σpp
1 1
ga
√ 0 ... 0 √ 0 ... 0
σ11 σ11 σ12 . . . σ1p σ11
1 .. 1 ..
√ √
0 . 0 σ21 σ22 . . . σ2p 0 . 0
σ22 σ22
⇒R=
.
.. .. . . .. .
.. .. ... .. . . . . . .. ... ..
. .
.
. .
1 σp1 σp2 . . . σpp 1
0 0 ... √ 0 0 ... √
n σpp σpp
1
√ 0 . . . 0
σ11
1 .
.
√
0 . 0
0 σ
⇒ R = D ΣD ,where D = . 22
.. .. .. ..
ra
. . .
1
0 0 ... √
σpp
Result: If V ar(Xi ) exist for all i, then for any set of real numbers, l1 , l2 , . . . , lp , Show that
p p p
! !
X X X XX
E li Xi = li µi and V ar li Xi = li lj σij
Ta
p
! Z
X
→ E li Xi = (l1 x1 + l2 x2 + · · · + lp xp )f (x)∂x
i=1 x e e
Ze Z
= l1 x1 f (x)∂x + · · · + lp xp f (x)∂x
x e e x e e
eZ Z e
= l1 x1 f (x1 , x2 , . . . , xp ) dx2 , . . . , dxp dx1 + · · · +
x1
Z Z
lp xp f (x1 , x2 , . . . , xp ) dx1 , . . . , ∂xp−1 ∂xp
xp
Z Z
= l1 x1 f (x1 )∂x1 + · · · + lp f (xp )∂xp
x1 xp
p
X
= l1 µ1 + l2 µ2 + · · · + lp µp = li µi (Proved)
i=1
11
σij = E[(Xi − µi )(Xj − µj )]
Now
p
! " p p
#2
X X X
V ar li Xi =E li Xi − li µi
i=1
i=1 i=1
!2 !2
p p p p
X X X X
=E li Xi + li µi −2 li lj Xi µj
i=1 i=1 i=1 j=1
p p
" #
X XX X XX XX
=E li2 Xi2 + li lj Xi Xj + li2 µ2i + li lj µi µj − 2 li lj Xi µj
i=1 i6=j i=1 i6=j i j
p p
" #
ga
XX
=E li lj {(Xi − µi )(Xj − µj )}
i=1 j=1
p p
X X
= li lj σij (Proved)
i=1 j=1
⇒
1X
p
p i=1
λi ≥
Yp
λi
!1/p
R = ((ρij ))
T r(R) =
X p
ρii =
Xp
1=p
ra
i=1 i=1 i=1
T r(R) 1/p
⇒ ≥ (|R|)
p
p
⇒ ≥ (|R|)1/p ⇒ |R| ≤ 1 (Proved)
p
Result: If Σ is pd, find E[(X − µ)0 Σ−1 (X − µ)] and find a non-trivial upper bound of
e e e e
h i
Ta
0 −1
P (X − µ) Σ (X − µ) ≥ λ
e e e e
→ here Σ is pd.
(X − µ)0 Σ−1 = A and (X − µ) = B
e e e e
h i h n oi
E (X − µ)0 Σ−1 (X − µ) = E Tr (X − µ)0 Σ−1 (X − µ)
e e e e e e e e
= E [Tr {AB}] = E [Tr(BA)]
h n oi
0 −1
= E Tr (X − µ)(X − µ) Σ
h n e e e e o i
= Tr E (X − µ)(X − µ)0 Σ−1
e e e e
= Tr[ΣΣ−1 ] = Tr(Ip ) = p
Now,
h i E[(X − µ)0 Σ−1 (X − µ)] p
0 −1
P (X − µ) Σ (X − µ) ≥ λ ≤ e e e e (by Markov’s inequality) =
e e e e λ λ
12
1
Q. Show that, − < ρ < 1, for a random vector X where,
p−1 e
1 ρ ... ρ
ρ 1 . . . ρ
Rp×p =
.. .. ..
. . .
ρ ρ ... 1
1 ρ ... ρ
ρ 1 ... ρ
→ |R| = . .. .
.. . . . . ..
ga
ρ ρ ... 1
1 + (p − 1)ρ 1 + (p − 1)ρ . . . 1 + (p − 1)ρ
p
ρ 1 ... ρ X
= .. .. .. [R1 = Ri ]
. . ... . i=1
ρ ρ ... 1
1 1 ... 1
n ρ
= [1 + (p − 1)ρ] .
..
1 ...
..
.
ρ ρ ...
ρ
..
1
.
[Taking 1 + (p − 1)ρ common from R1 ]
ra
1 0 ... 0
ρ 1 − ρ ... 0
= [1 + (p − 1)ρ] . .. .. [Ci = Ci − C1 ∀ i = 2, 3, . . . , p]
.. . .
ρ 0 ... (1 − ρ)
= [1 + (p − 1)ρ](1 − ρ)p−1
Ta
V ar(Z) ≥ 0
⇒ V ar(X1 + X2 + X3 ) ≥ 0
⇒ V ar(X1 ) + V ar(X2 ) + V ar(X3 ) + 2 cov(X1 , X2 ) + 2 cov(X1 , X3 ) + 2 cov(X2 , X3 ) ≥ 0
13
here we assume that V ar(X1 ) = 1 ∀ i = 1, 2, 3.
∴ 3 + 2ρ12 + 2ρ13 + 2ρ23 ≥ 0
⇒ ρ12 + ρ13 + ρ23 ≥ − 32 (Proved)
Result: Let X be a p-variate random variable with mean vector µ and dispersion matrix Σ.
Then for any enon-random matrix B p×p , E(BX ) = Bµ and Disp(BX e ) = BΣB 0
→ let us take Y = BX = (Y1 , Y2 , . . . , Yp )0
e e e
e e
E(Y ) = BE(X ) = Bµ
e e e
h i
Disp(BX ) = E (BX − B(µ))(BX − Bµ)0 = E[B(X − µ)(X − µ)0 B 0 ]
e e e e e e e
ga
e 0 0
e 0
= BE[(X − µ)(X − µ) ]B = BΣB
e e e e
Q. Each of random variable X,Y,Z has mean 0 and variance 1, while aX + bY + cZ = 0. Find
the dispersion matrix of X,Y and Z and show that a4 + b4 + c4 ≤ 2 (b2 c2 + c2 a2 + a2 b2 )
2ab 2ac
c2 − (a2 + b2 ) a2 − (c2 + b2 )
Σ = Disp(X, Y, Z) = 1
2ab 2bc
b2 − (a2 + c2 ) a2 − (c2 + b2 )
1
2ac 2bc
2
c2 − (a2 + b2 )
Now, ρ2XY ≤1 ⇒ ≤1
2ab
⇒a4 + b4 + c4 ≤ 2 a2 b2 + b2 c2 + a2 c2
(P roved)
14
Q. Suppose X1 , X2 , . . . , X2p denote scores on 2p questions in an aptitude test. Suppose they
have a common mean µ and a common variance σ 2 , while the correlation coefficient between
any pair of them is same, ρ > 0. Let, Y1 be the sum of scores in odd numbered questions and Y2
be the sum of scores in even numbered questions. Show that the correlation coefficient between
Y1 and Y2 tends to unity as p increases.
Solution:
Y1 = X1 + X3 + . . . + X2p−1
2p−1 2p−1
XX
V ar(Y1 ) = V ar(X1 ) + V ar(X3 ) + . . . + V ar(X2p−1 ) + cov(Xi , Xj )
ga
i6=j=1
2p−1 2p−1
XX
2
⇒ V ar(Y1 ) = pσ + ρσ 2 = pσ 2 + p(p − 1)ρσ 2
i6=j=1
2p 2p
X X
Similarly, V ar(Y2 ) = V ar(X2 ) + V ar(X4 ) + . . . + V ar(X2p ) + cov(Xi , Xj )
i6=j=1
n ⇒ V ar(Y2 ) = pσ 2 + p(p − 1)ρσ 2
Now, cov(Y1 , Y2 ) = cov(X1 + X3 + . . . + X2p−1 , X2 + X4 + . . . + X2p
= cov(X1 , X2 + . . . + X2p ) + cov(X3 , X2 + . . . + X2p ) + . . . + cov(X2p−1 , X2 + . . . + X2p )
2p 2p 2p
X X X
= cov(X1 , Xi ) + cov(X3 , Xi ) + . . . + cov(X2p , Xi )
ra
i=2 i=2 i=2
⇒ cov(Y1 , Y2 ) = p pρσ 2 = p ρσ 2
2
p2 ρσ 2 p2 ρ
∴ ρY1 ,Y2 = =
pσ 2 + p(p − 1)ρσ 2 p + p(p − 1)ρ
ρ ρ ρ
= 1 (p−1)ρ = → = 1 as p → ∞ (P roved)
+ p 1
+ 1 − p1 ρ ρ
p p
Ta
3 Concept Of Regression
X1 µ1
X2 µ
Let X be a p-variate random vector, X = with mean vector µ = 2 .
. .
e .. e ..
e
Xp µp
X2
.
Here X1 is dependent an X 2 such that X 2 = .
e e . .
Xp
Define a predictor of X1 on X2 , . . . , Xp is given by
g(X (2) ) = E X1 |X (2)
e e
15
Result: E(X1 | X (2) ) is the best predictor of X1 based on X2 , X3 , . . . , Xp in terms of MSE.
e
→ Let f X (2) be any other predictor of X1 based on X2 , X3 , . . . , Xp .
e
2 2
E X1 − f X (2) = E X1 − g X (2) + g X (2) − f X (2)
e e 2 e e 2
= E X1 − g X (2) + E f X (2) − g X (2)
e e e
+ 2E X1 − g X (2) g X (2) − f X (2)
e e e
∴ 2E X1 − g X (2) g X (2) − f X (2)
h ne e e oi
= 2E E X1 − g X (2) g X (2) − f X (2) X (2)
h e ne e e oi
= 2E g X (2) − f X (2) E X1 − g X (2) X (2)
ga
h e e n e e
oi
= 2E g X (2) − f X (2) E X1 − E X1 X 2 X (2)
e e e e
= 2E g X (2) − f X (2) E X1 X 2 − E X1 X 2 =0
e e | e {z e }
0
Therefore,
n
E X1 − f X (2)
e
2
= E X1 − g X (2)
e
2
2
+ E g X (2) − f X (2)
| e {z e }
≥0
ra
2 2
⇒ E X1 − f X (2) ≥ E X1 − g X (2)
e e
i.e., g X (2) is the best predictor of X1 based on X2 , X3 , . . . , Xp (Proved).
e
Result: Correlation coefficient between X1 and its best predictor is positive.
→ The best predictor of X1 based on X2 , X3 , . . . , Xp is given by
Ta
h i
E g X 2 = E E X1 X (2) = E (X1 )
e e
Now,
cov X1 , g X (2) = E (X1 − E (X1 )) g X (2) − E g X (2)
e e e
= E (X1 − E (X1 )) g X (2) − E (X1 )
h n e oi
= E E (X1 − E (X1 )) g X (2) − E(X1 ) X (2)
e e
= E g X (2) − E(X1 ) E (X1 − E(X1 )) X (2)
e e
= E g X (2) − E(X1 ) E X1 X (2) − E(X1 )
e e
= E g X (2) − E(X1 ) g X (2) − E (X1 )
h e e 2 i
= E g X (2) − E g X (2)
e e
= V ar g X (2) ≥ 0
e
i.e., cov X1 , g X (2) > 0 hence corr X1 , g X (2) > 0. (Proved)
e e
16
Result: The correlation coefficient between X1 and its best predictor is maximum.
→ Let f X (2) be any other predictor of X1 based an X2 , X3 , . . . , Xp and g X (2) be the
best predictore of X based on X .
1 (2)
e
e
E g X 2 = E E X1 |X 2 = E(X1 )
e e
Let f (X 2 ) be any other predictor of X1 based on X2 , X3 , . . . , Xp .
e
Now,
cov X1 , f X (2) = E (X1 − E(X1 )) f X (2) − E f X (2)
e e e
= E X1 f X (2) − E f X (2) − E(X1 ) E f X 2 − E f X 2
e e | e {z e }
=0
ga
h n oi
= E E X1 f X (2) − E f X (2) X (2)
h e e e i
= E f X (2) − E f X (2) E X1 X (2)
e e e
= E f X (2) − E f X (2) g(X 2 ) − E g(X 2 ) + E g(X 2 )
e e e
e e
= E f (X 2 ) − E f (X 2 ) g(X 2 ) − E g(X 2 )
e e e e
n + E g(X 2 ) . E f (X 2 ) − E f (X 2 )
e | e {z e }
=0
= cov f X (2) , g X (2) (1)
e e
Again cov X1 , g X (2) = V ar g X (2) = σg2 (say)
ra
e e
Now note that,
σg2
cov X1 , g X (2) σg
ρX1 ,g(X (2) ) = p q e = =
e V ar(X1 ) V ar g X (2)
σ1 σg σ1
e
Again,
Ta
2
σg2
cov X 1 , f X (2) cov f X (2) , g X (2)
ρ2X ,f (X ) = hp 2 = ·
σf2 .σg2 σ12
1 (2)
q e i e e
e V ar(X1 ) V ar f X (2)
e
= ρ2f,g × ρ2X ,g(X )
1 (2)
e
Therefore, ρ2X1 ,f (X (2) ) ≤ ρ2X1 ,g(X (2) ) ∵ ρ2f,g ≤ 1 (Proved)
e e
2 2
Result: E X1 − g X (2) = V ar(X1 ) 1 − ρX ,g X where g(X 2 ) = E X1 |X 2
1 ( (2) )
e e e e
2 2
Now, E X1 − g(X 2 ) = E X1 − E(X1 ) + E(X1 ) − g(X 2 )
e e 2
= E [X1 − E(X1 )]2 + E E(X1 ) − g(X 2 ) + 2E (X1 − E(X1 )) E(X1 ) − g X 2
e2 e
= V ar(X1 ) + E g(X 2 ) − E g(X 2 ) − 2E (X1 − E(X1 )) g(X 2 ) − E(X1 )
e e e
= V ar(X1 ) + V ar g(X 2 ) − 2cov X1 , g(X 2 )
e e
17
= V ar(X1 ) − cov(X1 , g(X 2 )) ∵ cov(X1 , g(X 2 )) = V ar(g(X 2 ))
cov(X1 , g(X 2 ))
e e e
= V ar(X1 ) 1 −
V ar(X1 )
e
!2
cov(X1 , g(X 2 )) V ar(g(X 2 ))
= V ar(X1 ) 1 − p p e .
cov(X1 , g(X 2 ))
e
V ar(X1 ) V ar(g(X 2 ))
h i e e
2
= V ar(X1 ) 1 − ρX1 ,g(X 2 ) (1) ∵ cov(X1 , g(X 2 )) = V ar(g(X 2 ))
2 h e i e e
⇒ E X1 − g(X 2 ) = V ar(X1 ). 1 − ρ2X1 ,g(X 2 )
(P roved)
e e
ga
X1
!
X2
X1
X p×1 =
.. =
, µ = Mean Vector
e . X (2) e
e
Xp
X1 = response and X2 , X3 , . . . , Xp = covariate.
n µ=
µ1
µ2
.= µ
e ..
µ 1
!
and Σ =
σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p
..
.
=
σ11 σ (2)
e
σ (2) Σ2
0
!
ra
(2)
e
µp σp1 σp2 . . . σpp
e
X1.23...p = α + β2 X2 + β3 X3 + · · · + βp Xp
18
⇒ E[X1 Xj ] − αE(Xj ) − β2 E(X2 Xj ) · · · − βp (Xp Xj ) = 0
p
X
⇒ E[X1 Xj ] = αE(Xj ) + E(Xi Xj )βi
i=2
p p
!
X X
⇒E[X1 Xj ] = µ1 − µi βi µj + βi E(Xi Xj ) [Replacing the value of α̂]
i=2 i=2
p
X
⇒E[X1 Xj ] = µ1 µj + βi (E(Xi Xj ) − µi µj )
i=2
p p
X X
⇒E[X1 Xj ] − µ1 µj = βi σij ⇒ σ1j = βi σij ∀ j = 2, 3, . . . , p
i=2 i=2
ga
For j=2,
σ12 = β2 σ22 + β3 σ32 + β4 σ42 + . . . + βp σp2
For j=3,
σ13 = β2 σ23 + β3 σ33 + β4 σ43 + . . . + βp σp3
..
.
For j=p,
n σ1p = β2 σ2p + β3 σ3p + β4 σ4p + . . . + βp σpp
∴
σ12 σ22 σ23 . . . σ2p
σ13 σ32 σ33 . . . σ3p β3
.. = ..
β2
..
ra
.
. .
σ1p σp2 σp3 . . . σpp βp
⇒σ(2) = Σ2 β ⇒ β̂ = Σ−1 −1
2 σ (2) [Assuming Σ is PD, so Σ2 exists]
g e e e
e e
Note that, X1 = X1.23...p + e1.23...p
Result:
|Σ|
V ar(e1.23...p ) =
|Σ2 |
⇒ X1 = X1.23...p + e1.23...p
⇒ e1.23...p = X1 − X1.23...p
⇒ V ar(e1.23...p ) = V ar(X1 − X1.23...p )
⇒ V ar(e1.23...p ) = V ar(X1 ) + V ar(X1.23...p ) − 2 cov(X1.23...p , X1 )
Now, cov(X1 , X1.23...p ) = cov(X1.23...p + e1.23...p , X1.23...p )
= V ar(X1.23...p ) + cov(e1.23...p , X1.23...p ) = V ar(X1.23...p )
| {z }
0 (normal equation)
19
Note that, cov(X1.23...p , e1.23...p ) = E[X1.23...p e1.23...p ] − E[e1.23...p ] E[X1.23...p ]
| {z } | {z }
0 from normal equations 2,3,. . . , p 0 from normal equation 1
Therefore,
p
!
X
V ar(e1.23 . . . p) = V ar(X1 ) − V ar(X1.23 . . . p) = σ11 − V ar α̂ + βi Xi
i=2
ga
2 (2)
e e
Now,
0
!
σ11 σ (2) 0
Σ= e ⇒ |Σ| = |Σ2 ||σ11 − σ (2) Σ−1
2 σ (2) |
σ (2) Σ2 e e
e
|Σ|
⇒ = σ11 − σ 0(2) Σ−1
2 σ (2) [∵ σ11 − σ 0(2) Σ−1
2 σ (2) is a scalar]
|Σ2 |
n e
⇒ V ar(e1.23...p ) =
e
|Σ|
|Σ2 |
(P roved)
e e
ra
5 Multiple correlation coefficient:
Correlation coefficient between X1 and X1.23...p is termed as multiple correlation coefficient of
X1 on X2 , X3 , . . . , Xp .
v s
u
s u V ar α + β 0 X 2 β̂ 0 Σ2 β̂
V ar(X1.23...p ) t
= = e e = e e
V ar(X1 ) V ar(X1 ) σ11
Remark:
V ar(X1.23...p )
ρ21.23...p =
V ar(X1 )
Now, V ar(X1.23...p ) = V ar(X1 − e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2 cov(X1 , e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2 cov(X1.23...p + e1.23...p , e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2V ar(e1.23...p ) − 2 cov(X1.23...p , e1.23...p )
| {z }
0 (by normal equation)
= V ar(X1 ) − V ar(e1.23...p )
20
V ar(X1 ) − V ar(e1.23...p ) V ar(e1.23...p ) |Σ|
ρ21.23...p = =1− =1−
V ar(X1 ) V ar(X1 ) σ11 |Σ2 |
|Σ|
⇒ ρ21.23...p = 1 −
σ11 |Σ2 |
Remark: If correlation matrix R is given, then the multiple correlation coefficient can be
obtained in terms of R.
1 ρ12 . . . ρ1p
!
ρ21 1 ... ρ2p ρ0(2)
1
Let R =
.. .. .. =
ρ2 R2
e
. . .
ga
ρp1 ρp2 . . . 1
e
√
1/ σ11 0 ... 0
0
! √
0 1/ σ22 . . . 0
1 σ (2)
Now, σ = e and R = DΣD0 where D =
. . ..
σ (2) Σ2 . .
e √
0 0 ... 1/ σpp
n So, |R| = |DΣD0 | = |D|.|D0 | |Σ| = |D|2 |Σ| =
p p
Yp
i=1
σ
1
ii
|Σ|
ra
Y Y
⇒ |Σ| = σii |R| and similarly |Σ2 | = σii |R2 |
i=1 i=2
Therefore,
2 |R|
P1.23 ...p = 1 −
|R2 |
Ta
Solution:
ga
If part:
1 0 0 ... 0
0 1 ρ23 . . . ρ2p
1 ρ0(2)
|R| = 0 ρ32 1 ... ρ3p =
ρ(2) R2
e
.. .. ..
n . . . e
0 ρp3 ρp4 . . . 1
|R|
Now, |R| = |R2 | so ρ21.23...p = 1 − =1−1=0
|R2 |
ra
|R|
Only if part: It is given that ρ21.23...p = 0 ⇒ 1 − |R2 |
= 0 ⇒ |R| = |R2 |
∴ |R2 | |1 − ρ0(2) R2−1 ρ(2) | = |R2 | ⇒ 1 − ρ0(2) R2−1 ρ(2) = 1 [∵ The quantity is a scalar ]
e e e
⇒ ρ0(2) R2−1 ρ(2) = 0
g
e e
Now, R2 is PD so R2−1 is also PD i.e., ρ2 0 R2−1 ρ(2) = 0 when ρ(2) = 0
So, ρ1j = 0 ∀ j = 2, 3, . . . , p (P roved)e
Ta
e e
V ar(e1.23...p )
Solution: We know ρ21.23...p = 1 −
V ar(X1 )
where α̂, β̂2 , . . . , β̂p are least square estimators for the multiple linear regression equation of X1
on X2 , X3 , . . . , Xp .
Next, we consider the multiple linear regression of X1 on X2 , X3 , . . . , Xp−1
22
h i h i2
Now, V ar(e1.23...p−1 ) = E e21.23...p−1 E X1 − γ̂ − δ̂2 X2 − · · · − δ̂p−1 Xp−1
where γ̂, δ̂2 , . . . , δ̂p−1 are least square estimators in the MLR of X1 on X2 , X3 , . . . , Xp−1
Now, we set α0 = γ̂, β20 = δ̂2 . . . βp−1
0
= δ̂p−1 , βp0 = 0
2 h i2
0 0 0
E X1 − α − β2 X2 − . . . − βp Xp ≥ E X1 − α̂ − β̂2 X2 − . . . − β̂p Xp
h i2 h i2
⇒ E X1 − γ̂ − δ̂2 X2 − . . . − δ̂p−1 Xp−1 ≥ E X1 − α̂ − β̂2 X2 − . . . − β̂p Xp
V ar(e1.23...p−1 ) V ar(e1.23...p )
⇒ − V ar(e1.23...p−1 ) ≤ −V ar(e1.23...p ) ⇒1− ≤1−
V ar(X1 ) V ar(X1 )
⇒ ρ21.23...p ≥ ρ21.23...p−1 (P roved)
ga
Problem: For a random vector x the correlation matrix is given by
e
1 ρ ... ρ
ρ 1 . . . ρ
R= .. .. ..
. . .
n ρ ρ ... 1
Obtain the multiple correlation coefficient of X1 on X2 , X3 . . . Xp
Solution:
ra
1 ρ ... ρ 1 + (p − 1)ρ 1 + (p − 1)ρ . . . 1 + (p − 1)ρ
p
" #
ρ 1 ... ρ ρ 1 ... ρ X
|R|p×p = . .. = .. .. .. R10 → Ri
.. . . . . i=1
ρ ρ ... 1 ρ ρ ... 1
1 1 ... 1 1 0 0 ... 0
" #
Ta
ρ 1 ... ρ ρ 1 − ρ 0 . . . 0 0 0
C i → C i − C 1
= (1 + (p − 1)ρ) . .. .. = (1 + (p − 1)ρ)
..
.. . . .
∀ i = 2, 3, . . . , p
ρ ρ ... 1 ρ 0 0 ... 1−ρ
p−1
∴ |R| = (1 + (p − 1)ρ) (1 − ρ)
1 ρ ... ρ
ρ 1 ... ρ
|R2 | = . .. = (1 + (p − 2)ρ) (1 − ρ)p−2
.. .
ρ ρ ... 1
p−1×p−1
s 1/2 1/2
|R| (1 + (p − 1)ρ)(1 − ρ) 1 + (p − 2)ρ − (1 − ρ) {1 + pρ − ρ}
∴ ρ1.23...p = 1− = 1− =
|R2 | 1 + (p − 2)ρ 1 + (p − 2)ρ
1/2 s s
1 + (p − 2)ρ − 1 + ρ − pρ + ρ + pρ2 − ρ2
ρ2 (p − 1) (p − 1)
= = =ρ
1 + (p − 2)ρ 1 + (p − 2)ρ 1 + (p − 2)ρ
23
6 Partial correlation coefficient:
Let X be a p-variate random vector with mean vector µ and variance covariance matrix Σ.
e e
X1 µ1
X2 µ
X= and µ = 2
. .
e .. e ..
Xp µp
ga
X3 µ3
X1 µ1
X µ
where X (3) = 4 and µ(3) = 4
X = X2 , µ = µ
2 .
. ..
. .
(2)
e e
X (3) µ
e e
e e Xp µp
σ11 σ12 σ13 . . . σ1p
σ21 σ22 σ23 . . . σ2p σ11 σ12 σ 0(13)
n
Σ=
σ31 σ32 σ33 . . .
..
.
..
.
..
.
σp1 σp2 σp3 . . .
σ3p = σ21 σ22
..
.
σpp
σ (13) σ (23)
e e
σ 0(23)
e
e
Σ3
ra
σ31 σ32
. .
where σ (13) = . and σ (23) = .
e . e . [∵ σij = σji ]
σp1 σp2
p−2×p−2
σ23 . . . σ3p
Ta
.
Σ3 = ..
σp3 . . . σpp
24
Note that,
ga
Similarly,
Σ11
V ar(e2.34...p ) = where Σ11 is the cofactor of σ11 in Σ.
|Σ3 |
Now, cov(e1.34...p , e2.34...p ) = cov [(X1 − X1.34...p ), e2.34...p ] = cov[X1 , e2.34...p ] − cov (X1.34...p , e2.34...p )
| {z }
0 (by normal equation)
= σ12 −
Xp
ee
i=3
δ̂i Xi
!
ra
δ̂i σ1i
i=3 i=3
0 h i
= σ12 − δ̂ 0 σ (13) = σ12 − Σ−1 0 −1
3 σ (23) σ (13) = σ12 − σ (23) Σ3 σ (13) ∵ δ̂ = Σ−1
3 σ (23)
ee e e e e e e
σ21 σ 0(23)
Again, e = |Σ3 | σ21 − σ 0(23) Σ−1
3 σ (13)
σ (13) Σ3 e e
e
= |Σ3 | σ12 − σ 0(23) Σ−1
3 σ (13)
Ta
! e e
σ21 σ 0(23)
e cofactor of σ12 in Σ denoted by Σ12
σ (13) Σ3
e
Σ12
⇒ cov(e1.34...p , e2.34...p ) = −
|Σ3 |
25
In terms of correlation matrix,
R12
ρ12.34...p = − √
R11 R22
Ex: Show that (1 − ρ213 ) (1 − ρ212.3 ) = (1 − ρ21.23 )
R12
→ ρ12.3 = − √
R11 R22
1 ρ12 ρ13
|R| = ρ21 1 ρ23
ρ31 ρ32 1
= 1 − ρ223 − ρ12 (ρ21 − ρ23 ρ31 ) + ρ13 (ρ12 ρ23 − ρ13 )
ga
= 1 − ρ223 − ρ213 − ρ212 + 2ρ12 ρ13 ρ23
= 1 − ρ223
R11
(ρ21 − ρ23 ρ31 )
ρ12.3 = −p
(1 − ρ223 ) (1 − ρ213 )
s s
|R| ρ213 + ρ212 − 2ρ12 ρ13 ρ23
ρ1.23 = 1− =
R11 (1 − ρ223 )
n ⇒ 1 − ρ21.23 =
1 − ρ223 − ρ212 − ρ213 + 2ρ12 ρ13 ρ23
(1 − ρ223 )
ra
" #
2
(ρ21 − ρ23 ρ31 )
1 − ρ213 1 − ρ212.3 = 1 − ρ213
1−
(1 − ρ223 ) (1 − ρ213 )
(1 − ρ213 ) (1 − ρ212 − ρ213 − ρ223 + 2ρ12 ρ13 ρ23 )
=
(1 − ρ223 ) (1 − ρ213 )
= 1 − ρ21.23 (Proved)
Ta
Problem: If Xi0 s are uncorrelated for i=2,3,. . .,p. Show that ρ21.23...p = ρ212 + ρ213 + . . . + ρ21p .
1 ρ12 ρ13 . . . ρ1p
ρ12 1 0 ... 0
Solution: R = ρ13 0 1 ... 0
. .. .. ..
.. . . .
ρ1p 0 0 ... 1
p
X
Now, |R| = |R2 | |1 − ρ0(2) R2−1 ρ(2) | = 1 × |1 − ρ0(2) Iρ(2) | =1− ρ21i
e e e e i=1
p
!
X
1− ρ21i
|R| i=1
∴ ρ21.23...p = 1 − =1− = ρ212 + ρ213 + . . . + ρ21p (P roved)
|R2 | 1
26
Problem: If ρ12 = 0, then what can be said about ρ12.3 ? Hence comment on ρ12 as a measure
of correlation between X1 and X2 .
1 0 ρ13
Solution: R = 0 1 ρ23
ρ13 ρ23 1
R12 = ρ13 ρ23 ; R11 = 1 − ρ223 ; R22 = 1 − ρ213
−ρ13 ρ23
∴ ρ12.3 = p 6= 0
(1 − ρ223 ) (1 − ρ213 )
So, ρ12 is not a proper measure of association.
ga
Problem: If ρij = ρ ∀ i, j, obtain ρ12.34...p
1 ρ ... ρ
ρ 1 ... ρ
Solution: In the problem R =
.. .. ..
n . . .
ρ ρ ... 1
ρ ρ ρ ... ρ ρ 0 0 ... 0
ρ 1 ρ ... ρ ρ 1−ρ 0 ... 0
Now, R12 = − ρ ρ 1 ... ρ =− ρ 0 1 − ρ ... 0 = −ρ(1 − ρ)p−2
ra
.. .. .. .. .. .. .. ..
. . . . . . . .
ρ ρ ρ ... 1 ρ 0 0 ... 1 − ρ
1 ρ ... 0
ρ 1 ... ρ p−2
Again, R11 = R22 = .
.. .. .. = (1 + (p − 2)ρ) (1 − ρ)
. .
Ta
ρ ρ ... 1
ρ(1 − ρ)p−2 ρ
∴ ρ12.34...p = p−2
=
(1 − ρ) (1 + (p − 2)ρ) 1 + (p − 2)ρ
Solution:
p
X
Note that, ai X i = k
i=1
While calculating partial correlation coefficient between Xi and Xj , the other variables are
kept fixed, i.e., ai Xi + aj Xj = k ∗ (constant)
27
Under this scenario, the usual correlation coefficient between Xi and Xj will be ±1
(
1 iff ai and aj are of different signs
ρij.12...p =
−1 iff ai and aj are of same sign
ρ1j = ρ; j = 2, 3, . . . , p
Problem: If . Obtain ρ1.23...p and ρ12.34...p
ρij = ρ0 ; i, j = 2, 3, . . . , p, i 6= j
1 ρ ρ ... ρ
ga
ρ 1 ρ 0 . . . ρ0
Solution: Here, R = ρ ρ0 1 . . . ρ0 .
. . . ..
.. .. .. .
ρ ρ 0 ρ0 . . . 1
1 ρ ρ ... ρ 1 0 0 ... 0
0 0 2 0 2 0
ρ 1 ρ ... ρ ρ 1−ρ ρ −ρ . . . ρ − ρ2
n
Now, |R| = ρ ρ0 1 . . . ρ0 = ρ ρ0 − ρ2 1 − ρ2 . . . ρ0 − ρ2
.. .. ..
. . .
..
.
..
.
ρ ρ0 ρ0 . . . 1
..
.
..
.
..
.
ρ ρ0 − ρ2 ρ0 − ρ2 . . . 1 − ρ2
p×p
[Ci0 → Ci − ρCi ]
ra
2 0 2 0 2
1−ρ ρ −ρ ... ρ − ρ
0
ρ −ρ 2
1−ρ 2
. . . ρ0 − ρ2
⇒ |R| = .. .. .. = ((1 − ρ2 ) + (p − 2)(ρ0 − ρ2 )) (1 − ρ0 )p−2 .
. . .
ρ0 − ρ2 ρ0 − ρ2 . . . 1 − ρ2
p−1×p−1
1 ρ0 . . . ρ0
Ta
ρ0 1 . . . ρ0 0 0 p−2
Again, |R2 | = . . .. = (1 + (p − 2)ρ ) (1 − ρ )
.. .. .
ρ0 ρ0 . . . ρ0
|R| ρ2 (p − 1)
∴ ρ21.23...p =1− =
|R2 | 1 + (p − 2)ρ0
28
ρ ρ0 ρ0 . . . ρ0 1 ρ0 ρ0 . . . ρ0 1 0 0 ... 0
0 0 0 0 0
ρ 1 ρ ... ρ 1 1 ρ ... ρ 1 1−ρ 0 ... 0
R12 = − ρ ρ0 1 . . . ρ0 = −ρ 1 ρ0 1 . . . ρ0 = −ρ 1 0 1 − ρ0 . . . 0
.. .. .. . .. .. .. .. .. .. .. ..
. . . . . . .. . . . . . . . .
ρ ρ 0 ρ0 . . . 1 1 ρ0 ρ0 . . . 1 1 0 0 . . . 1 − ρ0
⇒ R12 = −ρ(1 − ρ0 )p−2
√
ρ 1 − ρ0
∴ ρ12.34...p = p
1 + (p − 2)ρ0 (1 + (p − 3)ρ0 − ρ2 (p − 2))
ga
We consider the multiple linear regression equation of X1 on X2 , X3 . . . Xp as follows
β2 X2
β3 X
X1.23...p = α + β X (2) , where β = and X (2) = 3
. ..
e .. .
ee e
βp Xp
n 0
If X (2) = 0 0 . . . 1 . . . 0 0 → jth element is 1
Then, X 1.23...p = α +βj . This βj termed as partial regression coefficient of X1 on Xj eliminating
e
the effect
e of X , X , . . . , X , X , . . . , X .
ra
2 3 j−1 j+1 p
Here βj refers to as change in the response X1 due to unit change in jth covariate Xj , while
other covariates are fixed.
σ (2) = Σ2 β̂
e
and α̂ = µ1 − β̂ 0 µ(2)
e
ee
σ12 σ22 σ23 . . . σ2p β2
σ13 σ32 σ33 . . . σ3p β3
σ (2) =
.. = ..
.. ..
e . . . .
σ1p σp2 σp3 . . . σpp β1
⇒ σi1 = σi2 β2 + σi3 β3 + · · · + σip βp (1)
29
Comparing equation (1) and (2) we get
Σ12 Σ11
β2 = − . . . βp = −
Σ11 Σ11
Σ1j
∴ βj = −
Σ11
i.e., partial regression coefficient of X1 on Xj eliminating the effect of X2 , X3 , . . . , Xj−1 , Xj+1 ,
. . . , Xp is given by
Σ1j
β1j.23...j−1, j+1...p = −
Σ11
ga
cient.
Note that
Σ12
β12.34...p = −
Σ11
Σ21
β21.34...p =−
Again
n β12.34...p × β21.34...p =
Σ22
Σ212
Σ11 × Σ22
= ρ212.34...p
ra
2 Σ22
V ar(e1.34...p ) = σ1.34...p (say) =
|Σ3 |
2 Σ11
V ar(e2.34...p ) = σ2.34...p (say) =
|Σ3 |
2
r
σ1.34...p Σ22 σ1.34...p Σ22
Ta
2
= ⇒ =
σ2.34...p Σ11 σ2.34...p Σ11
Again
−Σ12
ρ12.34...p = √ √
Σ11 Σ22
σ1.34...p Σ12
ρ12.34...p × =− = −β12.34...p
σ2.34...p Σ11
Result: 1 − ρ21.23...p
= (1 − − ρ212 ) (1 1− ρ213.2 ) . . .or, ρ21p.23...p−1
Derive the relationship between multiple correlation coefficient and partial correlation coeffi-
cient.
30
→ V ar(e1.23...p ) = E e21.23...p = E(e1.23...p e1.23...p )
ga
= E[X1 e1.23...p−1 ] − β̂p E[e1.23...p−1 Xp ]
[rest of the terms are zero due to normal equation]
= E (X1.23...p−1 + e1.23...p−1 )e1.23...p−1 − β̂1p.23...p−1 E[Xp e1.23...p−1 ]
= E[e21.23...p−1 ] + E[X1.23...p−1 e1.23...p−1 ] −β1p.23...p−1 E[Xp e1.23...p−1 ]
| {z }
n 0
2
= σ1.23...p−1 − β1p.23...p−1 E[(Xp.23...p−1 + ep.23...p−1 )e1.23...p−1 ]
[here we take Xp as response]
2
= σ1.23...p−1 − β1p.23...p−1 E[ep.23...p−1 e1.23...p−1 ]
ra
Now
E[ep.23...p−1 e1.23...p−1 ] = ρ1p.23...p−1 × σp.23...p−1 × σ1.23...p−1
Therefore,
2
σ1.23...p−1
V ar(e1.23...p ) = σ1.23...p−1 − ρ1p.23...p−1 × β1p.23...p−1
σp.23...p−1
σ1.23...p−1
Ta
2
= σ1.23...p−1 − ρ1p.23...p−1 × σ1.23...p−1 σp.23...p−1 × ρ1p.23...p−1 ×
σp.23...p−1
2
= σ1.23...p−1 − ρ21.23...p−1 σ1.23...p−1
2
2
= σ1.23...p−1 (1 − ρ21.23...p−1 )
2 2
= σ1.23...p−1 1 − ρ1p.23...p−1
Note that,
2
σ1.23...p
ρ21.23...p = 1 − 2
⇒ σ1.23...p = σ11 (1 − ρ21.23...p )
σ11
2 2
1 − ρ21.23...p
∴ V ar(e1.23...p ) = σ1.23...p = σ1.23...p−1
2 2 2
⇒ σ11 1 − ρ1.23...p = σ11 1 − ρ1.23...p−1 1 − ρ1p.23...p−1
⇒ 1 − ρ21.23...p = 1 − ρ21.23...p−1 1 − ρ21p.23...p−1
Again, (1 − ρ21.23...p−1 ) = (1 − ρ21.23...p−2 )(1 − ρ21p−1,2...p−2 )
31
Recursively we get the final result,
1 − ρ21.23...p = (1 − ρ212 ) (1 − ρ213.2 ) . . . 1 − ρ21p.23...p−1
2 2
Result: σ1.23...p = σ1.34...p (1 − ρ212.34...p )
2 2
→ σ1.23...p = V ar(e1.23...p ) = E(σ1.23...p ) = E[e1.23...p × e1.23...p ]
= E[(X1 − X1.23...p )e1.23...p ] = E[X1 e1.23...p ] − E[X1.23...p e1.23...p ]
| {z }
0
ga
= V ar(e1.34...p ) + E [e1.34...p X1.34...p ] −E [e1.34...p (α + β2 X2 + . . . + βp Xp )]
| {z }
0
2
= σ1.34...p − β2 E[e1.34...p X2 ] [ rest of the terms vanish from normal equations]
2
= σ1.34...p − β2 E [(X2.34...p + e2.34...p )e1.34...p ]
2
= − β2 E [X2.34...p e1.34...p ]0 −β2 E [e1.34...p e2.34...p ]
σ1.34...p
| {z }
n 2
= σ1.34...p
2
= σ1.34...p
− ρ12.34...p ×
σ1.34...p
σ2.34...p
(1 − ρ212.34...p )
(P roved)
× ρ12.34...p (σ1.34...p × σ2.34...p )
ra
Problem: V ar(e1.23...p ) = V ar(e1.23...p−1 ) 1 − ρ21p.23...p−1
Give an interpretation of ρ21p.23...p−1 from the above result.
Solution:
V ar(e1.23...p ) V ar(e1.23...p )
1 − ρ21p.23...p−1 = ⇒ ρ21p.23...p−1 = 1 −
V ar(e1.23...p−1 ) V ar(e1.23...p−1 )
Ta
(
0 if V ar(e1.23...p ) = V ar(e1.23...p−1 )
⇒ρ21p.23...p−1 =
↑ if V ar(e1.23...p ) ↑ and V ar(e1.23...p−1 ) ↓
If eliminating Xp results in high variation in prediction of X1 then it indicates that X1 and Xp
are highly correlated, even when the effects of X2 , X3 , . . . , Xp−1 are eliminated.
32
∴ P [(X − µ)0 Σ−1 (X − µ) > 3p] < P [(X − µ)0 Σ−1 (X − µ) < 3p]
e e e e e e e e
X1 0 1 c 0
Problem: Let X = X2 , µ = 0 and Σ = c 1 1.
e
X3 0 0 c 1
e
Is
it possible to find cfor which
e
0 0
a X and b X are independently distributed where a0 =
1 −1 −1 and b0 = 1 1 1 .Justify
e ee e
e
ga
⇒ cov a0 X , b0 X = 0
cov(Y1 , Y2 ) = 0
3 X
3
ee ee
X
⇒ ai bj cov(Xi , Xj ) = 0
i=1 j=1
2 V ar(e1.23...p ) 2 V ar(e1.23...p )
Solution:i) Note that, r12.34...p =1− ; r1.23...p = 1 − ;
V ar(e1.34...p ) V ar(X1 )
2 V ar(e1.34...p )
r1.34...p =1− .
V ar(X1 )
2 2
If r1.23...p = r1.34...p , then V ar(e1.23...p ) = V ar(e1.34...p ), i.e., ρ212.34...p = 0
Ta
33
8.1 Moment generating function:
0
M (t) = E(et1 X1 +t2 X2 +···+tp Xp ) = E(eet Xe )
Z
1
e
0 − 1 (x−µ)Σ−1 (x−µ)
= p/2 1/2
eet xe e 2 e e e e dx
(2π) |Σ| Rp e
Σ is positive definite, Σ−1 is also pd.
There exist a nonsingular matrix C such that Σ−1 = C 0 C
⇒ |Σ−1 | = |C 0 C| = |C|2
1
⇒ |C| =
ga
|Σ|1/2
1|Σ|1/2
Z
t0 (µ+C −1 y )− 12 y 0 y
M X (t) = e e e dy
(2π)p/2 |Σ|1/2
e e e
e e Z e
1 t0 µ t0 C −1 y − 12 y 0 y
= ee e ee e e dy
(2π)p/2
e
Z e
1 t0 µ ay 0 − 21 y 0 y
= ee e eee e e dy where a0 = t0 C −1
Ta
(2π)p/2 e e e
Z
1 t0 µ+ 12 a0 a − 12 (y 0 y −2a0 y +a0 a)
= ee e e e e e e e e e e dy
(2π)p/2 e
Z
t0 µ+ 12 a0 a 1 − 12 (y −a)0 −1
= ee e e e e e e Ip (y − a) dy
(2π)p/2 e e e}
| {z
∼Np (a,Ip )
e
t0 µ+ 12 t0 Σt −1 0 −1
[∵ a0 a = t C
0 −1
t = t0 (C 0 C) t = t0 Σt]
= ee e e e C
ee e e e e e e
Ex. X ∼ Np (µ, Σ) Obtain the distribution of A0 X .
→ Z = A0 X
e e e
e e
0 0 0 0
MZ (t) = E(eet Ze ) = E(eet A Xe ) = E(e(Ate) X ), At = h (say)
e e
h0 µ+ 1 h0 Σh h0 µ+ 1 (At)0 Σ(At)
0
e e e
= E(ehe Xe ) = ee e 2 e e = ee e 2 e e
h0 µ+ 12 t0 A0 ΣAt t0 A0 µ+ 12 t0 A0 ΣAt
= ee e e e = ee e e e
34
So by uniqueness of mgf, Z ∼ NP (A0 µ, A0 ΣA)
e e
Problem: If X ∼ Np (µ, Σ) then for any matrix Br×p with R(B)=r, show that BX ∼ Nr Bµ, BΣB 0 .
e e e
Solution: Let Z = BX
e e h 0 i h 0 i h 00 i h 0 i
t (BX ) ( B t )X
MZ (t) = E ee e = E ee e = E e e e = E ehe Xe
tZ
e e
h0 µ+ 1 h0 Σh (B 0 t)0 µ+ 1 (B 0 t)0 Σ(B 0 t) t0 (B µ)+ 12 t0 (B 0 ΣB)t
= ee e 2 e e = e e e 2 e e = ee e e e
ga
0
∴ Z ∼ Nr Bµ, BΣB
e
Ex. X ∼ Np (µ, Σ) show that (X − µ)0 Σ−1 (X − µ) ∼ χ2p
e e e e e e
J e
x
!
e e
y = B(x − µ) ⇒ x = µ + B −1 y
e e e e
p
= B −1 = |Σ|
e
ra
y
1
e
− 1 y0 y
fY (y ) = p/2 1/2
|Σ|1/2 e 2 e e
e e (2π) |Σ|
iid
∴ Y ∼ Np (0, Ip )
⇒ Yi ∼ N (0, 1)
e e p
X
0 −1
⇒ (X − µ) Σ (X − µ) = Yi2 ∼ χ2p (Proved)
Ta
e e e e i=1
Here, X ∼ Np (µ, Σ)
Only if part:
e Z =e l0 x
ee 0 0 0
MZ (t) = E(eet Xe ) = E(eet el Xe ), where h0 = tl0
h0 µ+ 1 h0 Σh tl0 µ+ 1 t0 l0 Σlt
e 0
e e
= E(ehe Xe ) = ee e 2 e e = eee e 2 e e ee
i.e., Z ∼ Np (l0 µ, l0 Σl)
If part: ee e e
Z = l0 Z ∼ Np (l0 µ, l0 Σl)
ee ee e e
l0 µ+ 1 l0 Σl
E(eZ ) = ee e 2e e
0 l0 µ+ 12 l0 Σl
⇒ E(eel Xe ) = ee e e e
35
By uniqueness of mgf, X ∼ Np (µ, Σ)
e e
2 2
Ex. E(et1 X+t2 Y ) = et1 +5t2 −t1 t2 −2t2 obtain mgf (X − 2Y, X + 3Y )
→ Z1 X − 2Y and Z2 = X + 3Y
ga
∗2 +5t∗2 −t∗ t∗ −2t∗
= et1 2 1 2 2
2 2
= e23t1 +43t2 −59t1 t2 −6t2 +4t1
59
∴ (Z1 , Z2 ) ∼ N2 4, −6, 46, 86, − √ √
n 86 48
Solution: If part
!
Σ1 0
Here Σ12 = 0 therefore the dispersion matrix is Σ =
0 Σ2
36
1
h i
− 12 (x1 −µ1 )Σ−1 0 −1
1 (x1 −µ1 )+(x2 −µ2 ) Σ2 (x2 −µ2 )
∴ fX (x) = e
(2π)p/2 |Σ1 ||Σ2 |
e e e e e e e e
e e
1 − 1 (x −µ )Σ (x −µ ) 1 − 1 (x −µ )Σ−1 (x −µ )
= m/2
e 2 e1 e 1 1 1 e 1 u/2
e 2 e2 f2 2 e2 e2
(2π) |Σ1 | (2π) |Σ2 |
⇒ fX (x) = fX 1 (x1 ).fX 2 (x2 )
e e e e e e
Result: Any subset of multivariate normal random variable is also multivariate normal.
! !
X (1) µ(1)
→X= e
ga
,µ = e
e X (2) e µ(2)
e !
Σ1 Σ12
e
Σ=
Σ21 Σ22
Define
e e(2) e
y (2) = x
e e
i.e., we are considering the transformation X 1 X 2 → Y 1 Y 2 such that
e e e e
! ! !
Y (1) I −Σ12 Σ−12 X (1)
e = e = PX
Y (2) 0 I X (2) e
e e
where |P | = 1 i.e., P is non-singular.
Now, X ∼ Np (µ, Σ) ⇒ P X ∼ Np (P µ, P ΣP 0 )
e e e e
37
! ! !
I −Σ12 Σ−1
2 Σ1 Σ12 I 0
P ΣP 0 =
0 I Σ21 Σ2 −Σ−1
2 Σ21 I
! !
Σ1 − Σ12 Σ−1
2 Σ21 0 I 0
=
Σ21 Σ2 −Σ−1
2 Σ21 I
!
Σ1 − Σ12 Σ−1
2 Σ21 0
=
0 Σ2
! !
I −Σ12 Σ−1
2 µ(1)
and P µ =
0 I µ(2)
e
ga
e
e!
µ(1) − Σ12 Σ−12 µ(2)
=
µ(2)
e e
e
−1 (2) −1
∴ Y (2) ∼ Nu (µ(2) , Σ2 ) and Y (1) ∼ Nm µ(1) − Σ12 Σ2 µ , Σ1 − Σ12 Σ2 Σ21
e e e e e
Y (2) = X (2) subset of X follows multivariate normal. (Proved)
e n e e
38
I Σ12 Σ−1
2
Jocobian of the transformation = |J| = = |I| = 1
0 I
1 − 12 (x(2) −µ(2) )0 −1
P
2 (x(2) −µ(2) )
× µ/2
P 1/2
e
(2π) | 2 |
e e e e
ga
e e
e e 0
1
− 21 x(1) −µ(1) −Σ12 Σ−1
2 (x(2) −µ
(2) ) Σ−1 −1
11.2 x(1) −µ(1) −Σ12 Σ2 x(2) −µ(2)
= ×e e e
(2π)m/2 |Σ11.2 |1/2
e e e e e e
X (1) |X (2) ∼ Nm µ(1) + Σ12 Σ−1 2 (x (2) − µ (2) ), Σ 11.2
e e e e e
Ex. X ∼ Np (µ, Σ) and Σ is PD and a is a fixed vector. If ri be the correlation between Xi and
a0 X then
e
n show e that e
ee
r1
r2
− 1
0 1
r=. = C DΣa
2 where C = a Σa and D = Diag √
e .
. e e e e σii
ra
rp
Solution:
cov Xi , Σpj=1 aj Xj Σpj=1 cov(Xi , Xj )aj Σpj=1 aj σij
cov(Xi , a0 X )
→ ri = p = √ p = p = p
V ar(X i )V ar(a0 X ) σii a0 Σa σii a0 Σa σii a0 Σa
ee
e ee e e e e e e
Now,
Σaj σij
Ta
0
r1 √ p σ11 σ 12 . . . σ 1p σ (1)
σ11 a0 Σa e0
r2 σ σ . . . σ2p σ
.. e e
21 22 = e(2)
r= .
= . Σ = .. .. .. ..
..
e Σaj σpj
. . . .
rp √ p 0
σp1 σp2 . . . σpp σ 0(p)
σpp a Σa
√ e e e
a0 σ (1) / σ11
e0 e √
a σ (2) / σ22
= √aΣa
1
e e ..
.
e e
√
a0 σ / σpp
e e(p)
√1
σ11
0 . . . 0 σ 0(1)
e
1 0
√1 ... 0 σ 0(2)
−2 σ 22
=C . .. .. e .. e
a
.. . . .
0 0 ... √
1
σpp
σ 0(p)
1 e
= c− 2 DΣa (P roved)
e
39
Ex. X ∼ N3 (0, Σ)
e
1 P 0
Σ = P 1 P is there any value of P such that X1 + X2 + X3 and X1 − X2 − X3 are
0 p 1
independent.
cov(X1 + X2 + X3 , X1 − X2 − X3 ) = 0
=V ar(X1 ) − cov(X1 , X2 ) − cov(X1 , X3 ) + cov(X1 , X2 ) − V ar(X2 ) − cov(X2 , X3 )
+ cov(X1 , X3 ) − cov(X2 , X3 ) − V ar(X3 )
⇒ 1 − 1 − 1 − 2P = 0
1
ga
⇒P =
2
1 ρ ρ2
Ex. X ∼ N3 (0, Σ), Σ = ρ 1 ρ show that for c > 0
ρ2 ρ
e e
1
Z c
1 −y2
X22 2
X12 X22 X32 √ e 2 y 1/2 dy
P + c ρ − 2ρ(X1 X2 + X2 X3 ) + + + −c≤0 =
2x
→P
n
Solution:
X22 + c ρ2 − 2ρ(X1 X2 + X2 X3 ) + X12 + X22 + X32 − c ≤ 0
Y1 = p −p
1 − ρ2 1 − ρ2
Y2 = X2
X3 ρX2
Y3 = p −p
1−ρ 2 1 − ρ2
Y1
√1
2
−√ ρ 2 0
X1
1−ρ 1−ρ
⇒ Y = Y2 = 0 1 0 X2
ρ
−√ 2 √ 1
e
Y3 0 2
X3
1−ρ 1−ρ
⇒Y = P X ∼ N3 0, I3 [∵ P ΣP 0 = I3 (check)]
e e e iid
∴ P Y12 + Y22 + Y32 ≤ c = P [Z ≤ c] where Z ∼ χ23 [∵ Yi ∼ N (0, 1)]
Z c
1 y 3
= √ e− 2 y 2 −1 dy (Proved)
0 2π
40
Ex. X ∼ Np (µ, Σ)
e
1 1 1 ... 1
e
µ
1 2 2 . . . 2
µ
µ= Σ = 1 2 3 . . . 3
..
e . .
.. ..
.
µ
1 2 3 ... p
obtain the distribution, (X2 − X1 )2 + (X3 − X2 )2 + · · · + (Xp − Xp−1 )2
→ Define Y1 = (X1 − µ)
Y2 = (X2 − µ) − (X1 − µ)
Y3 = (X3 − µ) − (X2 − µ)
ga
..
.
Yp = (Xp − µ) − (Xp−1 − µ)
(X1 − µ) = Y1
(X2 − µ) = Y2 + Y1
(X3 − µ) = Y1 + Y2 + Y3
..
n .
(Xp − µ) = Y1 + Y2 + · · · + Yp
X1 − µ
1 0 0 ...
0
Y1
ra
1 1 0 ... 0
X2 − µ Y2
.. = 1 1 1 ... 0
..
. ... .. .. .. .
. . .
Xp − µ Yp
1 1 1 ... 1
(X − µ) = cY ⇒ Y = c−1 (X − µ)
Ta
e e e e e e
X
J e = |c| = 1 and (X − µ) ∼ Np (0, Σ)
Y e e e
e
1 0 ... 0 1 1 ... 1 1 1 ... 1
1 1 . . . 0 0 1 . . . 1 1 2 . . . 2
cc0 =
.. .. .. .. ..
.. = .. ..
.. = Σ
. . . . . . . . .
1 1 ... 1 0 0 ... 1 1 2 ... p
⇒|Σ| = |cc0 | = |c|2 = 1
41
The joint pdf of Y1 , Y2 , . . . , Yp is given by
1 − 12 y 0 y
fY (y ) = e
(2π)p/2
ee
e e
iid
Therefore, Y1 , Y2 , . . . , Yp ∼ N (0, 1)
⇒ Y22 + Y32 + · · · + Yp2 ∼ χ2p−1
Ex. x ∼ N4 (µ, Σ)
e
1 1 1 1 µ
e
1 2 2 2 µ
Σ=
µ=
1 2 3 3
µ
ga
e
1 2 3 4 µ
obtain the distribution of (x1 − x2 )2 + (x3 − x4 )2
→ y1 = (x1 − µ) − (x2 − µ)
y2 = (x3 − µ) − (x4 − µ)
n y1
y2
!
=
1 −1 0
0 0
!
1 −1 x
3 −
µ
x2 − µ
0
x1 − µ
ra
x4 − µ
y = B(x − µ)
e e e
(x − µ) ∼ N4 (0, Σ)
e e
B(x − µ) ∼ N4 (0, BΣB 0 )
Ta
e e
1 1 1 1 1 0
!
1 −1 0 0 1 2 2 2 −1 0
BΣB 0 =
0 0 1 −1 1 2 3 3
0 1
1 2 3 4 0 −1
!
1 0
= =I
0 1
+
y1 ∼ N (0, 1)
ii∂
y2 ∼ N (0, 1)
i.e., y12 + y22 ∼ x22
42
Ex. Let X and Y are p-variate random vectors such that
e e ! !!
X A B
Z = e ∼ N2p 0,
e Y e B0 C
e
Show that (X + Y ) and (X − Y ) are independent if A = C, B = B 0
e e e e
Solution:
→ U =X +Y
e e e
V =X −Y
!
e e e ! !
U I I X
ga
e = e = DZ ∼ N2p (0, DΣD0 )
V I −I Y e e
e e
! ! !
I I A B I I
DΣD0 = 0
I −I B C I −I
! !
A + B0 B + C I I
= 0
A−B B−C I −I
n U
!
=
A + B0 + B + C A + B0 − B − C
A − B0 + B − C A − B0 − B + C
2(A + B) 0
!
!
ra
Disp e = [ If A=C and B=B’]
V 0 2(A − B)
e
Since the covariance are zero, U and V are independent.
e e
X1 1 ρ12 ρ13
Ex. X = X2 ∼ N3 0, ρ2 1 ρ23
e e
Ta
X3 ρ13 ρ23 1
Show that
1 sin−1 ρ12 + sin−1 ρ13 + sin−1 ρ23
P [X1 > 0, X2 > 0, X3 > 0] = +
8 4π
43
1 1 1 3
⇒ 2P (X1 > 0, X2 > 0, X3 > 0) = 1 − + + +
2 2 2 4
1 −1 1 −1 1 −1
sin ρ12 + sin ρ13 + sin ρ23
2π 2π 2π
1 1
⇒ P (X1 > 0, X2 > 0, X3 > 0) = + (sin−1 ρ12 + sin−1 ρ13 + sin−1 ρ23 ) (proved)
8 4π
!
Σ1 Σ12
Ex. Let X ∼ Np (0, Σ) where Σ =
e e Σ21 Σ2
X = X1 X2 β = Σ12 Σ−1
2
e e e
ga
−1
Σ11.2 = Σ1 − Σ12 Σ2 Σ21
e Y2 λX λ e e
e e e
cX ∼ N2 (cµ, cΣc0 )
e e
Now, cµ = l0 λ0 µ = l0 µ + λ0 µ = 0
e e e e e e e e 0
0 0 0 00 0
Again, cov(Y1 , Y2 ) = cov(l X , λ X ) = E l X − l µ λ X − λ µ
ee e e e e e e e e e e
0
0
=lE X −µ X −µ λ = l0 Σλ = 0
e e e e e e e e
44
Ex. Suppose Z ∼ N (0, 1), Y |Z sin N (1 − Z, 1) and (X|Y, Z) ∼ N (1 − Y, 11). (a) Find the joint
distribution of (X, Y, Z). (b) Find the joint distribution of (U, V )0 = (1 + Z, 1 − Y ).
(c) Find E(Y |U = 1.7).
1 z2
a) fZ (z) = √ e− 2
2π
1 − 1 [y−(1−z)]2
fY |Z (y|z) = √ e 2
2π
1 − 12 {z2 +(y−1+z)2 }
fY,Z (y, z) = fY |Z (y|z) × fZ (z) = e
2π
1 − 12 {z2 +(y−1)2 +z2 −2(y−1)z}
= e
2π
1 − 12 {2(y−1)z+(y−1)2 +2z2 }
ga
= e
2π
1 1 2
fX|Y,Z (x, y, z) = √ e− 2 {x−1+y}
2π
∴ f (x, y, z) = fx|y,z (x, y, z) × fyz (y, z)
1 1 2 2 2
= 3/2
e− 2 [(x−1+y) −2(y−1)z+(y−1) +2z ]
n (2π)
0 −1
Y, Z
ra
J = =1
U, V 1 0
|J| = 1
1 − 1 [2{1−v−1}(u−1)+v2 +2(u−1)2 ]
fU,V (u, v) = e 2
2π
1 − 1 [2u2 +v2 −2u−2uv+2v+2]
= e 2 u ∈ R, v ∈ R
Ta
2π
c)
45
Find the distribution of
√ s
n(X̄ − µ) 1 + (n − 1)ρ
T =
S 1−ρ
ga
⇒ Y = P (X − µ) ⇒ Y = PZ
e e e e e
Now, Z ∼ Nn (0, Σ) ⇒ P Z ∼ Nn (0, P ΣP 0 )
e e e e
1 ρ ... ρ
ρ 1 ... ρ
Σ = σ2 .. .. ..
. . .
n
ρ ρ ... 1
1−ρ
0
= σ2
0 ...
1 − p ...
0
0
ρ ρ ...
ρ ρ . . .
+
ρ
ρ
= σ 2 [(1 − ρ)In + ρ110 ]
ra
.. .. .. ..
. . . . ee
0 0 ... 1−ρ ρ ρ ... ρ
P ΣP 0 = P σ 2 [(1 − ρ)In + ρ110 ]P 0 = σ 2 [P (1 − ρ)In P 0 + P ρ110 P 0 ]
= σ 2 [(1 − ρ)In + ρP 1(P 1)0 ]
ee ee
√ e e√ √ √
1/ n 1/ n 1/ n ... 1/ n √
√ √ 1 n
Ta
1/ 2 −1/ 2 0 ... 0
√ √ √
1 0
P1 = 1/ 6 1/ 6 −2/ 6 ... 0 .. = ..
e .. .. .. .. . .
p . . . ... .
1 0
p p p
1/ n(n − 1) 1/ n(n − 1) 1/ n(n − 1) . . . −(n − 1)/ n(n − 1)
√
n n ... ... 0
0 √ 0 . . . . . . 0
0 0
(P 1) (P 1) = .
n 0 ... 0 = .
..
e e . . . . .
0 0 ... ... 0
(1 − ρ) 0 ... 0 nρ 0 . . . 0
.
P ΣP 0 = σ 2
0 (1 − ρ) . . . 0 + ..
.. .. ..
. . .0 0 . . . (1 − ρ) 0 0 ... 0
46
1 + ρ(n − 1) 0 ... 0
0 (1 − ρ) . . . 0
0 = Σ∗ (say)
2
⇒ P ΣP = σ ..
. ...
0 0 ... (1 − ρ)
∴ Y ∼ Nn (0, Σ∗ )
e e
i.e., Y1 , Y2 , . . . , Yn are independent as Σ∗ is a diagonal matrix
∴ Y1 ∼ N (0, σ 2 (1 + (n − 1)ρ))
Yi ∼ N (0, (1 − ρ)σ 2 ) ∀ i = 2, 3, . . . , n
√
ga
Now, Y1 = n(X̄ − µ)
n
X n
X
2 2
Note that, (n − 1)S = (Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2
i=1 i=1
n
e e e e en e e n e e e
2
X
2 2
X
2
X Yi2
∴ (n − 1)S = Yi − Y1 = Yi ⇒ 2
∼ χ2n−1
σ (1 − ρ)
n √
i=1 i=2
∴
i=2
(n − 1)S 2
σ 2 (1 − ρ)
∼ χ2n−1
ra
n(X̄ − µ)
p √ s
σ 1 + (n − 1)ρ n(X̄ − µ) 1−ρ
And finally, s = · ∼ tn−1
(n − 1)S 2 1 S 1 + (n − 1)ρ
σ 2 (1 − ρ) (n−1)
e e
p
√
P
r σii
2
i=1
P [|X1 | > a1 , |X2 | > a2 , . . . , |Xp | > ap ] ≤ p
π P
ai
i=1
Solution:
p p
" #
X X
∴ P [|X1 | > a1 , |X2 | > a2 , . . . , |Xp | > ap ] ≤ P |Xi | > ai
i=1 i=1
p
P
Pp E|Xi |
E[ i=1 |Xi |] i=1
≤ p = p [Markov’s inequality]
P P
ai ai
i=1 i=1
r
2
Now Xi ∼ N (0, σii ) ⇒ E|Xi − 0| = σii [MD about mean]
π
47
Therefore,
p
√
P
r σ ii
2
i=1
P [|X1 | > ai , |X2 | > a2 , . . . , |Xp | > ap ] ≤ p
(Proved)
π P
ai
i=1
Question:
1 ρ12 ρ13
X1 , X2 , X3 ∼ N3 0, ρ12 1 ρ23
e
ρ13 ρ23 1
Show that 1 + 2ρ12 ρ13 ρ23 ≥ ρ212 + ρ213 + ρ223
ga
Solution: As |Σ| ≥ 0
1 ρ12 ρ13
ρ12 1 ρ23 ≥ 0
n ρ13 ρ23 1
⇒ 1 − ρ223 − ρ12 (ρ12 − ρ13 ρ23 ) + ρ13 (ρ12 ρ23 − ρ13 ) ≥ 0
⇒ 1 + 2ρ12 ρ13 ρ23 ≥ ρ212 + ρ213 + ρ223 (P roved)
ra
9 Multinomial distribution
Important remark:
V ar(E(X1 |X (2) ))
ρ21.23...p =
V ar(X1 )
e
E[cov(X1 , X2 |X (3) )]
ρ212.34...p = q p e
E(V ar(X1 |X (3) )) E(V ar(X2 |X (3) ))
e e
9.1 Introduction
Let us consider that we have k categories 1,2,. . ., k. Probability of falling into the i-th category
is pi , ∀ i = 1, 2, . . . , k. Suppose we have n samples which can belong to any of these categories
48
(these categories are disjoint). We define the following random variables,
X k X k
Xi : no. of samples falling into the i-th category where Xi = n and pi = 1
i=1 i=1
So the joint pmf of X1 , X2 , . . . , Xk is given by
n!
fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pxkk
x1 !x2 ! . . . xk !
k
X k
X
where pi = 1, xi = n and 0 < pi < 1
i=1 i=1
0
X = X1 X2 . . . Xk is said to follow multinomial distribution with parameters n,p1 , p2 , . . . , pk
ga
e
=
X
···
X
t X
E(eet Xe ) = E(ee 1 1 +t2 X2 +···+tk Xk )
n n
et1 x1 +t2 x2 +···+tk xk
n!
px1 px2 . . . pxkk
ra
x1 =0 xk =0
x1 !x2 ! . . . xk ! 1 2
n n
X X n!
= ··· (p1 et1 )x1 (p2 et2 )x2 . . . (pk etk )xk
x1 =0 xk =0
x 1 ! . . . x k !
= (p1 et1 + p2 et2 + · · · + pk etk )n
∂M ∂
= E(Xi ) = (p1 et1 + p2 et2 + · · · + pk etk )n
∂ti t=0 ∂ti t=0
e e e e
49
Dispersion matrix:
np1 (1 − p1 ) −np1 p2 ... −np1 pk
−np1 p2 np2 (1 − p2 ) . . . −np2 pk
Σ= . ..
.. .
−npk p1 −npk p2 ... npk (1 − pk )
(1 − p1 ) −p2 ... pk 0 −p2 ... −pk
−p1 (1 − p2 ) . . . −pk 0 (1 − p2 ) . . . −pk
|Σ| = nk (p1 p2 . . . pk ) .. .. .. = nk (p1 . . . pk ) . .. ..
. . . .. . .
−p1 −p2 ... (1 − pk ) 0 −p2 ... (1 − pk )
ga
Thus it is clear that X is a degenerate random vector
e
k−1
! n n k−1
!n−k−1
P
xi
n!et1 x1 +···+tk−1 xk−1
P
ti Xi i=1
xk−1
X X X
E e i=1 = ··· k−1
px1 1 px2 2 . . . px−1 1− pi
P
k−1 ! n −
x1 =0 xk−1 =0 x ! . . . x xi−1 ! i=1
1
i=1
n n k−1
!n−k−1
P
xi
X X k−1 X i=1
= ··· (p1 et1 )x1 (p2 et2 )x2 . . . pk−1 etk−1 × 1− pi
xi =1 xk−1 =0 i=1
k−1
!!n
X
= p1 et1 + pe2 et2 + · · · + pk−1 etk−1 + 1− pi
i=1
k−1
!!n−1
∂M X
= n p1 et1 + . . . + pk etk−1 + 1− pi pi eti = npi
∂ti t=0 i=1 t=0
e e e e
50
)
V ar(Xi ) = npi (1 − pi )
Check similarly
cov(Xi , Xj ) = −npi pj
Also note that |Σ| =
6 0, so the modified multinomial distribution is not degenerate.
ga
k
!n
X
t1 tk n ti
MX (t) = (p1 e + · · · + pk e ) = pi e
i=1
e e
t1
t2
t= . take tr+1 = tr+2 = · · · = tk = 0
..
e
n tk
MX (t1 , t2 , . . . , tr , 0, 0, . . . , 0) =
( r
X
eti pi +
k
X
pi
)n
ra
i=1 i=r+1
e
Note that
k
X n
X k
X k
X r
X
pi = 1 ⇒ pi + pi = 1 ⇒ pi = 1 − pi
i=1 i=1 i=r+1 i=r+1 i=1
Thus ( r )n
X r
X
eti pi + 1 −
Ta
MX (t1 , t2 , . . . , tr , 0, . . . , 0) = pi
i=1 i=1
e
k
P
Ex. Let (X1 , X2 , . . . , Xk ) ∼ M N (n, p1 , p2 , . . . , pk ) and pi = 1.
i=1
r
X
Obtain distribution of Xi , r < k
i=1
r
P
Solution: Let Z = Xi
i=1
( r k
)n
n
X X
MZ (t) = E(etz ) = E(etX1 +tX2 +tXr ) = et pi + = q + pet
pi
i=1 i=r+1
k
X k
X r
X
where q = pi and p = 1 − q = 1 − pi = pi
i=r+1 i=r+1 i=1
51
r r
!
X X
So, Z = Xi ∼ Bin n, pi
i=1 i=1
ga
=
P [Xr+1 = xr+1 , . . . , Xk−1 = xk−1 ]
k−1
n!
P
xk−1 n− xi
px1 1 px1 2 . . . pk−1 (1 − p1 − · · · − pk−1 ) i=1
x !x ! . . . xk−1 !(n − x1 − x2 . . . xk−1 )!
= 1 2
k−1
n− k−1
P
xi
n! xr+1 xr+2 P i=r+1
k
pr+1 pr+2 . . . 1 − pi
P i=r+1
n xr+1 !xr+2 ! . . . n − xi !
i=r+1
k−1
k−1
n−k−1
P
xi
P x1 x2 i=1
n− xr
P
xi ! (p1 p2 . . . pr ) × 1 − pi
i=r+1 1 i=1
= ×
x1 !x2 ! . . . xr ! k−1
n− k−1
P
xi
ra
P
n− xi !
k−1
P i=r+1
i=1 1− pi
i=r+1
k−1
X k−1
X
Let n − xi = n0 and p0 = 1 − pi
i=r+1 i=r+1
r
n0 −
P
Pr xi
0
n! px1 1 . . . pxr r 0
(p − i=1 pi ) i=1
=
Ta
p0n0
r
0
P
x1 !x2 ! . . . xp ! n − xi !
i=1
r
r n0 − P xi
i=1
P
x1 xr pi
n0 !
p1 pr
1 − i=1
= ...
0
r
P p0 p0 p0
x1 !x2 ! . . . xr ! n − xi !
i=1
0 p1 p2 pr
∴ (X1 , X2 , . . . , Xr |Xr+1 , . . . , Xk−1 ) ∼ M N n , 0 , 0 , . . . , 0
p p p
Ex. (X1 , X2 , . . . , Xk−1 ) ∼ M N (n, p1 , p2 , . . . , pk−1 )
p1 (p2 + p3 + · · · + pk−1 )
Show that, ρ21.23,...,k−1 =
(1 − p1 )(1 − p2 , −p3 − · · · + pk−1 )
52
k−1
X p1
Solution: X1 |X2 , X3 , . . . , Xk−1 ∼ Bin n − xi , (previous result)
k
P
i=2 1− pi
i=2
k−1
!
X p1
E(X1 |X2 , . . . , Xk−1 ) = n− xi
i=2
1 − p2 − p3 − . . . pk−1
ga
V ar(E(X1 |X (2) )) V ar(X1 ) − V ar(E(X1 |X |(2)))
⇒ 1 − ρ21.23...p = 1 − =
V ar(X1 ) V ar(X1 )
e e
E[V ar(X1 |X (2) )] + V ar(E(X1 |X (2) )) − V ar(E(X1 |X (2) )) E[V ar(X1 |X (2) )]
= =
V ar(X1 ) V ar(X1 )
e e e e
n k−1
!
X p1 p1
Now, V ar(X1 |X (2) ) = n − xi 1−
e i=2
1 − p 2 − · · · − p k−1 1 − p2 − · · · − pk−1
k−1
!
X p1 p1
∴ E[V ar(X1 |X (2) )] = n − E(Xi ) 1−
1 − p 2 − · · · − p k−1 1 − p2 − · · · − pk−1
ra
e i=2
k−1
!
X p1 p1
=n 1− pi 1−
k−1
P 1 − p2 − · · · − pk−1
i=2 1− pi
i=2
p1 p1
= np1 1 − = np1 1 −
Ta
1 − p2 − · · · − pk−1 k−1
P
1− pi
i=2
p1
np1 1 − k−1
n
p1
o
1−
P
1− pi
1−p2 −···−pk−1
∴ 1 − ρ21.23...p = i=2
⇒ ρ2i.23...p = 1 −
np1 (1 − p1 ) (1 − p1 )
k−1
k−1
P P
(1 − p1 )(1 − pi ) − 1 − pi
{1 − p1 − p2 − · · · − pk−1 } i=2 i=2
⇒ ρ21.23...p = 1 − =
(1 − p1 )(1 − p2 − · · · − pk−1 ) (1 − p1 )(1 − p2 − · · · − pk−1 )
k−1
P k−1
P k−1
P
1− pi − p 1 + p1 pi − 1 + p1 + pi
i=2 i=2 i=2
=
(1 − p1 )(1 − p2 − · · · − pk−1 )
p1 (p2 + p3 + · · · + pk−1 )
= (Proved)
(1 − p1 )(1 − p2 − · · · − pk−1 )
53
ind
Problem: Xi ∼ P oi(λi ), i = 1(1)k.
k
P
Find the conditional distribution of X1 , X2 , . . . , Xk | Xi = n and
i=1
k
P
X1 , X2 , . . . , Xr |Xr+1 , . . . , Xk−1 , Xi = n .
i=1
k
ga
P
P X1 = x1 , X2 = x2 , . . . , Xk = xk , Xi = n
i=1
= k
P
P Xi = n
i=1
k−1
P
P [X1 = x1 ]P [X2 = x2 ] . . . P [Xk−1 = xk−1 ]P Xk = n − Xi
i=1
= k
P
P Xi = n
n=
x1 !
x
e−λ1 λ1 1
× x2 !
x
e−λ2 λ2 2
× ··· ×
i=1
xk−1
e−λx−1 λk−1
xk−1 !
×
e−λk λ
n−
n−
k
k−1
P
i=1
k−1
P
i=1
!
xi
xi !
ra
k n
k
P
− λi P
e i=1 λi
i=1
n!
x1 xk−1 n−k−1
P
xi
k
P i=1
− λi
n!e λ
i=1
1
λ
k−1
λ
k
= . . .
Ta
k k k
P k
λi k−1 k−1
P
− P P
e i=1
Q
x! n−
P
x ! λi λi λi
i i
i=1 i=1 i=1
i=1 i=1
k
x1 xk−1 n− P xi
i=1
n! λ
1
λ
k−1
λ
k
= k−1
k
. . . k
k
P P P P
x1 !x2 ! . . . xk−1 ! n − xi ! λi λi λi
i=1 i=1 i=1 i=1
" k
#
X
ii)P X1 = x1 , X2 = x2 , . . . , Xr = xr Xr+1 = xr+1 , . . . , Xk−1 = xk=1 , Xi = n
i=1
k−1
P
P X1 = x1 , X2 = x2 , . . . Xr = xr , Xr+1 = xr+1 , . . . Xk−1 = xk−1 , Xk = n − Xi
i=1
= k−1
P
P [X1 = x1 , X2 = x2 ] · P Xr+1 = xr+1 , . . . , Xk−1 = xk−1 , Xi = n
i=1
54
k−1
P
P [X1 = x1 ]P [X2 = x2 ] . . . P [Xr = xr ]P [Xr+1 = xr+1 ] . . . P [Xk−1 = xk−1 ] × P [Xk = n − Xi ]
i=1
= k−1
P
P [Xr+1 = xr+1 ] . . . P [Xk−1 = Xk−1 ]P [X1 + X2 + · · · + Xr + Xk = n − Xi ]
i=r+1
k
P
n− xi
e−λ1 λx1 1 e−λ2 λx2 2 −λr
e λxr r −λk
e λk i=1
... k−1
x1 ! x2 ! xr ! P
n− xi !
i=1
= k−1
P
n− xi
−(λ1 +λ2 +···+λr +λk )
e (λ1 + λ2 + · · · + λr + λk ) i=r+1
k−1
P
n− Xi !
ga
i=r+1
k−1
k−1
P P
n− xi ! n− xi
r i=1
i=r+1
Y λ
= λxi i k
k−1
P n− k−1P
xi
x1 !x2 ! . . . xr ! n − xi ! i=1
k
P i=r+1
i=1 λk + λi
i=1
x1 xr xk
(x1 + x2 + · · · + xr + xk )! λ1 λr λk
=
n x1 !x2 ! . . . xr !xk !
" k−1
k
λk +
k−1
Pr
λi
i=1
...
λk +
P r
λi
i=1
×
λk +
k−1
Pr
λi
i=1
#
ra
X X X X
n− xi = xi − xi = x1 + x2 + · · · + xr + xk and n − xi = xk
i=r+1 i=1 i=r+1 i=1
Question: Find the partial correlation coefficient between X1 and X2 neglecting the effects of
X3 , . . . , Xk−1 .
k−1
X p1 p2
X1 , X2 X3 , X4 , . . . , Xk−1 ∼ M N n − xi , ,
k−1 k−1
P P
i=3 1− pi 1 − pi
i=3 i=3
k−1 k−1
X p1 X p2
X1 X 3 ∼ n − xi , and X2 X 3 ∼ n − xi ,
k−1 k−1
P P
i=3 1− pi i=3 1− pi
e e
i=3 i=3
k−1
!
X p 1 p2
So, cov X1 , X2 X 3 = − n − xi
k−1
2
e i=3
P
1− pi
i=3
55
k−1
!
X pj pj
V ar Xj X 3 = n− xi 1 − ; j = 1, 2
k−1 k−1
P P
i=3 1− pi 1− pi
e
i=3 i=3
k−1
!
pj pj X
∴ E V ar(Xj X 3 ) = 1 − n− E(Xi )
k−1
P k−1
P
1− pi 1− pi i=1
e
i=3 i=3
k−1
!
pj pj X
⇒ E V ar(Xj X 3 ) = 1 − n 1 − pi
k−1 k−1
ga
P P
1− pi 1− pi i=1
e
i=3 i=3
pj
⇒ E V ar(Xj X 3 ) = npj 1 − ; j = 1, 2
k−1
P
1− pi
e
i=3
k−1
! k−1
!
n e
⇒ E cov X1 , X2 X 3 = −
∴ E cov X1 , X2 X 3 = −
1−
p1 p2
k−1
P
np1 p2
pi
2 n −
X
i=3
E(Xi ) = −
i=3
1−
p1 p 2
k−1
P
pi
2 n 1 −
X
i=3
pi
i=3
ra
k−1
P
1− pi
e
i=3
2
np1 p2
− k−1
P
1−
pi
Ta
i=3
p 1 p2
∴ ρ212.34...p =
uv =
k−1
k−1
P P
u
u
1 − p1 − pi 1− pi
u p p i=3 i=2
nup p 1 − 1 2
1 −
u 1 2 k−1 k−1
P P
1− pi 1− pi
t
i=3 i=3
10 Ellipsoid Of Concentration
Let X be a random vector with mean vector µ and dispersion matrix Σ. We assume that Σ is
positive
e definite. e
By ellipsoid of concentration of X , we mean the following ellipsoid,
e
(y − a)0 B(y − a) ≤ 1 . . . (i)
e e e e
where a ∈ Rp and B is PD.
a and Be are so chosen that a random vector Y is distributed over th ellipsoid region (i) with
e e
56
the same mean vector and dispersion matrix Σ as X .
e
ga
∂y e 1
e e e e e e e
So, the jacobian of the transformation is given by, |J| = e =
∂z |C|
p
Now, |B| = |C| |C 0 | ⇒ |C| = |B|
e
Z Z
k
k dy = 1 ⇒ dz = 1
|C| e
(C(y −a)0 )(C(y −a))≤1 0
e z z ≤1
e e e e p ee p
n ⇒ p
k π
p
|B| Γ 2 + 1
∴fY (y ) =
p
p/2
=1 ⇒k=
p
|B|Γ 2 + 1
π p/2
|B|Γ 2 + 1
π p/2
if (y − a)0 B(y − a) ≤ 1
ra
e e e e
0 otherwise
e e
∴fZ (z ) = π p/2 ee
0 otherwise
e e
s
p
zi2
P
− 1−
i=2
p
1Z 1
Γ +1
Z Z
2
∴E(Z1 ) = ... z1 × dz1 dz2 . . . dzp = 0 [Odd Function]
0 0 s
π p/2
p
zi2
P
− 1−
i=2
1
So, E(Zi ) = 0; V ar(Zi ) = ∀ i = 1, 2, . . . , p
p+2
cov(Zi , Zj ) = 0 [Odd function]
∴ E(Z ) = 0 ⇒ E(C(Y − a)) = 0 ⇒ E(Y ) = a
Ip Ip Ip
e e e e e e e
Also, Disp(Z ) = ⇒ Disp(C(Y − a)) = ⇒ C Disp(Y )C 0 =
e p+2 e e p+2 e p+2
−1 0 −1 0 −1 −1
C Ip (C ) (C C) B
⇒Disp(Y ) = = =
e p+2 p+2 p+2
57
Now, a and B are such that,
e
E(X ) = E(Y ) ⇒ µ = a
B −1 Σ−1
e e e e
and Disp(X ) = Disp(Y ) ⇒ Σ = ⇒B=
e e p+2 (p + 2)
Σ
So, (y −µ)0 (y − µ) ≤ 1 ⇒ (y − µ)0 Σ−1 (y − µ) ≤ (p + 2)
e e p + 2 e e e e e e
Thus, the ellipsoid of concentration of X is given by,
e
n o
y : (y − µ)0 Σ−1 (y − µ) ≤ (p + 2)
e e e e e
n ga
ra
Ta
58