0% found this document useful (0 votes)
12 views59 pages

Multi Variate

The document provides a comprehensive overview of multivariate analysis, focusing on concepts such as multivariate random vectors, distribution functions, independence, and various types of distributions including discrete and continuous random vectors. It also covers regression concepts, multiple correlation coefficients, and the multivariate normal distribution. The document is structured with sections detailing definitions, properties, and mathematical proofs related to these topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views59 pages

Multi Variate

The document provides a comprehensive overview of multivariate analysis, focusing on concepts such as multivariate random vectors, distribution functions, independence, and various types of distributions including discrete and continuous random vectors. It also covers regression concepts, multiple correlation coefficients, and the multivariate normal distribution. The document is structured with sections detailing definitions, properties, and mathematical proofs related to these topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MULTIVARIATE ANALYSIS

ga
Taranga Mukherjee

n
ra
Ta
Contents
1 Introduction 2

2 Multivariate Random Vector 2


2.1 Distribution function: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Properties: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Discrete random vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Continuous random vector: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

ga
2.5 Marginal And Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Mean vector: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Dispersion matrix and variance-covariance matrix: . . . . . . . . . . . . . . . . . 7
2.8 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8.1 Relation between correlation matrix and dispersion matrix: . . . . . . . . 10

3 Concept Of Regression 14
n
4 Multiple linear regression:

5 Multiple correlation coefficient:


17

19
ra
6 Partial correlation coefficient: 23

7 Partial regression coefficient 28

8 Multivariate normal distribution: 32


Ta

8.1 Moment generating function: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


8.2 Marginal Distribution of Multivariate Normal . . . . . . . . . . . . . . . . . . . 35
8.3 Conditional distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

9 Multinomial distribution 47
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.2 Multinomial theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.3 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.4 Modified multinomial distribution: . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.1 Modified multinomial theorem: . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.2 Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . 49
9.4.3 Distribution of subset: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9.5 Conditional distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

1
10 Ellipsoid Of Concentration 55

n ga
ra
Ta

2
1 Introduction
Suppose in a board exam, n students are giving examination to p topics. Then for each student,
marks of p subjects are recorded. Then the overall marks of a student is represented as a p-tuple
vector. So here, observations regarding each student is a dimension data frame and the overall
information of n students are n, p-variate vectors and the overall data can be represented in a
n × p data matrix.
Let us consider the probability space (Ω, A, P ). Then
 
X1
 X2 
 
X= . 

 .. 

ga
e 
Xp
is said to be a p-variate random vector if X is a mapping from Ω → Rp .
e

2 Multivariate Random Vector

Ω→R
n
Also X is said to be a p-variate random vector defined on (Ω, A, P ) if it is a mapping from
ep such that

{ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , . . . , Xp (ω) ≤ xp } ∈ A


ra
 
X1
 X2 
 
Result: X =  ..  is a p-variate random vector iff X1 , X2 , . . . , Xp are univariate random

e  . 
Xp
variables.
Proof: (If Part) Here, X1 , X2 , . . . , Xp are univariate random variables defined on (Ω, A, P )
Ta

i.e., {ω : Xi (ω) ≤ xi } ∈ A ∀ i = 1(1)p


p
\
⇒ {ω : Xi (ω) ≤ xi } ∈ A ∀ i = 1(1)p
i=1

⇒ {ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , . . . , Xp (ω) ≤ xp } ∈ A


i.e., X is p-variate random vector.
e
(Only if part) Let X be a p-variate random vector defined on (Ω, A, P ).
e
Then
{ω : X1 (ω) ≤ x1 , X2 (ω) ≤ x2 , . . . , Xp (ω) ≤ xp } ∈ A
p
\
⇒ {ω : Xi (ω) ≤ xi } ∈ A
i=1

i.e., {ω : Xi (ω) ≤ xi } ∈ A ∀ i = 1(1)p


i.e., X1 , X2 , . . . , Xp are univariate random variable.

3
2.1 Distribution function:
Let  
X1
 X2 
 
X= .  be a p-variate random vector.

 .. 
e 
Xp
Then the cumulative distribution function is given by FX (x) = P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xp ≤
xp ], where X = (X1 , X2 , . . . , Xp )0 .
e e

2.1.1 Properties:

ga
(i) FX (x) is monotonically increasing.
e e

(ii) FX (−∞, x2 , x3 , . . . , xp ) = FX (x1 , −∞, x3 , . . . , xp ) = · · · = FX (x1 , x2 , . . . , xp−1 , −∞) = 0.


e e e

(iii) FX (∞, ∞, . . . , ∞) = 1.
e

(iv) FX (x) is right continuous.


ne e
 
X1
 X2 
 
Exercise: X =  ..  . Fxe (x) = P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xp ≤ xp ]

e  .
ra
  e
Xp
and FXi (xi ) = P [Xi ≤ xi ] then show that,
p
( p )1/p
X Y
FXi (xi ) − (p − 1) ≤ FX (x) ≤ FXi (xi ) .
i=1 i=1
e e
Ta

Proof: FX (x) = P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xp ≤ xp ]


e e

Note that, P [Xi ≤ xi ] ≥ FX (x) ∀ i = 1(1)p [∵ A ⊆ B ⇒ P (B) ≥ P (A)]


e e

Define Ai = {Xi ≤ xi }

FXi (xi ) = P (Ai ) ≥ P (A1 ∩ A2 ∩ · · · ∩ Ap ) = FX (x)


e e
p
! 
\ " p
!#p p
P Ai ≤ P (A1 )
 \ Y
⇒ P ≤


 Ai P (Ai )
i=1 

i=1 i=1


p
! 
\ 
 )1/p
p p
" !# (
P Ai ≤ P (A2 )

\ Y
i=1 ⇒ P Ai ≤ P (Ai ) (1)

i=1 i=1
..


.



 ( p )1/p
p
! 
 Y
\  ⇒ FX (x) ≤
 FXi (xi )
P Ai ≤ P (Ap ) 
i=1
 e e
i=1

4
Again, !#0
p p p
! " !
\ [ \
FX (x) = P Ai = P A0i =1−P A0i
i=1 i=1 i=1
e e

Now
p p
!
\ X
P A0i ≤ P (A0i )
i=1 i=1
p p
!
\ X
⇒P A0i ≤ (1 − P (Ai ))
i=1 i=1
p p
!
\ X
⇒ 1−P A0i ≥1−p+ P (Ai )

ga
i=1 i=1
p
X
⇒ FX (x) ≥ FXi (xi ) − (p − 1) (2)
i=1
e e

combining result (1) and (2) we get the result.

2.2 Independence
n
Let (X1 , X2 , . . . , Xr , Y1 , Y2 , . . . , Ys ) be an (r + s) dimensional random vector.
Then X = (X1 , X2 , . . . , Xr ) and Y = (Y1 , Y2 , . . . , Ys ) are said to be independent if,
ra
e e
P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xr ≤ xr , Y1 ≤ y1 , Y2 ≤ y2 , . . . , Ys ≤ ys ]
= P [X1 ≤ x1 , X2 ≤ x2 , . . . , Xr ≤ xr ].P [Y1 ≤ y1 , Y2 ≤ y2 , . . . , ys ≤ ys ]
= FX (x)FY (y )
e e e e

For a random vector X = (X1 , X2 , . . . , Xp )0 , X1 , X2 , . . . , Xp are said to be independent iff


e
Ta

p
Y
FX (x) = FXi (xi )
i=1
e e

2.3 Discrete random vector


A p-variate random vector is said to be discrete if there exists a countable set S ⊆ IRk such
that P [X ∈ S] = 1, i.e., random vector can take countable values.
e
Now f X (x1 , x2 , . . . , xp ) = P [X1 = x1 , X2 = x2 , . . . , Xp = xp ] is said to be joint pmf of X if,
e X e
(i) f X (x) ≥ 0 (ii) fX (x) = 1.
x∈S
e e e e
e

2.4 Continuous random vector:


A random vector X is said to be continuous if its joint cdf is absolutely continuous everywhere.
e
If f X (x) is said to be the joint pdf of X if
e e e
5
Z Z Z
(i) fX (x) ≥ 0 (ii) ··· fX (x) dx = 1
e e x1 x2 xp e e e
∂ p FX (x1 , x2 , . . . , xp )
fX (x1 , x2 , . . . , xp ) = e
e ∂x1 ∂x2 . . . ∂xp

2.5 Marginal And Conditional Distribution


!
q×1
X 1
Suppose X p×1 = f¯
e X2 p−q×1
The marginal pdf of X 1 is given by fX1 (x1 )
f
e e

ga
Z
fX 1 (x1 ) = fX (x) dx2 , x1 ∈ Rq
e e x2 e e e e
e

fX (x)
The conditional distribution of X 2 | X 1 is given by fX | X 1 (x2 |x1 ) = e e provided f1 (x1 ) >
e e e e e e fX 1 (x1 ) e
0. e e

Note that, random vectors X 1 and X 2 are independent iff,


n e e
fX (x) = fX 1 (x1 )fX 2 (x2 ) ∀ x1 ∈ Rq ∀ x2 ∈ Rp−q
e e e e e e e e
Q. Let X be a p-dimensional random vector with pdf
ra
e
( P 2
k, x i ≤ c2 ,
f (x) =
e 0, otherwise

i) Find k, ii) The marginal pdf of X 1 and iii)Conditional pdf of X 2 | X 1 .


e e e
Ta

Solution:-
Z Z
f (x) dx = 1 ⇒ k dx = 1
x2i ≤c
e2 x2i e
≤c2
P P
e

We consider the following transformation,


(X1 , X2 , . . . , Xp ) → (r, θ1 , θ2 , . . . , θp−1 ) such that

x1 = r cos θ1
x2 = r sin θ1 cos θ2
..
.
xp = r sin θ1 sin θ2 . . . sin θp−1

p
X
∴ x2i = r2 , r > 0, r2 ≤ c2 ⇒ r ≤ c [∵ r > 0]
i=1

6
Now, the Jacobian of the transformtion is,

|J| = rp−1 sinp−2 θ1 . sinp−3 θ2 . . . sin θp−2

where, 0 < θp−1 < 2π and 0 < θi < π ∀ i = 1, 2, . . . , p − 2


Z Z cZ π Z π Z π Z 2π
⇒k dx = k ··· rp−1 sinp−2 θ1 sinp−3 θ2 . . . sin θp−2 ∂r ∂θ1 ∂θ2 . . . ∂θp−1 = 1
2
r2 ≤ce r2 θ1 =0 θ2 =0 θp−2 =0 θp−1 =0

Z c Z π Z π Z 2π
p−1 p−2 p−3
⇒k r ∂r sin θ1 ∂θ1 sin θ2 ∂θ2 · · · ∂θp−1 = 1
0 0 0 0

ga
cp
   
p−2+1 1 p−3+1 1
⇒k ·β , β , . . . 2π = 1
p 2 2 2 2
!
Γ p−1 p−2
 1  1
 1
cp 2
Γ Γ Γ Γ (12) Γ
⇒ 2kπ p
 2 2
p 1
2 . . . 1
2 =1
p Γ 2 Γ 2−2 Γ 1+ 2
cp π (p−2)/2 kπ p/2 cp
⇒ 2kπ. =  =1
p Γ p2 Γ p2 + 1
n 

p

Γ 2
+1
∴ k=
π cp
p/2

The marginal PDF of X 1 :-


ra
e
p q p
X X X X
2 2 2 2 2
Now, xi ≤ c ⇒ xi ≤ c − xi ⇒ x2i ≤ R2
i=q+1 i=1 i=q+1
Rp−q
Z Z Z Z
fX 1 (x1 ) = ··· f (x)∂x2 = k ··· ∂x2 = k.  
p−q
Γ +1
e e e e e
Pp Pp
x2i ≤R2 2
i=q+1 xi ≤R
2
i=q+1 2
Ta

q
! p−q
2
p−q
X
p 2
c − xi .π
 2
Γ +1
⇒ fX 1 (x1 ) = 2p . i=1 
π 2 cp p−q
Γ +1
e e
2
q
! p−q
2
X
p 2 2
c − xi

Γ +1
⇒ fX 1 (x1 ) =  2  i=1
p−q π q/2 cp
Γ +1
e e
2

7
Conditional PDF of X 2 |X 1 is
e e
q
! p−q
2
X
c2 − x2i
p , p 
fX (x) Γ +1 Γ +1
fX 2 |X 1 (x2 |x1 ) = e e = 2p  2  i=1
q
fX 1 (x1 ) π 2 .cp p−q cp π 2
Γ +1
e e e e
2
e e
 
q−p p−q
π 2 Γ +1 p
2 X
⇒ fX 2 |X 1 (x2 |x1 ) = p−q ; x2i ≤ c2
(c2 − qi=1 x2i ) 2
P
i=1
e e e e

2.6 Mean vector:

ga
 
X1
 X2 
 
Let X =  ..  be a p-variate random vector.

e   . 
Xp
Then its mean vector is given by,
n  
E(X1 )  
µ1
E(X2 )  .
 
µ=   .
 ..  = µ2 .
e  . 
µp
ra
E(Xp )

2.7 Dispersion matrix and variance-covariance matrix:


Let X be a p-variate random vector with mean vector µ, then variance-covariance matrix of X
is defined as, e e
Ta

 
σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p 
 
Σ=  .. .. .. 

 . . . 
σp1 σp2 . . . σpp
where (
V ar(Xi ), if i = j
σij =
cov(Xi , Xj ), if i 6= j
since cov(Xi , Xj ) = cov(Xj , Xi ), Σ is symmetric.

8
Essentially Σ is given by,
h i
0
Σ = E (X − µ)(X − µ)
e e e e 
X1 − µ1
X2 − µ2 
  

= E  
..
 (X 1 − µ 1 , X2 − µ 2 , . . . , X p − µ p )
.
 
  
Xp − µp
 
(X1 − µ1 )2 (X1 − µ1 )(X2 − µ2 ) . . . (X1 − µ1 )(Xp − µp )
(X2 − µ2 )(X1 − µ1 ) (X2 − µ2 )2 . . . (X2 − µ2 )(Xp − µp )
 
=E 
..

.
 

ga
 
2
(Xp − µp )(X1 − µ1 ) ... ... (Xp − µp )
 
σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p 
 
=
 .. ..

. .

 
n σp1 σp2 . . . σpp

Result: Let x be a random vector with mean vector µ and dispersion matrix Σ. Then for
Y = a0 X , E(Ye) = a0 µ, V ar(Y ) = a0 Σa.
ee e ep e e
0
P
→ here y = a x = ai xi is a univariate random variable.
ra
e e i=1
 
µ1
p p
 µ2 
 
X X
E(Y ) = ai E(Xi ) = ai µi = a1 µ1 + a2 µ2 + · · · + ap µp = (a1 , a2 , . . . , ap )   a0 µ
 ..  = e
i=1 i=1 . e
µp
Ta

Now
h i h i
V ar(Y ) = E (Y − a0 µ)(Y − a0 µ)0 = E (a0 X − a0 µ)(a0 X − a0 µ)0
h ee ee i he e e e e e ie e
= E a (X − µ)(X − µ) a = a E (X − µ)(X − µ)0 a = a0 Σa
0 0 0
e e e e e e e e e e e e e e
Result: Σ is the variance covariance matrix of a non-degenerate random vector X iff Σ is
positive definite. e

If part: Σ is positive definite matrix. Then there exist a non-singular matrix B such that
Σ = B 0 B.
Define X = µ + B 0 Y , where Y be a random vector with mean vector 0 and dispersion matrix
I, i.e., e e e e e
E(X ) = µ + B 0 E(Y ) = µ + 0 = µ
e e e e e e

9
The dispersion matrix of X will be
he i h i
0 0 0 0
Disp(X ) = E (X − µ)(X − µ) = E (µ + B Y − µ)(µ + B Y − µ)
e  e e e e e e e e e e
= E B 0 Y Y 0 B = B 0 E((Y − 0)(Y − 0)0 )B
= B 0 Disp(Y )B = B 0 IB = B 0 B = Σ
ee e e e e
e
Only if part: Here Σ be a variance covariance matrix of a random vector X .
Let Y = a0 X .
e

Now V ar(Y ) > 0 ⇒ a0 Σa > 0 ∀ a 6= 0. [∵ Σ is a non-null matrix]


e ee

And, a0 Σa =e0 if a = 0 e e e e
e e e e

ga
∴ Σ is a positive definite matrix.

Result: E(X 0 AX ) = Tr(AΣ) + µ0 Aµ


e e e e
Proof:

E(X 0 AX ) = E[Tr(X 0 AX )] [∵ Tr(AB) = Tr(BA)]


n e e

e
= E[Tr(AX X )] = T r[E[AX X 0 ]] = T r[A E[X X 0 ]]
e e0

Now, Σ = Disp(X ) = E[(X − µ)(X − µ)0 ] = E[X X 0 ] + µµ0 − E[X ]µ0 − µE[X 0 ]
ee
e e e e
ee
ee
ee
e e e e
= E(X 0 X 0 ) + µµ0 − µµ0 − µµ0 = E[X X 0 ] − µµ0
ee
ra
e e ee
⇒ E(X X 0 ) = Σ + µµ0
ee ee ee ee
ee ee
Now

E(X 0 AX ) = Tr[A E(X X 0 )] = Tr[A(Σ + µµ0 )]


e e ee
= Tr(AΣ) + Tr(Aµµ0 ) = Tr(AΣ) + Tr(µ0 Aµ)
ee
Ta

= Tr(AΣ) + µ0 Aµ (Proved)
ee e e
e e

2.8 Correlation matrix


Correlation matrix of a random vector X is defined as
e
 
1 ρ12 ρ13 ... ρ1p
ρ21 1 ρ23 . . . ρ2p 
 
R=  .. .. .. . . . .. 

 . . . . 
ρp1 ρp2 ρp3 ... 1

cov(Xi , Xj )
ρij = correlation coefficient between Xi and Xj = p p
V ar(Xi ) V ar(Xj )
ρij = 1 iff i = j ∀ i, j = 1(1)p.

10
2.8.1 Relation between correlation matrix and dispersion matrix:
σij
ρij = √ √
σii σjj
We know
σ11
 σ12 σ13 σ1p 
√ √ √ √ √ √ ... √ √
 σ11 σ11 σ11 σ22 σ11 σ33 σ11 σpp 
 √ σ21 σ22 σ23 σ2p
 
 σ √σ √ √
σ22 σ22
√ √
σ22 σ33
... √ √ 
σ22 σpp 
R= 22 11 
 .. .. .. .. .. 
 . . . . . 
σp1 σp2 σp3 σpp
 
√ √ √ √ √ √ ... √ √
 
σpp σ11 σpp σ22 σpp σ33 σpp σpp
 1  1

ga
 
√ 0 ... 0  √ 0 ... 0
 σ11  σ11 σ12 . . . σ1p  σ11


 1 ..   1 .. 
√ √
 0 . 0   σ21 σ22 . . . σ2p   0 . 0 
 
σ22 σ22
 
⇒R=
 .

 .. .. . . ..   .
 
 .. .. ... ..  . . . . . .. ... .. 
 . . 
 
 .
 . . 
 1  σp1 σp2 . . . σpp  1 
0 0 ... √ 0 0 ... √
n σpp σpp
 1 
√ 0 . . . 0
 σ11 
 1 .
.


 0 . 0 
0 σ
 
⇒ R = D ΣD ,where D =   . 22 
 .. .. .. .. 
ra
 . . .  
 1 
0 0 ... √
σpp
Result: If V ar(Xi ) exist for all i, then for any set of real numbers, l1 , l2 , . . . , lp , Show that
p p p
! !
X X X XX
E li Xi = li µi and V ar li Xi = li lj σij
Ta

i=1 i=1 i=1 i j

p
! Z
X
→ E li Xi = (l1 x1 + l2 x2 + · · · + lp xp )f (x)∂x
i=1 x e e
Ze Z
= l1 x1 f (x)∂x + · · · + lp xp f (x)∂x
x e e x e e
eZ Z e 
= l1 x1 f (x1 , x2 , . . . , xp ) dx2 , . . . , dxp dx1 + · · · +
x1
Z Z 
lp xp f (x1 , x2 , . . . , xp ) dx1 , . . . , ∂xp−1 ∂xp
xp
Z Z
= l1 x1 f (x1 )∂x1 + · · · + lp f (xp )∂xp
x1 xp
p
X
= l1 µ1 + l2 µ2 + · · · + lp µp = li µi (Proved)
i=1

11
σij = E[(Xi − µi )(Xj − µj )]
Now
p
! " p p
#2
X X X
V ar li Xi =E li Xi − li µi
i=1
 i=1 i=1
!2 !2 
p p p p
X X X X
=E li Xi + li µi −2 li lj Xi µj 
i=1 i=1 i=1 j=1
p p
" #
X XX X XX XX
=E li2 Xi2 + li lj Xi Xj + li2 µ2i + li lj µi µj − 2 li lj Xi µj
i=1 i6=j i=1 i6=j i j
p p
" #

ga
XX
=E li lj {(Xi − µi )(Xj − µj )}
i=1 j=1
p p
X X
= li lj σij (Proved)
i=1 j=1

Result: Show that, |R| ≤ 1, where R is the correlation matrix.


n Now, R is non-negative definite
→ AM ≥ GM


1X
p

p i=1
λi ≥
Yp
λi
!1/p
R = ((ρij ))

T r(R) =
X p
ρii =
Xp
1=p
ra
i=1 i=1 i=1
T r(R) 1/p
⇒ ≥ (|R|)
p
p
⇒ ≥ (|R|)1/p ⇒ |R| ≤ 1 (Proved)
p
Result: If Σ is pd, find E[(X − µ)0 Σ−1 (X − µ)] and find a non-trivial upper bound of
e e e e
h i
Ta

0 −1
P (X − µ) Σ (X − µ) ≥ λ
e e e e
→ here Σ is pd.
(X − µ)0 Σ−1 = A and (X − µ) = B
e e e e
h i h n oi
E (X − µ)0 Σ−1 (X − µ) = E Tr (X − µ)0 Σ−1 (X − µ)
e e e e e e e e
= E [Tr {AB}] = E [Tr(BA)]
h n oi
0 −1
= E Tr (X − µ)(X − µ) Σ
h n e e e e o i
= Tr E (X − µ)(X − µ)0 Σ−1
e e e e
= Tr[ΣΣ−1 ] = Tr(Ip ) = p

Now,
h i E[(X − µ)0 Σ−1 (X − µ)] p
0 −1
P (X − µ) Σ (X − µ) ≥ λ ≤ e e e e (by Markov’s inequality) =
e e e e λ λ

12
1
Q. Show that, − < ρ < 1, for a random vector X where,
p−1 e
 
1 ρ ... ρ
ρ 1 . . . ρ 
 
Rp×p =
 .. .. .. 

. . .
ρ ρ ... 1
1 ρ ... ρ
ρ 1 ... ρ
→ |R| = . .. .
.. . . . . ..

ga
ρ ρ ... 1
1 + (p − 1)ρ 1 + (p − 1)ρ . . . 1 + (p − 1)ρ
p
ρ 1 ... ρ X
= .. .. .. [R1 = Ri ]
. . ... . i=1
ρ ρ ... 1
1 1 ... 1
n ρ
= [1 + (p − 1)ρ] .
..
1 ...
..
.
ρ ρ ...
ρ
..

1
.
[Taking 1 + (p − 1)ρ common from R1 ]
ra
1 0 ... 0
ρ 1 − ρ ... 0
= [1 + (p − 1)ρ] . .. .. [Ci = Ci − C1 ∀ i = 2, 3, . . . , p]
.. . .
ρ 0 ... (1 − ρ)
= [1 + (p − 1)ρ](1 − ρ)p−1
Ta

Now, since |R| > 0


1
1 + (p − 1)ρ > 0 ⇒ ρ > −
p−1
Again 1 − ρ > 0 ⇒ ρ < 1.
1
∴, − < ρ < 1 (Proved)
p−1

Q. Show that ρ12 + ρ13 + ρ23 ≥ − 23


→ Define Z = X1 + X2 + X3

V ar(Z) ≥ 0
⇒ V ar(X1 + X2 + X3 ) ≥ 0
⇒ V ar(X1 ) + V ar(X2 ) + V ar(X3 ) + 2 cov(X1 , X2 ) + 2 cov(X1 , X3 ) + 2 cov(X2 , X3 ) ≥ 0

13
here we assume that V ar(X1 ) = 1 ∀ i = 1, 2, 3.
∴ 3 + 2ρ12 + 2ρ13 + 2ρ23 ≥ 0
⇒ ρ12 + ρ13 + ρ23 ≥ − 32 (Proved)

Result: Let X be a p-variate random variable with mean vector µ and dispersion matrix Σ.
Then for any enon-random matrix B p×p , E(BX ) = Bµ and Disp(BX e ) = BΣB 0
→ let us take Y = BX = (Y1 , Y2 , . . . , Yp )0
e e e
e e
E(Y ) = BE(X ) = Bµ
e e e
h i
Disp(BX ) = E (BX − B(µ))(BX − Bµ)0 = E[B(X − µ)(X − µ)0 B 0 ]
e e e e e e e

ga
e 0 0
e 0
= BE[(X − µ)(X − µ) ]B = BΣB
e e e e
Q. Each of random variable X,Y,Z has mean 0 and variance 1, while aX + bY + cZ = 0. Find
the dispersion matrix of X,Y and Z and show that a4 + b4 + c4 ≤ 2 (b2 c2 + c2 a2 + a2 b2 )

Solution: Now, aX + bY + cZ = 0 ⇒ aX = −(bY + cZ)


n V ar(aX) = V ar(bY + cZ)
⇒ a2 V ar(X) = b2 V ar(Y ) + c2 V ar(Z) + 2bc cov(Y, Z)
⇒ a2 = b2 + c2 + 2bc cov(Y, Z)
a2 − (b2 + c2 )
⇒ cov(Y, Z) =
2bc
ra
c − (a2 + b2 )
2
Similarly, cov(X, Y ) =
2ab
b2 − (a2 + c2 )
cov(X, Z) =
2ac
So, the dispersion matrix is given by,
 
c2 − (a2 + b2 ) b2 − (a2 + c2 )
1
Ta


 2ab 2ac 

 
 
 c2 − (a2 + b2 ) a2 − (c2 + b2 ) 
Σ = Disp(X, Y, Z) =  1 

 2ab 2bc 

 
 
 b2 − (a2 + c2 ) a2 − (c2 + b2 ) 
1
2ac 2bc

2
c2 − (a2 + b2 )

Now, ρ2XY ≤1 ⇒ ≤1
2ab

c4 + a4 + b4 + 2a2 b2 − 2c2 a2 − 2c2 b2


⇒ ≤1
4a2 b2

⇒a4 + b4 + c4 ≤ 2 a2 b2 + b2 c2 + a2 c2

(P roved)

14
Q. Suppose X1 , X2 , . . . , X2p denote scores on 2p questions in an aptitude test. Suppose they
have a common mean µ and a common variance σ 2 , while the correlation coefficient between
any pair of them is same, ρ > 0. Let, Y1 be the sum of scores in odd numbered questions and Y2
be the sum of scores in even numbered questions. Show that the correlation coefficient between
Y1 and Y2 tends to unity as p increases.

Solution:

Y1 = X1 + X3 + . . . + X2p−1
2p−1 2p−1
XX
V ar(Y1 ) = V ar(X1 ) + V ar(X3 ) + . . . + V ar(X2p−1 ) + cov(Xi , Xj )

ga
i6=j=1
2p−1 2p−1
XX
2
⇒ V ar(Y1 ) = pσ + ρσ 2 = pσ 2 + p(p − 1)ρσ 2
i6=j=1
2p 2p
X X
Similarly, V ar(Y2 ) = V ar(X2 ) + V ar(X4 ) + . . . + V ar(X2p ) + cov(Xi , Xj )
i6=j=1
n ⇒ V ar(Y2 ) = pσ 2 + p(p − 1)ρσ 2
Now, cov(Y1 , Y2 ) = cov(X1 + X3 + . . . + X2p−1 , X2 + X4 + . . . + X2p
= cov(X1 , X2 + . . . + X2p ) + cov(X3 , X2 + . . . + X2p ) + . . . + cov(X2p−1 , X2 + . . . + X2p )
2p 2p 2p
X X X
= cov(X1 , Xi ) + cov(X3 , Xi ) + . . . + cov(X2p , Xi )
ra
i=2 i=2 i=2

⇒ cov(Y1 , Y2 ) = p pρσ 2 = p ρσ 2
2


p2 ρσ 2 p2 ρ
∴ ρY1 ,Y2 = =
pσ 2 + p(p − 1)ρσ 2 p + p(p − 1)ρ
ρ ρ ρ
= 1 (p−1)ρ =   → = 1 as p → ∞ (P roved)
+ p 1
+ 1 − p1 ρ ρ
p p
Ta

3 Concept Of Regression
   
X1 µ1
 X2  µ 
   
Let X be a p-variate random vector, X =   with mean vector µ =  2 .
. .
e   ..  e  .. 
e 

Xp µp
 
X2
 . 
Here X1 is dependent an X 2 such that X 2 =  . 
e e  . .
Xp
Define a predictor of X1 on X2 , . . . , Xp is given by

g(X (2) ) = E X1 |X (2)
e e

15
Result: E(X1 | X (2) ) is the best predictor of X1 based on X2 , X3 , . . . , Xp in terms of MSE.
e
→ Let f X (2) be any other predictor of X1 based on X2 , X3 , . . . , Xp .
e
 2    2
E X1 − f X (2) = E X1 − g X (2) + g X (2) − f X (2)
e  e 2  e  e 2
= E X1 − g X (2) + E f X (2) − g X (2)
 e  e  e 
+ 2E X1 − g X (2) g X (2) − f X (2)
e e e
   
∴ 2E X1 − g X (2) g X (2) − f X (2)
h ne e  e   oi
= 2E E X1 − g X (2) g X (2) − f X (2) X (2)
h  e  ne e  e oi
= 2E g X (2) − f X (2) E X1 − g X (2) X (2)

ga
h e  e  n e e
 oi
= 2E g X (2) − f X (2) E X1 − E X1 X 2 X (2)
 e e  e e 
    
= 2E  g X (2) − f X (2) E X1 X 2 − E X1 X 2  =0
e e | e {z e }
0

Therefore,
n 
E X1 − f X (2)
e
2 
= E X1 − g X (2)
e
2


2

+ E g X (2) − f X (2) 

| e {z e }
≥0
ra
 2  2
⇒ E X1 − f X (2) ≥ E X1 − g X (2)
e e

i.e., g X (2) is the best predictor of X1 based on X2 , X3 , . . . , Xp (Proved).
e
Result: Correlation coefficient between X1 and its best predictor is positive.
→ The best predictor of X1 based on X2 , X3 , . . . , Xp is given by
Ta

  h  i
E g X 2 = E E X1 X (2) = E (X1 )
e e
Now,
   
cov X1 , g X (2) = E (X1 − E (X1 )) g X (2) − E g X (2)
e  e  e
= E (X1 − E (X1 )) g X (2) − E (X1 )
h n e   oi
= E E (X1 − E (X1 )) g X (2) − E(X1 ) X (2)
  e e 
= E g X (2) − E(X1 ) E (X1 − E(X1 )) X (2)
 e    e 
= E g X (2) − E(X1 ) E X1 X (2) − E(X1 )
 e   e 
= E g X (2) − E(X1 ) g X (2) − E (X1 )
h e  e 2 i
= E g X (2) − E g X (2)
 e  e
= V ar g X (2) ≥ 0
e
 
i.e., cov X1 , g X (2) > 0 hence corr X1 , g X (2) > 0. (Proved)
e e
16
Result: The correlation coefficient between X1 and its best predictor is maximum.
 
→ Let f X (2) be any other predictor of X1 based an X2 , X3 , . . . , Xp and g X (2) be the
best predictore of X based on X .
1 (2)
e
e
   
E g X 2 = E E X1 |X 2 = E(X1 )
e e
Let f (X 2 ) be any other predictor of X1 based on X2 , X3 , . . . , Xp .
e
Now,
   
cov X1 , f X (2) = E (X1 − E(X1 )) f X (2) − E f X (2)
e   e  e   
= E X1 f X (2) − E f X (2) − E(X1 ) E f X 2 − E f X 2
e e | e {z e }
=0

ga
h n   oi
= E E X1 f X (2) − E f X (2) X (2)
h  e  e  e i
= E f X (2) − E f X (2) E X1 X (2)
 e  e   e   
= E f X (2) − E f X (2) g(X 2 ) − E g(X 2 ) + E g(X 2 )
 e e e 
e e
= E f (X 2 ) − E f (X 2 ) g(X 2 ) − E g(X 2 )
e  e e  e
n + E g(X 2 ) . E f (X 2 ) − E f (X 2 )
e | e {z e }
=0
 
= cov f X (2) , g X (2) (1)
e e
  
Again cov X1 , g X (2) = V ar g X (2) = σg2 (say)
ra
e e
Now note that,
σg2

cov X1 , g X (2) σg
ρX1 ,g(X (2) ) = p q e = =
e V ar(X1 ) V ar g X (2)
 σ1 σg σ1
e
Again,
Ta

2
σg2
 
cov X 1 , f X (2) cov f X (2) , g X (2)
ρ2X ,f (X ) = hp  2 = ·
σf2 .σg2 σ12
1 (2)
q e i e e
e V ar(X1 ) V ar f X (2)
e
= ρ2f,g × ρ2X ,g(X )
1 (2)
e
 
Therefore, ρ2X1 ,f (X (2) ) ≤ ρ2X1 ,g(X (2) ) ∵ ρ2f,g ≤ 1 (Proved)
e e
 
 2 2
 
Result: E X1 − g X (2) = V ar(X1 ) 1 − ρX ,g X where g(X 2 ) = E X1 |X 2
1 ( (2) )
e e e e
 2  2
Now, E X1 − g(X 2 ) = E X1 − E(X1 ) + E(X1 ) − g(X 2 )
e e 2
= E [X1 − E(X1 )]2 + E E(X1 ) − g(X 2 ) + 2E (X1 − E(X1 )) E(X1 ) − g X 2
  
 e2  e
= V ar(X1 ) + E g(X 2 ) − E g(X 2 ) − 2E (X1 − E(X1 )) g(X 2 ) − E(X1 )
e  e  e
= V ar(X1 ) + V ar g(X 2 ) − 2cov X1 , g(X 2 )
e e

17
 
= V ar(X1 ) − cov(X1 , g(X 2 )) ∵ cov(X1 , g(X 2 )) = V ar(g(X 2 ))
 
cov(X1 , g(X 2 ))
e e e
= V ar(X1 ) 1 −
V ar(X1 )
e
 !2 
cov(X1 , g(X 2 )) V ar(g(X 2 ))
= V ar(X1 ) 1 − p p e . 
cov(X1 , g(X 2 ))
e
V ar(X1 ) V ar(g(X 2 ))
h i  e e
2

= V ar(X1 ) 1 − ρX1 ,g(X 2 ) (1) ∵ cov(X1 , g(X 2 )) = V ar(g(X 2 ))
2 h e i e e
⇒ E X1 − g(X 2 ) = V ar(X1 ). 1 − ρ2X1 ,g(X 2 )

(P roved)
e e

4 Multiple linear regression:

ga
 
X1
!
 X2 
 
X1
X p×1 =
 ..  =
 , µ = Mean Vector
e  .  X (2) e
e
Xp
X1 = response and X2 , X3 , . . . , Xp = covariate.
n µ=
 
µ1
 µ2 
 
.= µ
e  .. 
 µ 1
!
and Σ = 

σ11 σ12 . . . σ1p
σ21 σ22 . . . σ2p 

 ..
 .


=

σ11 σ (2)
e
σ (2) Σ2
0
!
ra
(2) 
e
µp σp1 σp2 . . . σpp
e

We consider the following regression equation of X1 on X2 , X3 , . . . , Xp ,

X1.23...p = α + β2 X2 + β3 X3 + · · · + βp Xp

where the constants α, β2 , β3 . . . , βp are determined by the method of least square.


Ta

The mean square error is given by,

S 2 = E [X1 − X1.23...p ]2 = E [X1 − α − β2 X2 − β3 X3 − · · · − βp Xp ]2


∂S 2
Now, = (−2).E [X1 − α − β2 X2 − · · · − βp Xp ] = 0
∂α
⇒ E(X1 ) − α − β2 X2 − . . . − βp Xp = 0
⇒ µ1 − α − β2 µ2 − · · · − βp µp = 0
 
β2
p  
X
0
 β3 
⇒ α̂ = µ1 + µi β ⇒ α̂ = µ1 − β µ2 where β =  
e . . . 
 
i=2 ee
βp
∂S 2
Again, = 0 ⇒ E [X1 − α − β2 X2 · · · − βp Xp ] Xj = 0 ∀ j = 2, 3, . . . , p
∂βj

18
⇒ E[X1 Xj ] − αE(Xj ) − β2 E(X2 Xj ) · · · − βp (Xp Xj ) = 0
p
X
⇒ E[X1 Xj ] = αE(Xj ) + E(Xi Xj )βi
i=2
p p
!
X X
⇒E[X1 Xj ] = µ1 − µi βi µj + βi E(Xi Xj ) [Replacing the value of α̂]
i=2 i=2
p
X
⇒E[X1 Xj ] = µ1 µj + βi (E(Xi Xj ) − µi µj )
i=2
p p
X X
⇒E[X1 Xj ] − µ1 µj = βi σij ⇒ σ1j = βi σij ∀ j = 2, 3, . . . , p
i=2 i=2

ga
For j=2,
σ12 = β2 σ22 + β3 σ32 + β4 σ42 + . . . + βp σp2
For j=3,
σ13 = β2 σ23 + β3 σ33 + β4 σ43 + . . . + βp σp3
..
.
For j=p,
n σ1p = β2 σ2p + β3 σ3p + β4 σ4p + . . . + βp σpp

∴
  
σ12 σ22 σ23 . . . σ2p
σ13  σ32 σ33 . . . σ3p  β3 
  
 ..  =  ..
 
 
β2

  .. 
ra
   
.
   .  . 
σ1p σp2 σp3 . . . σpp βp

⇒σ(2) = Σ2 β ⇒ β̂ = Σ−1 −1
2 σ (2) [Assuming Σ is PD, so Σ2 exists]
g e e e

The multiple regression equation of X1 on X2 , X3 , . . . , Xp is given by X1.23...p = α̂ + β̂ 0 X 2


Ta

e e
Note that, X1 = X1.23...p + e1.23...p

Result:
|Σ|
V ar(e1.23...p ) =
|Σ2 |

⇒ X1 = X1.23...p + e1.23...p
⇒ e1.23...p = X1 − X1.23...p
⇒ V ar(e1.23...p ) = V ar(X1 − X1.23...p )
⇒ V ar(e1.23...p ) = V ar(X1 ) + V ar(X1.23...p ) − 2 cov(X1.23...p , X1 )
Now, cov(X1 , X1.23...p ) = cov(X1.23...p + e1.23...p , X1.23...p )
= V ar(X1.23...p ) + cov(e1.23...p , X1.23...p ) = V ar(X1.23...p )
| {z }
0 (normal equation)

19
Note that, cov(X1.23...p , e1.23...p ) = E[X1.23...p e1.23...p ] − E[e1.23...p ] E[X1.23...p ]
| {z } | {z }
0 from normal equations 2,3,. . . , p 0 from normal equation 1

Therefore,
p
!
X
V ar(e1.23 . . . p) = V ar(X1 ) − V ar(X1.23 . . . p) = σ11 − V ar α̂ + βi Xi
i=2

= σ11 − V ar(β̂ 0 X (2) ) = σ11 − β̂ 0 Disp(X (2) )β̂


e e0 e
= σ11 − Σ−1 −1
e e
2 σ (2) Σ 2 Σ2 σ (2)
e −1 0
= σ11 − σ (2) Σ2 Σ2 Σ−1
0
e
2 σ (2)
= σ11 − σ (2) Σ−1
e0  e
σ [∵ Σ2 is symmetric ]

ga
2 (2)
e e
Now,
0
!
σ11 σ (2) 0
Σ= e ⇒ |Σ| = |Σ2 ||σ11 − σ (2) Σ−1
2 σ (2) |
σ (2) Σ2 e e
e
|Σ|
⇒ = σ11 − σ 0(2) Σ−1
2 σ (2) [∵ σ11 − σ 0(2) Σ−1
2 σ (2) is a scalar]
|Σ2 |
n e

⇒ V ar(e1.23...p ) =
e
|Σ|
|Σ2 |
(P roved)
e e
ra
5 Multiple correlation coefficient:
Correlation coefficient between X1 and X1.23...p is termed as multiple correlation coefficient of
X1 on X2 , X3 , . . . , Xp .

cov(X1 , X1.23...p ) V ar(X1.23...p )


ρ1.23...p = p p =p
V ar(X1 ) V ar(X1.23...p ) V ar(X1 )V ar(X1.23...p )
Ta

v   s
u
s u V ar α + β 0 X 2 β̂ 0 Σ2 β̂
V ar(X1.23...p ) t
= = e e = e e
V ar(X1 ) V ar(X1 ) σ11

Remark:
V ar(X1.23...p )
ρ21.23...p =
V ar(X1 )
Now, V ar(X1.23...p ) = V ar(X1 − e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2 cov(X1 , e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2 cov(X1.23...p + e1.23...p , e1.23...p )
= V ar(X1 ) + V ar(e1.23...p ) − 2V ar(e1.23...p ) − 2 cov(X1.23...p , e1.23...p )
| {z }
0 (by normal equation)

= V ar(X1 ) − V ar(e1.23...p )

20
V ar(X1 ) − V ar(e1.23...p ) V ar(e1.23...p ) |Σ|
ρ21.23...p = =1− =1−
V ar(X1 ) V ar(X1 ) σ11 |Σ2 |
|Σ|
⇒ ρ21.23...p = 1 −
σ11 |Σ2 |

Remark: If correlation matrix R is given, then the multiple correlation coefficient can be
obtained in terms of R.
 
1 ρ12 . . . ρ1p
!
ρ21 1 ... ρ2p  ρ0(2)
 
1
Let R = 
 .. .. ..  =

ρ2 R2
e
 . . . 

ga
ρp1 ρp2 . . . 1
e

 √ 
1/ σ11 0 ... 0
0
! √
 0 1/ σ22 . . . 0
 
1 σ (2) 
Now, σ = e and R = DΣD0 where D = 
 . . ..

σ (2) Σ2  . .


e √
0 0 ... 1/ σpp
n So, |R| = |DΣD0 | = |D|.|D0 | |Σ| = |D|2 |Σ| =
p p
Yp

i=1
σ
1
ii
|Σ|
ra
Y Y
⇒ |Σ| = σii |R| and similarly |Σ2 | = σii |R2 |
i=1 i=2

Therefore,
2 |R|
P1.23 ...p = 1 −
|R2 |
Ta

Result: Multiple correlation coefficient is maximum correlation coefficient between response


and any other linear predictor.
Let f X (2) = a + b0 X (2) be any other linear predictor of X1 based on X2 , X3 . . . Xp .

e ee
 2  0
2  0
2
cov X 1 , f X (2) cov X 1 , a + b X (2) cov X 1 , b X (2)
ρ2X ,f (X ) =  = = ee 
V ar (X1 ) V ar f X (2) V ar (X1 ) V ar a + b0 X (2) σ11 V ar b0 X (2)
1 (2)
e ee
e
e2 2 2
[cov (X1 , pi=2 bi Xi )] [ pi=2 cov(X1 , Xi )bi ] [ pi=2 σ1i bi ]
P P eP e ee
=  = =
σ11 V ar b0 X (2) σ11 × b0 Σ2 b σ11 b0 Σ2 b
ee e e h i2 e e
2 0
[b2 σ12 + b3 σ13 + · · · + bp σ1p ]2 b0 σ (2) b Σ2 β
 
= = = e e [∵ β = Σ−1 2 σ (2) ⇒ σ (2) = Σ2 β ]
σ11 b0 Σ2 b σ11 b0 Σ2 b σ11 b0 Σ2 b
e e
e e e e
e e e e e e
Now, Σ2 is a PD matrix, so there exists a nonsingular matrix P such that Σ2 = P 0 P
h i2 h i2
b0 P 0 P β (P b)0 (P β )
∴ ρ2X1 ,f (X (2) ) = e 0 0e = e
σ11 (P b)0 (P b)
e
e σ11 b P P b
e e e e
21
Now by Cauchy-Schowartiz inequality we know (u0 u)(v 0 v ) ≥ (u0 v )2
ee ee ee
Taking u = (P b) and v = (P β )
e e e e
   2    2
b0 P 0 P b β 0 P 0 P β b0 P 0 P β b0 Σ2b β 0 Σ2 β b0 Σ 2 β

⇒ e e e  e ≥ e e  ⇒ e e 0 e e ≥ e 0 e 
σ11 b0 Σ2 b σ11 b0 Σ2 b σ11 b Σ2 b σ11 b Σ2 b
0
β Σ2 β
e e e e e e e e
2 2 2
⇒ e e ≥ ρX ,f (X ) ⇒ ρ1.23...p ≥ ρX ,f X (P roved)
σ11 1 (2) 1 ( (2) )
e e

Result: ρ21.23...p = 0 iff ρij = 0 ∀ j = 2, 3, . . . , p

Solution:

ga
If part:

1 0 0 ... 0
0 1 ρ23 . . . ρ2p
1 ρ0(2)
|R| = 0 ρ32 1 ... ρ3p =
ρ(2) R2
e
.. .. ..
n . . . e
0 ρp3 ρp4 . . . 1
|R|
Now, |R| = |R2 | so ρ21.23...p = 1 − =1−1=0
|R2 |
ra
|R|
Only if part: It is given that ρ21.23...p = 0 ⇒ 1 − |R2 |
= 0 ⇒ |R| = |R2 |

∴ |R2 | |1 − ρ0(2) R2−1 ρ(2) | = |R2 | ⇒ 1 − ρ0(2) R2−1 ρ(2) = 1 [∵ The quantity is a scalar ]
e e e
⇒ ρ0(2) R2−1 ρ(2) = 0
g
e e
Now, R2 is PD so R2−1 is also PD i.e., ρ2 0 R2−1 ρ(2) = 0 when ρ(2) = 0
So, ρ1j = 0 ∀ j = 2, 3, . . . , p (P roved)e
Ta

e e

Result: ρ21.23...p ≥ ρ21.23...(p−1) or,


Multiple correlation coefficient is a non decreasing function of its order.

V ar(e1.23...p )
Solution: We know ρ21.23...p = 1 −
V ar(X1 )

⇒ e1.23...p = X1 − X1.23...p = X1 − α̂ − β̂2 X2 − · · · − β̂p Xp


h i2
V ar(e1.23...p ) = E(e21.23...p ) = E X1 − α̂ − β̂2 X2 − β̂3 X3 − · · · − β̂p Xp

where α̂, β̂2 , . . . , β̂p are least square estimators for the multiple linear regression equation of X1
on X2 , X3 , . . . , Xp .
Next, we consider the multiple linear regression of X1 on X2 , X3 , . . . , Xp−1

X1.23...p−1 = γ̂ + δ̂2 X2 + δ̂3 X3 + . . . + δ̂p−1 Xp−1

22
h i h i2
Now, V ar(e1.23...p−1 ) = E e21.23...p−1 E X1 − γ̂ − δ̂2 X2 − · · · − δ̂p−1 Xp−1

where γ̂, δ̂2 , . . . , δ̂p−1 are least square estimators in the MLR of X1 on X2 , X3 , . . . , Xp−1
Now, we set α0 = γ̂, β20 = δ̂2 . . . βp−1
0
= δ̂p−1 , βp0 = 0
2 h i2
0 0 0

E X1 − α − β2 X2 − . . . − βp Xp ≥ E X1 − α̂ − β̂2 X2 − . . . − β̂p Xp
h i2 h i2
⇒ E X1 − γ̂ − δ̂2 X2 − . . . − δ̂p−1 Xp−1 ≥ E X1 − α̂ − β̂2 X2 − . . . − β̂p Xp
V ar(e1.23...p−1 ) V ar(e1.23...p )
⇒ − V ar(e1.23...p−1 ) ≤ −V ar(e1.23...p ) ⇒1− ≤1−
V ar(X1 ) V ar(X1 )
⇒ ρ21.23...p ≥ ρ21.23...p−1 (P roved)

ga
Problem: For a random vector x the correlation matrix is given by
e  
1 ρ ... ρ
ρ 1 . . . ρ
 
R=  .. .. .. 

. . .
n ρ ρ ... 1
Obtain the multiple correlation coefficient of X1 on X2 , X3 . . . Xp

Solution:
ra
1 ρ ... ρ 1 + (p − 1)ρ 1 + (p − 1)ρ . . . 1 + (p − 1)ρ
p
" #
ρ 1 ... ρ ρ 1 ... ρ X
|R|p×p = . .. = .. .. .. R10 → Ri
.. . . . . i=1
ρ ρ ... 1 ρ ρ ... 1
 
1 1 ... 1 1 0 0 ... 0
 " #
Ta

ρ 1 ... ρ ρ 1 − ρ 0 . . . 0  0 0

C i → C i − C 1
= (1 + (p − 1)ρ) . .. .. = (1 + (p − 1)ρ) 
 ..

.. . . .
 ∀ i = 2, 3, . . . , p

ρ ρ ... 1 ρ 0 0 ... 1−ρ
p−1
∴ |R| = (1 + (p − 1)ρ) (1 − ρ)
1 ρ ... ρ
ρ 1 ... ρ
|R2 | = . .. = (1 + (p − 2)ρ) (1 − ρ)p−2
.. .
ρ ρ ... 1
p−1×p−1

s  1/2  1/2
|R| (1 + (p − 1)ρ)(1 − ρ) 1 + (p − 2)ρ − (1 − ρ) {1 + pρ − ρ}
∴ ρ1.23...p = 1− = 1− =
|R2 | 1 + (p − 2)ρ 1 + (p − 2)ρ
1/2 s s
1 + (p − 2)ρ − 1 + ρ − pρ + ρ + pρ2 − ρ2

ρ2 (p − 1) (p − 1)
= = =ρ
1 + (p − 2)ρ 1 + (p − 2)ρ 1 + (p − 2)ρ

23
6 Partial correlation coefficient:
Let X be a p-variate random vector with mean vector µ and variance covariance matrix Σ.
e e
   
X1 µ1
 X2  µ 
   
X=  and µ =  2 
.  .
e   ..  e  .. 
Xp µp

We partition X , µ and Σ as follows


e e

ga
   
    X3 µ3
X1 µ1
X  µ 
   
 where X (3) =  4  and µ(3) =  4 
 
X =  X2  , µ =  µ
 
2 .
 .   .. 
 .  .
 
(2)
e e
X (3) µ
e e
e e Xp µp
 
σ11 σ12 σ13 . . . σ1p
   
σ21 σ22 σ23 . . . σ2p  σ11 σ12 σ 0(13)
n 

Σ=




σ31 σ32 σ33 . . .
..
.
..
.
..
.
σp1 σp2 σp3 . . .

σ3p  =  σ21 σ22
 
.. 
. 
σpp
 σ (13) σ (23)
e e
σ 0(23) 
e 
e
Σ3
ra
   
σ31 σ32
 .   . 
where σ (13) = .  and σ (23) =  . 
e  .  e  .  [∵ σij = σji ]
σp1 σp2
 p−2×p−2
σ23 . . . σ3p
Ta

 . 
Σ3 =  ..
 

σp3 . . . σpp

Next we consider multiple linear regression equation of X1 on X3 , X4 , . . . , Xp ,


i.e., X1.34...p = α̂ + β̂ 0 X (3) , where
e e
)
α̂ = µ1 − β 0 µ(3)
normal equations
β̂ = Σ−1
3 σ (13)
ee
e e
i.e., X1 = X1.34...p + e1.34...p
Also we consider multiple linear regression of X2 on X3 , X4 , . . . , Xp i.e., X2.34...p = γ̂ + δ̂ 0 X (3)
ee
)
where γ̂ = µ2 − δ 0 µ(3)
normal equation
δ̂ = Σ−1
ee
and 3 δ (23)
e e
i.e., X2 = X2.34...p + e2.34...p

24
Note that,

V ar(e1.34...p ) = V ar(X1 ) − V ar(X1.34...p ) = σ11 − V ar(β̂ X (3) )


ee
= σ11 − β̂ 0 Σ3 β̂ = σ11 − σ 0(13) Σ−1
3 σ̂13
e e e
Again,
σ11 σ 0(13)
e = |Σ3 | σ11 − σ 0(13) Σ−1
3 σ (13)
σ (13) Σ3 e e
e
Now, cofactor of σ22 in Σ is denoted by Σ22 , i.e.,
Σ22
Σ22 = |Σ3 | V ar(e1.34...p ) ⇒ V ar(e1.34...p ) =
|Σ3 |

ga
Similarly,
Σ11
V ar(e2.34...p ) = where Σ11 is the cofactor of σ11 in Σ.
|Σ3 |

Now, cov(e1.34...p , e2.34...p ) = cov [(X1 − X1.34...p ), e2.34...p ] = cov[X1 , e2.34...p ] − cov (X1.34...p , e2.34...p )
| {z }
0 (by normal equation)

= cov(X1 , (X2 − X2.34...p )) = cov(X1 , X2 ) − cov(X1 , X2.34...p )


n = σ12 − cov(X1 , γ̂ + δ̂ 0 X (3) ) = σ12 − cov X1 ,

= σ12 −
Xp
ee

δ̂i cov(X1 , Xi ) = σ12 −


Xp
Xp

i=3
δ̂i Xi
!
ra
δ̂i σ1i
i=3 i=3
0 h i
= σ12 − δ̂ 0 σ (13) = σ12 − Σ−1 0 −1
3 σ (23) σ (13) = σ12 − σ (23) Σ3 σ (13) ∵ δ̂ = Σ−1
3 σ (23)
ee e e e e e e
σ21 σ 0(23)
Again, e = |Σ3 | σ21 − σ 0(23) Σ−1
3 σ (13)
σ (13) Σ3 e e
e
= |Σ3 | σ12 − σ 0(23) Σ−1
3 σ (13)
Ta

! e e
σ21 σ 0(23)
e cofactor of σ12 in Σ denoted by Σ12
σ (13) Σ3
e
Σ12
⇒ cov(e1.34...p , e2.34...p ) = −
|Σ3 |

Therefore, partial correlation coefficient between X1 and X2 eliminating the effect of X3 , X4 , . . . , Xp


is defined as
Σ12

cov(e1.34...p , e2.34...p ) |Σ3 | Σ12
ρ12.34...p =p =r = −√
V ar(e1.34...p )V ar(e2.34...p ) Σ11 Σ22 Σ11 Σ22
|Σ3 | |Σ3 |
In general,
Σij
ρij.123...4...i−1 i+1,...p = (−1)i+j p
Σii Σjj

25
In terms of correlation matrix,
R12
ρ12.34...p = − √
R11 R22
Ex: Show that (1 − ρ213 ) (1 − ρ212.3 ) = (1 − ρ21.23 )

R12
→ ρ12.3 = − √
R11 R22
1 ρ12 ρ13
|R| = ρ21 1 ρ23
ρ31 ρ32 1
= 1 − ρ223 − ρ12 (ρ21 − ρ23 ρ31 ) + ρ13 (ρ12 ρ23 − ρ13 )


ga
= 1 − ρ223 − ρ213 − ρ212 + 2ρ12 ρ13 ρ23
= 1 − ρ223

R11
(ρ21 − ρ23 ρ31 )
ρ12.3 = −p
(1 − ρ223 ) (1 − ρ213 )
s s
|R| ρ213 + ρ212 − 2ρ12 ρ13 ρ23
ρ1.23 = 1− =
R11 (1 − ρ223 )
n ⇒ 1 − ρ21.23 =
1 − ρ223 − ρ212 − ρ213 + 2ρ12 ρ13 ρ23
(1 − ρ223 )
ra
" #
2
(ρ21 − ρ23 ρ31 )
1 − ρ213 1 − ρ212.3 = 1 − ρ213
  
1−
(1 − ρ223 ) (1 − ρ213 )
(1 − ρ213 ) (1 − ρ212 − ρ213 − ρ223 + 2ρ12 ρ13 ρ23 )
=
(1 − ρ223 ) (1 − ρ213 )
= 1 − ρ21.23 (Proved)

Ta

Problem: If Xi0 s are uncorrelated for i=2,3,. . .,p. Show that ρ21.23...p = ρ212 + ρ213 + . . . + ρ21p .
 
1 ρ12 ρ13 . . . ρ1p
 
ρ12 1 0 ... 0
 
Solution: R = ρ13 0 1 ... 0
 
 . .. .. .. 
 .. . . . 
 
ρ1p 0 0 ... 1
p
X
Now, |R| = |R2 | |1 − ρ0(2) R2−1 ρ(2) | = 1 × |1 − ρ0(2) Iρ(2) | =1− ρ21i
e e e e i=1
p
!
X
1− ρ21i
|R| i=1
∴ ρ21.23...p = 1 − =1− = ρ212 + ρ213 + . . . + ρ21p (P roved)
|R2 | 1

26
Problem: If ρ12 = 0, then what can be said about ρ12.3 ? Hence comment on ρ12 as a measure
of correlation between X1 and X2 .
 
1 0 ρ13
Solution: R =  0 1 ρ23 
 

ρ13 ρ23 1
R12 = ρ13 ρ23 ; R11 = 1 − ρ223 ; R22 = 1 − ρ213
−ρ13 ρ23
∴ ρ12.3 = p 6= 0
(1 − ρ223 ) (1 − ρ213 )
So, ρ12 is not a proper measure of association.

ga
Problem: If ρij = ρ ∀ i, j, obtain ρ12.34...p
 
1 ρ ... ρ
ρ 1 ... ρ
 
Solution: In the problem R = 
 .. .. .. 

n . . .
ρ ρ ... 1
ρ ρ ρ ... ρ ρ 0 0 ... 0
ρ 1 ρ ... ρ ρ 1−ρ 0 ... 0
Now, R12 = − ρ ρ 1 ... ρ =− ρ 0 1 − ρ ... 0 = −ρ(1 − ρ)p−2
ra
.. .. .. .. .. .. .. ..
. . . . . . . .
ρ ρ ρ ... 1 ρ 0 0 ... 1 − ρ
1 ρ ... 0
ρ 1 ... ρ p−2
Again, R11 = R22 = .
.. .. .. = (1 + (p − 2)ρ) (1 − ρ)
. .
Ta

ρ ρ ... 1

ρ(1 − ρ)p−2 ρ
∴ ρ12.34...p = p−2
=
(1 − ρ) (1 + (p − 2)ρ) 1 + (p − 2)ρ

Problem: If a1 X1 + a2 X2 + . . . ap Xp = k (constant) then what will be the partial correlation


coefficients of different orders? What will be the miltiple correlation coefficient ρ1.23...p ?

Solution:
p
X
Note that, ai X i = k
i=1

⇒ ai Xi + aj Xj = k − [a1 X1 + . . . + aj−1 Xj−1 + aj+1 Xj+1 + . . . + ai−1 Xi−1 + ai+1 Xi+1 + . . . + ap Xp ]

While calculating partial correlation coefficient between Xi and Xj , the other variables are
kept fixed, i.e., ai Xi + aj Xj = k ∗ (constant)

27
Under this scenario, the usual correlation coefficient between Xi and Xj will be ±1
(
1 iff ai and aj are of different signs
ρij.12...p =
−1 iff ai and aj are of same sign

The multiple correlation coefficient will be +1, because Xi is linearly dependent on X1 , X2 , . . . , Xp .

ρ1j = ρ; j = 2, 3, . . . , p
Problem: If . Obtain ρ1.23...p and ρ12.34...p
ρij = ρ0 ; i, j = 2, 3, . . . , p, i 6= j
 
1 ρ ρ ... ρ

ga
ρ 1 ρ 0 . . . ρ0 
 
 
Solution: Here, R = ρ ρ0 1 . . . ρ0 .
 
. . . .. 
 .. .. .. .
 
ρ ρ 0 ρ0 . . . 1
1 ρ ρ ... ρ 1 0 0 ... 0
0 0 2 0 2 0
ρ 1 ρ ... ρ ρ 1−ρ ρ −ρ . . . ρ − ρ2
n
Now, |R| = ρ ρ0 1 . . . ρ0 = ρ ρ0 − ρ2 1 − ρ2 . . . ρ0 − ρ2
.. .. ..
. . .
..
.
..
.
ρ ρ0 ρ0 . . . 1
..
.
..
.
..
.
ρ ρ0 − ρ2 ρ0 − ρ2 . . . 1 − ρ2
p×p
[Ci0 → Ci − ρCi ]
ra
2 0 2 0 2
1−ρ ρ −ρ ... ρ − ρ
0
ρ −ρ 2
1−ρ 2
. . . ρ0 − ρ2
⇒ |R| = .. .. .. = ((1 − ρ2 ) + (p − 2)(ρ0 − ρ2 )) (1 − ρ0 )p−2 .
. . .
ρ0 − ρ2 ρ0 − ρ2 . . . 1 − ρ2
p−1×p−1

1 ρ0 . . . ρ0
Ta

ρ0 1 . . . ρ0 0 0 p−2
Again, |R2 | = . . .. = (1 + (p − 2)ρ ) (1 − ρ )
.. .. .
ρ0 ρ0 . . . ρ0

|R| ρ2 (p − 1)
∴ ρ21.23...p =1− =
|R2 | 1 + (p − 2)ρ0

R11 = |R2 | = (1 − ρ0 )p−2 (1 + (p − 2)ρ0 ).


1 ρ ρ ... ρ
ρ 1 ρ0 . . . ρ0 0 p−3 0 2
R22 = . . .
.. .. .. .. = (1 − ρ ) (1 + (p − 3)ρ − ρ (p − 2))
.
ρ ρ 0 ρ0 . . . 1

28
ρ ρ0 ρ0 . . . ρ0 1 ρ0 ρ0 . . . ρ0 1 0 0 ... 0
0 0 0 0 0
ρ 1 ρ ... ρ 1 1 ρ ... ρ 1 1−ρ 0 ... 0
R12 = − ρ ρ0 1 . . . ρ0 = −ρ 1 ρ0 1 . . . ρ0 = −ρ 1 0 1 − ρ0 . . . 0
.. .. .. . .. .. .. .. .. .. .. ..
. . . . . . .. . . . . . . . .
ρ ρ 0 ρ0 . . . 1 1 ρ0 ρ0 . . . 1 1 0 0 . . . 1 − ρ0
⇒ R12 = −ρ(1 − ρ0 )p−2

ρ 1 − ρ0
∴ ρ12.34...p = p
1 + (p − 2)ρ0 (1 + (p − 3)ρ0 − ρ2 (p − 2))

7 Partial regression coefficient

ga
We consider the multiple linear regression equation of X1 on X2 , X3 . . . Xp as follows
   
β2 X2
β3  X 
   
X1.23...p = α + β X (2) , where β =   and X (2) =  3 
.  .. 
e  ..   . 
 
ee e
βp Xp
n  0
If X (2) = 0 0 . . . 1 . . . 0 0 → jth element is 1
Then, X 1.23...p = α +βj . This βj termed as partial regression coefficient of X1 on Xj eliminating
e
the effect
e of X , X , . . . , X , X , . . . , X .
ra
2 3 j−1 j+1 p
Here βj refers to as change in the response X1 due to unit change in jth covariate Xj , while
other covariates are fixed.

We denote it as βij.23...j−1 j+1 ...p .


Now
Ta

σ (2) = Σ2 β̂
e
and α̂ = µ1 − β̂ 0 µ(2)
e
ee
    
σ12 σ22 σ23 . . . σ2p β2
σ13  σ32 σ33 . . . σ3p  β3 
    
σ (2) =
 ..  =  ..
 
..   .. 
 
e  .   . .  . 
σ1p σp2 σp3 . . . σpp β1
⇒ σi1 = σi2 β2 + σi3 β3 + · · · + σip βp (1)

Now it can be shown that


σ1i Σ11 + σ2i Σ12 + · · · + σpi Σ1p = 0
where Σij is the cofactor of σij in the dispersion matrix.
Σ12 Σ13 Σ1p
i.e., σ1i = − σ2i − σ3i − · · · − σpi (2)
Σ11 Σ11 Σ11

29
Comparing equation (1) and (2) we get

Σ12 Σ11
β2 = − . . . βp = −
Σ11 Σ11
Σ1j
∴ βj = −
Σ11
i.e., partial regression coefficient of X1 on Xj eliminating the effect of X2 , X3 , . . . , Xj−1 , Xj+1 ,
. . . , Xp is given by
Σ1j
β1j.23...j−1, j+1...p = −
Σ11

Relationship between partial regression coefficient and partial correlation coeffi-

ga
cient.

Note that
Σ12
β12.34...p = −
Σ11
Σ21
β21.34...p =−

Again
n β12.34...p × β21.34...p =
Σ22
Σ212
Σ11 × Σ22
= ρ212.34...p
ra
2 Σ22
V ar(e1.34...p ) = σ1.34...p (say) =
|Σ3 |
2 Σ11
V ar(e2.34...p ) = σ2.34...p (say) =
|Σ3 |
2
r
σ1.34...p Σ22 σ1.34...p Σ22
Ta

2
= ⇒ =
σ2.34...p Σ11 σ2.34...p Σ11
Again
−Σ12
ρ12.34...p = √ √
Σ11 Σ22
σ1.34...p Σ12
ρ12.34...p × =− = −β12.34...p
σ2.34...p Σ11

  
Result: 1 − ρ21.23...p
= (1 − − ρ212 ) (1 1− ρ213.2 ) . . .or, ρ21p.23...p−1
Derive the relationship between multiple correlation coefficient and partial correlation coeffi-
cient.

30
→ V ar(e1.23...p ) = E e21.23...p = E(e1.23...p e1.23...p )


= E [(X1 − X1.23...p )e1.23...p ]


= E [X1 e1.23...p ] − E[X1.23...p e1.23...p ]
| {z }
0
 
= E (X1 − X1.23...p−1 + X1.23...p−1 )e1.23...p
 
= E (X1 − X1.23...p−1 )e1.23...p + E[X1.23...p−1 e1.23...p ]
| {z }
0
 
= E e1.23...p−1 e1.23...p
h i
= E e1.23...p−1 (X1 − α − β̂2 X2 − · · · − β̂p Xp )

ga
= E[X1 e1.23...p−1 ] − β̂p E[e1.23...p−1 Xp ]
[rest of the terms are zero due to normal equation]
 
= E (X1.23...p−1 + e1.23...p−1 )e1.23...p−1 − β̂1p.23...p−1 E[Xp e1.23...p−1 ]
= E[e21.23...p−1 ] + E[X1.23...p−1 e1.23...p−1 ] −β1p.23...p−1 E[Xp e1.23...p−1 ]
| {z }
n 0
2
= σ1.23...p−1 − β1p.23...p−1 E[(Xp.23...p−1 + ep.23...p−1 )e1.23...p−1 ]
[here we take Xp as response]
2
= σ1.23...p−1 − β1p.23...p−1 E[ep.23...p−1 e1.23...p−1 ]
ra
Now
E[ep.23...p−1 e1.23...p−1 ] = ρ1p.23...p−1 × σp.23...p−1 × σ1.23...p−1
Therefore,
2
σ1.23...p−1
V ar(e1.23...p ) = σ1.23...p−1 − ρ1p.23...p−1 × β1p.23...p−1
σp.23...p−1
σ1.23...p−1
Ta

2
= σ1.23...p−1 − ρ1p.23...p−1 × σ1.23...p−1 σp.23...p−1 × ρ1p.23...p−1 ×
σp.23...p−1
2
= σ1.23...p−1 − ρ21.23...p−1 σ1.23...p−1
2

2
= σ1.23...p−1 (1 − ρ21.23...p−1 )
 
2 2
= σ1.23...p−1 1 − ρ1p.23...p−1

Note that,
2
σ1.23...p
ρ21.23...p = 1 − 2
⇒ σ1.23...p = σ11 (1 − ρ21.23...p )
σ11
2 2
1 − ρ21.23...p

∴ V ar(e1.23...p ) = σ1.23...p = σ1.23...p−1
  
2 2 2

⇒ σ11 1 − ρ1.23...p = σ11 1 − ρ1.23...p−1 1 − ρ1p.23...p−1
   
⇒ 1 − ρ21.23...p = 1 − ρ21.23...p−1 1 − ρ21p.23...p−1
Again, (1 − ρ21.23...p−1 ) = (1 − ρ21.23...p−2 )(1 − ρ21p−1,2...p−2 )

31
Recursively we get the final result,
  
1 − ρ21.23...p = (1 − ρ212 ) (1 − ρ213.2 ) . . . 1 − ρ21p.23...p−1

2 2
Result: σ1.23...p = σ1.34...p (1 − ρ212.34...p )
2 2
→ σ1.23...p = V ar(e1.23...p ) = E(σ1.23...p ) = E[e1.23...p × e1.23...p ]
= E[(X1 − X1.23...p )e1.23...p ] = E[X1 e1.23...p ] − E[X1.23...p e1.23...p ]
| {z }
0

= E[(X1.34...p + e1.34...p )e1.234...p ] = E[X1.34...p e1.23...p ] +E[e1.34...p e1.23...p ]


| {z }
0

= E[e1.34...p (X1 − X1.23...p )] = E [X1 e1.34...p ] − E [X1.23...p e1.34...p ]

ga
= V ar(e1.34...p ) + E [e1.34...p X1.34...p ] −E [e1.34...p (α + β2 X2 + . . . + βp Xp )]
| {z }
0
2
= σ1.34...p − β2 E[e1.34...p X2 ] [ rest of the terms vanish from normal equations]
2
= σ1.34...p − β2 E [(X2.34...p + e2.34...p )e1.34...p ]
2
= − β2 E [X2.34...p e1.34...p ]0 −β2 E [e1.34...p e2.34...p ]
σ1.34...p
| {z }
 
n 2
= σ1.34...p
2
= σ1.34...p
− ρ12.34...p ×
σ1.34...p
σ2.34...p
(1 − ρ212.34...p )
(P roved)

× ρ12.34...p (σ1.34...p × σ2.34...p )


ra
Problem: V ar(e1.23...p ) = V ar(e1.23...p−1 ) 1 − ρ21p.23...p−1
Give an interpretation of ρ21p.23...p−1 from the above result.

Solution:
V ar(e1.23...p ) V ar(e1.23...p )
1 − ρ21p.23...p−1 = ⇒ ρ21p.23...p−1 = 1 −
V ar(e1.23...p−1 ) V ar(e1.23...p−1 )
Ta

(
0 if V ar(e1.23...p ) = V ar(e1.23...p−1 )
⇒ρ21p.23...p−1 =
↑ if V ar(e1.23...p ) ↑ and V ar(e1.23...p−1 ) ↓
If eliminating Xp results in high variation in prediction of X1 then it indicates that X1 and Xp
are highly correlated, even when the effects of X2 , X3 , . . . , Xp−1 are eliminated.

Problem: Verify which probability is larger,


P [(X − µ)0 Σ−1 (X − µ) > 3p] or P [(X − µ)0 Σ−1 (X − µ) < 3p]
e e e e e e e e

Solution: We know that, E[(X − µ)0 Σ−1 (X − µ)] = p


By Markov’s inequality, e e e e

E[(X − µ)0 Σ−1 (X − µ)] 1


P [(X − µ)0 Σ−1 (X − µ) > 3p] ≤ e e e e =
e e e e 3p 3
1 2
⇒ P [(X − µ)0 Σ−1 (X − µ) < 3p] ≥ 1 − =
e e e e 3 3

32
∴ P [(X − µ)0 Σ−1 (X − µ) > 3p] < P [(X − µ)0 Σ−1 (X − µ) < 3p]
e e e e e e e e
     
X1 0 1 c 0
Problem: Let X = X2  , µ = 0 and Σ =  c 1 1.
     
e
X3 0 0 c 1
e
Is
 it possible to find cfor which
 e
0 0
a X and b X are independently distributed where a0 =
1 −1 −1 and b0 = 1 1 1 .Justify
e ee e
e

Solution: Let Y1 = a0 X and Y2 = b0 X


Now, c is so chosen that,
ee ee

ga
⇒ cov a0 X , b0 X = 0

cov(Y1 , Y2 ) = 0
3 X
3
ee ee
X
⇒ ai bj cov(Xi , Xj ) = 0
i=1 j=1

⇒V ar(X1 ) + cov(X1 , X2 ) + cov(X1 , X3 ) − cov(X1 , X2 ) − V ar(X2 ) − cov(X2 , X3 )


− cov(X1 , X3 ) − cov(X3 , X2 ) − V ar(X3 ) = 0
⇒ 1+c+0−c−1−c−0−c−1=0
n
⇒c = −
1
2

Problem: Let r1.23...p denote the sample multiple correlation coefficient of X1 on X2 , X3 , . . . , Xp ; p ≥


ra
4. Discuss when i) r1.23...p = r1.34...p and ii) r1.34...p = 1.

2 V ar(e1.23...p ) 2 V ar(e1.23...p )
Solution:i) Note that, r12.34...p =1− ; r1.23...p = 1 − ;
V ar(e1.34...p ) V ar(X1 )
2 V ar(e1.34...p )
r1.34...p =1− .
V ar(X1 )
2 2
If r1.23...p = r1.34...p , then V ar(e1.23...p ) = V ar(e1.34...p ), i.e., ρ212.34...p = 0
Ta

So, partial correlation coefficient between X1 and X2 elminating X3 , X4 , . . . , Xp is 0.

ii) r1.34...p = 1, so multiple linear regression equation of X1 on X3 , X4 , . . . , Xp is actually a


perfect linear relationship.

8 Multivariate normal distribution:


X p×1 → p-variate random vector is said to follow multivariate normal if it has the following
joint
e pdf.
1 − 21 (x−µ)0 Σ−1 (x−µ)
f X (x) = e
(2π)p/2 |Σ|1/2
e e e e
e e
µ = mean vector
Σ = dispersion matrix

33
8.1 Moment generating function:

0
M (t) = E(et1 X1 +t2 X2 +···+tp Xp ) = E(eet Xe )
Z
1
e
0 − 1 (x−µ)Σ−1 (x−µ)
= p/2 1/2
eet xe e 2 e e e e dx
(2π) |Σ| Rp e
Σ is positive definite, Σ−1 is also pd.
There exist a nonsingular matrix C such that Σ−1 = C 0 C

⇒ |Σ−1 | = |C 0 C| = |C|2
1
⇒ |C| =

ga
|Σ|1/2

Now we consider the following transformation, X → Y such that Y = C(X − µ)


e e e e e
X = µ + C −1 Y
e e  e
X 1
|J| = J e = |C −1 | = = |Σ|1/2
Y |C|
e
n
Now note that

(x − µ)0 Σ−1 (x − µ) = (x − µ)0 C 0 C(x − y ) = (C(x − µ)0 (C(x − µ))) = y 0 y


e e e e e e e e e e e e ee
ra
Therefore,

1|Σ|1/2
Z
t0 (µ+C −1 y )− 12 y 0 y
M X (t) = e e e dy
(2π)p/2 |Σ|1/2
e e e
e e Z e
1 t0 µ t0 C −1 y − 12 y 0 y
= ee e ee e e dy
(2π)p/2
e
Z e
1 t0 µ ay 0 − 21 y 0 y
= ee e eee e e dy where a0 = t0 C −1
Ta

(2π)p/2 e e e
Z
1 t0 µ+ 12 a0 a − 12 (y 0 y −2a0 y +a0 a)
= ee e e e e e e e e e e dy
(2π)p/2 e
Z
t0 µ+ 12 a0 a 1 − 12 (y −a)0 −1
= ee e e e e e e Ip (y − a) dy
(2π)p/2 e e e}
| {z
∼Np (a,Ip )
e
t0 µ+ 12 t0 Σt −1 0 −1
[∵ a0 a = t C
0 −1
t = t0 (C 0 C) t = t0 Σt]

= ee e e e C
ee e e e e e e
Ex. X ∼ Np (µ, Σ) Obtain the distribution of A0 X .
→ Z = A0 X
e e e
e e
0 0 0 0
MZ (t) = E(eet Ze ) = E(eet A Xe ) = E(e(Ate) X ), At = h (say)
e e
h0 µ+ 1 h0 Σh h0 µ+ 1 (At)0 Σ(At)
0
e e e
= E(ehe Xe ) = ee e 2 e e = ee e 2 e e
h0 µ+ 12 t0 A0 ΣAt t0 A0 µ+ 12 t0 A0 ΣAt
= ee e e e = ee e e e

34
So by uniqueness of mgf, Z ∼ NP (A0 µ, A0 ΣA)
e e

 
Problem: If X ∼ Np (µ, Σ) then for any matrix Br×p with R(B)=r, show that BX ∼ Nr Bµ, BΣB 0 .
e e e
Solution: Let Z = BX
e e h 0 i h 0 i h 00 i h 0 i
t (BX ) ( B t )X
MZ (t) = E ee e = E ee e = E e e e = E ehe Xe
tZ
e e
h0 µ+ 1 h0 Σh (B 0 t)0 µ+ 1 (B 0 t)0 Σ(B 0 t) t0 (B µ)+ 12 t0 (B 0 ΣB)t
= ee e 2 e e = e e e 2 e e = ee e e e

Now, B is a r×p matrix with R(B)=r, so R(BΣB’)=r

ga
 
0
∴ Z ∼ Nr Bµ, BΣB
e
Ex. X ∼ Np (µ, Σ) show that (X − µ)0 Σ−1 (X − µ) ∼ χ2p
e e e e e e

Solution: → Σ is pd then Σ−1 is also pd.


So there exists a nonsingular matrix B such that Σ−1 = B 0 B
We consider the following transformation, X → Y such that
n e

J e
x
!
e e
y = B(x − µ) ⇒ x = µ + B −1 y
e e e e
p
= B −1 = |Σ|
e
ra
y
1
e
− 1 y0 y
fY (y ) = p/2 1/2
|Σ|1/2 e 2 e e
e e (2π) |Σ|
iid
∴ Y ∼ Np (0, Ip )
⇒ Yi ∼ N (0, 1)
e e p
X
0 −1
⇒ (X − µ) Σ (X − µ) = Yi2 ∼ χ2p (Proved)
Ta

e e e e i=1

Result: X ∼ Np (µ, Σ) iff l0 X ∼ Np (l0 µ, l0 Σl)


e e ee ee e e

Here, X ∼ Np (µ, Σ)
Only if part:
e Z =e l0 x
ee 0 0 0
MZ (t) = E(eet Xe ) = E(eet el Xe ), where h0 = tl0
h0 µ+ 1 h0 Σh tl0 µ+ 1 t0 l0 Σlt
e 0
e e
= E(ehe Xe ) = ee e 2 e e = eee e 2 e e ee
i.e., Z ∼ Np (l0 µ, l0 Σl)
If part: ee e e

Z = l0 Z ∼ Np (l0 µ, l0 Σl)
ee ee e e
l0 µ+ 1 l0 Σl
E(eZ ) = ee e 2e e
0 l0 µ+ 12 l0 Σl
⇒ E(eel Xe ) = ee e e e

35
By uniqueness of mgf, X ∼ Np (µ, Σ)
e e
2 2
Ex. E(et1 X+t2 Y ) = et1 +5t2 −t1 t2 −2t2 obtain mgf (X − 2Y, X + 3Y )

→ Z1 X − 2Y and Z2 = X + 3Y

MZ1 ,Z2 (t1 , t2 ) = E(et1 Z1 +t2 Z2 )


= E et1 (X−2Y )t2 (X+3Y )
 

= E eX(t1 +t2 )+Y (3t2 −2t1 ) where t∗1 = t1 + t2


 
 ∗ ∗ 
= E et1 X+t2 Y where t∗2 = (3t2 − 2t1 )

ga
∗2 +5t∗2 −t∗ t∗ −2t∗
= et1 2 1 2 2

2 +5(3t −2t )2 −(t +t )(3t −2t )−2(3t −2t )


= e(t1 +t2 ) 2 1 1 2 2 1 2 1

2 2
= e23t1 +43t2 −59t1 t2 −6t2 +4t1
 
59
∴ (Z1 , Z2 ) ∼ N2 4, −6, 46, 86, − √ √
n 86 48

8.2 Marginal Distribution of Multivariate Normal


!m+u=p
X m×1
(1)
Result: Let X ∼ Np (µ, Σ). We partition, X = e u×1
X (2)
ra
e e e
e
! !
µ( 1)m×1 Σ1 Σ12
µ= e u×1 Σ= Disp (X (1) ) = Σ1 Disp (X (2) ) = Σ2
e µ( 2) Σ21 Σ2 e e
e
Show that X (1) and X (2) are independent iff Σ12 = 0
e e
Ta

Solution: If part
!
Σ1 0
Here Σ12 = 0 therefore the dispersion matrix is Σ =
0 Σ2

The joint pdf of X is given by


e
1 − 21 (x−µ)0 Σ−1 (x−µ)
fX (x) = e
(2π)p/2 |Σ|1/2
e e e e
e e
!
Σ1 0 −1 Σ−1
1 0
Now, |Σ| = = |Σ1 ||Σ2 | and Σ =
0 Σ2 0 Σ−1
2
! !
  Σ−1 0 x1 − µ 1
1
(x − µ)0 Σ−1 (x − µ) = (x1 − µ1 )0 (x2 − µ2 )0
Σ−1
e
0 x2 − µ 2
e
e e e e e e e e 2
e
= (x1 − µ1 )0 Σ−1 0 0 −1
e
1 (x 1 − µ 1 ) + (x 2 − µ 2 ) Σ 2 (x2 − µ2 )
e e e e e e e e

36
1
h i
− 12 (x1 −µ1 )Σ−1 0 −1
1 (x1 −µ1 )+(x2 −µ2 ) Σ2 (x2 −µ2 )
∴ fX (x) = e
(2π)p/2 |Σ1 ||Σ2 |
e e e e e e e e
e e
1 − 1 (x −µ )Σ (x −µ ) 1 − 1 (x −µ )Σ−1 (x −µ )
= m/2
e 2 e1 e 1 1 1 e 1 u/2
e 2 e2 f2 2 e2 e2
(2π) |Σ1 | (2π) |Σ2 |
⇒ fX (x) = fX 1 (x1 ).fX 2 (x2 )
e e e e e e

i.e., X 1 and X 2 are independent.


e e
Only if part → trivial.

Result: Any subset of multivariate normal random variable is also multivariate normal.
! !
X (1) µ(1)
→X= e

ga
,µ = e
e X (2) e µ(2)
e !
Σ1 Σ12
e
Σ=
Σ21 Σ22

Define

n Y (1) = X (1) + BX (2)


e e e
Y (2) = X (2)
e e
[B is so chosen that Y (1) and Y (2) are independent]
e e
i.e., cov(Y (1) , Y (2) ) = cov(X (1) , X (2) ) + B Disp(X (2) ) = 0
ra
e e e e e
⇒ Σ12 + B Disp(X (2) ) = 0
e
⇒ Σ12 = −BΣ2
⇒ B = −Σ12 Σ−1
2

i.e., Y (1) = X (1) − Σ12 Σ−1


2 X (2)
Ta

e e(2) e
y (2) = x
e e
   
i.e., we are considering the transformation X 1 X 2 → Y 1 Y 2 such that
e e e e
! ! !
Y (1) I −Σ12 Σ−12 X (1)
e = e = PX
Y (2) 0 I X (2) e
e e
where |P | = 1 i.e., P is non-singular.
Now, X ∼ Np (µ, Σ) ⇒ P X ∼ Np (P µ, P ΣP 0 )
e e e e

37
! ! !
I −Σ12 Σ−1
2 Σ1 Σ12 I 0
P ΣP 0 =
0 I Σ21 Σ2 −Σ−1
2 Σ21 I
! !
Σ1 − Σ12 Σ−1
2 Σ21 0 I 0
=
Σ21 Σ2 −Σ−1
2 Σ21 I
!
Σ1 − Σ12 Σ−1
2 Σ21 0
=
0 Σ2
! !
I −Σ12 Σ−1
2 µ(1)
and P µ =
0 I µ(2)
e

ga
e
e!
µ(1) − Σ12 Σ−12 µ(2)
=
µ(2)
e e
e  
−1 (2) −1
∴ Y (2) ∼ Nu (µ(2) , Σ2 ) and Y (1) ∼ Nm µ(1) − Σ12 Σ2 µ , Σ1 − Σ12 Σ2 Σ21
e e e e e
Y (2) = X (2) subset of X follows multivariate normal. (Proved)
e n e e

8.3 Conditional distribution:


X ∼ Np (µ, Σ)
ra
e ! e !
X (1) µ(1)
X= e µ=
X (2) µ(2)
e
e e
e ! e
Σ1 Σ12
Σ=
Σ21 Σ2
Ta

Define Y (1) = X (1) − Σ12 Σ−1 2 X (2)


e e e
Y (2) = X (2)
e e
From the previous result we know that Y (1) and Y (2) are independent.
e e
Also the joint distribution of Y (1) and Y (2) are given by
e e
1 − 12 (y (1) −µ(1) )0 (Σ1 −Σ12 Σ−1
2 Σ11 )
−1 (y
(1) −µ(1) )
0
fY (1) Y (2) (y 1 , y 2 ) = 1/2
e
(2π)m/2 Σ1 − Σ12 Σ−1
e e e e
2 Σ21
e e e e
1 − 12 (y (2) −µ(2) )0 Σ−1
2 (y (2) −µ(2) )
0
× 1/2
e
(2π)u/2 |Σ2 |
e e e e

Define µ∗(1) = µ(1) − Σ12 Σ−1 −1


2 µ(2) and Σ11.2 = Σ1 − Σ12 Σ2 Σ21
e e e
Note that X (1) = Y (1) + Σ12 Σ−1 2 Y2
e e e
X (2) = Y (2)
e e

38
I Σ12 Σ−1
2
Jocobian of the transformation = |J| = = |I| = 1
0 I

fX (1) X (2) (x(1) , x(2) )


e e
1
e e
− 21 (x1 −Σ12 Σ−1 −1 0 −1 −1 −1
2 x2 +µ1 +Σ12 Σ2 µ2 ) Σ11.2 (x1 −Σ12 Σ2 x2 +µ1 +Σ12 Σ2 µ2 )
= e
(2π)m/2 |Σ11.2 |1/2
e e e e e e e e

1 − 12 (x(2) −µ(2) )0 −1
P
2 (x(2) −µ(2) )
× µ/2
P 1/2
e
(2π) | 2 |
e e e e

Conditional pdf of X (1) |X (2) is given by,


e e
f (x(1) , x(2) )
f (x(1) |x(2) ) = e e
fX (2) (x(2) )

ga
e e
e e 0
1
   
− 21 x(1) −µ(1) −Σ12 Σ−1
2 (x(2) −µ
(2) ) Σ−1 −1
11.2 x(1) −µ(1) −Σ12 Σ2 x(2) −µ(2)
= ×e e e
(2π)m/2 |Σ11.2 |1/2
e e e e e e

 
X (1) |X (2) ∼ Nm µ(1) + Σ12 Σ−1 2 (x (2) − µ (2) ), Σ 11.2
e e e e e
Ex. X ∼ Np (µ, Σ) and Σ is PD and a is a fixed vector. If ri be the correlation between Xi and
a0 X then
e
n show e that e
ee  
r1
r2 
   
− 1
0 1
r=.  = C DΣa
2 where C = a Σa and D = Diag √
e   .
. e e e e σii
ra
rp
Solution:
cov Xi , Σpj=1 aj Xj Σpj=1 cov(Xi , Xj )aj Σpj=1 aj σij

cov(Xi , a0 X )
→ ri = p = √ p = p = p
V ar(X i )V ar(a0 X ) σii a0 Σa σii a0 Σa σii a0 Σa
ee
e ee e e e e e e
Now,
   Σaj σij 
Ta

   
0
r1 √ p σ11 σ 12 . . . σ 1p σ (1)
   σ11 a0 Σa   e0 
r2   σ σ . . . σ2p σ 

.. e e 

 21 22  = e(2) 
  
r= .
= . Σ =  .. .. ..   .. 
 ..  
 
e   Σaj σpj 
  . . .   . 
rp √ p 0
σp1 σp2 . . . σpp σ 0(p)
σpp a Σa
 √ e e e
a0 σ (1) / σ11
e0 e √ 
a σ (2) / σ22 
= √aΣa 
1
e e ..

.

e e  

a0 σ / σpp
 e e(p) 
√1

σ11
0 . . . 0 σ 0(1)
  e 
1  0
 √1 ... 0  σ 0(2) 
−2 σ 22 
=C  . .. ..  e ..  e
 a
 .. . .  . 


0 0 ... √
1
σpp
σ 0(p)
1 e
= c− 2 DΣa (P roved)
e
39
Ex. X ∼ N3 (0, Σ)
e 
1 P 0
Σ = P 1 P  is there any value of P such that X1 + X2 + X3 and X1 − X2 − X3 are
 

0 p 1
independent.
cov(X1 + X2 + X3 , X1 − X2 − X3 ) = 0
=V ar(X1 ) − cov(X1 , X2 ) − cov(X1 , X3 ) + cov(X1 , X2 ) − V ar(X2 ) − cov(X2 , X3 )
+ cov(X1 , X3 ) − cov(X2 , X3 ) − V ar(X3 )
⇒ 1 − 1 − 1 − 2P = 0
1

ga
⇒P =
2
 
1 ρ ρ2
Ex. X ∼ N3 (0, Σ), Σ =  ρ 1 ρ  show that for c > 0
 

ρ2 ρ
e e
1
Z c
1 −y2
X22 2
X12 X22 X32 √ e 2 y 1/2 dy
  
P + c ρ − 2ρ(X1 X2 + X2 X3 ) + + + −c≤0 =
2x

→P
n
Solution:

X22 + c ρ2 − 2ρ(X1 X2 + X2 X3 ) + X12 + X22 + X32 − c ≤ 0
 

=P ρ2 X22 − 2ρX1 X2 + X12 + X32 + 2ρX2 X3 + ρ2 X22 − ρ2 X22 + X22 − c(1 − ρ2 ) ≤ 0


 
0
ra
=P (ρX2 − X1 )2 + (ρX2 − X3 )2 + X22 (1 − ρ2 ) − c(1 − ρ2 ) ≤ 0
 
 !2  
2
ρX2 − X1 X3 − ρX2
⇒P p + + X22 ≤ c
1−ρ 2 1 − ρ2

We define the following,


X1 ρX2
Ta

Y1 = p −p
1 − ρ2 1 − ρ2

Y2 = X2
X3 ρX2
Y3 = p −p
1−ρ 2 1 − ρ2
 
 
Y1
√1
2
−√ ρ 2 0 
X1

 1−ρ 1−ρ 
⇒ Y = Y2  =  0 1 0  X2 
    
 ρ

−√ 2 √ 1
e
Y3 0 2
X3
1−ρ 1−ρ

⇒Y = P X ∼ N3 0, I3 [∵ P ΣP 0 = I3 (check)]

e e e iid
∴ P Y12 + Y22 + Y32 ≤ c = P [Z ≤ c] where Z ∼ χ23 [∵ Yi ∼ N (0, 1)]
 
Z c
1 y 3
= √ e− 2 y 2 −1 dy (Proved)
0 2π

40
Ex. X ∼ Np (µ, Σ)
e  
1 1 1 ... 1
e
 
µ  
1 2 2 . . . 2
µ
   
µ= Σ = 1 2 3 . . . 3
 
 .. 

e  . .
 .. .. 
.
µ
 
1 2 3 ... p
obtain the distribution, (X2 − X1 )2 + (X3 − X2 )2 + · · · + (Xp − Xp−1 )2
→ Define Y1 = (X1 − µ)
Y2 = (X2 − µ) − (X1 − µ)
Y3 = (X3 − µ) − (X2 − µ)

ga
..
.
Yp = (Xp − µ) − (Xp−1 − µ)
(X1 − µ) = Y1
(X2 − µ) = Y2 + Y1
(X3 − µ) = Y1 + Y2 + Y3
..
n .
(Xp − µ) = Y1 + Y2 + · · · + Yp


X1 − µ


1 0 0 ...

0  
 Y1
ra

  1 1 0 ... 0
 X2 − µ   Y2 
  

 ..  =  1 1 1 ... 0 

 .. 
  
 .    ... .. .. ..   . 
. . .
Xp − µ Yp
 
1 1 1 ... 1
(X − µ) = cY ⇒ Y = c−1 (X − µ)
Ta

e e e e e e
X
J e = |c| = 1 and (X − µ) ∼ Np (0, Σ)
Y e e e
e
    
1 0 ... 0 1 1 ... 1 1 1 ... 1
1 1 . . . 0 0 1 . . . 1 1 2 . . . 2
    
cc0 = 
 .. .. ..   .. ..

..  =  .. ..
 
..  = Σ

. . . . . . . . .
1 1 ... 1 0 0 ... 1 1 2 ... p
⇒|Σ| = |cc0 | = |c|2 = 1

The joint pdf of X1 , X2 , . . . , Xp is:-


1 − 12 (x−µ)0 Σ−1 (x−µ)
f (x) = e
(2π)p/2 |Σ|1/2
e e e e
e
1 − 1 (x−µ)0 (cc0 )−1 (x−µ) 1 − 1 (c−1 (x−µ))0 (c−1 (x−µ))
= p/2 1/2
e 2e e e e =
p/2 1/2
e 2
(2π) |Σ| (2π) |Σ|
e e e e

41
The joint pdf of Y1 , Y2 , . . . , Yp is given by

1 − 12 y 0 y
fY (y ) = e
(2π)p/2
ee
e e

iid
Therefore, Y1 , Y2 , . . . , Yp ∼ N (0, 1)
⇒ Y22 + Y32 + · · · + Yp2 ∼ χ2p−1

Ex. x ∼ N4 (µ, Σ)
e    
1 1 1 1 µ
e
   
1 2 2 2 µ
Σ=
  µ= 
1 2 3 3
µ

ga

e  
1 2 3 4 µ
obtain the distribution of (x1 − x2 )2 + (x3 − x4 )2

→ y1 = (x1 − µ) − (x2 − µ)
y2 = (x3 − µ) − (x4 − µ)
n y1
y2
!
=
1 −1 0
0 0
!

1 −1  x
 3 −

µ

 x2 − µ 
0 

x1 − µ

ra

x4 − µ
y = B(x − µ)
e e e

(x − µ) ∼ N4 (0, Σ)
e e
B(x − µ) ∼ N4 (0, BΣB 0 )
Ta

e e
  
1 1 1 1 1 0
!  
1 −1 0 0 1 2 2 2 −1 0 
BΣB 0 =  
0 0 1 −1 1 2 3 3

0 1 

1 2 3 4 0 −1
!
1 0
= =I
0 1
+
y1 ∼ N (0, 1)
ii∂
y2 ∼ N (0, 1)
i.e., y12 + y22 ∼ x22

42
Ex. Let X and Y are p-variate random vectors such that
e e ! !!
X A B
Z = e ∼ N2p 0,
e Y e B0 C
e
Show that (X + Y ) and (X − Y ) are independent if A = C, B = B 0
e e e e

Solution:

→ U =X +Y
e e e
V =X −Y
!
e e e ! !
U I I X

ga
e = e = DZ ∼ N2p (0, DΣD0 )
V I −I Y e e
e e
! ! !
I I A B I I
DΣD0 = 0
I −I B C I −I
! !
A + B0 B + C I I
= 0
A−B B−C I −I
n U
!
=
A + B0 + B + C A + B0 − B − C
A − B0 + B − C A − B0 − B + C
2(A + B) 0
!
!
ra
Disp e = [ If A=C and B=B’]
V 0 2(A − B)
e
Since the covariance are zero, U and V are independent.
    e e 
X1 1 ρ12 ρ13
Ex. X = X2  ∼ N3 0,  ρ2 1 ρ23 
    
e e
Ta

X3 ρ13 ρ23 1
Show that
1 sin−1 ρ12 + sin−1 ρ13 + sin−1 ρ23
P [X1 > 0, X2 > 0, X3 > 0] = +
8 4π

→ Define 3 events, A1 = {X1 > 0}


A2 = {X2 > 0}
A3 = {X3 > 0}

P (A1 ∩ A2 ∩ A3 ) = P (Ac1 ∪ Ac2 ∪ Ac3 )c


= 1 − P (Ac1 ∪ Ac2 ∪ Ac3 )
= 1 − [P (Ac ) + P (Ac2 ) + P (Ac3 ) − P (Ac ∩ Ac2 ) − P (Ac2 ∩ Ac3 ) − P (Ac1 ∩ Ac3 )
+ P (Ac1 ∩ Ac2 ∩ Ac3 )]
= 1 − P (X1 < 0) − P (X2 < 0) − P (X3 < 0) + P (X1 < 0, X2 < 0)
+ P (X2 < 0, X3 < 0) + P (X1 < 0, X3 < 0) − P (X1 < 0, X2 < 0, X3 < 0)

43
 
1 1 1 3
⇒ 2P (X1 > 0, X2 > 0, X3 > 0) = 1 − + + +
2 2 2 4
 
1 −1 1 −1 1 −1
sin ρ12 + sin ρ13 + sin ρ23
2π 2π 2π
1 1
⇒ P (X1 > 0, X2 > 0, X3 > 0) = + (sin−1 ρ12 + sin−1 ρ13 + sin−1 ρ23 ) (proved)
8 4π
!
Σ1 Σ12
Ex. Let X ∼ Np (0, Σ) where Σ =
e e Σ21 Σ2
 
X = X1 X2 β = Σ12 Σ−1
2
e e e

ga
−1
Σ11.2 = Σ1 − Σ12 Σ2 Σ21

Find the dist of U = (X 1 − βX 2 )0 Σ−1


11.2 (X 1 − βX 2 )
e e e e
→ Define Y 1 = X 1 + BX 2 where B = −Σ12 Σ−12 = −β
Solution:
e e e
Y 2 = X2
here Y 1 ∼ Nm (0, Σ1 − Σ12 Σ−1
2 Σ21 ) ← (previous result)
n e e
e e
⇒ (X 1 − βX 2 ) ∼ Nm (0, Σ11.2 )
∴ (X 1 − βX 2 )0 Σ−1 2
e e e
11.2 (X 1 − βX 2 ) ∼ χm
e e e e
ra
Ex. X ∼ Np (µ, Σ)
Let l and λ be two non-random vectors such that l0 µ = λ0 µ = l0 Σλ = 0
e e
e e ee e e e e
0
If Y1 = X λ Find P (Y1 > 0, Y2 > 0)
e0 e
Y2 = λ X
e e ! ! !
Y1 l0 X l0
Solution:→ Let Y = = e0 e = e0 X = cX
Ta

e Y2 λX λ e e
e e e
cX ∼ N2 (cµ, cΣc0 )
e  e
Now, cµ = l0 λ0 µ = l0 µ + λ0 µ = 0
e e e e e e e e  0 
0 0 0 00 0
Again, cov(Y1 , Y2 ) = cov(l X , λ X ) = E l X − l µ λ X − λ µ
ee e e   e e e e e e e e
0
0
=lE X −µ X −µ λ = l0 Σλ = 0
e e e e e e e e

∴ Y1 and Y2 are independent

P [Y1 > 0, Y2 > 0] = P (Y1 > 0)P (Y2 > 0)


1 1
= × [∵ Y1 and Y2 are symmetric about zero]
2 2
1
=
4

44
Ex. Suppose Z ∼ N (0, 1), Y |Z sin N (1 − Z, 1) and (X|Y, Z) ∼ N (1 − Y, 11). (a) Find the joint
distribution of (X, Y, Z). (b) Find the joint distribution of (U, V )0 = (1 + Z, 1 − Y ).
(c) Find E(Y |U = 1.7).
1 z2
a) fZ (z) = √ e− 2

1 − 1 [y−(1−z)]2
fY |Z (y|z) = √ e 2

1 − 12 {z2 +(y−1+z)2 }
fY,Z (y, z) = fY |Z (y|z) × fZ (z) = e

1 − 12 {z2 +(y−1)2 +z2 −2(y−1)z}
= e

1 − 12 {2(y−1)z+(y−1)2 +2z2 }

ga
= e

1 1 2
fX|Y,Z (x, y, z) = √ e− 2 {x−1+y}

∴ f (x, y, z) = fx|y,z (x, y, z) × fyz (y, z)
1 1 2 2 2
= 3/2
e− 2 [(x−1+y) −2(y−1)z+(y−1) +2z ]
n (2π)

b) Let us consider the transformation U = 1 + Z ⇒ Z = U − 1


V =1−Y ⇒ Y =1−V

0 −1
 
Y, Z
ra
J = =1
U, V 1 0
|J| = 1

1 − 1 [2{1−v−1}(u−1)+v2 +2(u−1)2 ]
fU,V (u, v) = e 2

1 − 1 [2u2 +v2 −2u−2uv+2v+2]
= e 2 u ∈ R, v ∈ R
Ta


c)

E[Y |U = 1.7] = E[Y |1 + Z = 1.7] = E[Y |Z = 0.7]


Z −∞ Z ∞
1 1 2
= y fY |Z (y|z = 0.7) dy = y × √ e− 2 (y−0.3) dy
−∞ −∞ 2π
= 0.3 [ pdf of N(0.3,1)]
 
ρ 1 ρ ...
Ex: Let X ∼ Nn (µ, Σ) where µ = (µ, µ, . . . , µ)0 and Σ = σ 2 ρ 1 . . . ρ
 
e
ρ ρ ... 1
e e
1
and − n−1 < ρ < 1.
Define n n
1X 1 X
X̄ = Xi and S 2 = (Xi − X̄)2
n i=1 n − 1 i=1

45
Find the distribution of
√ s
n(X̄ − µ) 1 + (n − 1)ρ
T =
S 1−ρ

Consider the transformation, (X1 , X2 , . . . , Xn ) → (Y1 , Y2 , . . . , Yn ) such that


   √ √ √ √  
Y1 1/ n 1/ n 1/ n ... 1/ n X1 − µ
   √ √  
 Y2   1/ 2 −1/ 2 0 . . . 0   X2 − µ 
   √ √ √  
 Y3  =  1/ 6 1/ 6 −2/ 6 ... 0   X3 − µ 
    
.  .. .. .. ..  . 
 ..   . . . . . . .   .. 
   p p p p  
Yn 1/ n(n − 1) 1/ n(n − 1) 1/ n(n − 1) . . . −(n − 1)/ n(n − 1) Xn − µ

ga
⇒ Y = P (X − µ) ⇒ Y = PZ
e e e e e
Now, Z ∼ Nn (0, Σ) ⇒ P Z ∼ Nn (0, P ΣP 0 )
e e e e
 
1 ρ ... ρ
 ρ 1 ... ρ 
 
Σ = σ2  .. .. .. 

 . . . 
n 
ρ ρ ... 1
1−ρ
 0

= σ2 
0 ...
1 − p ...
0
0
 
ρ ρ ...
 ρ ρ . . .
 
+
ρ
ρ 


= σ 2 [(1 − ρ)In + ρ110 ]
ra
 .. ..   .. .. 
 
 . .  . .  ee
0 0 ... 1−ρ ρ ρ ... ρ
P ΣP 0 = P σ 2 [(1 − ρ)In + ρ110 ]P 0 = σ 2 [P (1 − ρ)In P 0 + P ρ110 P 0 ]
= σ 2 [(1 − ρ)In + ρP 1(P 1)0 ]
ee ee
 √ e e√ √ √ 
1/ n 1/ n 1/ n ... 1/ n   √ 
√ √  1 n
Ta


 1/ 2 −1/ 2 0 ... 0    
 √ √ √ 
 1  0 
P1 =  1/ 6 1/ 6 −2/ 6 ... 0  ..  =  .. 

   
e  .. .. .. ..  .  . 

 p . . . ... .

1 0
p p p 
1/ n(n − 1) 1/ n(n − 1) 1/ n(n − 1) . . . −(n − 1)/ n(n − 1)
√   
n n ... ... 0
 0  √  0 . . . . . . 0
    
0 0
(P 1) (P 1) =  . 
  n 0 ... 0 = . 
.. 

e e  . . . . .
0 0 ... ... 0
   
(1 − ρ) 0 ... 0 nρ 0 . . . 0
  .
P ΣP 0 = σ 2 
 
 0 (1 − ρ) . . . 0  +  .. 
.. .. ..
   
. . .0 0 . . . (1 − ρ) 0 0 ... 0

46
 
1 + ρ(n − 1) 0 ... 0
0 (1 − ρ) . . . 0
 
0  = Σ∗ (say)
 2

⇒ P ΣP = σ  ..
. ...
 
 
0 0 ... (1 − ρ)
∴ Y ∼ Nn (0, Σ∗ )
e e
i.e., Y1 , Y2 , . . . , Yn are independent as Σ∗ is a diagonal matrix

∴ Y1 ∼ N (0, σ 2 (1 + (n − 1)ρ))
Yi ∼ N (0, (1 − ρ)σ 2 ) ∀ i = 2, 3, . . . , n

ga
Now, Y1 = n(X̄ − µ)
n
X n
X
2 2
Note that, (n − 1)S = (Xi − X̄) = (Xi − µ)2 − n(X̄ − µ)2
i=1 i=1

Again, Y Y = (X − µ) (P P )(X − µ) = (X − µ)0 (X − µ) [∵ P 0 P = I]


0 0 0

n
e e e e en e e n e e e
2
X
2 2
X
2
X Yi2
∴ (n − 1)S = Yi − Y1 = Yi ⇒ 2
∼ χ2n−1
σ (1 − ρ)
n √
i=1 i=2


i=2

(n − 1)S 2
σ 2 (1 − ρ)
∼ χ2n−1
ra
n(X̄ − µ)
p √ s
σ 1 + (n − 1)ρ n(X̄ − µ) 1−ρ
And finally, s = · ∼ tn−1
(n − 1)S 2 1 S 1 + (n − 1)ρ
σ 2 (1 − ρ) (n−1)

Question: X ∼ Np (0, Σ) show that for ai > 0, i = 1(1)p


Ta

e e
p

 
P
r σii 
2
 i=1
P [|X1 | > a1 , |X2 | > a2 , . . . , |Xp | > ap ] ≤ p

π P 
ai
i=1

Solution:
p p
" #
X X
∴ P [|X1 | > a1 , |X2 | > a2 , . . . , |Xp | > ap ] ≤ P |Xi | > ai
i=1 i=1
p
P
Pp E|Xi |
E[ i=1 |Xi |] i=1
≤ p = p [Markov’s inequality]
P P
ai ai
i=1 i=1

r
2
Now Xi ∼ N (0, σii ) ⇒ E|Xi − 0| = σii [MD about mean]
π

47
Therefore,
p

 
P
r σ ii
2
 i=1

P [|X1 | > ai , |X2 | > a2 , . . . , |Xp | > ap ] ≤ p
 (Proved)
π P 
ai
i=1

Question:
  
1 ρ12 ρ13
X1 , X2 , X3 ∼ N3 0, ρ12 1 ρ23 
  
e
ρ13 ρ23 1
Show that 1 + 2ρ12 ρ13 ρ23 ≥ ρ212 + ρ213 + ρ223

ga
Solution: As |Σ| ≥ 0

1 ρ12 ρ13
ρ12 1 ρ23 ≥ 0
n ρ13 ρ23 1
⇒ 1 − ρ223 − ρ12 (ρ12 − ρ13 ρ23 ) + ρ13 (ρ12 ρ23 − ρ13 ) ≥ 0
⇒ 1 + 2ρ12 ρ13 ρ23 ≥ ρ212 + ρ213 + ρ223 (P roved)
ra
9 Multinomial distribution
Important remark:

1. When the true regression of X1 on y2 , X3 , . . . , Xp is linear i.e., E[X1 |X (2) ] = X1.23...p →


(multilinear regression equation X1 on X2 , X3 , . . . Xp ). e
Ta

The multiple correlation coefficient of X1 on X2 , X3 , . . . , Xp is given by

V ar(E(X1 |X (2) ))
ρ21.23...p =
V ar(X1 )
e

2. When the regression of X1 on X2 , X3 , . . . , Xp is linear then partial correlation coefficient


between X1 and X2 eliminating the effect of X3 , X4 , . . . , Xp is given by

E[cov(X1 , X2 |X (3) )]
ρ212.34...p = q p e
E(V ar(X1 |X (3) )) E(V ar(X2 |X (3) ))
e e

9.1 Introduction
Let us consider that we have k categories 1,2,. . ., k. Probability of falling into the i-th category
is pi , ∀ i = 1, 2, . . . , k. Suppose we have n samples which can belong to any of these categories

48
(these categories are disjoint). We define the following random variables,
X k X k
Xi : no. of samples falling into the i-th category where Xi = n and pi = 1
i=1 i=1
So the joint pmf of X1 , X2 , . . . , Xk is given by
n!
fX1 ,X2 ,...,Xk (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pxkk
x1 !x2 ! . . . xk !
k
X k
X
where pi = 1, xi = n and 0 < pi < 1
i=1 i=1
 0
X = X1 X2 . . . Xk is said to follow multinomial distribution with parameters n,p1 , p2 , . . . , pk

ga
e

9.2 Multinomial theorem


n X
n n
X X n!
(a1 + a2 + · · · + ak ) = n
··· ai11 ai22 . . . aikk
i1 =0 i2 =0 ik
i !i ! . . . ik !
=0 1 2

9.3 Moment Generating Function


n 0

=
X
···
X
t X
E(eet Xe ) = E(ee 1 1 +t2 X2 +···+tk Xk )
n n
et1 x1 +t2 x2 +···+tk xk
n!
px1 px2 . . . pxkk
ra
x1 =0 xk =0
x1 !x2 ! . . . xk ! 1 2
n n
X X n!
= ··· (p1 et1 )x1 (p2 et2 )x2 . . . (pk etk )xk
x1 =0 xk =0
x 1 ! . . . x k !
= (p1 et1 + p2 et2 + · · · + pk etk )n

Mean, Variance and Covariance:


Ta

∂M ∂
= E(Xi ) = (p1 et1 + p2 et2 + · · · + pk etk )n
∂ti t=0 ∂ti t=0
e e e e

= n(p1 et1 + p2 et2 + · · · + pk etk )u−1 pi eti = npi


t=0
e e
∂ 2M
= n(n − 1)(p1 et1 + . . .pk etk )n−2 p2i e2ti
∂ti t=0
e e

+ n(p1 et1 + · · · + pk etk )n−1 pi eti = n(n − 1)p2i + npi = E(Xi2 )


t=0
e e
V ar(X1 ) = E(Xi2 ) 2
− E (Xi ) = n(n − 1)p2i + npi − n2 p2i = npi (1 − pi )
∂ 2M
= n(n − 1)(p1 et1 + · · · + px etk )pi pj eti etj = n(n − 1)pi pj = E(X1 Xj )
∂ti ∂tj t=0 t=0
e e e e
cov(Xi , Xj ) = E(X1 Xj ) − E(Xi )E(Xj ) = n(n − 1)pi pj − n2 pi pj = −npi pj

49
Dispersion matrix:
 
np1 (1 − p1 ) −np1 p2 ... −np1 pk
 −np1 p2 np2 (1 − p2 ) . . . −np2 pk
 

Σ= . ..


 .. .


−npk p1 −npk p2 ... npk (1 − pk )
(1 − p1 ) −p2 ... pk 0 −p2 ... −pk
−p1 (1 − p2 ) . . . −pk 0 (1 − p2 ) . . . −pk
|Σ| = nk (p1 p2 . . . pk ) .. .. .. = nk (p1 . . . pk ) . .. ..
. . . .. . .
−p1 −p2 ... (1 − pk ) 0 −p2 ... (1 − pk )

ga
Thus it is clear that X is a degenerate random vector
e

9.4 Modified multinomial distribution:


In order to circumstance the problem we consider the point distribution of x1 , x2 , . . . , xk is
given by
n n! xk−1
fx (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pk−1
x1 ! . . . xt1 ! (u − x1 − · · · − xk−1 )!
Pk−1
(1 − p1 − · · · − px−1 )n− i=1 x1
ra
9.4.1 Modified multinomial theorem:
( k−1
!)n k−1
!n−k−1
P
ij
n! j=1
! ai11 ai22 . . . aik−1
X X X
k−1
a1 + a2 + · · · + 1− ai = 1− ai
k−1
i=1 P i=1
i1 !i2 ! − ik−1 ! n − ij !
j=1
Ta

9.4.2 Moment Generating Function

k−1
! n n k−1
!n−k−1
P
xi
n!et1 x1 +···+tk−1 xk−1
P
ti Xi i=1
xk−1
X X X
E e i=1 = ···  k−1
 px1 1 px2 2 . . . px−1 1− pi
P
k−1 ! n −
x1 =0 xk−1 =0 x ! . . . x xi−1 ! i=1
1
i=1

n n k−1
!n−k−1
P
xi
X X k−1 X i=1
= ··· (p1 et1 )x1 (p2 et2 )x2 . . . pk−1 etk−1 × 1− pi
xi =1 xk−1 =0 i=1
k−1
!!n
X
= p1 et1 + pe2 et2 + · · · + pk−1 etk−1 + 1− pi
i=1
k−1
!!n−1
∂M X
= n p1 et1 + . . . + pk etk−1 + 1− pi pi eti = npi
∂ti t=0 i=1 t=0
e e e e

50
)
V ar(Xi ) = npi (1 − pi )
Check similarly
cov(Xi , Xj ) = −npi pj
Also note that |Σ| =
6 0, so the modified multinomial distribution is not degenerate.

9.4.3 Distribution of subset:


 
X1
 X2 
 
X= .  ∼ M N (n, p1 , p2 , . . . , pk )

 .. 
e 
Xk

ga
k
!n
X
t1 tk n ti
MX (t) = (p1 e + · · · + pk e ) = pi e
i=1
e e

 
t1
t2 
 
t= .  take tr+1 = tr+2 = · · · = tk = 0

 .. 
e 
n tk

MX (t1 , t2 , . . . , tr , 0, 0, . . . , 0) =
( r
X
eti pi +
k
X
pi
)n
ra
i=1 i=r+1
e

Note that
k
X n
X k
X k
X r
X
pi = 1 ⇒ pi + pi = 1 ⇒ pi = 1 − pi
i=1 i=1 i=r+1 i=r+1 i=1

Thus ( r )n
X r
X
eti pi + 1 −
Ta

MX (t1 , t2 , . . . , tr , 0, . . . , 0) = pi
i=1 i=1
e

So, subset of multinomial is also multinomial of lower dimension

k
P
Ex. Let (X1 , X2 , . . . , Xk ) ∼ M N (n, p1 , p2 , . . . , pk ) and pi = 1.
i=1
r
X
Obtain distribution of Xi , r < k
i=1
r
P
Solution: Let Z = Xi
i=1
( r k
)n
n
X X
MZ (t) = E(etz ) = E(etX1 +tX2 +tXr ) = et pi + = q + pet

pi
i=1 i=r+1

k
X k
X r
X
where q = pi and p = 1 − q = 1 − pi = pi
i=r+1 i=r+1 i=1

51
r r
!
X X
So, Z = Xi ∼ Bin n, pi
i=1 i=1

9.5 Conditional distribution:


k−1
X
(X1 , X2 , . . . , Xk ) ∼ M N (n, p1 , p2 , . . . , pk−1 ), where pi < 1
i=1

The conditional distn of X1 , X2 , . . . , Xr Xr+1 , . . . , Xk−1 is given by,

P [X1 = x1 , X2 = x2 , . . . , Xr = xr |Xr+1 = xr+1 , . . . , Xk−1 = xk−1 ]


P [X1 = x1 , X2 = x2 , . . . , Xr = xr , Xr+1 = xr+1 . . . , Xk−1 = xk−1 ]

ga
=
P [Xr+1 = xr+1 , . . . , Xk−1 = xk−1 ]
k−1
n!
P
xk−1 n− xi
px1 1 px1 2 . . . pk−1 (1 − p1 − · · · − pk−1 ) i=1
x !x ! . . . xk−1 !(n − x1 − x2 . . . xk−1 )!
= 1 2
 k−1
n− k−1
P
xi
n! xr+1 xr+2 P i=r+1
 k
 pr+1 pr+2 . . . 1 − pi
P i=r+1
n xr+1 !xr+2 ! . . . n − xi !
i=r+1

 k−1
  k−1
n−k−1
P
xi
P x1 x2 i=1
n− xr
P
xi ! (p1 p2 . . . pr ) × 1 − pi
i=r+1 1 i=1
=   ×
x1 !x2 ! . . . xr ! k−1
n− k−1
P
xi
ra
P
n− xi !
 k−1
P i=r+1
i=1 1− pi
i=r+1
k−1
X k−1
X
Let n − xi = n0 and p0 = 1 − pi
i=r+1 i=r+1
r
n0 −
P
Pr xi
0
n! px1 1 . . . pxr r 0
(p − i=1 pi ) i=1
=
Ta

p0n0
 r

0
P
x1 !x2 ! . . . xp ! n − xi !
i=1
r
 r n0 − P xi
i=1
P
x1  xr pi 
n0 !
 
p1 pr 
1 − i=1
=   ... 
0
r
P p0 p0  p0 
x1 !x2 ! . . . xr ! n − xi !
i=1
 
0 p1 p2 pr
∴ (X1 , X2 , . . . , Xr |Xr+1 , . . . , Xk−1 ) ∼ M N n , 0 , 0 , . . . , 0
p p p
Ex. (X1 , X2 , . . . , Xk−1 ) ∼ M N (n, p1 , p2 , . . . , pk−1 )

p1 (p2 + p3 + · · · + pk−1 )
Show that, ρ21.23,...,k−1 =
(1 − p1 )(1 − p2 , −p3 − · · · + pk−1 )

52
 
k−1
 X p1 
Solution: X1 |X2 , X3 , . . . , Xk−1 ∼ Bin n − xi ,  (previous result)
 
 k
P 
i=2 1− pi
i=2

k−1
!
X p1
E(X1 |X2 , . . . , Xk−1 ) = n− xi
i=2
1 − p2 − p3 − . . . pk−1

∴ Regression of X1 on X2 , . . . , Xk−1 is linear.


V ar(E(X1 |X (2) ))
ρ21.23...p =
V ar(X1 )
e

ga
V ar(E(X1 |X (2) )) V ar(X1 ) − V ar(E(X1 |X |(2)))
⇒ 1 − ρ21.23...p = 1 − =
V ar(X1 ) V ar(X1 )
e e
E[V ar(X1 |X (2) )] + V ar(E(X1 |X (2) )) − V ar(E(X1 |X (2) )) E[V ar(X1 |X (2) )]
= =
V ar(X1 ) V ar(X1 )
e e e e

n k−1
!  
X p1 p1
Now, V ar(X1 |X (2) ) = n − xi 1−
e i=2
1 − p 2 − · · · − p k−1 1 − p2 − · · · − pk−1
k−1
!  
X p1 p1
∴ E[V ar(X1 |X (2) )] = n − E(Xi ) 1−
1 − p 2 − · · · − p k−1 1 − p2 − · · · − pk−1
ra
e i=2
k−1
!  
X p1 p1
=n 1− pi   1−
k−1
P 1 − p2 − · · · − pk−1
i=2 1− pi
i=2
 
 
p1  p1 
= np1 1 − = np1 1 −
 
Ta

1 − p2 − · · · − pk−1 k−1

 P 
1− pi
i=2

 
p1
np1 1 − k−1
 n
p1
o
1−
P
1− pi
1−p2 −···−pk−1
∴ 1 − ρ21.23...p = i=2
⇒ ρ2i.23...p = 1 −
np1 (1 − p1 ) (1 − p1 )
k−1
 k−1

P P
(1 − p1 )(1 − pi ) − 1 − pi
{1 − p1 − p2 − · · · − pk−1 } i=2 i=2
⇒ ρ21.23...p = 1 − =
(1 − p1 )(1 − p2 − · · · − pk−1 ) (1 − p1 )(1 − p2 − · · · − pk−1 )
k−1
P k−1
P k−1
P
1− pi − p 1 + p1 pi − 1 + p1 + pi
i=2 i=2 i=2
=
(1 − p1 )(1 − p2 − · · · − pk−1 )
p1 (p2 + p3 + · · · + pk−1 )
= (Proved)
(1 − p1 )(1 − p2 − · · · − pk−1 )

53
ind
Problem: Xi ∼ P oi(λi ), i = 1(1)k.  
k
P
Find the conditional distribution of X1 , X2 , . . . , Xk | Xi = n and
  i=1
k
P
X1 , X2 , . . . , Xr |Xr+1 , . . . , Xk−1 , Xi = n .
i=1

Solution: Xi ∼ P oi(λi ) indepdently


 k

P
(i) P X1 = x1 , X2 = x2 . . . , Xk = xk | Xi = n
P i=1

Pk k
We know that, i=1 Xi ∼ P oi i=1 λi

 k


ga
P
P X1 = x1 , X2 = x2 , . . . , Xk = xk , Xi = n
i=1
= k 
P
P Xi = n
i=1
 k−1

P
P [X1 = x1 ]P [X2 = x2 ] . . . P [Xk−1 = xk−1 ]P Xk = n − Xi
i=1
= k 
P
P Xi = n
n=
x1 !
x
e−λ1 λ1 1
× x2 !
x
e−λ2 λ2 2
× ··· ×
i=1

xk−1
e−λx−1 λk−1
xk−1 !
×
e−λk λ

n−
n−

k
k−1
P
i=1
k−1
P
i=1
!
xi

xi !
ra
k  n
k
P
− λi P
e i=1 λi
i=1
n!
  x1  xk−1  n−k−1
P
xi
k
P i=1
− λi
n!e  λ 
i=1
 1 
λ
 k−1 
  λ 
 k 
= . . .
Ta

k k k
 
 P     k 
λi k−1 k−1
P
−  P  P 
e i=1
Q
x! n−
P
x ! λi λi λi
i i
i=1 i=1 i=1
i=1 i=1
k
  x1  xk−1  n− P xi
i=1

n!  λ 
 1 
λ
 k−1 
  λ 
 k 
=  k−1
  k
 . . .  k
  k 
P P  P  P 
x1 !x2 ! . . . xk−1 ! n − xi ! λi λi λi
i=1 i=1 i=1 i=1
" k
#
X
ii)P X1 = x1 , X2 = x2 , . . . , Xr = xr Xr+1 = xr+1 , . . . , Xk−1 = xk=1 , Xi = n
i=1
 k−1

P
P X1 = x1 , X2 = x2 , . . . Xr = xr , Xr+1 = xr+1 , . . . Xk−1 = xk−1 , Xk = n − Xi
i=1
=  k−1

P
P [X1 = x1 , X2 = x2 ] · P Xr+1 = xr+1 , . . . , Xk−1 = xk−1 , Xi = n
i=1

54
k−1
P
P [X1 = x1 ]P [X2 = x2 ] . . . P [Xr = xr ]P [Xr+1 = xr+1 ] . . . P [Xk−1 = xk−1 ] × P [Xk = n − Xi ]
i=1
= k−1
P
P [Xr+1 = xr+1 ] . . . P [Xk−1 = Xk−1 ]P [X1 + X2 + · · · + Xr + Xk = n − Xi ]
i=r+1
k
P
n− xi
e−λ1 λx1 1 e−λ2 λx2 2 −λr
e λxr r −λk
e λk i=1

...  k−1

x1 ! x2 ! xr ! P
n− xi !
i=1
= k−1
P
n− xi
−(λ1 +λ2 +···+λr +λk )
e (λ1 + λ2 + · · · + λr + λk ) i=r+1
 k−1

P
n− Xi !

ga
i=r+1
 k−1
 k−1
P P
n− xi ! n− xi
r i=1
i=r+1
Y λ
=   λxi i k
k−1
P n− k−1P
xi
x1 !x2 ! . . . xr ! n − xi ! i=1
 k
P i=r+1
i=1 λk + λi
i=1
 x1  xr   xk
(x1 + x2 + · · · + xr + xk )!  λ1 λr λk
=
n x1 !x2 ! . . . xr !xk !

" k−1

k
λk +

k−1
Pr
λi


i=1

 ...

λk +
P r
λi



i=1

×

λk +

k−1
Pr
λi
i=1


#
ra
X X X X
n− xi = xi − xi = x1 + x2 + · · · + xr + xk and n − xi = xk
i=r+1 i=1 i=r+1 i=1

Question: Find the partial correlation coefficient between X1 and X2 neglecting the effects of
X3 , . . . , Xk−1 .

Solution: Note that,


Ta

 
k−1
  X p1 p2 
X1 , X2 X3 , X4 , . . . , Xk−1 ∼ M N n − xi , ,
 
k−1 k−1

 P P 
i=3 1− pi 1 − pi
i=3 i=3
   
k−1 k−1
  X p1    X p2 
X1 X 3 ∼ n − xi ,  and X2 X 3 ∼ n − xi ,
   
k−1 k−1

 P   P 
i=3 1− pi i=3 1− pi
e e
i=3 i=3
 
k−1
!
 X  p 1 p2 
So, cov X1 , X2 X 3 = − n − xi
 

k−1
2 
e i=3
 P 
1− pi
i=3

55
  
k−1
!
  X  pj  pj 
V ar Xj X 3 = n− xi  1 −  ; j = 1, 2
  
k−1 k−1

 P  P 
i=3 1− pi 1− pi
e
i=3 i=3
  
k−1
!
   pj  pj  X
∴ E V ar(Xj X 3 ) =   1 −  n− E(Xi )
  
 k−1
P  k−1
P 
1− pi 1− pi i=1
e
i=3 i=3
  
k−1
!
   pj  pj  X
⇒ E V ar(Xj X 3 ) =   1 − n 1 − pi
  
k−1 k−1

ga
 P  P 
1− pi 1− pi i=1
e
i=3 i=3
 

   pj 
⇒ E V ar(Xj X 3 ) = npj 1 −  ; j = 1, 2
 
 k−1
P 
1− pi
e
i=3
 
k−1
! k−1
!



n e



⇒ E cov X1 , X2 X 3 = − 

∴ E cov X1 , X2 X 3 = −  


1−
p1 p2
k−1
P

np1 p2
pi

2  n −


X

i=3
E(Xi ) = − 

i=3
1−
p1 p 2
k−1
P
pi
2 n 1 −
X

i=3
pi

i=3
ra
k−1

P
1− pi
e
i=3
 2
 
 np1 p2 

 − k−1
 

P
1−
 
 pi 
Ta

 i=3
 p 1 p2
∴ ρ212.34...p =
 uv    = 

k−1
 k−1

P P
 u
 u

 1 − p1 − pi 1− pi
 u  p  p  i=3 i=2
 nup p 1 − 1 2
 1 −
 
 u 1 2 k−1 k−1

 P   P 
1− pi 1− pi
 t
i=3 i=3

10 Ellipsoid Of Concentration
Let X be a random vector with mean vector µ and dispersion matrix Σ. We assume that Σ is
positive
e definite. e
By ellipsoid of concentration of X , we mean the following ellipsoid,
e
(y − a)0 B(y − a) ≤ 1 . . . (i)
e e e e
where a ∈ Rp and B is PD.
a and Be are so chosen that a random vector Y is distributed over th ellipsoid region (i) with
e e
56
the same mean vector and dispersion matrix Σ as X .
e

pdf of Y (analogous to multivriate uniform) is given by,


e (
k if (y − a)0 B(y − a) ≤ 1
fY (y ) = e e e e
e e 0 otherwise
Z Z
We first obtain k as, f (y ) dy = 1 ⇒ k dy = 1
0
e e e
(y −a) B(y −a)≤1

Now, B is PD so there exists a non-singular matrix C such that B = C 0 C.


e e e e

We also consider the transformation y → z such that, z = C(y − a) ⇒ y = C −1 z + a

ga
∂y e 1
e e e e e e e
So, the jacobian of the transformation is given by, |J| = e =
∂z |C|
p
Now, |B| = |C| |C 0 | ⇒ |C| = |B|
e
Z Z
k
k dy = 1 ⇒ dz = 1
|C| e
(C(y −a)0 )(C(y −a))≤1 0
e z z ≤1
e e e e p ee p 
n ⇒ p
k π
p
|B| Γ 2 + 1

∴fY (y ) =
p
p/2
 =1 ⇒k=

p
 |B|Γ 2 + 1
π p/2

|B|Γ 2 + 1
π p/2

if (y − a)0 B(y − a) ≤ 1
ra
e e e e
0 otherwise
e e 

Now, we consider the transformation Y → Z such that Z = C(Y − a); where C 0 C = B


e 1 e e e e
Jacobian of the transformation, |J| = p
|B|
 p 
Γ 2 + 1
if z 0 z ≤ 1
Ta

∴fZ (z ) = π p/2 ee
0 otherwise
e e 
s
p
zi2
P
− 1−
i=2
p
1Z 1

Γ +1
Z Z
2
∴E(Z1 ) = ... z1 × dz1 dz2 . . . dzp = 0 [Odd Function]
0 0 s
π p/2
p
zi2
P
− 1−
i=2

1
So, E(Zi ) = 0; V ar(Zi ) = ∀ i = 1, 2, . . . , p
p+2
cov(Zi , Zj ) = 0 [Odd function]
∴ E(Z ) = 0 ⇒ E(C(Y − a)) = 0 ⇒ E(Y ) = a
Ip Ip Ip
e e e e e e e
Also, Disp(Z ) = ⇒ Disp(C(Y − a)) = ⇒ C Disp(Y )C 0 =
e p+2 e e p+2 e p+2
−1 0 −1 0 −1 −1
C Ip (C ) (C C) B
⇒Disp(Y ) = = =
e p+2 p+2 p+2

57
Now, a and B are such that,
e
E(X ) = E(Y ) ⇒ µ = a
B −1 Σ−1
e e e e
and Disp(X ) = Disp(Y ) ⇒ Σ = ⇒B=
e e p+2 (p + 2)
Σ
So, (y −µ)0 (y − µ) ≤ 1 ⇒ (y − µ)0 Σ−1 (y − µ) ≤ (p + 2)
e e p + 2 e e e e e e
Thus, the ellipsoid of concentration of X is given by,
e
n o
y : (y − µ)0 Σ−1 (y − µ) ≤ (p + 2)
e e e e e

n ga
ra
Ta

58

You might also like