0% found this document useful (0 votes)
20 views27 pages

Multiple Random Variables and Random Vectors

Uploaded by

jehadalam123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views27 pages

Multiple Random Variables and Random Vectors

Uploaded by

jehadalam123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Multiple Random Variables and Random Vectors

Table of Contents

Multivariate Join Probability Mass Function.............................................................................4

Marginal PMF...................................................................................................................................4

Degree of Freedom.......................................................................................................................6

Multivariate Joint Cumulative Distributive Functions............................................................8

Marginal CDF....................................................................................................................................8

Well-Known Multivariate Discrete Random Variables...........................................................9

Multivariate Hypergeometric Distributions..........................................................................9

Multinomial Distributions...........................................................................................................10

Multivariate Continuous Random Variables............................................................................12

Multivariate Joint Cumulating Distributive Functions.....................................................12

Multivariate Joint Probability Density Functions...............................................................12


Random Vectors...............................................................................................................................15

Distribution Functions................................................................................................................16

Joint Distribution Functions......................................................................................................17

Expected Value Vectors............................................................................................................18

Expected Value Matrices..........................................................................................................18

Covariance.....................................................................................................................................18

Correlation......................................................................................................................................21

Covariance and Correlation of Two Random Vectors....................................................24


Consider a scenario where a single random experiment has multiple random

variables associated with it. We will only be considering the cases where all the

random variables are from discrete sets or all of them are from continuous sets. All of

the random variables must be from the same random experiment.

Say we have a set of discrete random variables, X 1, X 2 , …, X n, each of which are

associated with a set of values, S X , S X , …, S X respectively. For a single instance of


1 2 n

the experiment, the random variables have the values x 1, x 2, …, x n respectively.


Multivariate Join Probability Mass Function

In this scenario, the probability model we will be using is the multivariate joint PMF,

MJPMF.

P X , … , X ( x 1 ,… x n )=P [ X 1=x 1 , … , X n=x n ]


1 n

Essentially, this gives us the probability of getting a single point in an n -dimensional

space.

Of course, to be a PMF at all, a few conditions have to be met:

 P X , … , X ( x 1 ,… , x n ) ≥ 0
1 n

 ∑ … ∑ P X , … , X ( x 1 ,… , x n )=1
1 n
x1 ∈S X 1
x n ∈S X n

 1 ≤i ≤ n, x i ∉ S X i
P X , … , X ( x 1 ,… x n )=0
1 n

Marginal PMF

For MJPMF’s of n random variables, we can define marginal PMFs for k random

variables, where 1 ≤ k ≤ n−1. If k > 1, the marginal PMF will be a marginal joint PMF.

Say n=4 . Thus, the MJPMF is P X , … , X ( x 1 , … x 4 ). 1 4

P X ( x 1 )=P [ X 1= x1 ]=
1
∑ ∑ ∑ P X , …, X ( x 1 ,… , x 4 )
1 4
x 2 ∈SX x 3∈ SX x 4 ∈S X
2 3 4
PX 1 X2 ( x 1 , x 2 ) =P [ X 1=x 1 , X 2 =x2 ]= ∑ ∑ PX 1 ,… , X 4 ( x1 , … , x4 )
x 3 ∈S X x 4 ∈S X
3 4

Of course, all other combinations are possible, such as P X 1 X3 ( x 1 , x3 ). We just need to


take the sum of the PMFs for the other remaining random variables.

Example

Say an access point (AP) forwards data from many different devices to a particular

router. Say the AP forwards 23 data packets, of which

 5 are lost due to channel errors

 8 are dropped

 10 are delivered successfully

From the 23 packets, we observe 7 random packets.

We can have three random variables here,

X =number of packets lost

Y =number of packets dropped

Z=number of packets delivered

From here, we want to fine the MJPMF and the marginal PMF or marginal joint PMF.

There are two conditions here that are implied that we must abide by:

 x + y + z=7
 x ≥ 0 , y ≥ 0, z ≥ 0

P XYZ ( x , y , z )=P [ X =x , Y = y , Z=z ]

This is just going to be the number of ways we can pick x lost packets, y dropped

packets and z delivered packets divided by the number of ways 7 packets can be

randomly observed from 23 packets.

{
5 8
Cx × Cy × Cz
10 x+ y + z =7
x, y ,z ≥0
P XYZ ( x , y , z )= 23
C7 x ≤5 , y ≤7 , z ≤ 7
0 otherwise

Notice the last condition. Essentially, since we are only picking 7 packets, the values

of y and z are limited to 7. Additionally, since the maximum value of x is 5, x is limited

to 5.

Degree of Freedom

Say we want to fine the marginal PMF for X .

P X ( x )=[ X =x ]

Consider that x=5 . Then, y cannot be more than 2, since the total number of packets

is limited to 7 . If y=2, then z has to be 0. It is said that the degree of freedom is


n−1=2 in this case. This means that, even though we have three random variables,

we cannot independently choose values for all three.

Since we have defined a particular value for X , the next question is to ask for the

value of Y . One we have a value for Y as well, we can no longer choose the value of Z .
Even if we had chosen X =5 and Y =1 , the value of Z would have to be 1. It is fixed.

Thus, we could only independently choose values for two of the random variables.

Because the value of Z is fixed once we have picked a value for X and one for Y , we

just need to loop over all the values of Y .

7 −x 5 8 10 5 7− x
C x × C y × C (7−x− y ) C
∴ P X ( x )=∑ 23
= 23 x × ∑ 8C y × 10C (7 −x− y )
y=0 C7 C7 y=0

5
Cx
23 has been brought outside since it is not related to y .
C7

7 −x
Thus, the second part of the equation, ∑ 8C y × 10C( 7−x− y ) , essentially tells us the
y=0

number of ways to pick 7−x dropped and/or delivered packets, we do not care which.
18
However, that is just C( 7−x ). Thus,

5 18
C x × C (7− x ) 0≤ x≤5
P X ( x )= 23
y , z≥0
C7 x + y + z=7

Similarly,

8 15
C y × C( 7− y ) 0 ≤ y ≤7
PY ( y )= 23
x , z ≥0
C7 x + y + z=7

10 13
C z × C (7− z ) 0 ≤ z ≤7
PZ ( z )= 23
x, y ≥0
C7 x + y + z=7
For a marginal joint PMF of two values, we again cannot choose any values for the

third random variable. Thus, we do not even need to sum anything.

5 8 10
C x × C y × C 7− x− y
P XY ( x , y ) = 23
C7

Multivariate Joint Cumulative Distributive Functions

The MJCDF of n random variables is given by

FX 1 ,… , X n ( x1 , … , xn ) =P [ X 1 ≤ x 1 ,… , X n ≤ x n ]

Marginal CDF

F X ( x1 ) =P [ X 1 ≤ x 1 ]=F X
1 1 ,… Xn ( x 1 , ∞ , … , ∞ )=P [ X 1 ≤ x 1 , X 2 ≤ ∞ , … , X n ≤ ∞ ]
Well-Known Multivariate Discrete Random Variables

Multivariate Hypergeometric Distributions

With multivariate hypergeometric distributions, we deal with n items of k types are

mixed together. ni is the number of items of the i-th type, where i is between 1 and k .

Thus,

n1 +…+ nk =n

From here, we will pick r items randomly and without replacement.

X i =number of items picked of the i−th type

Finally, we need to find P X , … , X ( x 1 ,… , x k ). This is given by


1 k

{
n1 nk
C x × …× C x
1 k
0 ≤ x i ≤ ni
P X , … , X ( x 1 ,… , x k )=
1 k
n
Cr ∑ ni=r
i
0 otherwise

Looking at this, it should be obvious that this is what we were working with in the

previous lecture.
Multinomial Distributions

Multinomial distributions are the same as multivariate hypergeometric distributions,

except that we replace items picked. Thus, the probability of picking an item of the i -
k
ni
th type, pi, remains constant at and ∑ pi =1.
n i=1

{
r 0 ≤ x i ≤r
C( x ,… , x ) p x1 … p xk
1 k

P X , … , X ( x 1 ,… , x k )=
1 k
1 k
∑ x i=r
i
0 otherwise

This can directly be related to binomial random variables.

r
Here, the term C ( x ,… , x ) is called the multinomial coefficient, similar to the binomial
1 k

coefficient.

r r!
C ( x ,… , x )=
1 k
x1 ! … xk !

Say k =2 and we are picking r items. Thus,

r! x x r! x r− x
PX X2 ( x 1 , x 2 ) = x ! x ! p1 p2 = 1 2
p1 ( 1−p 1 ) 1 1

1
1 2 x 1 ! ( r−x 1 ) !

Of course, this is the binomial distribution exactly.

Remember that a Bernoulli experiment was one in which there were only two

possible outcomes. A binomial distribution is a type of Bernoulli experiment. If instead


we have multiple possible outcomes, one of the possible distributions is the

multinomial distribution. Such experiments are sometimes called multinoulli or

categorical distributions, since we have multiple categories.

We can compare Bernoulli experiments to tossing a coin and counting how many

heads and how many tails we get. Multinoulli experiments would then be like rolling a

die and counting how many outcomes of each category we get.


Multivariate Continuous Random Variables

With multivariate continuous random variables, we have n continuous random

variables defined from the same sample space, from X 1 to X n.

Multivariate Joint Cumulating Distributive Functions

MJCDFs are defined as

FX 1 ,… , X n ( x1 , … , xn ) =P [ X 1 ≤ x 1 ,… , X n ≤ x n ]

Multivariate Joint Probability Density Functions

MJPDFs can be defined with the help of MJCDFs. The n-th order partial derivative of

the MJCDF for a multivariate continuous random variable is its MJPDF.

n
δ FX ,… , X n ( x1 , … , xn )
f X , …, X ( x 1 , … , x n )= 1

1 n
δ x1 … δ xn

MJPDFs have a few conditions they need to abide by:

1. f X , …, X ( x 1 , … , x n ) ≥ 0
1 n

+∞ +∞

2. ∫ … ∫ f X … X ( x 1 , … , x n ) d x 1 … d x n=1

−∞ −∞
1 n

n
xn x1

3.
FX …Xn ( x1 , … , x n) = ∫ … ∫ f X … X ( x 1 , … , x n ) d x 1 … d x n

1 1 n
−∞ −∞
n

Marginal PDFs

The marginal PDF of a single random variable or the marginal joint PDF of two random

variables or the marginal multivariate joint PDF of k random variables, where 2<k < n,

can be found.

+∞ +∞
f X ( x 1) = ∫ … ∫ f X … X ( x 1 , … , x n ) d x 2 … d x n
1


−∞ −∞
1 n

n−1

+∞ +∞
fX X2 ( x 1 , x 2 )= ∫ … ∫ f X … Xn ( x 1 , … , x n ) d x3 … d x n
1


−∞ −∞
1

n−2

Example

Say we have three random variables, X 1, X 2 and X 3 , for which the MJCDF is given as

{
( −( x ) )( 1−e−(2 x ) )( 1−e−( 3 x ) ) x 1 , x 2 , x 3 ≥ 0
( x 1 , x 2 , x 3 ) = 1−e
1 2 3

FX , X2, X3
0 otherwise
1

From this, we need to find the MJPDF and the marginal PDF.

δ F X , X , X ( x 1 , x 2 , x 3 ) δ 3 ( 1−e−( x ) )( 1−e−( 2 x ) )( 1−e−(3 x ) )


3 1 2 3

f X ,X
1 2
, X3 ( x1 , x2 , x3 )= δ x 1 δ x 2 δ x3
1 2
= 3

δ x1 δ x2 δ x3
− ( x 1) − (2 x 2 ) − ( 3 x 3)
¿e ⋅2 e ⋅3 e

− ( x 1+2 x2 +3 x 3)
¿6e

Now say we want to find the marginal PDF for X 1 and X 3 .

+∞
fX 1 X3 ( x 1 , x 3 )= ∫ f X , X 1 2 , X3 ( x 1 , x 2 , x3 ) d x 2
−∞

+∞
¿∫ 6 e
− ( x1 ) − ( 2 x 2) − (3 x3 )
e e d x2
0

+∞
¿6e
− ( x 1) − (3 x3 )
e ∫ e−(2 x ) d x 2 2

[ ]
+∞
− ( x 1) − (3 x3 ) 1 − (2 x )
¿6e e e 2

−2 0

− ( x 1+3 x 3 )
¿3e

If we wanted to find the marginal PDF of X 1 , we could do this from the original PDF, or

from the marginal PDF of X 1 and X 3 .

+∞
f X ( x 1) = ∫ f X
1 1 X3 ( x1 , x3 ) d x3

+∞
¿∫ 3e
− ( x1 ) −( 3 x 3 )
e d x3
0

− ( x 1)
¿e
We can even find the marginal CDFs by integrating the marginal PDFs.
Random Vectors

Say we have a set of random variables which are

 either all discrete or all continuous

 defined on a single sample space

 jointly distributed

In this scenario, the random variable can be represented by a vector. This vector is

called a random vector.

Random vectors are denoted using bold, capital letters. If bold letters are

inconvenient, for example when writing on physical paper, double lines may be used.

X Random Variable
x Value of a random variable

X or X Random Vector

[]
X1
X T
X = 2 =[ X 1 X2 ⋯ Xn]

Xn

x or x Value of random vector

T
x=[ x 1 x 2 ⋯ xn]

Random vectors are column vectors by default, unless otherwise mentioned.

However, since writing the vectors as columns would take a huge amount of space,

they are also frequently written in their transposed form.


Distribution Functions

F X ( x )=P [ X ≤ x ] =P [ X 1 ≤ x 1 , … , X n ≤ x n ]=F X 1 ,… Xn ( x1 , … , xn)

Just by looking at the CDF of a random vector, we can tell that we can represent a

multivariate random variable in a much more concise manner, which is exactly why it

is used. Additionally, we can use matrix algebra with random vectors.

Similarly,

P X ( x )=P [ X=x ]=P [ X 1=x 1 , … , X n=x n ] =P X 1 , …, X n ( x1 , … , xn )

f X ( x )=f X 1 , …, X n ( x1 , … , x n)

Example

{ 6 e−(a x)
T

x≥0
f X ( x )=
0 otherwise

a = [1 2 3 ]
T

Since there are three different values in a , there must also be three different random

variables involved here. Thus,

[]
x1
a x=[ 1 2 3 ] x 2 = x1 +2 x 2+3 x 3
T

x3

Thus,
− ( x 1+2 x2 +3 x 3)
f X ( x )=6 e

If we want to find the CDF,

x3 x2 x1

F X ( x )=∫ ∫ ∫ 6 e−( x +2 x +3 x ) d x 1 d x 2 d x 3
1 2 3

0 0 0

6
¿ ( 1−e−( x ) )( 1−e−( 2 x ) )( 1−e−(3 x ) )
1 2 3

2× 3

¿ ( 1−e )( 1−e−( 2 x ) )( 1−e−(3 x ) )


− ( x1 ) 2 3

Joint Distribution Functions

If we have a pair of random vectors, X =[ X 1 , … , X n ] and Y = [ Y 1 , … , Y m ] , and we know


T

that they are either both distribute or continuous, the distribution functions are

represented as:

P XY ( x , y ) =P X , …, X 1 n ,Y 1 , …,Y m ( x 1 ,… , x n , y 1 , … , y m )

F XY ( x , y )=F X , …, X 1 n ,Y 1 , …,Y m ( x 1 ,… , x n , y1 , … , y m )

f XY ( x , y )=f X , … , X , Y , …, Y ( x 1 , … , x n , y 1 , … , y m )
1 n 1 m

Another way of looking at this is as though W is a random vector such that


T
W =[ X 1 , … , X n ,Y 1 , … , Y m ] , and we are just finding the distribution functions for W .
Expected Value Vectors

If X is a random vector such that X =[ X 1 , … , X n ], then the individual expectations of

each of the vectors in X can be expressed as a vector. This vector is called the

expected value vector.

T
E [ X ] =[ E [ X 1 ] , E [ X 2 ] , … , E [ X n ] ]

The expected value vector can also be expressed as μ X .

Expected Value Matrices

Similar to random vectors, we can also have random matrices, which are matrices of

random variables. We will not discuss those in depth here. For random matrices, we

can have corresponding expected value matrices.

[ ] [ ]
A 11 ⋯ A1 n E [ A11 ] ⋯ E [ A1 n ]
A= ⋮ ⋮ ⋮ E [ A ]= ⋮ ⋮ ⋮
An 1 ⋯ A nn E [ A n 1 ] ⋯ E [ A nn ]

Covariance

We have previously seen covariances when discussing joint random variables. For

the joint random variables X and Y , the covariance was defined as

Cov [ X , Y ] =E [ ( X−μ X ) ( Y −μY ) ]

This was also alternatively written as


Cov [ X , Y ] =E [ XY ] −μ X μ Y

In this second form, the value E [ XY ] was given a special name, the correlation, R XY .

This special name was given since the correlation will crop up in many different

places. Thus,

Cov [ X , Y ] =R XY −μ X μY

From this, it would make sense that the covariance of a random vector is just the

covariance of the random variables that make up the vector. However, by definition,

covariance refers to just two random variables. As such, the covariance of a random

vector is the covariance of all possible pairs of random variables in the vector.

If a vector has n random variables, each of those random variables can make pairs

with n random variables, including with themselves. Thus, there are n × n pairs.

[ ]
Cov [ X 1 , X 1 ] Cov [ X 1 , X 2 ] ⋯ Cov [ X 1 , X n ]
Cov [ X 2 , X 1 ] Cov [ X 2 , X 2 ] ⋯ Cov [ X 2 , X n ]
Cov [ X ] =C X =Σ=
⋮ ⋮ ⋮ ⋮
Cov [ X n , X 1 ] Cov [ X n , X 2 ] ⋯ Cov [ X n , X n ]

There are two points to notice about the covariance of a random vector. Firstly,

notice the covariance of pairs with the same random variables.

1 1 [
Cov [ X 1 , X 1 ] =E [ ( X 1−μ X )( X 1−μ X ) ]=E ( X 1−μ X 1
) ] =Var [ X 1 ]
2
Thus, the covariance of pairs with the same random variables is just the variance of

that random variable. This means every element of the matrix along the diagonal is

going to be a variance.

Secondly, notice that the matrix for the covariance of the random vector is

symmetric, since Cov [ X 1 , X 2 ] =Cov [ X 2 , X 1 ].

From these two points, we can make another observation. A symmetric matrix is the

product of a column vector and its transpose. Thus, if we consider the following

column vector and its transpose,

[ ]
X 1−μ X 1

X 2−μ X

2
[ X 1−μ X 1
X 2−μ X 2
⋯ X n−μ X n
]
X n−μ X n

T
This product can also be written as ( X −μ X )( X−μ X ) . The result is an n × n matrix. The

expectation of this matrix is Cov [ X ].

Thus,

Cov [ X ] =E [ ( X −μ X ) ( X−μ X )T ]

Similar to normal covariances, we also have alternative formulae for this.

Cov [ X ] =E [ X X T ]−μ X μTX =R X −μ X μTX

Here, R X is the correlation of X . It is also a matrix.


Correlation

We know that R X =E [ X X T ].

[] [ ]
X1 X1 X1 X1 X2 ⋯ X 1 X n
X X X X2 X2 ⋯ X 2 X n
X X = 2 [ X1 X n ]= 2 1
T
X2 ⋯
⋮ ⋮ ⋮ ⋮ ⋮
Xn Xn X1 Xn X2 ⋯ X n X n

Notice that all the values along the diagonal are just the squares of the random

variables.

[ ]
E [ X 21 ] E [ X1 X2] ⋯ E [ X1 Xn]

R X =E [ X X T ]= E [ X 2 X 1 ] E[ X ] ⋯ E [ X2 Xn]
2
2
⋮ ⋮ ⋮ ⋮
E [ Xn X1] E [ Xn X2] ⋯ E [ X 2n ]

Thus, the correlation matrix is also a symmetric matrix. The values along the diagonal

are the expected values of the squares of the component random variables, and all

other values are the expected values of all possible pairs. Note that each of these is

a correlation as well, i.e. E [ X 1 X 2 ]=R X X . 1 2

Example

T
Let X =[ X 1 X 2 ] and f X ( x )=2 for 0 ≤ x1 ≤ x 2 ≤1 . We need to find E [ X ] , R X and C X .

To be able to find the expectation, we first need to find the PDFs of the individual

random variables.
+∞
f X ( x 1) = ∫ f X
1 1 X2 ( x1 , x 2) d x 2
−∞

1
¿∫ 2 d x2
x1

¿ 2 ( 1−x 1 ) 0 ≤ x1 ≤1

+∞
f X ( x 2 )=∫ f X
2 1 X2 ( x 1 , x 2) d x 1
−∞

x2

¿∫ 2 d x1
0

¿ 2 x2 0 ≤ x2 ≤ 1

1
E [ X 1 ]=∫ x 1 ⋅2 ( 1−x 1 ) d x 1
0

[ ][ ]
2 1 3 1
x1 x1
¿ 2⋅ − 2⋅
2 0 3 0

1
¿
3

1
E [ X 2 ]=∫ x 2 ⋅2 x 2 d x 2
0

[ ]
3 1
x2
¿ 2⋅
3 0
2
¿
3

[ ]
T
1 2
E [ X ]=
3 3

RX=
[ E [ X 21 ]
E [ X2 X 1]
E [ X1 X2]
E [ X2]
2
]
1
1
E [ X 1 ]=∫ x 1 ⋅2 ( 1−x 1 ) d x 1=
2 2

0 6

1
1
E [ X 2 ]=∫ x 2 ⋅2 x 2 d x 2=
2 2

0 2

+∞ +∞
E [ X 1 X 2 ]=E [ X 2 X 1 ] =∫ ∫ x 1 x 2 f X 1 X2 ( x1 , x2 ) d x1 d x2
∞ −∞

1 x2
¿ ∫ ∫ x 1 x 2 ⋅2 d x 1 d x 2
0 0

[ ]
x2
1
x 21
¿ ∫ 2 x2 ⋅ d x2
0 2 0

[ ]
1
x2
¿
4 0

1
¿
4
[ ]
1 1
6 4
RX=
1 1
4 2

[ ][] [ ]
1 1 1 1 1
T
C X =R X −μ X μ X =
6
1
4
1

3
2 [ ] 1
3
2
3
=
18
1
36
1
4 2 3 36 18

Covariance and Correlation of Two Random Vectors

R XY =E [ X Y T ]

Cov [ X , Y ] =R XY −μ X μY
T

¿ E [ ( X−μ X )( Y −μY )T ]

Here, we can also define a derived random vector from the random vector X as

Y = AX +b

where A is a matrix, X is a vector and b is a vector.

Further,

μY = A μ X + b
T T T T
RY = A R X A + ( A μ X ) b + b ( A μ X ) +b b
T
C Y =A C X A
For example, we found values for μ X , R X and C X in the previous example. Thus,

provided a specific A and b , we can easily calculate all of these values.

You might also like