Multiple Random Variables and Random Vectors
Multiple Random Variables and Random Vectors
Table of Contents
Marginal PMF...................................................................................................................................4
Degree of Freedom.......................................................................................................................6
Marginal CDF....................................................................................................................................8
Multinomial Distributions...........................................................................................................10
Distribution Functions................................................................................................................16
Covariance.....................................................................................................................................18
Correlation......................................................................................................................................21
variables associated with it. We will only be considering the cases where all the
random variables are from discrete sets or all of them are from continuous sets. All of
In this scenario, the probability model we will be using is the multivariate joint PMF,
MJPMF.
space.
P X , … , X ( x 1 ,… , x n ) ≥ 0
1 n
∑ … ∑ P X , … , X ( x 1 ,… , x n )=1
1 n
x1 ∈S X 1
x n ∈S X n
1 ≤i ≤ n, x i ∉ S X i
P X , … , X ( x 1 ,… x n )=0
1 n
Marginal PMF
For MJPMF’s of n random variables, we can define marginal PMFs for k random
variables, where 1 ≤ k ≤ n−1. If k > 1, the marginal PMF will be a marginal joint PMF.
P X ( x 1 )=P [ X 1= x1 ]=
1
∑ ∑ ∑ P X , …, X ( x 1 ,… , x 4 )
1 4
x 2 ∈SX x 3∈ SX x 4 ∈S X
2 3 4
PX 1 X2 ( x 1 , x 2 ) =P [ X 1=x 1 , X 2 =x2 ]= ∑ ∑ PX 1 ,… , X 4 ( x1 , … , x4 )
x 3 ∈S X x 4 ∈S X
3 4
Example
Say an access point (AP) forwards data from many different devices to a particular
8 are dropped
From here, we want to fine the MJPMF and the marginal PMF or marginal joint PMF.
There are two conditions here that are implied that we must abide by:
x + y + z=7
x ≥ 0 , y ≥ 0, z ≥ 0
This is just going to be the number of ways we can pick x lost packets, y dropped
packets and z delivered packets divided by the number of ways 7 packets can be
{
5 8
Cx × Cy × Cz
10 x+ y + z =7
x, y ,z ≥0
P XYZ ( x , y , z )= 23
C7 x ≤5 , y ≤7 , z ≤ 7
0 otherwise
Notice the last condition. Essentially, since we are only picking 7 packets, the values
to 5.
Degree of Freedom
P X ( x )=[ X =x ]
Consider that x=5 . Then, y cannot be more than 2, since the total number of packets
Since we have defined a particular value for X , the next question is to ask for the
value of Y . One we have a value for Y as well, we can no longer choose the value of Z .
Even if we had chosen X =5 and Y =1 , the value of Z would have to be 1. It is fixed.
Thus, we could only independently choose values for two of the random variables.
Because the value of Z is fixed once we have picked a value for X and one for Y , we
7 −x 5 8 10 5 7− x
C x × C y × C (7−x− y ) C
∴ P X ( x )=∑ 23
= 23 x × ∑ 8C y × 10C (7 −x− y )
y=0 C7 C7 y=0
5
Cx
23 has been brought outside since it is not related to y .
C7
7 −x
Thus, the second part of the equation, ∑ 8C y × 10C( 7−x− y ) , essentially tells us the
y=0
number of ways to pick 7−x dropped and/or delivered packets, we do not care which.
18
However, that is just C( 7−x ). Thus,
5 18
C x × C (7− x ) 0≤ x≤5
P X ( x )= 23
y , z≥0
C7 x + y + z=7
Similarly,
8 15
C y × C( 7− y ) 0 ≤ y ≤7
PY ( y )= 23
x , z ≥0
C7 x + y + z=7
10 13
C z × C (7− z ) 0 ≤ z ≤7
PZ ( z )= 23
x, y ≥0
C7 x + y + z=7
For a marginal joint PMF of two values, we again cannot choose any values for the
5 8 10
C x × C y × C 7− x− y
P XY ( x , y ) = 23
C7
FX 1 ,… , X n ( x1 , … , xn ) =P [ X 1 ≤ x 1 ,… , X n ≤ x n ]
Marginal CDF
F X ( x1 ) =P [ X 1 ≤ x 1 ]=F X
1 1 ,… Xn ( x 1 , ∞ , … , ∞ )=P [ X 1 ≤ x 1 , X 2 ≤ ∞ , … , X n ≤ ∞ ]
Well-Known Multivariate Discrete Random Variables
mixed together. ni is the number of items of the i-th type, where i is between 1 and k .
Thus,
n1 +…+ nk =n
{
n1 nk
C x × …× C x
1 k
0 ≤ x i ≤ ni
P X , … , X ( x 1 ,… , x k )=
1 k
n
Cr ∑ ni=r
i
0 otherwise
Looking at this, it should be obvious that this is what we were working with in the
previous lecture.
Multinomial Distributions
except that we replace items picked. Thus, the probability of picking an item of the i -
k
ni
th type, pi, remains constant at and ∑ pi =1.
n i=1
{
r 0 ≤ x i ≤r
C( x ,… , x ) p x1 … p xk
1 k
P X , … , X ( x 1 ,… , x k )=
1 k
1 k
∑ x i=r
i
0 otherwise
r
Here, the term C ( x ,… , x ) is called the multinomial coefficient, similar to the binomial
1 k
coefficient.
r r!
C ( x ,… , x )=
1 k
x1 ! … xk !
r! x x r! x r− x
PX X2 ( x 1 , x 2 ) = x ! x ! p1 p2 = 1 2
p1 ( 1−p 1 ) 1 1
1
1 2 x 1 ! ( r−x 1 ) !
Remember that a Bernoulli experiment was one in which there were only two
We can compare Bernoulli experiments to tossing a coin and counting how many
heads and how many tails we get. Multinoulli experiments would then be like rolling a
FX 1 ,… , X n ( x1 , … , xn ) =P [ X 1 ≤ x 1 ,… , X n ≤ x n ]
MJPDFs can be defined with the help of MJCDFs. The n-th order partial derivative of
n
δ FX ,… , X n ( x1 , … , xn )
f X , …, X ( x 1 , … , x n )= 1
1 n
δ x1 … δ xn
1. f X , …, X ( x 1 , … , x n ) ≥ 0
1 n
+∞ +∞
2. ∫ … ∫ f X … X ( x 1 , … , x n ) d x 1 … d x n=1
⏟
−∞ −∞
1 n
n
xn x1
3.
FX …Xn ( x1 , … , x n) = ∫ … ∫ f X … X ( x 1 , … , x n ) d x 1 … d x n
⏟
1 1 n
−∞ −∞
n
Marginal PDFs
The marginal PDF of a single random variable or the marginal joint PDF of two random
variables or the marginal multivariate joint PDF of k random variables, where 2<k < n,
can be found.
+∞ +∞
f X ( x 1) = ∫ … ∫ f X … X ( x 1 , … , x n ) d x 2 … d x n
1
⏟
−∞ −∞
1 n
n−1
+∞ +∞
fX X2 ( x 1 , x 2 )= ∫ … ∫ f X … Xn ( x 1 , … , x n ) d x3 … d x n
1
⏟
−∞ −∞
1
n−2
Example
Say we have three random variables, X 1, X 2 and X 3 , for which the MJCDF is given as
{
( −( x ) )( 1−e−(2 x ) )( 1−e−( 3 x ) ) x 1 , x 2 , x 3 ≥ 0
( x 1 , x 2 , x 3 ) = 1−e
1 2 3
FX , X2, X3
0 otherwise
1
From this, we need to find the MJPDF and the marginal PDF.
f X ,X
1 2
, X3 ( x1 , x2 , x3 )= δ x 1 δ x 2 δ x3
1 2
= 3
δ x1 δ x2 δ x3
− ( x 1) − (2 x 2 ) − ( 3 x 3)
¿e ⋅2 e ⋅3 e
− ( x 1+2 x2 +3 x 3)
¿6e
+∞
fX 1 X3 ( x 1 , x 3 )= ∫ f X , X 1 2 , X3 ( x 1 , x 2 , x3 ) d x 2
−∞
+∞
¿∫ 6 e
− ( x1 ) − ( 2 x 2) − (3 x3 )
e e d x2
0
+∞
¿6e
− ( x 1) − (3 x3 )
e ∫ e−(2 x ) d x 2 2
[ ]
+∞
− ( x 1) − (3 x3 ) 1 − (2 x )
¿6e e e 2
−2 0
− ( x 1+3 x 3 )
¿3e
If we wanted to find the marginal PDF of X 1 , we could do this from the original PDF, or
+∞
f X ( x 1) = ∫ f X
1 1 X3 ( x1 , x3 ) d x3
∞
+∞
¿∫ 3e
− ( x1 ) −( 3 x 3 )
e d x3
0
− ( x 1)
¿e
We can even find the marginal CDFs by integrating the marginal PDFs.
Random Vectors
jointly distributed
In this scenario, the random variable can be represented by a vector. This vector is
Random vectors are denoted using bold, capital letters. If bold letters are
inconvenient, for example when writing on physical paper, double lines may be used.
X Random Variable
x Value of a random variable
X or X Random Vector
[]
X1
X T
X = 2 =[ X 1 X2 ⋯ Xn]
⋮
Xn
T
x=[ x 1 x 2 ⋯ xn]
However, since writing the vectors as columns would take a huge amount of space,
Just by looking at the CDF of a random vector, we can tell that we can represent a
multivariate random variable in a much more concise manner, which is exactly why it
Similarly,
f X ( x )=f X 1 , …, X n ( x1 , … , x n)
Example
{ 6 e−(a x)
T
x≥0
f X ( x )=
0 otherwise
a = [1 2 3 ]
T
Since there are three different values in a , there must also be three different random
[]
x1
a x=[ 1 2 3 ] x 2 = x1 +2 x 2+3 x 3
T
x3
Thus,
− ( x 1+2 x2 +3 x 3)
f X ( x )=6 e
x3 x2 x1
F X ( x )=∫ ∫ ∫ 6 e−( x +2 x +3 x ) d x 1 d x 2 d x 3
1 2 3
0 0 0
6
¿ ( 1−e−( x ) )( 1−e−( 2 x ) )( 1−e−(3 x ) )
1 2 3
2× 3
that they are either both distribute or continuous, the distribution functions are
represented as:
P XY ( x , y ) =P X , …, X 1 n ,Y 1 , …,Y m ( x 1 ,… , x n , y 1 , … , y m )
F XY ( x , y )=F X , …, X 1 n ,Y 1 , …,Y m ( x 1 ,… , x n , y1 , … , y m )
f XY ( x , y )=f X , … , X , Y , …, Y ( x 1 , … , x n , y 1 , … , y m )
1 n 1 m
each of the vectors in X can be expressed as a vector. This vector is called the
T
E [ X ] =[ E [ X 1 ] , E [ X 2 ] , … , E [ X n ] ]
Similar to random vectors, we can also have random matrices, which are matrices of
random variables. We will not discuss those in depth here. For random matrices, we
[ ] [ ]
A 11 ⋯ A1 n E [ A11 ] ⋯ E [ A1 n ]
A= ⋮ ⋮ ⋮ E [ A ]= ⋮ ⋮ ⋮
An 1 ⋯ A nn E [ A n 1 ] ⋯ E [ A nn ]
Covariance
We have previously seen covariances when discussing joint random variables. For
In this second form, the value E [ XY ] was given a special name, the correlation, R XY .
This special name was given since the correlation will crop up in many different
places. Thus,
Cov [ X , Y ] =R XY −μ X μY
From this, it would make sense that the covariance of a random vector is just the
covariance of the random variables that make up the vector. However, by definition,
covariance refers to just two random variables. As such, the covariance of a random
vector is the covariance of all possible pairs of random variables in the vector.
If a vector has n random variables, each of those random variables can make pairs
with n random variables, including with themselves. Thus, there are n × n pairs.
[ ]
Cov [ X 1 , X 1 ] Cov [ X 1 , X 2 ] ⋯ Cov [ X 1 , X n ]
Cov [ X 2 , X 1 ] Cov [ X 2 , X 2 ] ⋯ Cov [ X 2 , X n ]
Cov [ X ] =C X =Σ=
⋮ ⋮ ⋮ ⋮
Cov [ X n , X 1 ] Cov [ X n , X 2 ] ⋯ Cov [ X n , X n ]
There are two points to notice about the covariance of a random vector. Firstly,
1 1 [
Cov [ X 1 , X 1 ] =E [ ( X 1−μ X )( X 1−μ X ) ]=E ( X 1−μ X 1
) ] =Var [ X 1 ]
2
Thus, the covariance of pairs with the same random variables is just the variance of
that random variable. This means every element of the matrix along the diagonal is
going to be a variance.
Secondly, notice that the matrix for the covariance of the random vector is
From these two points, we can make another observation. A symmetric matrix is the
product of a column vector and its transpose. Thus, if we consider the following
[ ]
X 1−μ X 1
X 2−μ X
⋮
2
[ X 1−μ X 1
X 2−μ X 2
⋯ X n−μ X n
]
X n−μ X n
T
This product can also be written as ( X −μ X )( X−μ X ) . The result is an n × n matrix. The
Thus,
Cov [ X ] =E [ ( X −μ X ) ( X−μ X )T ]
We know that R X =E [ X X T ].
[] [ ]
X1 X1 X1 X1 X2 ⋯ X 1 X n
X X X X2 X2 ⋯ X 2 X n
X X = 2 [ X1 X n ]= 2 1
T
X2 ⋯
⋮ ⋮ ⋮ ⋮ ⋮
Xn Xn X1 Xn X2 ⋯ X n X n
Notice that all the values along the diagonal are just the squares of the random
variables.
[ ]
E [ X 21 ] E [ X1 X2] ⋯ E [ X1 Xn]
R X =E [ X X T ]= E [ X 2 X 1 ] E[ X ] ⋯ E [ X2 Xn]
2
2
⋮ ⋮ ⋮ ⋮
E [ Xn X1] E [ Xn X2] ⋯ E [ X 2n ]
Thus, the correlation matrix is also a symmetric matrix. The values along the diagonal
are the expected values of the squares of the component random variables, and all
other values are the expected values of all possible pairs. Note that each of these is
Example
T
Let X =[ X 1 X 2 ] and f X ( x )=2 for 0 ≤ x1 ≤ x 2 ≤1 . We need to find E [ X ] , R X and C X .
To be able to find the expectation, we first need to find the PDFs of the individual
random variables.
+∞
f X ( x 1) = ∫ f X
1 1 X2 ( x1 , x 2) d x 2
−∞
1
¿∫ 2 d x2
x1
¿ 2 ( 1−x 1 ) 0 ≤ x1 ≤1
+∞
f X ( x 2 )=∫ f X
2 1 X2 ( x 1 , x 2) d x 1
−∞
x2
¿∫ 2 d x1
0
¿ 2 x2 0 ≤ x2 ≤ 1
1
E [ X 1 ]=∫ x 1 ⋅2 ( 1−x 1 ) d x 1
0
[ ][ ]
2 1 3 1
x1 x1
¿ 2⋅ − 2⋅
2 0 3 0
1
¿
3
1
E [ X 2 ]=∫ x 2 ⋅2 x 2 d x 2
0
[ ]
3 1
x2
¿ 2⋅
3 0
2
¿
3
[ ]
T
1 2
E [ X ]=
3 3
RX=
[ E [ X 21 ]
E [ X2 X 1]
E [ X1 X2]
E [ X2]
2
]
1
1
E [ X 1 ]=∫ x 1 ⋅2 ( 1−x 1 ) d x 1=
2 2
0 6
1
1
E [ X 2 ]=∫ x 2 ⋅2 x 2 d x 2=
2 2
0 2
+∞ +∞
E [ X 1 X 2 ]=E [ X 2 X 1 ] =∫ ∫ x 1 x 2 f X 1 X2 ( x1 , x2 ) d x1 d x2
∞ −∞
1 x2
¿ ∫ ∫ x 1 x 2 ⋅2 d x 1 d x 2
0 0
[ ]
x2
1
x 21
¿ ∫ 2 x2 ⋅ d x2
0 2 0
[ ]
1
x2
¿
4 0
1
¿
4
[ ]
1 1
6 4
RX=
1 1
4 2
[ ][] [ ]
1 1 1 1 1
T
C X =R X −μ X μ X =
6
1
4
1
−
3
2 [ ] 1
3
2
3
=
18
1
36
1
4 2 3 36 18
R XY =E [ X Y T ]
Cov [ X , Y ] =R XY −μ X μY
T
¿ E [ ( X−μ X )( Y −μY )T ]
Here, we can also define a derived random vector from the random vector X as
Y = AX +b
Further,
μY = A μ X + b
T T T T
RY = A R X A + ( A μ X ) b + b ( A μ X ) +b b
T
C Y =A C X A
For example, we found values for μ X , R X and C X in the previous example. Thus,