0% found this document useful (0 votes)
56 views16 pages

Understanding Normal and Gamma Distributions

The document provides an overview of various probability distributions, including the normal and gamma distributions, along with their properties and moment-generating functions. It also discusses joint distributions for random variables, emphasizing the calculations for independent variables and their distributions. Additionally, it introduces the chi-square, t, and F distributions, detailing their definitions and relationships to other distributions.

Uploaded by

jithinjames01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views16 pages

Understanding Normal and Gamma Distributions

The document provides an overview of various probability distributions, including the normal and gamma distributions, along with their properties and moment-generating functions. It also discusses joint distributions for random variables, emphasizing the calculations for independent variables and their distributions. Additionally, it introduces the chi-square, t, and F distributions, detailing their definitions and relationships to other distributions.

Uploaded by

jithinjames01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Probability Lecture II (August, 2006)

1 More on Named Distribution


1.1 Normal distribution
A random variable X has normal(µ, σ 2 ) distribution, if the probability density function of X is
1 1 2
f (x) = √ e− 2 (x−µ) /σ ; -∞ < x < ∞. (1)
σ 2π

In the case where µ = 0 and σ = 1, the distribution is called standard normal distribution. It
can be showed that if X has normal(µ, σ 2 ), then E(X) = µ, and Var(X) = σ 2 .

Properties The normal density curve has the following properties:

1. it is symmetric about µ with a bell-shape curve, concave on either side of µ.


2. The areas under the curve within 1, 2, and 3 σ from µ are 68%, 95% and 99.7%, respectively.

3. X = σZ + µ has normal(µ, σ 2 ) distribution, where Z is the standard normal, i.e., normal(0, 1),
variable and σ ≥ 0.

Figure 1: The normal(µ, σ 2 ) density curve.

2 The Moment-Generating Function

Definition 1 The moment-generating function (mgf ) of a random variable X is M (t) = E(etX )


provided the expectation is defined.

Remark 2 Note that the expectation, and therefore the moment-generating function, might not exist
for all values of t.

1
Remark 3 If X1 , · · · , Xn are independent random variables with moment generating functions
MX1 , · · · , MXn , then X1 + · · · + Xn has moment generating function given by
n
Y
M(X1 +···+Xn ) (t) = MXi (t). (2)
i=1

Theorem 4 If the mgf exists for t in an open interval containing zero, it uniquely determines the
probability distribution.

Theorem 5 If the mgf exists in an open interval containing zero, then M (r) (0) = E(X r ) where
M (r) (0) is the rth derivative of M at 0.

The advantage of Theorem (5) is that when the moment of a variable (which involves integration)
is difficult to calculate, we can differentiate the mgf to achieve the same result, and differentiation
is just mechanical.

Example 6 (Gamma Distribution) The gamma(α, λ) density function depends on two parameters,
α > 0 and λ > 0, and has density function

½ λα α−1 −λt Z ∞
gα,λ (t) = Γ(α) t e , t≥0
where Γ(x) = ux−1 e−u du, x > 0
0 t<0 0

Remark 7 It follows by integration by parts that, for all p > 0, Γ(p) = pΓ(p) and that Γ(k) = (k−1)!
for positive integers k.

Remark 8 If α = 1, the gamma density coincides with the exponential density. The parameter α
is called a shape parameter for the gamma density, and λ is called a scale parameter. Varying α
changes the shape of the density, whereas varying λ changes the scale of the density.

We will find E(X) and Var(X) for a gamma variable X. The mgf of a gamma distribution is

Z ∞
λα α−1 −λx
M (t) = etx x e dx
0 Γ(α)
Z ∞
λα
= xα−1 e(t−λ)x dx
Γ(α) 0
µ ¶ Z ∞
λα Γ(α)
= since xα−1 e(t−λ)x dx converages for t < λ
Γ(α) (λ − t)α 0
and can be calculated by relating it to the gamma density with α and λ − t
µ ¶α
λ
= (3)
λ−t

Therefore,
α
EX = M (1) (0) =
λ
α(α + 1)
EX 2 = M (2) (0) =
λ2
and

2
Var(X) = EX 2 − [EX]2
α(α + 1) α2
= − 2
λ2 λ
α
= 2.
λ
¤

Example 9 Suppose X has a gamma(α1 , λ) distribution , and independently Y has a gamma(α2 , λ)


distribution. By (2), the mgf of X + Y is

µ ¶α1 µ ¶α2 µ ¶α1 +α2


λ λ λ
= , t < λ.
λ−t λ−t λ−t
³ ´α1 +α2
λ
Since λ−t is the mgf of a gamma distribution with parameters α1 + α2 and λ, we see
that the sum of n independent exponential(λ) random variables–since exponential(λ) is the special
case gamma(1, λ)–follows a gamma distribution with parameters n and λ. Thus, the time between
n consecutive events of a Poisson process follows a gamma distribution. ¤

3 Joint Distribution
3.1 For Random Variables
The table below summarizes the formulae for calculating some quantities related to joint distribu-
tions. However, the table only serves as an outline. When doing actual calculations, one should,
instead of relying on the formulae, use reasonings and the basic conditional probability concepts we
developed in the first lecture. The examples below demonstrate some of these skills.

for discrete variables


P X and Y for continuous variables
RR X and Y
Probability on a set B P ((X, Y ) ∈ B) = (x,y)∈B P (x, y) P ((X, Y ) ∈ B) = f (x, y)dxdy
P R∞ B
Marginals P (X = x) = all y P (x, y) fX (x) = −∞ f (x, y)dy
P R∞
P (Y = y) = all x P (x, y) fY (y) = −∞ f (x, y)dx
For indepdent X, Y P (x, y) = P (XP = x)PP(Y = y) f (x, y) = fX (x)f
R RY (y)
Expectation of g((X, Y )) E(g(X, Y )) = all x all y g(x, y)P (x, y) E(g(X, Y )) = g(x, y)f (x, y)dxdy

Table 1: Table for Joint Distribution formulae

3.1.1 For Independent Variables


Example 10 Let X1 , X2 , ..., Xn be a collection of independent random variables with cdf F1 , F2 , ..., Fn ,
respectively. The cdf of either the maximum, or the minimum of the X 0 s can be found easily as the
following.

Fmax (x) = P (Xmax ≤ x)


= P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
= P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x) since X1 , X2 , ..., Xn are independent
= F1 (x)F2 (x)...Fn (x)

3
and

Fmin (x) = P (Xmin ≤ x)


= 1 − P (Xmin > x)
= 1 − P (X1 > x, X2 > x, ..., Xn > x)
= 1 − [1 − F1 (x)][1 − F2 (x)]...[1 − Fn (x)].

Example 11 Minimum of independent exponential variable is exponential.

Let X1 , X2 , ..., Xn be independent random variables, and Xi has exponential distribution with
rate λi , i = 1, 2, ..., n. Find the distribution of Xmin .

For i = 1, 2, ..., n, the cdf of Xi is

½
0 if x < 0
Fi (x) =
1 − e−λi x if x ≥ 0
Since the X 0 s are non-negative, so is their minimum. So Xmin has cdf Fmin (x) = 0 for x < 0.
For x ≥ 0,

Fmin (x) = 1 − e−λ1 x e−λ2 x ...e−λn x


= 1 − e−(λ1 +λ2 +...+λn )x

which is the cdf of the exponential distribution with rate λ1 + λ2 + ... + λn . ¤

Example 12 Suppose X and Y are independent uniform (0, 1) random variables.

a)
π
P (X 2 + Y 2 ≤ 1) =
4
b)
P (X 2 + Y 2 ≤ 1, X + Y ≥ 1)
P (X 2 + Y 2 ≤ 1|X + Y ≥ 1) =
P (X + Y ≥ 1)
π/4 − 1/2
=
1/2
= π/2 − 1

c)
Z 1
2 1 31 1
P (Y ≤ X ) = x2 dx = x |0 =
0 3 3
d)
P (|X − Y | ≤ 0.5) = 1 − 1/4 = 3/4
e)
X 2 1 1 2 5
P (| − 1| ≤ 0.5) = P ( X ≤ Y ≤ 2X) = 1 − ( + ) =
Y 3 2 2 3 12
f)
1 1 1 1 3
P (Y ≥ X|Y ≥ ) = ( − )/ = .
2 2 8 2 4
¤

4
Example 13 let X and Y be independent exponentially distributed random variables with parameter
λ and µ, respectively. Find P (X < Y ).

Since X and Y are independent,

f (x, y) = (λe−λx )(µe−µy ) = λµe−λx−µy .


Then,
Z Z
P (X < Y ) = λµe−λx−µy dx dy
x<y
Z ∞ Z ∞
= dx λµe−λx−µy dy
x=0 y=x
Z ∞
= λe−λx−µy dx
x=0
λ
=
λ+µ
¤

Exercise 14 Suppose U(1) < U(2) < ... < U(5) are the order statistics of 5 independent uniform
(0, 1) variable U1 , U2 , ..., U5 , so U(i) is the ith smallest of U1 , U2 , ..., U5 .(See Pitman[1993] P352,
example 3)

a) Find the joint density of U(2) and U(4) .


b) Find P (U(2) > 1/4 and U(4) > 1/2).

Independent Normal Variables

Theorem 15 Linear combination of independent normal variables are always normally distributed.
In addition, if X and Y are independent with normal(λ, σ 2 ) and normal(µ, τ 2 ) distributions, then
X + Y has normal (λ + µ, σ 2 + τ 2 ) distribution.

The proof of the theorem makes use of the rotational symmetry of the joint distribution of
independent standard normal random variable X and Y. See Pitman [1993].

Example 16 For σ = 1, 2, 3 suppose Xσ has normal (0, σ 2 ) distribution, and these three random
variables are independent.

a) Find P (X1 + X2 + X3 < 4). √


Let S = X1 + X2 + X3 . Then S has normal (0, 12 + 22 + 32 ) distribution, and if Z = S/ 14 is
S standardized, the problem is to find
4−0 4
P (S < 4) = P (Z < √ ) = P (Z < √ ) ≈ 0.857
12 2
+2 +3 2 14
b) Find P (4X1 − 10 < X2 + X3 ).

P (4X1 − 10 < X2 + X3 ) = P (4X1 − X2 − X3 < 10) = P (L < 10) where L = 4X1 − X2 − X3 .


Then, L has normal distribution with mean 0 and variance 42 ×12 +(−1)2 ×22 +(−1)2 ×32 = 29,
the probability is
P (L < 10) = P (Z < √1029 ) ≈ 0.968. ¤

5
Remark 17 If X and Y are independent with density functions fX (x) and fY (y) in the plane R2 ,
then, a formula for the joint density function fX+Y (z), where Z = X + Y, is
Z ∞
fX+Y (z) = fX (x)fY (z − x)dx.
−∞

This is the convolution formula.

Exercise 18 Suppose that X and Y are independent and normally distributed with mean 0 and
variance 1. Find the distribution of X
Y . (See discussion and solution in Pitman[1993].)

χ2 , t, and F Distribution
Pn
Definition 19 If Z is a standard normal random variable, the distribution of U = i=1 Zi2 is called
the chi-square distribution with n degree of freedom, denoted χ2n .
It can be shown that χ21 is a special case of the gamma distribution with parameters 21 and 12 .
In example 9, we see that the sum of independent gamma random variables sharing the same value
of λ follows a gamma distribution. Thus, χ2n is a gamma distribution with α = n2 and λ = 12 .

Definition 20 If Z ∼ N (0, 1) and U ∼ χ2n and Z and U are independent, then the distribution of
√Z is called the t distribution with n degrees of freedom.
U/n

The density function of the t distribution with n degrees of freedom is


µ ¶−(n+1)/2
Γ[(n + 1)/2] t2
f (t) = √ 1+
nπΓ(n/2) n
which can be obtained by using a method similar with the exercise (18) above for the density of
a quotient of two independent variables.
The shape of the density function for t distribution is very similar with that for the normal
distribution, except that the curve for the t distribution has heavier tails on two sides. However,
as n increases the density curve for the t distribution gets closer and closer to that for the normal.
When n = 30, the density curves for the t and the normal distribution are almost indistinguishable.

Theorem 21 Show that if X1 , X2 , ... are independent N (µ, σ 2 ) variables, thenPX and S 2 are in-
n
dependent, and X is N (µ, σ 2 /n) and (n − 1)S 2 /σ 2 is χ2n−1 , where X = n1 i=1 Xi and S 2 =
1 2
n−1 (Xi − X) .

Proof. (From Rice[1995]) The proof of the statement is built on the fact that X and the vector
of random variables (X1 − X, X2 − X, ..., Xn − X) are independent. We will not prove this fact
here; the interested readers are referred to Rice [1995] for a treatment using the moment-generating
function.

Since S 2 is a function of (X1 − X, X2 − X, ..., Xn − X), and functions of independent vectors are
also independent, we can conclude that X and S 2 are independent.
Pn
Since X1 , X2 , ... are independent N (µ, σ 2 ), an exdensionPof theorem (15) shows that i=1 Xi
n
is N (nµ, nσ 2 ). Thus, deviding a constant n, makes X = n1 i=1 Xi a normal variable with mean
³ ´2
nσ 2 σ2
E(X) = nµ n = µ and variance Var(X)= n 2 = n . In addition, X−µ

σ/ n
follows χ21 by definition (19).

To see that (n − 1)S 2 /σ 2 is χ2n−1 , note that

6
n n µ ¶2
1 X 2
X Xi − µ Xi − µ
(Xi − µ) = ∼ χ2n , since ∼ N (0, 1)
σ 2 i=1 i=1
σ σ
and

n n
1 X 2 1 X
(Xi − µ) = [(Xi − X) + (X − µ)]2
σ 2 i=1 σ 2 i=1
( n n n
)
1 X 2
X X
2
= 2 (Xi − X) + 2 (Xi − X)(X − µ) + (X − µ)
σ i=1 i=1 i=1
n n µ ¶2
1 X X X −µ
= 2 (Xi − X)2 +
σ i=1 i=1
σ
n µ ¶2
1 X 2 X −µ
= 2 (Xi − X) + √ (4)
σ i=1 σ/ n

Pn Pn ³ ´2
1 2 1 2 X−µ
Let W = σ2 i=1 (Xi −µ) , U = σ2 i=1 (Xi −X) and V =

σ/ n
, (4) says that W = U +V.
Since U is a function of (X1 − X, X2 − X, ..., Xn − X), and V is a function of X, U and V are
independent by the fact we mentioned at the beginning of the proof.
So far, we have showed that W ∼ χ2n , and V ∼ χ21 . Let MW (t) be the mgf for W and so on.
Then,

MW (t)
MU (t) =
MV (t)
(1 − 2t)−n/2
=
(1 − 2t)−1/2
= (1 − 2t)−(n−1)/2

which is the mgf of a random variable with a χ2n−1 distribution.


¤

Definition 22 Let U and V be independent chi-square random variables with m and n degrees of
freedom, respectively. The distribution of

U/m
W =
V /n
is called the F distribution with m and n degree of freedom and is denoted by Fm,n .

3.1.2 For Dependent Variables

Example 23 (Gamma and uniform) Suppose X has gamma (2, λ) distribution, and that given

X = x, Y has uniform (0, x) distribution. Find the joint density of X and Y .

7
By the definition of the gamma distribution
½ 2 −λx
λ xe , x>0
fX (x) =
0, x≤0
and from the uniform (0, x) distribution of Y given X = x
½
1/x, 0 < y < x
fY (y|X = x) =
0, otherwise
So by the multiplication for densities
½
λ2 e−λx , 0<y<x
f (x, y) = fX (x)fY (y|X = x) =
0, otherwise
Example 24 Find the marginal density of Y .
Integrating out x in the joint density gives the marginal density of Y : for y > 0
Z ∞ Z ∞
fY (y) = f (x, y)dx = λ2 e−λx dx = λe−λy
0 y

The density is of course 0 for y ≤ 0. That is to say, Y has exponential (λ) distribution.

Disceret Case Continuous Case


Multiplication rule P (X = x, Y = y) = P
P(X = x)P (Y = y|X = x) f (x, y) = fX (x)fY (y|X
R = x)
Cond. dist. of (Y |X = x) P (Y ∈ B|X = x) = y∈B P (Y = y|X = x) P (Y ∈ B|X = x) = B fY (y|X = x)dy
P R
Average cond. expectation E(Y ) = all x E(Y |X = x)P (X = x) E(Y ) = E(Y |X = x)fX (x)dx

Table 2: Table for Conditioning formulae

4 Conditioning by a Random Vector


4.1 Discrete Case
If Y and Z are discrete random vectors possibly of different dimensions, we want to study the
conditional probability structure of Y given that Z has taken on a particular value z.
Definition 25 Define the conditional probability mass function p(· | z) of Y given Z = z by
p(y, z)
p(y | z) = P [Y = y | Z = z] = (5)
pZ (z)
where p and pZ are the probability mass functions of (Y, Z) and Z. The conditional probability mass
function p is defined only for values of z such that pZ (z) > 0. With this definition it is clear that
p(· | z) is the mass function of a probability distribution because
P
X y p(y, z) pZ (z)
f (y | z) = = =1
y
p Z (z) pZ (z)

This probability distribution is called the conditional distribution of Y given that Z = z.


Example 26 Let Y = (Y1 , · · · , YnP
), where the Yi are the indicators of a set of n Bernoulli trials
n
with success probability p. Let Z = i=1 Yi , the total number of successes. Then Z has a binomial,
B(n, p), distribution and
P [Y = y, Z = z] py (1 − p)n−z 1
p(y | z) = µ ¶ =µ ¶ = µ ¶.
n z n n
p (1 − p)n−z pz (1 − p)n−z
z z z
Thus, if we are told we obtained k successes in n binomial trials, then these successes are as
likely to occur on one set of trials as on any other. ¤

8
4.1.1 Bayes’ Rule

Let q(z | y) denote the conditional probability mass function of Z give Y = y. Then,

p(y, z) = p(y | z)fZ (z)


q(z | y)pY (y)
p(y | z) = P Bayes’ Rule
y q(z | y)pY (y)

whenever the denominator of the right-hand side is positive.

4.1.2 Conditional Expectation for Discrete Variables


Definition 27 Suppose that Y is a random variable with E(|Y |) < ∞. Define the conditional
expectation of Y given Z = z, written E(Y | Z = z), by
X
E(Y | Z = z) = yp(y | z).
y

Note that if pZ (z) > 0,


X X pY (y) E (|Y |)
E(|Y | |Z = z|) = |y| pY (y|z) ≤ |y| = .
y y
pZ (z) pZ (z)

The inequality is because that {y ∩ z} ⊆ {y}. Thus, when pZ (z) > 0, the conditional expected value
of Y is finite whenever the expected value is finite.

Definition 28 Let g(z) = E(Y | Z = z). The random variable g(Z) is written E(Y | Z) and is
called the conditional expectation of Y given Z.

Example 29 As an example we calculate E(Y1 | Z) where Y1 and Z are given in Example 26. We
have µ ¶
n−1
i−1 i
E(Y1 | Z = i) = P [Y1 = 1 | Z = i] = µ ¶ = .
n n
i
The first of these equalities
µ holds
¶ because Y1 is an indicator. The second follows from the equation in
n−1
Example 26 because is just the number of ways i successes can occur in n Bernoulli trials
i−1
with the first trial being a success. Therefore,
Z
E(Y1 | Z) = .
n
Exercise 30 Let X1 and X2 be the numbers on two independent fair-die rolls. Let X be the mini-
mum and Y the maximum of X1 and X2 . Calculate: E(Y |X = x) and E(X|Y = y).

9
Exercise 31 Repeat the last exercise with X1 and X2 two draws without replacement from {1, 2, ..., n}.

Properties of Conditional Expected Values In the context of previous lecture, the conditional
distribution of a random vector Y given Z = z corresponds to a single probability function Pz on
(Ω, A). Specifically, define for A ∈ A,

Pz (A) = P (A | [Z = z]) if pZ (z) > 0.


This Pz is just the conditional probability function on (Ω, A) mentioned before. Now the conditional
distribution of Y given Z = z is the same as the distribution of Y if Pz is the probability function
on (Ω, A). Therefore, the conditional expectation is an ordinary expectation with respect to the
probability function Pz .
Properties. It follows that all the properties of the expectation given before hold for the
conditional expectation given Z = z. Thus, for any real-valued function r(Y) with E |r(Y)| < ∞,
P
1. E(r(Y) | Z = z) = y r(y)p(y | z) identically in z for any Y1 , Y2 such that E(|Y1 |), E(|Y2 |)
are finite.
2. E(αY1 + βY2 | Z) = αE(Y1 | Z) + βE(Y2 | Z), since E(αY1 + βY2 | Z = z) = αE(Y1 | Z =
z) + βE(Y2 | Z = z) holds for all z.
3. E(Y | Z) = E(Y ) if Y and Z are independent.
4. E(h(Z) | Z) = h(Z).
5. E(q(Y, Z) | Z = z) = E(q(Y, z) | Z = z) (subsitution theorem for conditional expectation)
6. E(E(Y | Z)) = E(Y )

Property 6 is true because

X X X X
E(E(Y | Z)) = PZ (z)[ yp(y | z)] = yp(y | z)pZ (z) = yp(y, z) = E(Y ).
z y y,z y,z

The interchange of summation used is valid because the finiteness of E(|Y |) implies that all sums
converge absolutely.
As an illustration, we check E(E(Y | Z)) = E(Y ) for E(Y1 | Z) = Zn given before. In this case,
µ ¶
Z np
E(E(Y1 | Z)) = E = = p = E(Y1 )
n n

4.2 Continuous Case


Definition 32 Suppose (Y, Z) is a continuous random vector having coordinates that are themselves
vectors and having density function p(y, z). In analogy to (5), the conditional density function of Y
given Z = z is

p(y, z)
p(y | z) =
pZ (z)
if pZ (z) > 0.

10
4.2.1 Bayes’ Rule

The Bayes’ rule for the continuous case is defined as

p (y)q(z | y)
p(y | z) = R ∞ R∞ Y ,
−∞
· · · −∞ pY (t)q(z | t)dt1 · · · dtn
where q is the conditional density of Z given Y = y.

Remark 33 If Y and Z are independent, the conditional distributions equal the marginals as in the
discrete case.
Remark 34 If E(|Y |) < ∞, we denote the conditional expectation of Y given Z = z in analogy to
the discrete case as the expected value of a random variable with density p(y | z). More generally, if
E(|r(Y)|) < ∞, the conditional expectation of r(Y) given Z = z can be obtained from
Z ∞
E(r(Y) | Z = z) = r(y)p(y | z)dy.
−∞

5 Transformation of a Random Vector


We have encountered the change of variable formula for the case involving random variables. In this
section, we will see a more general case: transformation of a random vector.
Let h = (h1 , ...hk )T , where each hi is a real-valued function on Rk . Thus, h is a transformation
from Rk to Rk . Recall that the Jacobian Jh (t) of h evaluated at t = (t1 , · · · , tk )T is by definition
the determinant ¯ ∂ ¯
¯ ∂t h1 (t) · · · ∂t∂ hk (t) ¯
¯ 1 1 ¯
¯ .. .. ¯
Jh (t) = ¯ . . ¯.
¯ ¯
¯ ∂
h1 (t) · · · ∂
hk (t) ¯
∂tk ∂tk

Theorem 35 Let X be continuous and let S be an open subset of Rk such that P (X ∈ S) = 1. If


g = (g1 , · · · , gk )T is a transformation from S to Rk such that g and S satisfy the conditions:
1. g−1 has continous first partial derivatives on S.
2. g−1 is one-to-one on S.
3. The Jacobian of h does not vanish on S.
Then, the density of Y = g(Y) is given by
¯ ¯
pY (y) = pX (g −1 (y)) ¯Jg−1 (y)¯ (6)
for y ∈ g(S)..
Example 36 Suppose X = (X1 , X2 )T where X1 and X2 are independent with N (0, 1) and N (0, 4)
distributions, respectively. What is the joint distribution of Y1 = X1 + X2 and Y2 = X1 − X2 ? Here,
µ · ¸¶
1 1 1
pX (x1 , x2 ) = exp − x21 + x22 .
4π 2 4
In this case, S = R2 . Also note that g1 (x) = x1 + x2 , g2 (x) = x1 − x2 , g1−1 (y) = 21 (y1 + y2 ), g2−1 (y) =
1 2
2 (y1 − y2 ), that the range g(S) is R and that
¯ 1 ¯
¯ 1 ¯ 1
Jg−1 (y) = ¯¯ 21 2 ¯=− .
1 ¯
2 − 2 2
Upon substituting these quantities in (6), we obtain

11
µ ¶
1 1 1
pY (y1 , y2 ) = pX (y1 + y2 ), (y1 − y2 )
2 2 2
· ¸
1 1 1
= exp − (y1 + y2 )2 + (y1 − y2 )2
8π 4 16
1 1 £ 2 ¤
= exp − 5y1 + 5y22 + 6y1 y2 .
8π 32
This is an example of bivariate normal density.

Gamma and Beta Distribution A random variable X has Beta(r, s) distribution if it has density

xr−1 (1 − x)s−1
br,s (x) = , for 0 < x < 1,
B(r, s)
Γ(r)Γ(s) R∞
where B(r, s) = Γ(r+s) is the beta function and Γ(x) = 0 ux−1 e−u du as in the gamma
distribution.

Example 37 If X1 and X2 are independent random variables with gamma(p, λ) and gamma(q, λ)
distributions, respectively, then Y1 = X1 + X2 and Y2 = (X1X+X
1
2)
are independent and have, respec-
tively, gamma(p + q, λ) and Beta(p, q) distribution.

If λ = 1, the joint density of X1 and X2 is

e−(x1 +x2 ) x1p−1 xq−1


2
p(x1 , x2 ) = , for x1 > 0, x2 > 0.
Γ(p)Γ(q)
Let
µ¶ µ ¶
y1 x1 + x2
= g (y1 , y2 ) = x1 .
y2 x1 +x2
µ ¶ µ ¶
x1 y1
Then, g is one-to-one on S = { : x1 > 0, x2 > 0} and its range is S1 = { : y1 >
x2 y2
0, 0 < y2 < 1}. We note that on S1
µ ¶
y1 y2
g−1 (y1 , y2 ) = (7)
y1 − y1 y2
Therefore,
¯ ¯
¯ y
1 − y2 ¯¯
Jg−1 (y1 , y2 ) = ¯¯ 2 = −y1 (8)
−y1 ¯
y1
µ ¶
Y1
Subsititute (7) and (8) into (6) and get for the density of = g (X1 , X2 ) ,
Y2
e−y1 (y1 y2 )p−1 (y1 − y1 y2 )q−1 y1
pY (y1 , y2 ) = , for y1 > 0, 0 < y2 < 1. (9)
Γ(p)Γ(q)
Simplifying (9) gives

pY (y1 , y2 ) = gp+q,1 (y1 )bp,q (y2 ).


Thus, the statement is proved for λ = 1. If λ 6= 1, define X10 = λX1 and X20 = λX2 . Now X10
and X20 are independent Γ(p, 1), Γ(q, 1) variables respectively. Because X10 + X20 = λ(X1 + X2 ) and
X10 (X10 + X20 )−1 = X1 (X1 + X2 )−1 , the statement follows. ¤

12
6 Markov Chain

6.1 Discrete case


In this section, we consider a stochastic process {Xn , n = 0, 1, 2, ...} that takes on a finite or countable
number of possible values. Unless otherwise mentioned, this set of possible values of process will be
denoted by the set of non-negative integers {0, 1, 2, ...}. If Xn = i, then the process is said to be in
state i at time n. We suppose that whenever the process is in state i, there is a fixed probability Pij
that it will next be at state j. That is, we suppose that

P {Xn+1 = j|Xn = i, Xn−1 = in−1 , ..., X1 = i1 , X0 = i0 } = Pij


for all states i0 , i1 , ..., in−1 , i, j and all n ≥ 0. Such a stochastic process is known as a Markov
Chain.

Example 38 (A random walk model) A Markov chain whose state space is given by the integers
i = 0, ±1, ±2, ... is said to be a random walk if , for some number 0 < p < 1,

Pi,i+1 = p = 1 − Pi,i−1 ; i = 0, ±1, ±2, ...


The preceding Markov chain is called a random walk for we may think of it as being a model for
an individual walking on a straight line who at each point of time either takes one step to the right
with probability p or one step to the left with probability 1 − p.

Chapman-Kolmogorov Equations
We have defined the one-step transition probability Pij. We now define the n-step transition
probabilities Pijn to be the probability that a process in state i will be in state j after n additional
transitions. That is,

Pijn = P {Xn+k = j|Xk = i}, n ≥ 0, i, j ≥ 0


The Chapman-Kolmogorov equations provide a method for computing these n-step transition
probabilities. These equations are

X
Pijn+m = n m
Pik Pkj for all n, m ≥ 0, all i, j
k=0
n m
and are most easily understood by noting that Pik Pkj represents the probability that starting
in i the process will go to state j in n + m transitions through a path which takes it into state k
at the nth transition. Hence, summing over all intermediate states k yields the probability that the
process will be in state j after n + m transitions. Formally, we have

Pijn+m = P {Xn+m = j|X0 = i}



X
= P {Xn+m = j, Xn = k|X0 = i}
k=0
X∞
= P {Xn+m = j|Xn = k, X0 = i}P {Xn = k|X0 = i}
k=0
X∞
n m
= Pik Pkj
k=0

13
6.2 Continuous case
Suppose we have a continuous-time stochastic process {X(t), t ≥ 0} taking on values in the set
of non-negative integers. In analogy with the definition of a discrete-time Markov chain, we say
that the process {X(t), t ≥ 0} is a continous-time Markov chain if for all s, t ≥ 0 and non-negative
integers i, j, x(u), 0 ≤ u < s

P {X(t + s) = j|X(s) = i, X(u) = x(u), 0 ≤ u < s}


= P {X(t + s) = j|X(s) = i}

In other words, a continous-time Markov chain is a stochastic process having the Markovian
property that the conditional distribution of the future X(t + s) given the present X(s) and the past
X(u), 0 ≤ u < s, depends only on the present and is independent of the past. If in addition,

P {X(t + s) = j|X(s) = i
is independent of s, then the continuous-time Markov chain is said to have stationary or homo-
geneous transition probabilities.

Example 39 Suppose the service in a particular barbra shop consists of two procedures: A customer
upon arrival goes initially to chair 1 where his/her hair will be raised by an assistant; after this is
done the customer moves on to chair 2 where his/her hair will be cut by the stylist. The service time
at the two steps are assumped to be independent random variable that are exponentially distributed
with respective rates µ1 and µ2 . Suppose that the potential customers arrive in accordance with a
Poisson process having rate λ, and that a potential customer will enter the system only if both chairs
are empty.

This problem can be modeled as a continuous-time Markov chain. Since a potential customer
will enter the shop only if there is no other customers in the shop, there will always be either 0, or 1
customers in the shop. If there is 1 customer in the shop, then we would need to know which chair
the customer is on. Therefore, an appropriate stat space would consists of three states: 0 =shop is
empty, 1 =chair 1 is occupied and 2 =chair 2 is occupied.

7 Delta Method
(The followings are taken from Rice[1995])
Suppose that we know the expectation and the variance of a random variable X, but not the
entire distribution and that we are interested in the mean and variance of Y = g(X) for some fixed
function g. For example, we might be able to measure X and determine its mean and variance, but
really be interested in Y , which is related to X in a known way. We might wish to know V ar(Y ),
at least approximately, in order to assess the accuracy of the indirect measurement process. From
the results given in this section. we cannot in general find E(Y ) = µY and V ar(Y ) = σY2 from
2
E(X) = µX and V ar(X) = σX , unless the function g is linear. However, if g is nearly linear in a
range in which X has high probability, it can be approximated by a linear function and approximate
moments of Y can be found.
In proceeding as just described, we follow a tack often taken in applied mathematics: When
confronted with a nonlinear problem that we cannot solve, we linearize. In probability and statistics,
this method is called propagation of error, or the δ method. Linearization is carried out through
a Taylor series expansion of g about µX . To the first order,

Y = g(X) ≈ g(µX ) + (X − µX )g 0 (µX )

14
We have expressed Y as approximately equal to a linear function of X. Recalling that if U = a + bV ,
then E(U ) = a + bE(V ) and V ar(U ) = b2 V ar(V ), we find

µY ≈ g(µX )
σY2 ≈ σX
2
[g 0 (µX )]2

We know that in general E(Y ) 6= g(E(X)), as given by the approximation. In fact, we can carry
out the Taylor series expansion to the second order to get an improved approximation of µY
1
Y = g(X) ≈ g(µX ) + (X − µX )g 0 (µX ) + (X − µX )2 g 00 (µX )
2
Taking the expectation of the right-hand side, we have, since E(X − µX ) = 0,
1 2 00
E(Y ) ≈ g(µX ) + σX g (µX )
2
How good such approximations are depends on how nonlinear g is in a neighborhood of µX and
on the size of σX . From Chebyshev’s inequality we know that X is unlikely to be many standard
deviations away from µX ; if g can be reasonably well approximated in this range by a linear function,
the approximations for the moments will be reasonable as well.

Example 40 The relation of voltage, current, and resistance is V = IR. Suppose that the voltage
is held constant at a value V0 across a medium whose resistance fluctuates randomly as a result, say,
of random fluctuations at the molecular level. The current therefore also varies randomly. Suppose
that it can be determined experimentally to have mean µI = 0 and variance σI2 . We wish to find the
mean and variance of the resistance, R, and since we do not know the distribution of I, we must
resort to an approximation. We have

V0
R = g(I) =
I
V0
g 0 (µI ) = −
µ2I
2V0
g 00 (µI ) = 3
µI
Thus,
V0 V0
µR ≈ + 3 σI2
µI µI
2 V02 2
σR ≈ σ
µ4I I

We see that the variability of R depends on both the mean level of I and the variance of I. This makes
sense, since if I is quite small, small variations in I will result in large variations in R = V0 /I,
whereas if I is large, small variations will not affect R as much. The second-order correction factor
for µR also depends on µI and is large if µI is small. In fact, when I is near zero, the function
g(I) = V0 /I is quite nonlinear, and the linearization is not a good approximation.

References
[1] Peter J. Bickel and Kjell A. Doksum (2001) Mathematical Statistics: Basic Ideas and Selected
Topics, Vol. I, 2nd Edition. Prentice Hall
[2] Geoffrey Grimmett and David Stirzaker (2002) Probability and Random Processes, 3rd Edition,
Oxford University Press.

15
[3] Jim Pitman (1993) Probability, Springer-Verlag New York, Inc.
[4] John A. Rice (1995) Mathematical Statistics and Data Analysis, 2nd Edition, Duxbury Press.
[5] Sheldon M. Ross (2003) Introduction to Probability Models, Academic Press.

16

You might also like