0% found this document useful (0 votes)

56 views16 pages

Understanding Normal and Gamma Distributions

The document provides an overview of various probability distributions, including the normal and gamma distributions, along with their properties and moment-generating functions. It also discusses joint distributions for random variables, emphasizing the calculations for independent variables and their distributions. Additionally, it introduces the chi-square, t, and F distributions, detailing their definitions and relationships to other distributions.

Uploaded by

jithinjames01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views16 pages

Understanding Normal and Gamma Distributions

Uploaded by

jithinjames01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Probability Lecture II (August, 2006)

1 More on Named Distribution

1.1 Normal distribution
A random variable X has normal(µ, σ 2 ) distribution, if the probability density function of X is
1 1 2
f (x) = √ e− 2 (x−µ) /σ ; -∞ < x < ∞. (1)
σ 2π

In the case where µ = 0 and σ = 1, the distribution is called standard normal distribution. It
can be showed that if X has normal(µ, σ 2 ), then E(X) = µ, and Var(X) = σ 2 .

Properties The normal density curve has the following properties:

1. it is symmetric about µ with a bell-shape curve, concave on either side of µ.

2. The areas under the curve within 1, 2, and 3 σ from µ are 68%, 95% and 99.7%, respectively.

3. X = σZ + µ has normal(µ, σ 2 ) distribution, where Z is the standard normal, i.e., normal(0, 1),
variable and σ ≥ 0.

Figure 1: The normal(µ, σ 2 ) density curve.

2 The Moment-Generating Function

Definition 1 The moment-generating function (mgf ) of a random variable X is M (t) = E(etX )

provided the expectation is defined.

Remark 2 Note that the expectation, and therefore the moment-generating function, might not exist
for all values of t.

1
Remark 3 If X1 , · · · , Xn are independent random variables with moment generating functions
MX1 , · · · , MXn , then X1 + · · · + Xn has moment generating function given by
n
Y
M(X1 +···+Xn ) (t) = MXi (t). (2)
i=1

Theorem 4 If the mgf exists for t in an open interval containing zero, it uniquely determines the
probability distribution.

Theorem 5 If the mgf exists in an open interval containing zero, then M (r) (0) = E(X r ) where
M (r) (0) is the rth derivative of M at 0.

The advantage of Theorem (5) is that when the moment of a variable (which involves integration)
is difficult to calculate, we can differentiate the mgf to achieve the same result, and differentiation
is just mechanical.

Example 6 (Gamma Distribution) The gamma(α, λ) density function depends on two parameters,
α > 0 and λ > 0, and has density function

½ λα α−1 −λt Z ∞
gα,λ (t) = Γ(α) t e , t≥0
where Γ(x) = ux−1 e−u du, x > 0
0 t<0 0

Remark 7 It follows by integration by parts that, for all p > 0, Γ(p) = pΓ(p) and that Γ(k) = (k−1)!
for positive integers k.

Remark 8 If α = 1, the gamma density coincides with the exponential density. The parameter α
is called a shape parameter for the gamma density, and λ is called a scale parameter. Varying α
changes the shape of the density, whereas varying λ changes the scale of the density.

We will find E(X) and Var(X) for a gamma variable X. The mgf of a gamma distribution is

Z ∞
λα α−1 −λx
M (t) = etx x e dx
0 Γ(α)
Z ∞
λα
= xα−1 e(t−λ)x dx
Γ(α) 0
µ ¶ Z ∞
λα Γ(α)
= since xα−1 e(t−λ)x dx converages for t < λ
Γ(α) (λ − t)α 0
and can be calculated by relating it to the gamma density with α and λ − t
µ ¶α
λ
= (3)
λ−t

Therefore,
α
EX = M (1) (0) =
λ
α(α + 1)
EX 2 = M (2) (0) =
λ2
and

2
Var(X) = EX 2 − [EX]2
α(α + 1) α2
= − 2
λ2 λ
α
= 2.
λ
¤

Example 9 Suppose X has a gamma(α1 , λ) distribution , and independently Y has a gamma(α2 , λ)

distribution. By (2), the mgf of X + Y is

µ ¶α1 µ ¶α2 µ ¶α1 +α2

λ λ λ
= , t < λ.
λ−t λ−t λ−t
³ ´α1 +α2
λ
Since λ−t is the mgf of a gamma distribution with parameters α1 + α2 and λ, we see
that the sum of n independent exponential(λ) random variables–since exponential(λ) is the special
case gamma(1, λ)–follows a gamma distribution with parameters n and λ. Thus, the time between
n consecutive events of a Poisson process follows a gamma distribution. ¤

3 Joint Distribution
3.1 For Random Variables
The table below summarizes the formulae for calculating some quantities related to joint distribu-
tions. However, the table only serves as an outline. When doing actual calculations, one should,
instead of relying on the formulae, use reasonings and the basic conditional probability concepts we
developed in the first lecture. The examples below demonstrate some of these skills.

for discrete variables

P X and Y for continuous variables
RR X and Y
Probability on a set B P ((X, Y ) ∈ B) = (x,y)∈B P (x, y) P ((X, Y ) ∈ B) = f (x, y)dxdy
P R∞ B
Marginals P (X = x) = all y P (x, y) fX (x) = −∞ f (x, y)dy
P R∞
P (Y = y) = all x P (x, y) fY (y) = −∞ f (x, y)dx
For indepdent X, Y P (x, y) = P (XP = x)PP(Y = y) f (x, y) = fX (x)f
R RY (y)
Expectation of g((X, Y )) E(g(X, Y )) = all x all y g(x, y)P (x, y) E(g(X, Y )) = g(x, y)f (x, y)dxdy

Table 1: Table for Joint Distribution formulae

3.1.1 For Independent Variables

Example 10 Let X1 , X2 , ..., Xn be a collection of independent random variables with cdf F1 , F2 , ..., Fn ,
respectively. The cdf of either the maximum, or the minimum of the X 0 s can be found easily as the
following.

Fmax (x) = P (Xmax ≤ x)

= P (X1 ≤ x, X2 ≤ x, ..., Xn ≤ x)
= P (X1 ≤ x)P (X2 ≤ x)...P (Xn ≤ x) since X1 , X2 , ..., Xn are independent
= F1 (x)F2 (x)...Fn (x)

3
and

Fmin (x) = P (Xmin ≤ x)

= 1 − P (Xmin > x)
= 1 − P (X1 > x, X2 > x, ..., Xn > x)
= 1 − [1 − F1 (x)][1 − F2 (x)]...[1 − Fn (x)].

Example 11 Minimum of independent exponential variable is exponential.

Let X1 , X2 , ..., Xn be independent random variables, and Xi has exponential distribution with
rate λi , i = 1, 2, ..., n. Find the distribution of Xmin .

For i = 1, 2, ..., n, the cdf of Xi is

½
0 if x < 0
Fi (x) =
1 − e−λi x if x ≥ 0
Since the X 0 s are non-negative, so is their minimum. So Xmin has cdf Fmin (x) = 0 for x < 0.
For x ≥ 0,

Fmin (x) = 1 − e−λ1 x e−λ2 x ...e−λn x

= 1 − e−(λ1 +λ2 +...+λn )x

which is the cdf of the exponential distribution with rate λ1 + λ2 + ... + λn . ¤

Example 12 Suppose X and Y are independent uniform (0, 1) random variables.

a)
π
P (X 2 + Y 2 ≤ 1) =
4
b)
P (X 2 + Y 2 ≤ 1, X + Y ≥ 1)
P (X 2 + Y 2 ≤ 1|X + Y ≥ 1) =
P (X + Y ≥ 1)
π/4 − 1/2
=
1/2
= π/2 − 1

c)
Z 1
2 1 31 1
P (Y ≤ X ) = x2 dx = x |0 =
0 3 3
d)
P (|X − Y | ≤ 0.5) = 1 − 1/4 = 3/4
e)
X 2 1 1 2 5
P (| − 1| ≤ 0.5) = P ( X ≤ Y ≤ 2X) = 1 − ( + ) =
Y 3 2 2 3 12
f)
1 1 1 1 3
P (Y ≥ X|Y ≥ ) = ( − )/ = .
2 2 8 2 4
¤

4
Example 13 let X and Y be independent exponentially distributed random variables with parameter
λ and µ, respectively. Find P (X < Y ).

Since X and Y are independent,

f (x, y) = (λe−λx )(µe−µy ) = λµe−λx−µy .

Then,
Z Z
P (X < Y ) = λµe−λx−µy dx dy
x<y
Z ∞ Z ∞
= dx λµe−λx−µy dy
x=0 y=x
Z ∞
= λe−λx−µy dx
x=0
λ
=
λ+µ
¤

Exercise 14 Suppose U(1) < U(2) < ... < U(5) are the order statistics of 5 independent uniform
(0, 1) variable U1 , U2 , ..., U5 , so U(i) is the ith smallest of U1 , U2 , ..., U5 .(See Pitman[1993] P352,
example 3)

a) Find the joint density of U(2) and U(4) .

b) Find P (U(2) > 1/4 and U(4) > 1/2).

Independent Normal Variables

Theorem 15 Linear combination of independent normal variables are always normally distributed.
In addition, if X and Y are independent with normal(λ, σ 2 ) and normal(µ, τ 2 ) distributions, then
X + Y has normal (λ + µ, σ 2 + τ 2 ) distribution.

The proof of the theorem makes use of the rotational symmetry of the joint distribution of
independent standard normal random variable X and Y. See Pitman [1993].

Example 16 For σ = 1, 2, 3 suppose Xσ has normal (0, σ 2 ) distribution, and these three random
variables are independent.

a) Find P (X1 + X2 + X3 < 4). √

Let S = X1 + X2 + X3 . Then S has normal (0, 12 + 22 + 32 ) distribution, and if Z = S/ 14 is
S standardized, the problem is to find
4−0 4
P (S < 4) = P (Z < √ ) = P (Z < √ ) ≈ 0.857
12 2
+2 +3 2 14
b) Find P (4X1 − 10 < X2 + X3 ).

P (4X1 − 10 < X2 + X3 ) = P (4X1 − X2 − X3 < 10) = P (L < 10) where L = 4X1 − X2 − X3 .

Then, L has normal distribution with mean 0 and variance 42 ×12 +(−1)2 ×22 +(−1)2 ×32 = 29,
the probability is
P (L < 10) = P (Z < √1029 ) ≈ 0.968. ¤

5
Remark 17 If X and Y are independent with density functions fX (x) and fY (y) in the plane R2 ,
then, a formula for the joint density function fX+Y (z), where Z = X + Y, is
Z ∞
fX+Y (z) = fX (x)fY (z − x)dx.
−∞

This is the convolution formula.

Exercise 18 Suppose that X and Y are independent and normally distributed with mean 0 and
variance 1. Find the distribution of X
Y . (See discussion and solution in Pitman[1993].)

χ2 , t, and F Distribution
Pn
Definition 19 If Z is a standard normal random variable, the distribution of U = i=1 Zi2 is called
the chi-square distribution with n degree of freedom, denoted χ2n .
It can be shown that χ21 is a special case of the gamma distribution with parameters 21 and 12 .
In example 9, we see that the sum of independent gamma random variables sharing the same value
of λ follows a gamma distribution. Thus, χ2n is a gamma distribution with α = n2 and λ = 12 .

Definition 20 If Z ∼ N (0, 1) and U ∼ χ2n and Z and U are independent, then the distribution of
√Z is called the t distribution with n degrees of freedom.
U/n

The density function of the t distribution with n degrees of freedom is

µ ¶−(n+1)/2
Γ[(n + 1)/2] t2
f (t) = √ 1+
nπΓ(n/2) n
which can be obtained by using a method similar with the exercise (18) above for the density of
a quotient of two independent variables.
The shape of the density function for t distribution is very similar with that for the normal
distribution, except that the curve for the t distribution has heavier tails on two sides. However,
as n increases the density curve for the t distribution gets closer and closer to that for the normal.
When n = 30, the density curves for the t and the normal distribution are almost indistinguishable.

Theorem 21 Show that if X1 , X2 , ... are independent N (µ, σ 2 ) variables, thenPX and S 2 are in-
n
dependent, and X is N (µ, σ 2 /n) and (n − 1)S 2 /σ 2 is χ2n−1 , where X = n1 i=1 Xi and S 2 =
1 2
n−1 (Xi − X) .

Proof. (From Rice[1995]) The proof of the statement is built on the fact that X and the vector
of random variables (X1 − X, X2 − X, ..., Xn − X) are independent. We will not prove this fact
here; the interested readers are referred to Rice [1995] for a treatment using the moment-generating
function.

Since S 2 is a function of (X1 − X, X2 − X, ..., Xn − X), and functions of independent vectors are
also independent, we can conclude that X and S 2 are independent.
Pn
Since X1 , X2 , ... are independent N (µ, σ 2 ), an exdensionPof theorem (15) shows that i=1 Xi
n
is N (nµ, nσ 2 ). Thus, deviding a constant n, makes X = n1 i=1 Xi a normal variable with mean
³ ´2
nσ 2 σ2
E(X) = nµ n = µ and variance Var(X)= n 2 = n . In addition, X−µ
√
σ/ n
follows χ21 by definition (19).

To see that (n − 1)S 2 /σ 2 is χ2n−1 , note that

6
n n µ ¶2
1 X 2
X Xi − µ Xi − µ
(Xi − µ) = ∼ χ2n , since ∼ N (0, 1)
σ 2 i=1 i=1
σ σ
and

n n
1 X 2 1 X
(Xi − µ) = [(Xi − X) + (X − µ)]2
σ 2 i=1 σ 2 i=1
( n n n
)
1 X 2
X X
2
= 2 (Xi − X) + 2 (Xi − X)(X − µ) + (X − µ)
σ i=1 i=1 i=1
n n µ ¶2
1 X X X −µ
= 2 (Xi − X)2 +
σ i=1 i=1
σ
n µ ¶2
1 X 2 X −µ
= 2 (Xi − X) + √ (4)
σ i=1 σ/ n

Pn Pn ³ ´2
1 2 1 2 X−µ
Let W = σ2 i=1 (Xi −µ) , U = σ2 i=1 (Xi −X) and V =
√
σ/ n
, (4) says that W = U +V.
Since U is a function of (X1 − X, X2 − X, ..., Xn − X), and V is a function of X, U and V are
independent by the fact we mentioned at the beginning of the proof.
So far, we have showed that W ∼ χ2n , and V ∼ χ21 . Let MW (t) be the mgf for W and so on.
Then,

MW (t)
MU (t) =
MV (t)
(1 − 2t)−n/2
=
(1 − 2t)−1/2
= (1 − 2t)−(n−1)/2

which is the mgf of a random variable with a χ2n−1 distribution.

Definition 22 Let U and V be independent chi-square random variables with m and n degrees of
freedom, respectively. The distribution of

U/m
W =
V /n
is called the F distribution with m and n degree of freedom and is denoted by Fm,n .

3.1.2 For Dependent Variables

Example 23 (Gamma and uniform) Suppose X has gamma (2, λ) distribution, and that given

X = x, Y has uniform (0, x) distribution. Find the joint density of X and Y .

7
By the definition of the gamma distribution
½ 2 −λx
λ xe , x>0
fX (x) =
0, x≤0
and from the uniform (0, x) distribution of Y given X = x
½
1/x, 0 < y < x
fY (y|X = x) =
0, otherwise
So by the multiplication for densities
½
λ2 e−λx , 0<y<x
f (x, y) = fX (x)fY (y|X = x) =
0, otherwise
Example 24 Find the marginal density of Y .
Integrating out x in the joint density gives the marginal density of Y : for y > 0
Z ∞ Z ∞
fY (y) = f (x, y)dx = λ2 e−λx dx = λe−λy
0 y

The density is of course 0 for y ≤ 0. That is to say, Y has exponential (λ) distribution.

Disceret Case Continuous Case

Table 2: Table for Conditioning formulae

4 Conditioning by a Random Vector

4.1 Discrete Case
If Y and Z are discrete random vectors possibly of different dimensions, we want to study the
conditional probability structure of Y given that Z has taken on a particular value z.
Definition 25 Define the conditional probability mass function p(· | z) of Y given Z = z by
p(y, z)
p(y | z) = P [Y = y | Z = z] = (5)
pZ (z)
where p and pZ are the probability mass functions of (Y, Z) and Z. The conditional probability mass
function p is defined only for values of z such that pZ (z) > 0. With this definition it is clear that
p(· | z) is the mass function of a probability distribution because
P
X y p(y, z) pZ (z)
f (y | z) = = =1
y
p Z (z) pZ (z)

This probability distribution is called the conditional distribution of Y given that Z = z.

Example 26 Let Y = (Y1 , · · · , YnP
), where the Yi are the indicators of a set of n Bernoulli trials
n
with success probability p. Let Z = i=1 Yi , the total number of successes. Then Z has a binomial,
B(n, p), distribution and
P [Y = y, Z = z] py (1 − p)n−z 1
p(y | z) = µ ¶ =µ ¶ = µ ¶.
n z n n
p (1 − p)n−z pz (1 − p)n−z
z z z
Thus, if we are told we obtained k successes in n binomial trials, then these successes are as
likely to occur on one set of trials as on any other. ¤

8
4.1.1 Bayes’ Rule

Let q(z | y) denote the conditional probability mass function of Z give Y = y. Then,

p(y, z) = p(y | z)fZ (z)

q(z | y)pY (y)
p(y | z) = P Bayes’ Rule
y q(z | y)pY (y)

whenever the denominator of the right-hand side is positive.

4.1.2 Conditional Expectation for Discrete Variables

Note that if pZ (z) > 0,

X X pY (y) E (|Y |)
E(|Y | |Z = z|) = |y| pY (y|z) ≤ |y| = .
y y
pZ (z) pZ (z)

The inequality is because that {y ∩ z} ⊆ {y}. Thus, when pZ (z) > 0, the conditional expected value
of Y is finite whenever the expected value is finite.

Definition 28 Let g(z) = E(Y | Z = z). The random variable g(Z) is written E(Y | Z) and is
called the conditional expectation of Y given Z.

Example 29 As an example we calculate E(Y1 | Z) where Y1 and Z are given in Example 26. We
have µ ¶
n−1
i−1 i
E(Y1 | Z = i) = P [Y1 = 1 | Z = i] = µ ¶ = .
n n
i
The first of these equalities
µ holds
¶ because Y1 is an indicator. The second follows from the equation in
n−1
Example 26 because is just the number of ways i successes can occur in n Bernoulli trials
i−1
with the first trial being a success. Therefore,
Z
E(Y1 | Z) = .
n
Exercise 30 Let X1 and X2 be the numbers on two independent fair-die rolls. Let X be the mini-
mum and Y the maximum of X1 and X2 . Calculate: E(Y |X = x) and E(X|Y = y).

9
Exercise 31 Repeat the last exercise with X1 and X2 two draws without replacement from {1, 2, ..., n}.

Properties of Conditional Expected Values In the context of previous lecture, the conditional
distribution of a random vector Y given Z = z corresponds to a single probability function Pz on
(Ω, A). Specifically, define for A ∈ A,

Pz (A) = P (A | [Z = z]) if pZ (z) > 0.

This Pz is just the conditional probability function on (Ω, A) mentioned before. Now the conditional
distribution of Y given Z = z is the same as the distribution of Y if Pz is the probability function
on (Ω, A). Therefore, the conditional expectation is an ordinary expectation with respect to the
probability function Pz .
Properties. It follows that all the properties of the expectation given before hold for the
conditional expectation given Z = z. Thus, for any real-valued function r(Y) with E |r(Y)| < ∞,
P
1. E(r(Y) | Z = z) = y r(y)p(y | z) identically in z for any Y1 , Y2 such that E(|Y1 |), E(|Y2 |)
are finite.
2. E(αY1 + βY2 | Z) = αE(Y1 | Z) + βE(Y2 | Z), since E(αY1 + βY2 | Z = z) = αE(Y1 | Z =
z) + βE(Y2 | Z = z) holds for all z.
3. E(Y | Z) = E(Y ) if Y and Z are independent.
4. E(h(Z) | Z) = h(Z).
5. E(q(Y, Z) | Z = z) = E(q(Y, z) | Z = z) (subsitution theorem for conditional expectation)
6. E(E(Y | Z)) = E(Y )

Property 6 is true because

X X X X
E(E(Y | Z)) = PZ (z)[ yp(y | z)] = yp(y | z)pZ (z) = yp(y, z) = E(Y ).
z y y,z y,z

The interchange of summation used is valid because the finiteness of E(|Y |) implies that all sums
converge absolutely.
As an illustration, we check E(E(Y | Z)) = E(Y ) for E(Y1 | Z) = Zn given before. In this case,
µ ¶
Z np
E(E(Y1 | Z)) = E = = p = E(Y1 )
n n

4.2 Continuous Case

Definition 32 Suppose (Y, Z) is a continuous random vector having coordinates that are themselves
vectors and having density function p(y, z). In analogy to (5), the conditional density function of Y
given Z = z is

p(y, z)
p(y | z) =
pZ (z)
if pZ (z) > 0.

10
4.2.1 Bayes’ Rule

The Bayes’ rule for the continuous case is defined as

p (y)q(z | y)
p(y | z) = R ∞ R∞ Y ,
−∞
· · · −∞ pY (t)q(z | t)dt1 · · · dtn
where q is the conditional density of Z given Y = y.

Remark 33 If Y and Z are independent, the conditional distributions equal the marginals as in the
discrete case.
Remark 34 If E(|Y |) < ∞, we denote the conditional expectation of Y given Z = z in analogy to
the discrete case as the expected value of a random variable with density p(y | z). More generally, if
E(|r(Y)|) < ∞, the conditional expectation of r(Y) given Z = z can be obtained from
Z ∞
E(r(Y) | Z = z) = r(y)p(y | z)dy.
−∞

5 Transformation of a Random Vector

We have encountered the change of variable formula for the case involving random variables. In this
section, we will see a more general case: transformation of a random vector.
Let h = (h1 , ...hk )T , where each hi is a real-valued function on Rk . Thus, h is a transformation
from Rk to Rk . Recall that the Jacobian Jh (t) of h evaluated at t = (t1 , · · · , tk )T is by definition
the determinant ¯ ∂ ¯
¯ ∂t h1 (t) · · · ∂t∂ hk (t) ¯
¯ 1 1 ¯
¯ .. .. ¯
Jh (t) = ¯ . . ¯.
¯ ¯
¯ ∂
h1 (t) · · · ∂
hk (t) ¯
∂tk ∂tk

Theorem 35 Let X be continuous and let S be an open subset of Rk such that P (X ∈ S) = 1. If

g = (g1 , · · · , gk )T is a transformation from S to Rk such that g and S satisfy the conditions:
1. g−1 has continous first partial derivatives on S.
2. g−1 is one-to-one on S.
3. The Jacobian of h does not vanish on S.
Then, the density of Y = g(Y) is given by
¯ ¯
pY (y) = pX (g −1 (y)) ¯Jg−1 (y)¯ (6)
for y ∈ g(S)..
Example 36 Suppose X = (X1 , X2 )T where X1 and X2 are independent with N (0, 1) and N (0, 4)
distributions, respectively. What is the joint distribution of Y1 = X1 + X2 and Y2 = X1 − X2 ? Here,
µ · ¸¶
1 1 1
pX (x1 , x2 ) = exp − x21 + x22 .
4π 2 4
In this case, S = R2 . Also note that g1 (x) = x1 + x2 , g2 (x) = x1 − x2 , g1−1 (y) = 21 (y1 + y2 ), g2−1 (y) =
1 2
2 (y1 − y2 ), that the range g(S) is R and that
¯ 1 ¯
¯ 1 ¯ 1
Jg−1 (y) = ¯¯ 21 2 ¯=− .
1 ¯
2 − 2 2
Upon substituting these quantities in (6), we obtain

11
µ ¶
1 1 1
pY (y1 , y2 ) = pX (y1 + y2 ), (y1 − y2 )
2 2 2
· ¸
1 1 1
= exp − (y1 + y2 )2 + (y1 − y2 )2
8π 4 16
1 1 £ 2 ¤
= exp − 5y1 + 5y22 + 6y1 y2 .
8π 32
This is an example of bivariate normal density.

Gamma and Beta Distribution A random variable X has Beta(r, s) distribution if it has density

xr−1 (1 − x)s−1
br,s (x) = , for 0 < x < 1,
B(r, s)
Γ(r)Γ(s) R∞
where B(r, s) = Γ(r+s) is the beta function and Γ(x) = 0 ux−1 e−u du as in the gamma
distribution.

Example 37 If X1 and X2 are independent random variables with gamma(p, λ) and gamma(q, λ)
distributions, respectively, then Y1 = X1 + X2 and Y2 = (X1X+X
1
2)
are independent and have, respec-
tively, gamma(p + q, λ) and Beta(p, q) distribution.

If λ = 1, the joint density of X1 and X2 is

e−(x1 +x2 ) x1p−1 xq−1

2
p(x1 , x2 ) = , for x1 > 0, x2 > 0.
Γ(p)Γ(q)
Let
µ¶ µ ¶
y1 x1 + x2
= g (y1 , y2 ) = x1 .
y2 x1 +x2
µ ¶ µ ¶
x1 y1
Then, g is one-to-one on S = { : x1 > 0, x2 > 0} and its range is S1 = { : y1 >
x2 y2
0, 0 < y2 < 1}. We note that on S1
µ ¶
y1 y2
g−1 (y1 , y2 ) = (7)
y1 − y1 y2
Therefore,
¯ ¯
¯ y
1 − y2 ¯¯
Jg−1 (y1 , y2 ) = ¯¯ 2 = −y1 (8)
−y1 ¯
y1
µ ¶
Y1
Subsititute (7) and (8) into (6) and get for the density of = g (X1 , X2 ) ,
Y2
e−y1 (y1 y2 )p−1 (y1 − y1 y2 )q−1 y1
pY (y1 , y2 ) = , for y1 > 0, 0 < y2 < 1. (9)
Γ(p)Γ(q)
Simplifying (9) gives

pY (y1 , y2 ) = gp+q,1 (y1 )bp,q (y2 ).

Thus, the statement is proved for λ = 1. If λ 6= 1, define X10 = λX1 and X20 = λX2 . Now X10
and X20 are independent Γ(p, 1), Γ(q, 1) variables respectively. Because X10 + X20 = λ(X1 + X2 ) and
X10 (X10 + X20 )−1 = X1 (X1 + X2 )−1 , the statement follows. ¤

12
6 Markov Chain

6.1 Discrete case

In this section, we consider a stochastic process {Xn , n = 0, 1, 2, ...} that takes on a finite or countable
number of possible values. Unless otherwise mentioned, this set of possible values of process will be
denoted by the set of non-negative integers {0, 1, 2, ...}. If Xn = i, then the process is said to be in
state i at time n. We suppose that whenever the process is in state i, there is a fixed probability Pij
that it will next be at state j. That is, we suppose that

P {Xn+1 = j|Xn = i, Xn−1 = in−1 , ..., X1 = i1 , X0 = i0 } = Pij

for all states i0 , i1 , ..., in−1 , i, j and all n ≥ 0. Such a stochastic process is known as a Markov
Chain.

Example 38 (A random walk model) A Markov chain whose state space is given by the integers
i = 0, ±1, ±2, ... is said to be a random walk if , for some number 0 < p < 1,

Pi,i+1 = p = 1 − Pi,i−1 ; i = 0, ±1, ±2, ...

The preceding Markov chain is called a random walk for we may think of it as being a model for
an individual walking on a straight line who at each point of time either takes one step to the right
with probability p or one step to the left with probability 1 − p.

Chapman-Kolmogorov Equations
We have defined the one-step transition probability Pij. We now define the n-step transition
probabilities Pijn to be the probability that a process in state i will be in state j after n additional
transitions. That is,

Pijn = P {Xn+k = j|Xk = i}, n ≥ 0, i, j ≥ 0

The Chapman-Kolmogorov equations provide a method for computing these n-step transition
probabilities. These equations are
∞
X
Pijn+m = n m
Pik Pkj for all n, m ≥ 0, all i, j
k=0
n m
and are most easily understood by noting that Pik Pkj represents the probability that starting
in i the process will go to state j in n + m transitions through a path which takes it into state k
at the nth transition. Hence, summing over all intermediate states k yields the probability that the
process will be in state j after n + m transitions. Formally, we have

Pijn+m = P {Xn+m = j|X0 = i}

∞
X
= P {Xn+m = j, Xn = k|X0 = i}
k=0
X∞
= P {Xn+m = j|Xn = k, X0 = i}P {Xn = k|X0 = i}
k=0
X∞
n m
= Pik Pkj
k=0

13
6.2 Continuous case
Suppose we have a continuous-time stochastic process {X(t), t ≥ 0} taking on values in the set
of non-negative integers. In analogy with the definition of a discrete-time Markov chain, we say
that the process {X(t), t ≥ 0} is a continous-time Markov chain if for all s, t ≥ 0 and non-negative
integers i, j, x(u), 0 ≤ u < s

P {X(t + s) = j|X(s) = i, X(u) = x(u), 0 ≤ u < s}

= P {X(t + s) = j|X(s) = i}

In other words, a continous-time Markov chain is a stochastic process having the Markovian
property that the conditional distribution of the future X(t + s) given the present X(s) and the past
X(u), 0 ≤ u < s, depends only on the present and is independent of the past. If in addition,

P {X(t + s) = j|X(s) = i
is independent of s, then the continuous-time Markov chain is said to have stationary or homo-
geneous transition probabilities.

Example 39 Suppose the service in a particular barbra shop consists of two procedures: A customer
upon arrival goes initially to chair 1 where his/her hair will be raised by an assistant; after this is
done the customer moves on to chair 2 where his/her hair will be cut by the stylist. The service time
at the two steps are assumped to be independent random variable that are exponentially distributed
with respective rates µ1 and µ2 . Suppose that the potential customers arrive in accordance with a
Poisson process having rate λ, and that a potential customer will enter the system only if both chairs
are empty.

This problem can be modeled as a continuous-time Markov chain. Since a potential customer
will enter the shop only if there is no other customers in the shop, there will always be either 0, or 1
customers in the shop. If there is 1 customer in the shop, then we would need to know which chair
the customer is on. Therefore, an appropriate stat space would consists of three states: 0 =shop is
empty, 1 =chair 1 is occupied and 2 =chair 2 is occupied.

7 Delta Method
(The followings are taken from Rice[1995])
Suppose that we know the expectation and the variance of a random variable X, but not the
entire distribution and that we are interested in the mean and variance of Y = g(X) for some fixed
function g. For example, we might be able to measure X and determine its mean and variance, but
really be interested in Y , which is related to X in a known way. We might wish to know V ar(Y ),
at least approximately, in order to assess the accuracy of the indirect measurement process. From
the results given in this section. we cannot in general find E(Y ) = µY and V ar(Y ) = σY2 from
2
E(X) = µX and V ar(X) = σX , unless the function g is linear. However, if g is nearly linear in a
range in which X has high probability, it can be approximated by a linear function and approximate
moments of Y can be found.
In proceeding as just described, we follow a tack often taken in applied mathematics: When
confronted with a nonlinear problem that we cannot solve, we linearize. In probability and statistics,
this method is called propagation of error, or the δ method. Linearization is carried out through
a Taylor series expansion of g about µX . To the first order,

Y = g(X) ≈ g(µX ) + (X − µX )g 0 (µX )

14
We have expressed Y as approximately equal to a linear function of X. Recalling that if U = a + bV ,
then E(U ) = a + bE(V ) and V ar(U ) = b2 V ar(V ), we find

µY ≈ g(µX )
σY2 ≈ σX
2
[g 0 (µX )]2

We know that in general E(Y ) 6= g(E(X)), as given by the approximation. In fact, we can carry
out the Taylor series expansion to the second order to get an improved approximation of µY
1
Y = g(X) ≈ g(µX ) + (X − µX )g 0 (µX ) + (X − µX )2 g 00 (µX )
2
Taking the expectation of the right-hand side, we have, since E(X − µX ) = 0,
1 2 00
E(Y ) ≈ g(µX ) + σX g (µX )
2
How good such approximations are depends on how nonlinear g is in a neighborhood of µX and
on the size of σX . From Chebyshev’s inequality we know that X is unlikely to be many standard
deviations away from µX ; if g can be reasonably well approximated in this range by a linear function,
the approximations for the moments will be reasonable as well.

Example 40 The relation of voltage, current, and resistance is V = IR. Suppose that the voltage
is held constant at a value V0 across a medium whose resistance fluctuates randomly as a result, say,
of random fluctuations at the molecular level. The current therefore also varies randomly. Suppose
that it can be determined experimentally to have mean µI = 0 and variance σI2 . We wish to find the
mean and variance of the resistance, R, and since we do not know the distribution of I, we must
resort to an approximation. We have

V0
R = g(I) =
I
V0
g 0 (µI ) = −
µ2I
2V0
g 00 (µI ) = 3
µI
Thus,
V0 V0
µR ≈ + 3 σI2
µI µI
2 V02 2
σR ≈ σ
µ4I I

We see that the variability of R depends on both the mean level of I and the variance of I. This makes
sense, since if I is quite small, small variations in I will result in large variations in R = V0 /I,
whereas if I is large, small variations will not affect R as much. The second-order correction factor
for µR also depends on µI and is large if µI is small. In fact, when I is near zero, the function
g(I) = V0 /I is quite nonlinear, and the linearization is not a good approximation.

References
[1] Peter J. Bickel and Kjell A. Doksum (2001) Mathematical Statistics: Basic Ideas and Selected
Topics, Vol. I, 2nd Edition. Prentice Hall
[2] Geoffrey Grimmett and David Stirzaker (2002) Probability and Random Processes, 3rd Edition,
Oxford University Press.

15
[3] Jim Pitman (1993) Probability, Springer-Verlag New York, Inc.
[4] John A. Rice (1995) Mathematical Statistics and Data Analysis, 2nd Edition, Duxbury Press.
[5] Sheldon M. Ross (2003) Introduction to Probability Models, Academic Press.

Chapitre 5 - Continuous Random Variables
No ratings yet
Chapitre 5 - Continuous Random Variables
29 pages
Probtheo 5
No ratings yet
Probtheo 5
78 pages
Chapter 2 - Section 2 (Part 1)
No ratings yet
Chapter 2 - Section 2 (Part 1)
42 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
8 pages
Lecture 6 - Fall 2023
No ratings yet
Lecture 6 - Fall 2023
39 pages
Stat100b Gamma Chi T F
No ratings yet
Stat100b Gamma Chi T F
18 pages
Lecture 6 - Fall 2023
No ratings yet
Lecture 6 - Fall 2023
38 pages
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
No ratings yet
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
11 pages
Notes Dvi
No ratings yet
Notes Dvi
34 pages
Beta Distribution Explained
No ratings yet
Beta Distribution Explained
8 pages
Unit03 Slide
No ratings yet
Unit03 Slide
39 pages
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
100% (1)
Lecture Notes 1: Brief Review of Basic Probability (Casella and Berger Chapters 1-4)
14 pages
Chapter 6F-PropCRV - W PDF
No ratings yet
Chapter 6F-PropCRV - W PDF
30 pages
Basic Probability Lecture Notes
No ratings yet
Basic Probability Lecture Notes
7 pages
ProbabilityStatistics Probability3
No ratings yet
ProbabilityStatistics Probability3
9 pages
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
Lecture Slides#7
No ratings yet
Lecture Slides#7
37 pages
Discrete Time Overview
No ratings yet
Discrete Time Overview
28 pages
Lecture 3:4 (Part 2)
No ratings yet
Lecture 3:4 (Part 2)
8 pages
Nonlife Actuarial Models: Claim-Severity Distribution
No ratings yet
Nonlife Actuarial Models: Claim-Severity Distribution
62 pages
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
No ratings yet
Lecture04 CH 04 ContinuousDistributions Baron Inf Stats FA24
46 pages
Important Distributions
No ratings yet
Important Distributions
25 pages
Unit-11 IGNOU STATISTICS
No ratings yet
Unit-11 IGNOU STATISTICS
23 pages
EEN330 Topic 5
No ratings yet
EEN330 Topic 5
5 pages
Uniform and Normal Distribution
No ratings yet
Uniform and Normal Distribution
4 pages
Week 5-8 Short Notes
No ratings yet
Week 5-8 Short Notes
10 pages
MIT18 S096F13 Lecnote3
No ratings yet
MIT18 S096F13 Lecnote3
7 pages
Appendix A Probability and Statistics
No ratings yet
Appendix A Probability and Statistics
12 pages
استدلال احصائي
No ratings yet
استدلال احصائي
110 pages
Normal Distribution - Wikipedia, The Free Encyclopedia
No ratings yet
Normal Distribution - Wikipedia, The Free Encyclopedia
22 pages
Review of Random Variables
No ratings yet
Review of Random Variables
8 pages
Common Probability Distributions Overview
No ratings yet
Common Probability Distributions Overview
6 pages
Proba Lectures
No ratings yet
Proba Lectures
29 pages
Chapter4 Random Samples December 2024
No ratings yet
Chapter4 Random Samples December 2024
23 pages
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
8 pages
Moment Generating Functions (MGF)
No ratings yet
Moment Generating Functions (MGF)
13 pages
Packet 6
No ratings yet
Packet 6
16 pages
Section 06 Normal Distribution and Sums of Variables
No ratings yet
Section 06 Normal Distribution and Sums of Variables
16 pages
STT201
No ratings yet
STT201
19 pages
ST3236 Note3
No ratings yet
ST3236 Note3
17 pages
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
No ratings yet
Addis Ababa Science & Technology University Department of Electrical & Computer Engineering
63 pages
FormulaSheet Final
No ratings yet
FormulaSheet Final
19 pages
Chap 2.2
No ratings yet
Chap 2.2
34 pages
Wa0005.
No ratings yet
Wa0005.
118 pages
Probability Basics for Students
No ratings yet
Probability Basics for Students
56 pages
Prelims Stats
No ratings yet
Prelims Stats
39 pages
Statistics 9
No ratings yet
Statistics 9
2 pages
PCA Frances
No ratings yet
PCA Frances
9 pages
Some Continuous and Discrete Distributions: X y B A
No ratings yet
Some Continuous and Discrete Distributions: X y B A
8 pages
Prob-Review Xid-8243918 1
No ratings yet
Prob-Review Xid-8243918 1
21 pages
Spring 2011
No ratings yet
Spring 2011
11 pages
Test
No ratings yet
Test
13 pages
PG TRB Maths Short Notes
No ratings yet
PG TRB Maths Short Notes
7 pages
Notes For Lectures 1 To 10 - 2024
No ratings yet
Notes For Lectures 1 To 10 - 2024
39 pages
Moment Generating Function of Normal Distribution
No ratings yet
Moment Generating Function of Normal Distribution
38 pages
1 Notes On Brownian Motion: 1.1 Normal Distribution
No ratings yet
1 Notes On Brownian Motion: 1.1 Normal Distribution
15 pages
J Continuous
No ratings yet
J Continuous
5 pages
Use of Moment Generating Functions: X, Y, Z, Determine The Moment
No ratings yet
Use of Moment Generating Functions: X, Y, Z, Determine The Moment
40 pages
Statistics AS-level Formula Sheet
No ratings yet
Statistics AS-level Formula Sheet
5 pages
Assignment-II Random Variables
No ratings yet
Assignment-II Random Variables
5 pages
Chapter 4 Discrete Random Variable
No ratings yet
Chapter 4 Discrete Random Variable
23 pages
12th Std Maths: Key Probability Questions
No ratings yet
12th Std Maths: Key Probability Questions
5 pages
Terro's Real Estate Agency
No ratings yet
Terro's Real Estate Agency
17 pages
Probability and Statistics Exam Paper
No ratings yet
Probability and Statistics Exam Paper
4 pages
Order Statistics for Statisticians
No ratings yet
Order Statistics for Statisticians
16 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
Module 02
No ratings yet
Module 02
86 pages
Business Mathematics & Statistics - MTH302 Fall 2008 Final Term Paper Session 1
No ratings yet
Business Mathematics & Statistics - MTH302 Fall 2008 Final Term Paper Session 1
10 pages
DS Imp List by MD
No ratings yet
DS Imp List by MD
4 pages
Physical Reliability Models - Static Models
No ratings yet
Physical Reliability Models - Static Models
29 pages
Mean Variance and Standard Deviation of A
No ratings yet
Mean Variance and Standard Deviation of A
19 pages
Introduction to Probability Models 9th Edition Sheldon M. Ross
No ratings yet
Introduction to Probability Models 9th Edition Sheldon M. Ross
483 pages
Exercise 5 (4-5)
No ratings yet
Exercise 5 (4-5)
1 page
Higher Order Growth Curves and Mixture Modeling With Mplus A Practical Guide 2nd Edition Tae Kyoung Lee Updated 2025
No ratings yet
Higher Order Growth Curves and Mixture Modeling With Mplus A Practical Guide 2nd Edition Tae Kyoung Lee Updated 2025
101 pages
Binomial Distribution in AP Stats
No ratings yet
Binomial Distribution in AP Stats
2 pages
ST120 Practice Sheet 8
No ratings yet
ST120 Practice Sheet 8
3 pages
Part 2-1 Random Samples Sampling Distributions - Notes
No ratings yet
Part 2-1 Random Samples Sampling Distributions - Notes
8 pages
MCQ (Binomial, Poisson, Normal)
100% (1)
MCQ (Binomial, Poisson, Normal)
1 page
12 MSB Maths Test - Probability Distribution 3
No ratings yet
12 MSB Maths Test - Probability Distribution 3
2 pages
Objective Statistics
No ratings yet
Objective Statistics
19 pages
Probability Distributions MCQs Set 3
No ratings yet
Probability Distributions MCQs Set 3
4 pages
CompleteLectureNotes STAT 261
No ratings yet
CompleteLectureNotes STAT 261
158 pages
Weekly Progress Table - Sample
No ratings yet
Weekly Progress Table - Sample
2 pages
MATM - Z Table
No ratings yet
MATM - Z Table
1 page
Geometric & Poisson Probability Exercises
No ratings yet
Geometric & Poisson Probability Exercises
13 pages
Topic 3 Discrete & Continuous Probability Distributions (Student)
No ratings yet
Topic 3 Discrete & Continuous Probability Distributions (Student)
58 pages
Confirmatory Factor Analysis
No ratings yet
Confirmatory Factor Analysis
8 pages
2024fall EECE7204 Hw2 Answers
No ratings yet
2024fall EECE7204 Hw2 Answers
2 pages

Understanding Normal and Gamma Distributions

Uploaded by

Understanding Normal and Gamma Distributions

Uploaded by

Probability Lecture II (August, 2006)

1 More on Named Distribution

Properties The normal density curve has the following properties:

1. it is symmetric about µ with a bell-shape curve, concave on either side of µ.

Figure 1: The normal(µ, σ 2 ) density curve.

2 The Moment-Generating Function

Definition 1 The moment-generating function (mgf ) of a random variable X is M (t) = E(etX )

Example 9 Suppose X has a gamma(α1 , λ) distribution , and independently Y has a gamma(α2 , λ)

µ ¶α1 µ ¶α2 µ ¶α1 +α2

for discrete variables

Table 1: Table for Joint Distribution formulae

3.1.1 For Independent Variables

Fmax (x) = P (Xmax ≤ x)

Fmin (x) = P (Xmin ≤ x)

Example 11 Minimum of independent exponential variable is exponential.

For i = 1, 2, ..., n, the cdf of Xi is

Fmin (x) = 1 − e−λ1 x e−λ2 x ...e−λn x

which is the cdf of the exponential distribution with rate λ1 + λ2 + ... + λn . ¤

Example 12 Suppose X and Y are independent uniform (0, 1) random variables.

Since X and Y are independent,

f (x, y) = (λe−λx )(µe−µy ) = λµe−λx−µy .

a) Find the joint density of U(2) and U(4) .

Independent Normal Variables

a) Find P (X1 + X2 + X3 < 4). √

P (4X1 − 10 < X2 + X3 ) = P (4X1 − X2 − X3 < 10) = P (L < 10) where L = 4X1 − X2 − X3 .

This is the convolution formula.

The density function of the t distribution with n degrees of freedom is

To see that (n − 1)S 2 /σ 2 is χ2n−1 , note that

which is the mgf of a random variable with a χ2n−1 distribution.

3.1.2 For Dependent Variables

X = x, Y has uniform (0, x) distribution. Find the joint density of X and Y .

Disceret Case Continuous Case

Table 2: Table for Conditioning formulae

4 Conditioning by a Random Vector

This probability distribution is called the conditional distribution of Y given that Z = z.

p(y, z) = p(y | z)fZ (z)

whenever the denominator of the right-hand side is positive.

4.1.2 Conditional Expectation for Discrete Variables

Note that if pZ (z) > 0,

Pz (A) = P (A | [Z = z]) if pZ (z) > 0.

Property 6 is true because

4.2 Continuous Case

The Bayes’ rule for the continuous case is defined as

5 Transformation of a Random Vector

Theorem 35 Let X be continuous and let S be an open subset of Rk such that P (X ∈ S) = 1. If

If λ = 1, the joint density of X1 and X2 is

e−(x1 +x2 ) x1p−1 xq−1

pY (y1 , y2 ) = gp+q,1 (y1 )bp,q (y2 ).

6.1 Discrete case

P {Xn+1 = j|Xn = i, Xn−1 = in−1 , ..., X1 = i1 , X0 = i0 } = Pij

Pi,i+1 = p = 1 − Pi,i−1 ; i = 0, ±1, ±2, ...

Pijn = P {Xn+k = j|Xk = i}, n ≥ 0, i, j ≥ 0

Pijn+m = P {Xn+m = j|X0 = i}

P {X(t + s) = j|X(s) = i, X(u) = x(u), 0 ≤ u < s}

Y = g(X) ≈ g(µX ) + (X − µX )g 0 (µX )

You might also like