0% found this document useful (0 votes)
344 views36 pages

SMA 240 Probability and Statistics 1 Lecture Notes

The document contains lecture notes for a course on Probability and Statistics I, covering topics such as random variables, distribution functions, moments, and linear regression. It includes detailed explanations, properties, and examples related to each topic, along with chapter problems for practice. The notes are authored by Dr. Davis Bundi from the University of Nairobi and reference key textbooks in the field.

Uploaded by

gwan94484
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
344 views36 pages

SMA 240 Probability and Statistics 1 Lecture Notes

The document contains lecture notes for a course on Probability and Statistics I, covering topics such as random variables, distribution functions, moments, and linear regression. It includes detailed explanations, properties, and examples related to each topic, along with chapter problems for practice. The notes are authored by Dr. Davis Bundi from the University of Nairobi and reference key textbooks in the field.

Uploaded by

gwan94484
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

SMA 240: Probability and Statistics I

Lecture Notes, February 2023


Dr. Davis Bundi - [email protected]
Department of Mathematics, University of Nairobi
Comments; In case of any typos, inform the author

Reference Books
1. An introduction to Probability and Statistics. Rohatgi, V.K. and Saleh, A.K. Second
Edition, Wiley Eastern Limited, 2011

2. Introduction to the theory of Statistics. Mood, A., Graybill, F., and Boes, D.C.
Third Edition, London

Contents
1 Review of Random Variables 2
1.1 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Chapter Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Moments and Moments Generating Functions 5


2.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Property of Moment Generating Function . . . . . . . . . . . . . . . . . . 9
2.4 Markov and Chebyshev’s Inequality . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Chapter Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Bivariate Probability Distribution 13


3.1 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Bivariate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Marginal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Bivariate Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.8 Chapter Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1
4 Linear Regression and Correlation Analysis 24
4.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Chapter Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Distribution of Functions of Random Variables 28


5.1 Cumulative distribution function technique . . . . . . . . . . . . . . . . . . 28
5.2 Change of Variable Technique . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Chapter Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Derived Distributions 35
6.1 Gamma Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

1 Review of Random Variables


1.1 Random Variable
A RV X is said to be discrete if with probability one, it can take only a finite or countably
infinite number of possible values, that is

Σ∞
k=1 P (X = xk ) = 1

X is a continuous RV if there exists a function fx : ℜ → [0, ∞) such that


Z ∞
P (X = x) = fX (s)dx
−∞

The probability density function (Pdf) must satisfy


(a) fX (x) ≥ 0 for all x ∈ ℜ
R∞
(b) −∞ fX (x)dx = 1
Rb
(c) P (a ≤ X ≤ b) = a fX (x)dx for any a ≤ b

Exercise 1 1. Let X be a continuous RV with Pdf


 −x
e : x>0
f (x) =
0 x≤0
Find P (x > a) for a > 0
2. X has a probability mass function (pmf ) given by
 1
, x=0
 41


2
, x=1
f (x) = 1
, x=2
 4


0 elsewhere
1
Find (a)Σx f (x) and (b)P (X = 1) = 2

2
1.2 Distribution Function
Let X be a RV defined on a sample space X. Consider the event E that X satisfies,
−∞ < X ≤ x, where x is any real number
P (x ∈ E) = P [−∞ < X ≤ x] = P [X ≤ x] = F (x)
The function F (x) is called the Distribution Function or the Cumulative Distribution
Function (cdf) of the RV X. For continuous RV X with Pdf f (x)
Z ∞
F (x) = f (x)dx
−∞
For discrete RV X with pmf f (x)
F (x) = Σx≤n f (x)

1.2.1 Properties of cumulative distribution function


(a) 0 ≤ F (x) ≤ 1 since 0 ≤ P (X ≤ x) ≤ 1
(b) If a and b are any real numbers such that a ≤ b, then P (a < X < b) = F (b) − F (a)
(c) F (x) is a non-decreasing function of x
(d) The limit Limx→−∞ F (x) = 0 and Limx→∞ F (x) = 1

1.3 Expectation
Let X be a discrete RV which takes the values x1 , x2 , x3 , . . . and whose pmf is defined by
f (x) = P (X = xi ), i = 1, 2, 3, . . .
The expected value of X, denoted by E(X) is defined by
E(x) = Σ∞
i xi f (xi )

If X is a continuous RV with Pdf, f (x), the expected of X, E(X) is defined as


Z ∞
E(X) = xf (x)dx
−∞
More generally, if g(x) is a function of x, then
Z ∞
E[g(x)] = g(x)f (x)dx
−∞

1.3.1 Properties of Expectation


1. If g(x) = ax + b where a, b ∈ ℜ, then
E[g(x)] = E[ax + b] = aE(x) + b

2. Let g(x) and h(x) be any real valued function of x. The expected value is
E[ag(x) + bh(x)] = aE[g(x)] + bE[h(x)], where a, b ∈ ℜ

3
1.4 Variance
The variance for a RV X is expressed in terms of the expected value

Var(X) = E(X 2 ) − [E(X)]2

where E(X 2 ) = Σ∞ 2
i=1 xi f (xi ) for discrete RV

Z ∞
2
and E(X ) = x2 f (x)dx for continuous RV
−∞

Example 1 Given the Pdf, find the expected value and the variance
 x
2
: 0≤x≤2
f (x) =
0 elsewhere
R2 R2 x2
Solution 1 (a) E(X) = 0 x. x2 dx = 0 2
dx = 4
3
R2 R2 3
(b) E(X 2 ) = 0 x2 . x2 = 0 x2 dx = 2
16 2
(c) V ar(x) = E(x2 ) − [E(x)]2 = 2 − 9
= 9

1.5 Chapter Problems


1. Find the value of the constant c such that the function is a density function
 2
cx , 0 ≤ x ≤ 3
f (x) =
0 otherwise

Then compute the following

(a) P (1 < X < 2)


(b) F (x)
(c) E(x) and V ar(x)

2. A continuous RV X has a Pdf, f (x) given by


 k
4
, 2≤x≤4
f (x) =
0 elsewhere

Find

(a) The cdf of x, that is F (x)


(b) if y = x2 , find the cdf of y, that is F (y)
(c) P [X ≥ 1]

3. Suppose that X has a moment generating function


1 1
MX (t) = (1 − 2t)− 2 , for t <
2
4
ˆ Compute the standard deviation for the random variable X

4. The cdf of a RV X is 
 0, x < −1
x+1
F (x) = 2
, −1 ≤ x ≤ 1
1, x≥1

(a) Find: P [x ≥ 12 ] and P [− 12 < X ≤ 34 ]

(b) What is the value of a such that P (X ≤ a) = 0.65?

5. A RV X has a pdf 
cx, 1 ≤ x ≤ 3
f (x) =
0 elsewhere
Find the constant c and P [0 ≤ x ≤ 1]

6. The random variable X has a pmf given by



k(x + 3), x = 0, 1, 2, 3
f (x) =
0 elsewhere

(a) Find the value of the constant k such that the distribution is a mass function
(b) P (x = 3, 4)
(c) Find the cdf, that is F (x)

2 Moments and Moments Generating Functions


The parameters µ and σ are very important parameters that describe the center and the
spread of a random variable X. They, however don’t provide the unique characterization
of the distribution of X.

2.1 Moments
Definition 1 The k th moment of a random variable X taken about the origin is defined
to be
E Xk


and is denoted by µ′k

From the above definition we can easily verify that the first moment about the origin
is
E(X) = µ′1 = µ,
the second moment about the origin is

E X 2 = µ′2


and so on. In addition to taking moments about the origin, a moment of a random
variable can also be taken about the mean µ.

5
Definition 2 The k th moment of a random variable X taken about its mean, or the k th
central moment of X, is defined to be
h i
E (X − µ)k

and is denoted by µk

The major use of moments is to approximate the probability distribution of a ran-


dom variable (usually and estimator or a decision maker). This further means that, the
moments µ′k where k = 0, 1, 2 · · · are primarily of theoretical value for k > 3.

2.2 Moment Generating Functions


A moment generating function is an interesting type of expectation of a random variable
which, figuratively speaking, packages all the moments of a random variable into one
simple expression.

Definition 3 The moment generating function MX (t) for a random variable X is defined
to be
MX (t) = E etX
 

We consider the moment generating functions of some probability distribution func-


tions:

2.2.1 Bernoulli Distribution


Consider a Bernoulli distributed random variable X, with probability mass function

f (x) = px (1 − p)1−x ; where x = 0, 1

The moment generating function is given by


1
X
tX
etx px (1 − p)1−x

MX (t) = E e =
x=0
= (1 − p)1 + pet = pet + 1 − p = pet + q, where q = 1 − p

2.2.2 Binomial Distribution


Consider a Binomial distributed random variable X, that is X ∼ Bin(n, p), with the
probability mass function
 
n x
f (x) = p (1 − p)n−x ; where x = 0, 1, 2, ..., n
x

From the definition of moment generating function

MX (t) = E etX ,
 

6
the moment generating function of the Binomial distributed random variable X is given
as
n
X
tX
etx f (x) where f (x) is the probability mass function of X

MX (t) = E e =
x=0
n   n  
X n x X n x
= e tx
p (1 − p)n−x
= pet (1 − p)n−x
x x
x=0t n  t x=0
n
= pe + (1 − p) = pe + q , where q = 1 − p

2.2.3 Poisson Distribution


Consider a Poisson distributed random variable X with mean λ, that is, X ∼ Pois(λ) and
probability mass function

e−λ λx
f (x) = ; where x = 0, 1, 2, ..., ∞
x!
The moment generating function MX (t) of this random variable is given as
∞ ∞ x
 X e−λ λx X (λet ) t t
MX (t) = E etx = etx = e−λ = e−λ eλe = eλ(e −1)
x=0
x! x=0
x!

2.2.4 Exponential Distribution


Consider an Exponential distributed random variable X with probability density

f (x) = λe−λx ; where x > 0

The moment generating function is given as


Z ∞ Z ∞
−λx
tX tx
e−(λ−t)x dx

MX (t) = E e = e λe dx = λ
0 0

λ −(λ−t) λ λ
= e = [0 − 1] =
t−λ t−λ λ−t
0

2.2.5 Gamma Distribution


Consider a Gamma distributed random variable X, with probability density function
1 x
α−1 − β
f (x) = x e ; where 0 < x < ∞, α > 0, β > 0
Γ(α)β α

If λ = β1 , then

λα α−1 −xλ
f (x) = x e ; where 0 < x < ∞, α > 0, λ > 0
Γ(α)

Given the two listed properties of Gamma function. For any positive real number α:

7
R∞
ˆ Γ(α) = 0 xα−1 e−x dx
R∞
ˆ 0 xα−1 e−λx dx = Γ(α)
λα
, for λ > 0
The moment generating function of the Gamma distribution can be expressed as:


λα tx α−1 −xλ
Z
tX

MX (t) = E e = e x e dx
0 Γ(α)
Z ∞
λα
= xα−1 e−x(λ−t) dx
Γ(α) 0
dy y
Let y = (λ − t)x =⇒ dx = (λ−t)
=⇒ x = (λ−t)
. We substitute to get:
Z ∞
λα y α−1 −y dy
MX (t) = e
Γ(α) 0 (λ − t)α−1 (λ − t)α−1
α Z ∞
λ 1
= y α−1 e−y dy
Γ(α) (λ − t)α 0
Z ∞
λα Γ(α) h
α−1 −y
i
= Since y e dy = Γ(α)
Γ(α) (λ − t)α 0
λα
MX (t) =
(λ − t)α

2.2.6 Normal Distribution


Consider a Normal distributed random variable X, that is X ∼ N(µ, σ 2 ), with the prob-
ability density function
1 1
f (x) = √ exp − (X − µ)2 ; where −∞<x<∞
2πσ 2 2σ 2
The moment generating function is calculated as follows
Z ∞
1 1 2
tx
eµt e− 2σ2 (x−µ) dx

MX (t) = E e = √
2πσ
Z−∞

1 1 2 2 1 2 2
= √ eµt+ 2 σ t e− 2σ2 [x−(µ+σ t)] dx
−∞ 2πσ
Z ∞
1 2 2 1 1 2 2
=e µt+ 2 σ t
√ e− 2σ2 [x−(µ+σ t)] dx
−∞ 2πσ
1 2 t2
MX (t) = eµt+ 2 σ
We note the following
R ∞ 1 − 1 [x−(µ+σ2 t)]2
ˆ −∞ √2πσ e 2σ2 dx = 1
−1 2 −1
tx − 2
(x + µ2 − 2xµ) = 2 (x2 + µ2 − 2xµ − 2txσ 2 )
2σ 2σ
−1 1
= 2 [x2 + µ2 + 2µtσ 2 + t2 σ 4 − 2xµ − 2xtσ 2 ] + [tµ + t2 σ 2 ]
2σ 2
1 2 2 1
= [tµ + t σ ] − 2 [x − (µ + tσ 2 )]2
2 2σ
8
2.3 Property of Moment Generating Function
One of the most important properties of moment generating function is that, if we can
find
E etX
 

we can find any of the moments of X.

Theorem 1 If MX (t) exists, then for any positive integer k,

dMX (t) (k)


= MX (0) = µ′k
dt
t=0

That is, if you find the k th derivative of MX (t) with respect to t and then set t = 0, the
result will be µ′k
In the section below, we show how to calculate the mean and variance using moment
generating functions.
ˆ Discrete Case: In the discrete case the moment generating function is in general
given by
 X tx
MX (t) = E etX = e f (x)
x

Taking the first derivative we have


dMX (t) X tx
= xe f (x)
dt x

The derivative at t=0 is


dMX (t) X
= xf (x) = E(X)
dt x
t=0

Taking the second derivative we have


d2 MX (t) X 2 tx
= x e f (x)
dt2 x

The derivative at t=0 is


d2 MX (t) X
= x2 f (x) = E(X 2 )
dt2 x
t=0

ˆ Continuous Case: In the continuous case the moment generating function is in


general given by Z
tX
= etx f (x)dx

MX (t) = E e
x
Taking the first derivative we have
Z
dMX (t)
= xetx f (x)dx
dt x

9
The derivative at t=0 is
Z
dMX (t)
= xf (x)dx = E(X)
dt x
t=0

Taking the second derivative we have

d2 MX (t)
Z
= x2 etx f (x)dx
dt2 x

The derivative at t=0 is

d2 MX (t)
Z
= x2 f (x)dx = E(X 2 )
dt2 x
t=0

Hence
M ′ (0) = E(X) and M ′′ (0) = E(X 2 )
And
V ar(X) = E(X 2 ) − [E(X)]2 = V ar(X) = M ′′ (0) − [M ′ (0)]2
Therefore
E(X) = M ′ (0)
V ar(X) = M ′′ (0) − [M ′ (0)]2
The examples below show how to calculate the mean and variance for 3 distributions.
Note that calculations for the other distributions are left as exercises.

Example 2 Consider a Bernoulli distribution with mass function

f (x) = px (1 − p)1−x with x = 0, 1

Use the moment generating function to find the mean and variance.

Solution 2
MX (t) = (1 − p) + pet
dMX (t)
E(X) = = M ′ (0) = pet =p
dt
t=0 t=0

d2 MX (t)
E(X 2 ) = = M ′′ (0) = pet =p
dt2
t=0 t=0
2 2 2
V ar(X) = E(X ) − [E(X)] = p − p = p(1 − p)

Example 3 Consider a Geometric distribution with mass function

f (x) = q x p with x = 0, 1, 2, ..., ∞

Find the moment generating function and the mean and variance.

10
Solution 3 ∞ ∞
X X p
MX (t) = E(etx ) = etx q x p = p (qet )x =
x=0 x=0
1 − qet
t
dMX (t) −p(−qe ) pq pq q
E(X) = |t=0 = t 2
|t=0 = 2
= 2 =
dt (1 − qe ) (1 − q) p p
d2 MX (t) pqet (1 − qet )2 + 2pq 2 e2t (1 − qet ) pq + pq 2
E(X 2 ) = |t=0 = |t=0 =
dt2 (1 − qet )4 p3
pq + pq 2 q 2 q + q2 − q2 q
V ar(X) = M ′′ (0) − [M ′ (0)]2 = − = =
p3 p2 p2 p2

Example 4 Let X and Y be independent; Z = XY

Solution 4 Z Z Z
E(XY ) = xyf (x, y)dxdy = xyf (x)f (y)dxdy
y x y
Z Z Z Z
= x[ yf (y)dy]f (x)dx = f (x)dx yf (y)dy = E(X)E(Y )
x y x y

Example 5 Let X and Y be two independent random variables with mgf ’s MX (t) and
MY (t) respectively. Obtain the mgf of Z = X + Y , or rather, show that MZ (t) =
MX (t)MY (t).

Solution 5

MZ (t) = E(etZ ) = E(et(X+Y ) ) = E(etX+tY ) ) = E(etX etY )

Due to independence;
= E(etX ).E(etY ) = MX (t)MY (t)

2.4 Markov and Chebyshev’s Inequality


First, we consider the Markov inequality and use that inequality to derive the Chebyshev’s
inequality.

11
Markov Inequality
Let X be any positive and continuous random variable, we can write:
Z ∞
E(x) = xf (x)dx
−∞
Z ∞
= xf (x)dx (Since X is positive)
0
Z ∞
≥ xf (x)dx (for any a > 0)
Za ∞
≥ af (x)dx (Since x > a in the integrated region)
a
Z ∞
≥a f (x)dx
a
≥ aP (X ≥ a)

We write the Markov inequality as:

E(x)
P (X ≥ a) ≤ , for any a > 0
a

Chebyshev’s Inequality
Let X be any random variable. If we define Y = [X − E(X)]2 , then Y is a non-negative
random variable. We apply Markov’s inequality to Y , for any positive real number b.

E(Y )
P [Y ≥ b2 ] ≤
b2
But E(Y ) = E[X − E(X)]2 = V ar(X) and this implies

P [Y ≥ b2 ] = P ([X − E(X)]2 ≥ b2 ) = P (|X − E(X)| ≥ b)

And we can write the Chebyshev’s inequality as

V ar(X)
P (|X − E(X)| ≥ b) ≤ , for any b > 0
b2
ˆ Remark: If the variance is small, then X is unlikely to be too far from the mean.

2.5 Chapter Problems


1. The moment generating function for the Gaussian distribution is given by MX (t) =
exp(µt + 12 σ 2 t2 ). Find the expectation and variance of this distribution.

2. Each of 3 boxes contains 4 bolts, 3 square in shape and 1 circular in shape. A bolt
is chosen at random from each box. Find the probability that 3 circular bolts are
chosen.

12
3. If f (x) = λe−λx , if x ≥ 0 and zero elsewhere. Find the moment generating function
of f (x) and E(x)

4. If x = 1, 2, 3, . . . has the geometric distribution f (x) = pq x−1 , where q = 1 − p, show


that the moment generating function is

pet
M (t) =
1 − qet

Find E(x)

5. Find the moment generating function of f (x) = 1 where 0 < x < 1 and thereby
1
confirm that E(x) = 12 and V ar(x) = 12

6. Find the moment generating function of the point binomial

f (x) = px (1 − p)1−x

where x = 0, 1. What is the relationship between this and the moment generating
function of the binomial distribution?

7. Calculate the E(x) and V ar(x) for the Gamma distribution in section (2.2.5) and
the Normal distribution in section (2.2.6)

3 Bivariate Probability Distribution


I real life, we are often interested in two (or more) random variables at the same time.
For example, we might measure the height and weight of people, or the income and food
expenditure of a group of workers, or the frequency of exercise and the rate of heart
disease in adults, or the level of air pollution and rate of respiratory illness in cities. In
such situations the random variables have a joint distribution that allows us to compute
probabilities of events involving both variables and understand the relationship between
the variables.
In this chapter, we focus on bivariate analysis, where exactly two measurements are
made on each observation. The two measurements will be called X and Y . Since X and
Y are obtained for each observation, the data for one observation is the pair (X, Y ). Let
X and Y be two random variables defined on the same sample space. Then, the ordered
pair (X, Y ) is called a two dimensional random variable.

Definition 4 Suppose that X and Y are random variables. The joint distribution, or bi-
variate distribution of X and Y is the collection of all probabilities of the form P [(X, Y ) ∈
C] for all sets C ⊂ ℜ2 such that {(X, Y ) ∈ C} is an event.

3.1 Joint distributions


3.1.1 Continuous Case
If f (x, y) is continuous random variable, that is, it can take any value in a rectangle, then

13
 
(x, y) : a < x < b
.
c<y<d
In a plane, the Joint probability density function, f (x, y) is a non-negative real valued
function defined on ℜ2 such that
R∞ R∞
(i) P [(X, Y ) ∈ ℜ2 ] = −∞ −∞ f (x, y)dxdy
RdRb
(ii) P [a < x < b, c < y < d] = c a f (x, y)dxdy
RR
(iii) P [X ⊂ ℜ, ℜ ⊂ ℜ2 ] = ℜ
f (x, y)dxdy
Example 6 Consider the Joint probability density function defined by

k(6 − x − y) : 0 < x < 2, 2 < y < 4
f (x, y) =
0 elsewhere
Find
(i) the value of the constant k
(ii) P (x < 1, y < 3)
(iii) P (x + y < 3)
R4R2
Solution 6 (i) 2 0 k(6 − x − y)dxdy = 1
Z 4 Z 4
x2 2
k (6x − − xy)|0 dy = k (12 − 2 − 2y)dy
2 2 2
= k[10y − y 2 ]42 = k[40 − 16 − 20 + 4] = 1
1
k=
8
 1
8
(6 − x − y) : 0 < x < 2, 2 < y < 4
f (x, y) =
0 elsewhere
(ii) P [x < 1, y < 3]
Z 3Z 1
1 3 x2
Z
1
(6 − x − y)dxdy = (6x − xy − )|dy
2 0 8 8 2 2
Z 3
1 11 1 11 y2
= ( − y)dy = [ y − ]32
8 2 2 8 2 2
1 33 9 1 3
= [ − − 11 + 2] = [33 − 9 − 22 + 4] =
8 2 2 16 8
(iii) P [x + y < 3]
1 3 3−y 1 3 x2
Z Z Z
= (6 − x − y)dxdy = 6x − xy − |03−y dy
8 2 0 8 2 2
Z 3 2
1 9 y
= (18 − 6y − + 3y − − 3y + y 2 )dy
8 2 2 2
1 3 27 y2 y3 3
Z
1 27 2 5
= ( − 6y + )dy = [ y − 3y + ]2 =
8 2 2 2 8 2 6 24

14
3.1.2 Discrete Case
The random variable (X, Y ) are discrete if each of it’s components x and y are discrete.
The Joint probability density function of x and y is defined as:

f (x, y) = P (X = x and Y = y), satisfying

(i) f (x, y) ≥ 0 ∀(x, y) ∈ ℜ2


P P
(ii) x y f (x, y) = 1

Suppose that X can assume any one of m values x1 , x2 , . . . , xm and Y can assume any
one of n values y1 , y2 , . . . yn . Then, the probability of the event that X = xj and Y = yk
is given by
P (X = xj , Y = yk ) = f (xj , yk )
A joint probability function for X and Y can be represented by a joint probability table
as in Table 3.1.2. The probability that X = xj is obtained by adding all entries in the
row corresponding to xi and is given by
n
X
P (X = xj ) = f1 (xj ) = f (xj , yk )
k=1

Y
y1 y2 ... yn Totals
X
x1 f (x1 , y1 ) f (x1 , y2 ) ... f (x1 , yn ) f1 (x1 )
x2 f (x2 , y1 ) f (x2 , y2 ) ... f (x2 , yn ) f1 (x2 )
.. .. .. .. ..
. . . . .
xm f (xm , y1 ) f (xm , y2 ) ... f (xm , yn ) f1 (xm )
Totals f2 (y1 ) f2 (y2 ) ... f2 (yn ) 1

For j = 1, 2, . . . , m, these are indicated by the entry totals in the extreme right-hand
column or margin of table3.1.2. Similarly, the probability that Y = yk is obtained by
adding all entries in the column corresponding to yk and is given by
m
X
P (X = xk ) = f2 (xk ) = f (xj , yk )
j=1

For k = 1, 2, . . . , n, these are indicated by the entry totals in the bottom row or margin
of Table 3.1.2. The probabilities f1 (xj ) and f2 (yk ) are the marginal probability functions
of X and Y , respectively.
It should also be noted that
m
X n
X
f1 (xj ) = 1, f2 (xk ) = 1
j=1 k=1

15
which can be written as m X
n
X
f (xj , yk ) = 1
j=1 k=1

The joint distribution function of X and Y is defined by


XX
F (x, y) = P (X ≤ x, Y ≤ y) = f (u, v)
u≤x v≤y

In Table 3.1.2, F (x, y) is the sum of all entries for which xj ≤ x and yk ≤ y.

Example 7 The table shows the promotional status of police officers during the past two
years.
Promoted Not Promoted Total
Men 288 672 960
Women 36 204 240
Total 324 876 1200

ˆ Let M be the event that an officer is a male

ˆ Let W be the event that an officer is a woman

ˆ Let A be the event that an officer is promoted

ˆ Let Ac be the event that an officer is not promoted

Required

(a) Probability that an officer is a man and is promoted

(b) Probability that an officer is a woman and is not promoted

(c) Probability that an officer is promoted

Solution 7 (a) P (M ∩ A) = 0.24


204
(b) P (W ∩ Ac ) = 1200
= 0.17

(c) P (A) = 0.27

16
3.2 Bivariate Functions
The joint distribution function (cdf), F (x, y) of two random variables X and Y defined
on the same sample space is given by
F (x, y) = P (X ≤ x and Y ≤ y), −∞ < x < ∞
If X and Y are continuous, then
Z ∞ Z ∞ Z ∞
F (x, y) = f (u, v)dvdu = f (x, y)dydx
−∞ −∞ −∞

If F (x, y) is differentiable, then the joint probability density function of X and Y is


∂ 2 F (x, y)
f (x, y) =
∂y∂x
Example 8 If F (x, y) = x + y + 2xy
Solution 8
∂F (x, y)
= 1 + 2y
∂x
and
∂ 2 F (x, y)
= 2 = f (x, y)
∂y∂x

3.3 Marginal distributions


Let X and Y be two continuous random variables with joint probability density function,
f (x, y), then
Z bZ ∞
P [a < X < b] = f (x, y)dydx
a −∞
The integral Z ∞
f (x, y)dy = f1 (x)
−∞
is the marginal probability density function of x, while the integral,
Z ∞
f (x, y)dx = f2 (y)
−∞

is the marginal probability density function of y.


If X and Y are discrete, then
X
f1 (x) = f (x, y)
y
.
The marginal density function of x is given by
Z x
F1 (x) = P (X ≤ x) = f1 (y)dy, if x is continuous
−∞

and X
F1 (x) = f1 (y), if x is discrete
X≤x

17
Example 9 Let the joint probability density function of X and Y be given by,

2 ,0 < x < y < 1
f (x, y) =
0 , elsewhere

Find the marginal probability density function of X and Y , that is f1 (x) and f2 (y).
R∞ R1
Solution 9 (a) f1 (x) = −∞
2dy = x
2dy = 2(1 − x)

2(1 − x) , 0 ≤ x ≤ 1
f1 (x) =
0 , elsewhere
R∞ Ry
(b) f2 (y) = −∞
2dy = 0
2dx = 2y

2y , 0 ≤ y ≤ 1
f1 (y) =
0 , elsewhere

3.4 Conditional Distributions


Definition 5 Let X and Y be discrete random variables with joint probability mass func-
tion. The probability that X will take a given value given that Y = y has been observed is
denoted by f (y/x) and is defined by

P [X = x and Y = y] f (x, y)
P [X = x/Y = y] = =
P (Y = y) f2 (y)

Where f2 (y) is the marginal probability density function of y. The function


(
f (x,y)
f2 (y)
f2 (y) > 0
f (x/y) =
0 f2 (y) = 0

is called the conditional probability function of X given Y .

Example 10 Suppose X and Y are two discrete random variables with joint probability
density function
 1
54
(x + y) x = 1, 2, 3; y = 1, 2, 3, 4,
f (x/y) =
0 elsewhere

Calculate the following

(a) f (y/x), that is the conditional distribution of Y given X = x

(b) P (y = 1/x = 1)

(c) P (y = 4/x = 2, 3)

(d) E[y|x]

18
Solution 10 (a) We express the conditional distribution of Y given X = x as
f (x, y)
f (y/x) =
f1 (x)
4
X 1 1 1
But f1 (x) = (x + y) = (4x + 10) = (2x + 5)
y=1
54 54 27
1
54
(x + y) (x + y)
f (y/x) = 1 = , x = 1, 2, 3, ; y = 1, 2, 3, 4,
54
(4x + 10) (4x + 10)
1+1 2 1
(b) P (y = 1/x = 1) = f (1/1) = 4+10
= 14
= 7

7
(c) P (y = 4/x = 3) = f (4/3) = 22
2
(d) E[y|x] = Σy xy+y
4x+10
= x+1+2x+4+3x+9+4x+16
4x+10
= 5x+15
2x+5

3.5 Independent Random Variables


Two random variables X and Y are said to be statistically independent if for any two
sets A and B of real numbers
P [x ∈ A and y ∈ B] = P [x ∈ A]P [y ∈ B]
it follows that if X and Y are independent random variables with joint probability density
function, then,
f (x, y) = f1 (x)f2 (y)
Example 11 Let X and Y have a joint probability density function
 −(x+2y)
2e x ≥ 0, y ≥ 0
f (x, y) =
0 elsewhere
Show that X and Y are independent.
Solution 11 The two random variables are independent if
f (x, y) = f1 (x)f2 (y) = 2e−(x+2y)
The marginal density function of X is given by,


e−(x+2y) ∞
Z
f1 (x) = 2 e−(x+2y) dy = 2{ | = −[e−∞ − e−x ] = e−x
0 −2 0
The marginal density function of Y is given by,


e−(x+2y) ∞
Z
f2 (y) = 2 e−(x+2y) dx = −2{ |0 = −2[e−∞ − e−2y ] = 2e−2y
0 −2
Therefore the random variables X and Y are independent since,
f (x, y) = f1 (x)f2 (y) = 2e−(x+2y)

19
3.6 Bivariate Expectations
Let X and Y have joint probability density function, f (x, y) and let U (x, y) be a function
of X and Y . The expected value of X and Y is expressed as,
Z ∞Z ∞
E[U (x, y)] = U (x, y)f (x, y)dxdy
−∞ −∞

If X and Y are discrete, then


XX
E[U (x, y)] = U (x, y)f (x, y)
y x

The moments of X about X = 0 are


Z ∞
r
E(X ) = xr f1 (x)dx
−∞

where f1 (x) is the marginal probability density function of X. The central moments of
X are Z ∞
r
E[(X − µx ) ] = (X − µX )r f1 (x)dx
−∞

The joint product moments of X and Y about the point (0, 0) are
Z ∞Z ∞
r s
E[X Y ] = xr y s f (x, y)dxdy
−∞ −∞

3.6.1 Conditional Expectation


Definition 6 If X and Y have joint density function f (x, y), the conditional density
function of Y given X is f (y|x) = f (x, y)/f1 (x) where f1 (x) is the marginal density
function of X. We can define the conditional expectation, or conditional mean, of Y
given X by Z ∞
E(Y |X = x) = yf (y|x)dy
−∞

and Z ∞
E(X|Y = y) = xf (x|y)dx
−∞

We note the following properties


(a) E(Y |X = x) = E(Y ) when X and Y are independent
Rx
(b) E(Y ) = −∞ E(Y |X = x)f2 (y)dy

Example 12 A miner is trapped in a mine containing 3 doors. The first door leads to a
tunnel which takes him to safety after 2 hours of travel. The second door leads to a tunnel
which returns him to the mine after 3 hours of travel. The third door leads to a tunnel
which returns him to the mine after 5 hours. Assuming he is at all times equally likely
to choose any of the doors, what is the expected length of time until the miner reaches
safety?

20
Solution 12 Let X be the time to reach safety (hours) and Y be the door (1,2 or 3)
initially chosen. Then,

E(X) = E(X|Y = 1)P (Y = 1) + E(X|Y = 2)P (Y = 2) + E(X|Y = 3)P (Y = 3)


1
= {E(X|Y = 1) + E(X|Y = 2) + E(X|Y = 3)}
3
Now
E(X|Y = 1) = 2)
E(X|Y = 2) = 3 + E(X)
E(X|Y = 3) = 5 + E(X)

So
1
E(X) = {2 + 3 + E(X) + 5 + E(X)} = 10
3

3.7 Covariance and Correlation


Let X and Y be two jointly distributed random variables with mean µX and µY . Then,
the covariance of (X, Y ) written as Cov(X, Y ) is expressed as,

Cov(X, Y ) = E[(X − µX )(Y − µY )] = σXY


= X[XY ] − µX E(Y ) − µY E(X) + µX µY
= E(XY ) − µX µY = E(XY ) − E(X)E(Y )

Covariance measures the degree of association between X and Y . Suppose that X and Y
are jointly distributed random variables with finite variances. The correlation coefficient
between X and Y is denoted by,

Cov(X, Y ) Cov(X, Y )
ρ(XY ) = =
σX σY std(X)std(Y )

Where −1 ≤ ρ(XY ) ≤ 1
If the random variables X and Y are independent, then

Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(X)E(Y ) − E(X)E(Y ) = 0

and this implies that the correlation coefficient is also zero, as

Cov(X, Y ) Cov(X, Y ) 0
ρ(XY ) = = = =0
σX σY std(X)std(Y ) σX σY

This is because independent random variables are uncorrelated.

21
3.7.1 Properties of Covariance
Let X and Y be two random variables, then

(a) Cov(X, Y ) = Cov(Y, X)

(b) Cov(X, X) = E(X 2 ) − E(X)E(X) = Var(X)

(c) Cov(aX + b, Y ) = aCov(X, Y )

Where a and b are arbitrary constants.

Example 13 Suppose X and Y are jointly distributed with joint probability density func-
tion  1
8
(x + y) 0 < x < 2, 0 < y < 2
f (x, y) =
0 otherwise
Find the correlation coefficient between X and Y .

Solution 13 First, find E(XY ), E(X), E(Y ), Var(X) and Var(Y ).


Z 2
7
E(X) = xf1 (x)dx =
0 6
and Z 2
7
E(Y ) = xf2 (y)dy =
0 6
Due to symmetry, we find that
Z 2 Z 2
2 21 5
E(Y ) = E(X ) = x2 (x + y)dydx =
8 0 0 3
and also
5 49 11
Var(X) = Var(Y ) = − =
3 36 36
The expected value of X and Y is given by,

1 2 2
Z Z
4
E(XY ) = xy(x + y)dxdy =
8 0 0 3
While the covariance between X and Y is,
4 77 1
Cov(X, Y ) = − =−
3 66 36
For the correlation coefficient between X and Y

−1 1
ρ(XY ) = q p36√ =−
11
1136 11
36

Therefore, X and Y are negatively correlated.

22
3.8 Chapter Problems
1. Show that if X and Y are independent, then, E[X/Y = y] = E[X] for all y.

2. Suppose that X and Y are jointly distributed with a joint probability density func-
tion 
k(x + y) 0 < x < 2, 0 < y < 2
f (x, y) =
0 elsewhere

(a) Find the value of the constant k


(b) Find the marginal density functions of X and Y
(c) Find f (x/y) and f (y/x)
(d) Are X and Y independent?
(e) Calculate P [x < 1, y > 3]
(f) Calculate E(x, y), E(x) and E(y)

3. Suppose that X and Y are random variables whose jpdf has a moment generating
function 1 3 t2 3 10
M (t1 , t2 ) = et1 + e +
4 8 8
for all real t1 and t2 . Find Cov(X, Y )

4. The joint density function of two continuous random variables X and Y is



cxy 0 < x < 4, 1 < y < 5
f (x, y) =
0 otherwise

Find

(a) The value of the constant c.


(b) P (X ≥ 3, Y ≤ 2) and P (1 < X < 2, 2 < Y < 3)
(c) f1 (x) and f2 (y)
(d) E(X, Y ), F (x, y) and P (X < 2/Y = 2)

5. The cumulative distribution function for the joint distribution of the continuous
random variables X and Y is
1
F (x, y) = (3x3 y + 2x2 y 2 ), 0 < x < 1, 0 < y < 1
5
(a) Find f (x, y) and f (0, 0.5)
(b) Show that f (x, y) is a complete probability density function
(c) Find the marginal probability density function of X and Y
(d) Find P (0 < x < 1) and P (0 < y < 1)

23
6. Let X and Y be two random variables with joint probability mass function

k(x + 2y) x = 1, 2; y = 1, 2, 3
f (x, y) =
0 otherwise
Find
(a) The value of the constant k
(b) E(XY 2 ) and Var(x)
(d) P (x = 1, y = 1, 2) and P (x = 1/y = 1, 2)
(f) f (y/x), E(Y ) and E(Y )
7. Let X and Y be randomly distributed random variables with joint probability den-
sity function
 x+y
p (1 − p)2−x−y , x = 0, 1; y = 0, 1
f (x, y) =
0 otherwise
Find the covariance and correlation coefficient between X and Y
8. A dice and a coin are each tossed once. Write the possible outcomes in a Joint
probability table.
(a) What is the probability of head and a number greater than 3 from the dice?
(b) What is the probability of a tail and an even number from the dice?
9. Suppose X and Y are two independent random variables having the respective
probability density function of the form

2(1 − x) 0 ≤ x ≤ 1
f (x) =
0 otherwise
and 
2(1 − y) 0 ≤ y ≤ 1
f (y) =
0 otherwise
(a) Find
(a) P [X + Y ≤ 1] and P [X ≤ 12 , Y ≤ 1]
(b) What is the relationship of E(x, y), E(x) and E(y)

4 Linear Regression and Correlation Analysis


Correlation is a statistical method used to determine whether a relationship between
variables exists. Regression is a statistical method used to describe the nature of the
relationship between variables. Some of the questions answered by correlation and regres-
sion are
ˆ Is there a relationship between the number of hours a student studies and the
student’s score on a particular exam.
ˆ Is there a relationship between a person’s age and his or her blood pressure?
ˆ Is caffeine related to heart damage?

24
4.1 Correlation
To test for the correlation, we use the correlation coefficient (r) to determine the strength
of the linear relationship between two variables. We use the Pearson product moment
correlation coefficient. If

ˆ The range of the correlation coefficient is from -1 to +1.

ˆ If there is a strong positive linear relationship between the variables, the value of r
will be close to +1

ˆ If there is a strong negative linear relationship between the variables, the value of r
will be close to -1.

ˆ When there is no linear relationship between the variables or only a weak relation-
ship, the value of r will be close to 0.

Formula for the Correlation Coefficient, r


P P P
xy) − ( x)( y)
n(
r=p P P P P
[n( x2 ) − ( x)2 ][n( y 2 ) − ( y)2 ]
Where n is the number of data pairs. We refer to y as the dependent variable and x as
the independent variable. A scatter plot can be used to visualize the relationship even
before the calculation of the correlation coefficient.

Example
The data shows the number of cars rental companies have and their respective annual
income. Use the data to calculate the correlation coefficient and interpret its meaning.

Comany Cars (in ten thousands) Revenue (in billions $)


A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Find the values of xy, x2 and y 2 . Then find the sum of each column. Since Revenue
depends on the number of cars the company has, the Revenue is the dependent variable
(y) and Cars is the independent variable (x)

25
Comany Cars (x) Revenue (y) xy x2 y2
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 2.75 72.25
P 2 2.25
y 2 = 80.67
P P P P
x = 153.8 y = 18.7 xy = 682.77 x = 5859.26

We substitute the values in the table to the formula and solve for r
P P P
n( xy) − ( x)( y)
r =p P P P P
[n( x2 ) − ( x)2 ][n( y 2 ) − ( y)2 ]
6(682.77) − (153.8)(18.7)
=p
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]
=0.982

ˆ The correlation coefficient suggest a strong relationship between the number of cars
a rental company has and its annual income

ˆ Therefore, as the number of cars increases (decreases), the annual income increases
(decreases)

ˆ Remark

– For negative correlation coefficient, an increase in x leads to a decrease in y


– For a positive correlation coefficient, an increase in x leads to an increase in y
– For zero correlation coefficient, an increase/decrease in x has no influence on y

4.2 Regression
If there is a negative or positive correlation coefficient, the next step is to determine the
equation of the regression line, which is the line of best fit. The purpose of the regression
line is to enable the researcher to see the trend and make predictions based on the data.

Regression Line Equation


The regression line is given by y = a + bx where a is the y intercept and b is the slope of
the line. To calculate the value of a and b:
( y)( x2 ) − ( x)( xy)
P P P P
a= P P
n( x2 ) − ( x)2
P P P
n( xy) − ( x)( y)
b= P P
n( x2 ) − ( x)2

26
Example
We revisit the example of the number of cars the rental company has, and the annual
income made by the company. To estimate the regression line, we compute the values of
a and b. P P P
The values
P 2 needed for the equations are n = 6, x = 153.8, y = 18.7, xy =
682.77 and x = 5859.26. We substitute the values in the formula to get:
( y)( x2 ) − ( x)( xy)
P P P P
(18.7)(5859.26) − (153.8)(682.77)
a= P 2 P 2 = = 0.396
n( x ) − ( x) (6)(5859.26) − (153.8)2
P P P
n( xy) − ( x)( y) 6(682.77) − (153.8)(18.7)
b= P 2 P 2 = = 0.106
n( x ) − ( x) (6)(5859.26) − (153.8)
The equation of the regression line y = a + bx is
y = 0.396 + 0.106x
We can use the regression line to predict the values of y given the values of x, that is,
predict the annual income given the number of cars. For example
Let x = 40 cars
y = 0.396 + 0.106(40) = 4.636
Thus, with 400,000 cars, the company makes 4.636 billions dollars per year or we write
(40, 4.636)

4.3 Chapter Problems


1. Calculate the value of the correlation coefficient for the number of hours a person
exercises and the amount of milk a person consumes per week.

Subject Hours of Exercise (X) Amount of Milk Consumed (Y)


A 3 48
B 0 8
C 2 32
D 5 64
E 8 10
F 5 32
G 10 56
H 2 72
I 3 48

2. The number of calories and the number of milligrams of cholesterol for a random
sample of fast-food chicken sandwiches from seven restaurants are shown here. Is
there a relationship between the variables?

Calories x 390 535 720 300 430 500 440


Cholesterol y 43 45 80 50 55 52 60

27
3. The number of forest fires and the number of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45
Acres y 62 41 19 26 51 15 30 15

Find y when x = 60

4. In Question number 3 above, find y when x = 600 calories

5 Distribution of Functions of Random Variables


In this section, we will learn how to find the probability distribution of functions of random
variables. For example, we might know the probability density function of X, but want to
know the probability density function of u(X) = X 2 . The two techniques to be discussed
are:

(a) Cumulative Distribution function (cdf) technique

(b) Change of variable (Jacobian) technique

Suppose that a random variable X has discrete distribution and U = Φ(X) is another
random variable which is a function of X. Then, for any positive U > 0, we have
X
g(U ) = P (U = u) = P (Φ(X) = u) = f (x)
X:Φ(X)=u

If X has a continuous distribution and U = Φ(X) is another random variable which


is a function of X. Then, for any positive U > 0, we have
Z
g(U ) = P (U ≤ u) = P (Φ(X) ≤ u) = f (x)dx
X:Φ(X)=u

5.1 Cumulative distribution function technique


The cumulative distribution function (cdf) is applied to a univariate distribution. If X is
continuous and g(u) is the cdf of u = Φ(X), then
Z
G(U ) = P (U ≤ u) = P (Φ(X) ≤ u) = f (x)dx
X:Φ(X)≤u

If G(U ) is differentiable, then the pdf of U is,

d
g(U ) = G(U )
du
This method of getting the pdf of U is called the cdf technique. If Φ(X) is continuous
and strictly increasing or decreasing function of X, over the interval (a, b) , then U will
vary over some interval (α, β) as X varies over the interval (α, β), and the inverse function

28
W will be strictly increasing or decreasing over the interval (α, β). The pdf of U is given
by
d
g(U ) = f (W (u)) W (u)
du
Where U = Φ(X) if and only if X = W (U )

Example 14 Let X be a random variable with pdf


 1
2
x: 0<x<2
f (x) =
0 elsewhere

Find the pdf of u = 1 − x2

Solution 14 We consider the five steps in using the cdf method:


ˆ Step 1: Find the interval for the random variable U
We transform the variable x to obtain the interval values for u as: x : 0 < x <
2 = (−3, 1)
u is continuous and strictly decreasing over the interval (0, 2). That is,

x = 0 =⇒ u = 1 and x = 2 =⇒ u = −3

As x varies over (0, 2) = (a, b), U varies over (−3, 1) = (α, β).

ˆ Step 2: Find the inverse function


The inverse function is
1
x = (1 − u) 2 , −3 < u < 1
and
1
x = w(u) = (1 − u) 2

ˆ Step 3: Find the derivative of u


Find the derivative with respect to u
d 1 1 1
(w(u)) = (1 − u)− 2 = 1
du 2 2(1 − u) 2

ˆ Step 4: Find f (w(u))


1 1
f (w(u)) = (1 − u) 2
2
ˆ Step 5: Find g(u) = f (w(u)| du
d
w(u)|)
The pdf of u is expressed as
1 1 1 1 1
g(u) = (1 − u) 2 . 1 =
2 2 (1 − u) 2 4

1

4
: −3 < u < 1
g(u) =
0 elsewhere

29
5.2 Change of Variable Technique
Another method of obtaining distribution functions of functions of random variables.
Let X be a random variable (continuous), with pdf f (x).
Let Y = µ(X) ; some function of XϵA
This implies that, X = µ−1 (Y ) = w(y)
By change of variable technique, the pdf of Y is given as:

f (w(y))|J| y ∈ B
g(y) =
0 elsewhere

Where J is the Jacobian transformation from A to B = dx


dy
A is the domain of random variable X and B is the domain of random variable Y .

Example 15 Let X be a random variable with pdf



k(x + 1) 0 < x < 2
f (x) =
0 elsewhere
Where k is a constant. Find the distribution of X 2 .

Solution 15 We first need to obtain k.


Z 2
We solve the integral f (x)dx = 1
0

x2 1
k( + x)|20 = 1 =⇒ k(2 + 2) = 1 =⇒ k =
2 4
The pdf of X is expressed as:
(
1
(x + 1); 0 < x < 2
f (x) = 4
0; Otherwise

We five steps to follow while using the change of variable technique

ˆ Step 1: Find the interval for y


We know the interval for x which is 0 < x < 2, and we can find the interval for y.
Given that Y = X 2 , then

X = 0 =⇒ Y = 0 and X = 2 =⇒ Y = 4

Now
Y = X 2 = µ(x) ⇒ B = [y|, 0 < y < 4

ˆ Step 2: Find the inverse function


The inverse transformation is given by

X = µ−1 (y) = y

use instead of µ−1 (y) := w(y), that is, w(y) = y

30
ˆ Step: Find the Jacobian
The Jacobian transformation is obtained as

dx d y 1 1 1 1
J= = = y − 2 = √ ⇒ |J| = √
dy dy 2 2 y 2 y

ˆ Step 4: Find f (w(y))


1 1 √
f (w(y)) = w(y) = [1 + y]
4 4
ˆ Step 5: The distribution of Y is
g(y) = f (w(y)) with |J| ; yϵB
1 √ 1
g(y) = ( y + 1) √ , 0 < y < 4
4 2 y
(
1
8
(1 + √1y ); 0 < y < 4
g(y) =
0; elsewhere

Example 16 Let X be a discrete random variable with probability mass function


 1 x
2( 3 ) x = 1, 2, 3, . . .
f (x) =
0 elsewhere
Find the distribution of Y = X 3 + 2

Solution 16 We consider the five steps in the case of pmf

ˆ Step 1: Find the values of the random variable y

A = [x|x = 1, 2, 3, ...]
and
B = [y|y = 3, 10, 29, ...]

ˆ Step 2: Find the inverse function

3
p 1
w(y) = x = y − 2 = (y − 2) 3

ˆ Step 3: Find the Jacobian


Since X is discrete, we do not have to compute the Jacobian of transformation.

ˆ Step 4: Find f (w(y))


1
f (w(y)) = 2( )w(y)
3

31
ˆ Step 5: Find the distribution of y
The pmf of Y is given by
1
g(y) = f (w(y)) = 2( )w(y) ; y = 3, 10, 29, ...; yϵB
3
Thus, the pmf of y is given as
( 1

g(y) = 2( 31 )(y−2) 3 y = 3, 10, 29, . . .


0 elsewhere

5.2.1 Bivariate Distribution Cases


Suppose X and Y are jointly distributed random variables with a jpdf, f (x, y). We define
two new random variables with the transformation

U =Φ(X, Y ) and
V =Φ1 (X, Y )

which is one-to-one and maps the set S of (X, Y ) onto B, where B = set of (U, V ). This
transformation is invertible and hence

X = W1 (X, Y ) and Y =W2 (X, Y )

S is a subset of (X, Y ) plane, while B is a subset of (U, V ) plane. The Jacobian of x, y


with respect to u and v is a short-hand for the 2 × 2 determinant
 ∂x ∂x 
|J| = ∂u
∂y
∂v
∂y ,
∂u ∂v

is called the Jacobian of transformation. The Jpdf of g(uv, ) is given by



f [W1 (U, V ); W2 (U, V )] |J| U, V ∈ ℜ
g(u, v) =
0 elsewhere
The Jacobian determinant is used when making a change of variables and evaluating
a multiple integral of a function over a region within its domain. To accommodate for the
change of coordinates the magnitude of the Jacobian determinant arises as a multiplicative
factor within the integral.
suppose that X1 , X2 , ..., Xn are jointly distributed random variables with a jpdf

f (x1 , x2 , ..., xn ); X1 , X2 , ..., Xn ϵA.

Let Y1 , Y2 , ..., Yn define a one-one transformation

y1 = µ1 (x1 , x2 , ..., xn )

y2 = µ2 (x1 , x2 , ..., xn )
..
.

32
yn = µn (x1 , x2 , ..., xn )
We find the joint pdf of Y1 , Y2 , ..., Yn as follows:
(
f (w1 (y1 , ...yn ), ..., wn (y1 , ...yn ))|J|; Y1 , Y2 , ...Yn ϵB
g(y1 , y2 , ..., yn ) =
0; otherwise

where  
∂x1 ∂x1 ∂x1
 ∂y1 ∂y2
... ∂yn 
 
 ∂x2 ∂x2
... ∂x2 
|J| =  ∂y. 1 ∂y2 ∂yn 

 .. .. .. 
 . ... . 

 
∂xn ∂xn ∂xn
∂y1 ∂y2
... ∂yn

Note: If X1 , X2 , ..., Xn were discrete random variables, we would ignore |J| in the joint
pdf g(...) of Yi : i = 1, 2, ...n.

Example 17 Let X and Y have a jpdf given by



1 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 elsewhere
Find the distribution of U = X + Y and Y = Y − X

Solution 17 We consider the five steps in the case of a jpdf

ˆ Step 1: Find the inverse functions


U −V U +V
U = X + Y and V = Y − X. We solve the two to get X = 2
and Y = 2

ˆ Step 2: Find the intervals


When
X = 0 =⇒ U − V = 0 andX = 1 =⇒ U − V = 2
While
Y = 0 =⇒ U + V = 0 andY = 1 =⇒ U + V = 2
Thus: 0 < u − v < 2 and 0 < u + v < 2

ˆ Step 3: Find the Jacobian


The Jacobian of transformation is given by
 ∂x ∂x

|J| = ∂u∂y
∂v
∂y ,
∂u ∂v

The partial derivatives and the determinant of the Jacobian is


1
− 12

|J| = 1 1 ,
2
2 2

1
This means that |J| = 2

33
ˆ Step 4: Find f (w1 (u, v), w2 (u, v))

f (w1 (u, v), w2 (u, v)) = 1

ˆ Step 5: Find g(u, v)


1
g(u, v) = f (w1 (u, v), w2 (u, v))|J| =
2
The jpdf of X and Y is given by
 1
2
0 < u − v < 2, 0 < u + v < 2
g(u, v) =
0 elsewhere

5.3 Chapter Problems


1. The probability density function of the random variable X is given by
 x2
−3 < x < 6
f (x) = 81
0 elsewhere

Find the probability density function of the random variable U = 13 (12 − X)

2. If the random variables X and Y have joint density function


 xy
96
0 < x < 4, 1 < y < 5
f (x, y) =
0 elsewhere

(a) Find the joint density function of U = X + Y and V = Y − X

3. The probability function of a random variable X is


 −x
2 x = 1, 2, 3, . . .
f (x) =
0 elsewhere

Find the probability function of the random variable U = X 4 + 1

4. Let X and Y have joint density function


 −(x+y)
e x ≥ 0, y ≥ 0
f (x, y) =
0 elsewhere
X
If U = Y
, V = X + Y , find the joint density function of U and V .

5. Let X have the density function

e−x x > 0

f (x) =
0 x≤0

Find the density function of Y = X 2

34
6. Let f (x, y) be the joint density function of X and Y .

1 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 otherwise

Find the density function of Z = XY

7. Let the probability density function of X and Y be given by


 −x−2y
2e x > 0, y > 0
f (x, y) =
0 otherwise

Find the probability density of Z = X + Y given that Z is a linear function of X


and Y .

6 Derived Distributions
6.1 Gamma Function
The Gamma Function Γ(x) is an extension of the factorial function to real (and com-
plex) numbers. If n ∈ {1, 2, 3, . . .}, then

Γ(n) = (n − 1)!

More generally, for any positive real number α, Γ(α) is defined as


Z ∞
Γ(α) = xα−1 e−x dx, for α > 0
0

Note that for α = 1, we can write


Z ∞
Γ(1) = e−x dx = 1
0

Using the integration by parts, it can be shown that:

Γ(α + 1) = αΓ(α), for α > 0

Using the change of variable x = λy, we can show the following equation is useful when
working with the gamma distribution:
Z ∞
Γ(α) = λ α
y α−1 e−λy dy, for α, λ > 0
0

6.1.1 Properties of the Gamma Function


For any positive real number α:
R∞
1. Γ(α) = 0 xα−1 e−x dx
R∞
2. 0 xα−1 e−λx dx = Γ(α)
λα
, for λ > 0

35
3. Γ(α + 1) = αΓ(α)

4. Γ(n) = (n − 1)!, for n = 1, 2, 3, . . .



5. Γ( 12 ) = π

Example 18 Find the value of

1. Γ( 27 ) = 25 Γ( 25 ) = 52 32 Γ( 23 ) = 52 32 12 Γ( 12 ) = 15
8
π
R∞
2. I = 0 x6 e−5x dx
Since α = 7 and λ = 5, we obtain I = Γ(7) 57
= 6!
57
= 0.0092

6.2 Gamma Distribution


A continuous random variable X is said to have a gamma distribution with parameters
α > 0 and λ > 0, shown as X ∼ Gamma(α, λ), if its probability density function is given
by ( α α−1 −λx
λ x e
Γ(α)
x>0
f (x) =
0 otherwise

Exercise
ˆ Find the E(x) and V ar(x) of the gamma distribution

ˆ Prove that the Gamma distribution is a probability density function, that is:
Z ∞
1
λα xα−1 e−λx dx = 1
Γ(α) 0

36

You might also like