Stochastic Differential Equations Overview
Stochastic Differential Equations Overview
1.Probability Spaces
1.1 σ-algebras and information
We begin with some notation and terminology. The symbol denotes a generic
non-empty set; the power of , denoted by 2, is the set of all subsets of . If the
number of elements in the set is M N, we say that is finite. If contains an
infinite number of elements and there exists a bijection N, we say that is
countably infinite. If is neither finite nor countably infinite, we say that it is
uncountable. An example of uncountable set is the set of real numbers. When
is finite we write ω 1 , ω 2 , , ω M , or ω k k1,,M . If is countably infinite
we write ω k k . Note that for a finite set with M elements, the power set
contains 2 M elements. For instance, if , 1, $, then
2 , , 1, $, , 1, , $, 1, $, , 1, $ ,
which contains 2 3 8 elements. Here denotes the empty set, which by definition
is a subset of all sets.
Within the applications in probability theory, the elements ω are called sample
points and represent the possible outcomes of a given experiment (or trial), while the
subsets of correspond to events which may occur in the experiment. For instance,
if the experiment consists in throwing a dice, then 1, 2, 3, 4, 5, 6 and A 2, 4, 6
identifies the event that the result of the experiment is an even number. Now let N ,
N γ 1 , , γ N , γ k H, T H, T N , 1. 1
where H stands for “head” and T stands for “tail”. Each element ω γ 1 , . . . , γ N N
is called a N toss and represents a possible outcome for the experiment “tossing a coin
N consecutive times”. Evidently, N contains 2 N elements and so 2 N contains
N
2 2 elements. We show later that —the sample space for the experiment “tossing a
coin infinitely many times”—is uncountable.
A collection of events, e.g., A 1 , A 2 , 2 , is also called information. To
understand the meaning of this terminology, suppose that the experiment has been
performed and we observe that the events A 1 , A 2 , have occurred. We may then
use this information to restrict the possible outcomes of the experiment. For instance,
if we are told that in a 5 toss the following two events have occurred:
1
1. there are more heads than tails
2. the first toss is a tail
then we may conclude that the result of the 5-toss is one of
T, H, H, H, H, T, T, H, H, H, T, H, T, H, H, T, H, H, T, H, T, H, H, H, T.
If in addition we are given the information that
3. the last toss is a tail,
then we conclude that the result of the 5 toss is T, H, H, H, T.
The power set of the sample space provides the total accessible information
and represents the collection of all the events that can be resolved (i.e., whose
occurrence can be inferred) by knowing the outcome of the experiment. For an
uncountable sample space, the total accessible information is huge and it is
typically replaced by a subclass of events 2 , which is imposed to form a
σ algebra.
Definition 1.1.1 A collection 2 of subsets of is called a σ algebra (orσ field)
on if
(i) ;
(ii) A A c : ω : ω A ;
(iii) k1 A k , for all A k k .
If G is another σ algebra on and G , we say that G is a sub-σ-algebra of .
Remark 1.1 (Notation). The letter A is used to denote a generic event in the σ algebra.
If we need to consider two such events, we denote them by A, B, while N generic events
are denoted A 1 , , A N .
Let us comment on Definition 1.1.1 The empty set represents the “nothing happens”
event, while A c represents the “A does not occur” event. Given a finite number A 1 , . . . , A N
of events, their union is the event that at least one of the events A 1 , , A N occurs,
while their intersection is the event that all events A 1 , , A N occur. The reason to
include the countable union/intersection of events in our analysis is to make it possible
to “take limits” without crossing the boundaries of the theory. Of course, unions and
intersections of infinitely many sets only matter when is not finite.
The smallest σ algebra on is , , which is called the trivial σ algebra. There
is no relevant information contained in the trivial σ algebra. The largest possible
σ algebra is 2 , which contains the full amount of accessible information. When
is countable, it is common to pick 2 as σ-algebra of events. However, as already
mentioned, when is uncountable this choice is unwise. A useful procedure to
construct a σ algebra of events when is uncountable is the following. First we select
a collection of events (i.e., subsets of ), which for some reason we regard as
fundamental. Let O denote this collection of events. Then we introduce the smallest
σ algebra containing O, which is formally defined as follows.
2
Definition 1.1.2 Let O 2 . The σ algebra generated by O is
O : 2 is a σ algebra and O ,
i.e., O is the smallest σ algebra on containing O.
The intersection of any number of σ-algebras is still a σ-algebra, see Exercise 3, hence
O is a well-defined σ algebra. For example, let d and let O be the collection of
all open balls:
O B x R R0, x d , where B x R y d : |x y| R.
The σ algebra generated by O is called Borel σ algebra and denoted B d . The
elements of B d are called Borel sets.
Remark 1.2 (Notation). The Borel σ algebra B plays an important role in these
notes, so we shall use a specific notation for its elements. A generic event in the
σ algebra B will be denoted U; if we need to consider two such events we denote
them by U, V , while N generic Borel sets of will be denoted U 1 , , U N . Recall that
for general σ algebras, the notation used is the one indicated in Remark 1.1.
The σ algebra generated by O has a particular simple form when O is a partition of .
Definition 1.1.3 Let I N. A collection O A k kI of non-empty subsets of is called
a partition of if
(i) the events A k kI are disjoint, i.e., A j A k , for j k;
(ii) kI A k .
If I is a finite set we call O a finite partition of .
Note that any countable sample space ω k k is partitioned by the atomic events
A k ω k , where ω k identifies the event that the result of the experiment is exactly ω k .
3
The quantity PA is called probability of the event A; if PA 1 we say that the
event Aoccurs almost surely, which is sometimes shortened by a.s.; if PA 0 we say
that A is a null set. In general, the elements of with probability zero or one will be
called trivial events (as trivial is the information that they provide). For instance,
P 1, i.e., the probability that “something happens” is one, and
P P c 1 P 0, i.e., the probability the “nothing happens” is zero.
Let us see some examples of probability space.
There is only one probability measure defined on the trivial σ-algebra, namely
P 0 and P 1.
In this example we describe the general procedure to construct a probability
space on a countable sample space ω k k . We pick 2 and let
0 p k 1, k N, be real numbers such that
p k 1.
k1
while P 0.
As a special case of the previous example we now introduce a probability
measure on the sample space N of the N coin tosses experiment. Given
0 p 1 and ω N , we define the probability of the atomic event ω as
Pω p N H ω 1 p N T ω , 1. 2
where N H ω is the number of H in ω and N T ω is the number of T in
ω N H ω N T ω N. We say that the coin is fair if p 1/2. The
probability of a generic event A 2 N is obtained by adding up
the probabilities of the atomic events whose disjoint union forms the event A.
For instance, assume N 3 and consider the event
“The first and the second toss are equal”.
Denote by A the set corresponding to this event. Then clearly A is the
(disjoint) union of the atomic events
4
H, H, H, H, H, T, T, T, T, T, T, H.
Hence,
PA PH, H, H PH, H, T PT, T, T PT, T, H
p 3 p 2 1 p 1 p 3 1 p 2 p 2p 2 2p 1.
Let f : R 0, be a measurable function such that
fxdx 1.
Then
PU U fxdx, 1. 3
Remark 1.3 (Riemann vs. Lebesgue integral). The integral in (1.3) must be understood
in the Lebesgue sense, since we are integrating a general measurable function over
a general Borel set. If f is a sufficiently regular (say, continuous) function, and
U a, b R is an interval, then the integral in (1.3) can be understood in the
Riemann sense. Although this last case is sufficient for most applications in finance,
all integrals in these notes should be understood in the Lebesgue sense, unless
otherwise stated. The knowledge of Lebesgue integration theory is however not required
for our purposes.
5
(a) PA 1/3 (b) PA 1/4
Figure 1.1: The Bertrand paradox. The length T of the cord pq is greater then L.
Whenever two probabilities are defined for the same experiment, we shall require them
to be equivalent, in the following sense.
Definition 1.2.2 Given two probability spaces , F, P and , F, P , the probability
measures P and P are said to be equivalent if PA 0 PA 0.
Conditional probability
It might be that the occurrence of an event B makes the occurrence of another event
A more or less likely. For instance, the probability of the event A {the first two tosses
of a fair coin are both head} is 1/4; however if know that the first toss is a tail, then
PA 0, while PA 1/2 if we know that the first toss is a head. This leads to the
important definition of conditional probability.
Definition 1.2.3 Given two events A, B such that PB 0, the conditional probability
of A given B is defined as
PA B
PA|B .
PB
6
the restriction of the probability space , , P when B has occurred.
If PA|B PA, the two events are said to be independent. The interpretation is the
following: if two events A, B are independent, then the occurrence of the event B does
not change the probability that A occurred. By Definition 1.1.4 we obtain the following
equivalent characterisation of independent events.
Definition 1.2.4 Two events A, B are said to be independent if PA B PAPB. In
general, the events A1, , A N N 2 are said to be independent if, for all 1 k 1 k 2
k m N, we have
m
PA k 1 A k m PA k .
j
j1
Two σ algebras , G are said to be independent if A and B are independent, for all
A G and B . In general the σ algebras 1 , , N N 2 are said to be
independent if A 1 , A 2 , , A N are independent events, for all A 1 1 , . . . , A N N .
In our applications t stands for the time variable and filtrations are associated to
experiments in which “information accumulates with time”. For instance, in the
example given above, the more times we toss the coin, the higher is the number of
events which are resolved by the experiment, i.e., the more information becomes
7
accessible.
Finally consider the toss ω which is obtained by changing each single toss of ω,
that is to say
ω γ m m , where γ m H if γm T, and γ m T if γ m H, for all m .
It is clear that the toss ω does not belong to the set (1.5). In fact, by construction,
the first toss of ω is different from the first toss of ω 1 , the second toss of ω is
different from the second toss of ω 2 , . . . , the n th toss of ω is different from the nth
toss of ω n , and so on, so that each toss in (1.5) is different from ω. We conclude
that the elements of cannot be listed as they were comprising a countable set.
Now, let N and recall that the sample space N for the N tosses experiment is
given by (1.1). For each ω γ 1 , . . . , γ N N we define the event A ω
by
A ω ω γ n n : γ j γ j , j 1, . . . , N,
i.e., the event that the first N tosses in a toss be equal to γ 1 , . . . , γ N . Define the
probability of this event as the probability of the N toss ω, that is
P 0 A ω p N H ω 1 p N T ω ,
where 0 p 1, N H ω is the number of heads in the N toss ω and N T ω N N H ω
is the number of tails in ω, see (1.2). Next consider the family of events
U N A ω ω N 2 .
It is clear that U N is, for each fixed N , a partition of . Hence the σ algebra
N U N is generated according to Exercise 1.4. Note that N contains all events of
that are resolved by the first N tosses. Moreover N N1 , that is to say, N N is
a filtration. Since P 0 is defined for all A ω U N , then it can be extended uniquely to the
entire N , because each element A N is the disjoint union of events of U N (see
8
again Exercise 1.4) and therefore the probability of A can be inferred by the property (ii)
in the definition of probability measure, see Definition 1.1.4. But then P 0 extends
uniquely to
N.
N
9
O (plus the empty set, of course).
Exercise 1.5 Find the partition of 1, 2, 3, 4, 5, 6 that generates the σ algebra 2 in
Exercise 1.2 .
Exercise 1.6 () Prove the following properties:
1. PA c 1 PA;
2. PA B PA PB PA B;
3. If A B, then PA PB.
Exercise 1.7 (Continuity of probability measures (*)). Let A k k such that
A k A k1 , for all k . Let A k A k . Show that
lim PA k PA.
k
Similarly, if now A k k such that A k1 A k , for all k and A k A k , show
that
lim PA k PA.
k
Exercise 1.8 () Prove that ω N Pω 1, where Pω is given by (1.2).
Exercise 1.9 () Given a fair coin and assuming N is odd, consider the following two
events A, B N :
A “the number of heads is greater than the number of tails”,
B “The first toss is a head”.
Use your intuition to guess whether the two events are independent.
10
valued random variable X X 1 , , X N : N can be defined by simply requiring
that each component X j : is a random variable in the sense of Definition 2.1.1.
Remark 2.1 (Notation). A generic real-valued random variable will be denoted by X.
If we need to consider two such random variables we will denote them by X, Y , while
N real-valued random variables will be denoted by X 1 , , X N . Note that
X 1 , , X N : N is a vector-valued random variable. The letter Z is used for
complex-valued random variables.
Remark 2.2. Equality among random variables is always understood to hold up to a null
set. That is to say,X Y always means X Y a.s., for all random variables
X, Y : .
Random variables are also called measurable functions, but we prefer to use this
terminology only when and BR. Measurable functions will be denoted
by small Latin letters (e.g., f, g, ). If X is a random variable and Y fX for some
measurable function f, then Y is also a random variable. We denote
PX U PX U the probability that X takes value in U BR. Moreover,
given two random variables X, Y : and the Borel sets U, V , we denote
PX U, Y V PX U Y V,
which is the probability that the random variable X takes value in U and Y takes value
in V . The generalization to an arbitrary number of random variables is straightforward.
As the value attained by X depends on the result of the experiment, random variables
carry information, i.e., upon knowing the value attained by X we know something about
the outcome ω of the experiment. For instance, if Xω 1 ω , where ω is the result
of tossing a dice, and if we are told that X takes value 1, then we infer immediately that
the dice roll is even. The information carried by a random variable X forms the
σ algebra generated by X, whose precise definition is the following.
Definition 2.1.2 Let X : be a random variable. The σ algebra generated by
X is the collection σX of events given by
σX A : A X U, for some U BR.
If G is another σ algebra of subsets of and σX G, we say that X is G
measurable. If Y : is another random variable and σY σX, we say that
Y is X measurable
Thus σX contains all the events that are resolved by knowing the value of X. The
interpretation of X being G measurable is that the information contained in G suffices
to determine the value taken by X in the experiment. Note that the σ algebra generated
by a deterministic constant consists of trivial events only.
Definition 2.1.3 The σ algebra σX, Y generated by two random variables X, Y :
is the smallest σ algebra containing σX σY, that is to say σX, Y O , where
O σX σY, and similarly for any number of random variables.
11
If Y is X measurable then σX, Y σX, i.e., the random variable Y does not add
any new information to the one already contained in X. Clearly, if Y fX for some
measurable function f, then Y is X measurable. It can be shown that the opposite is
also true: if σY σX, then there exists a measurable function f such that Y fX.
The other extreme is when X and Y carry distinct information, i.e., when σX σY
consists of trivial events only. This occurs in particular when the two random variables
are independent.
Definition 2.1.4 Let X : be a random variable and G be a subσ algebra.
We say that X is independent of G if σX and G are independent. Two random variables
X, Y : are said to be independent random variables if the σ algebras σX and
σY are independent. More generally, the random variables X 1 , , X N are independent if
σX1, . . . , σXN are independent σ-algebras.
12
N
X akIA k
k1
Thus a simple random variable X can attain only a finite number of values, while a
discrete random variable X attains countably infinite many values. In both cases we
have
0, if x ImX,
PX x
PAk, if x ak,
where ImX x : Xω x, for some ω is the image of X. We remark that
most references do not assume, in the definition of simple random variable, that the sets
A 1 , , A N should be disjoint. We do so, however, because all simple random variables
considered in these notes satisfy this property and because the sets A 1 , , A N can
always be re-defined in such a way that they are disjoint, without modifying the image
of the simple random variable, see Exercise 2.5. Similarly the condition that
a 1 , , a N should be distinct can be removed from the definition of simple random
variable. Let us see two examples of simple/discrete random variables that appear
in financial mathematics (and in many other applications). A simple random variable
X is called a binomial random variable if
RangeX 0, 1, . . . , N;
N
There exists p 0, 1 such that PX k p k 1 p Nk , k 1, , N.
k
For instance, if we let X to be the number of heads in a N toss, then X is binomial.
A widely used model for the evolution of stock prices in financial mathematics
assumes that the price of the stock at any time is a binomial random variable
(binomial asset pricing model). A discrete random variable X is called a
Poisson variable if
RangeX 0;
µ k e µ
There exists µ 0 such that PX k
k!
, k 0, 1, 2, .
We denote by Pµ the set of all Poisson random variables with parameter µ 0.
The following important theorem shows that all non-negative random variables
can be approximated by a sequence of simple random variables.
Theorem 2.1.7 Let X : 0, be a random variable and let n be given. For
k 0, 1, . . . n2 n 1, consider the sets
A k,n : X kn , k n 1
2 2
13
and for k n2 n let
A n2 n , n X n.
Note that A k,n k0, , n2 n is a partition of , for all fixed n . Define the simple random
variables
n2 n
s Xn ω k IA ω.
2 n k,n
k0
All probability density functions considered in these notes are continuous, and therefore
the integral in (2.1) can be understood in the Riemann sense. Moreover in this case F X
is differentiable and we have
f X dF X .
dx
If the integral in (2.1) is understood in the Lebesgue sense, then the density f X can be a
quite irregular function. In this case, the fundamental theorem of calculus for the
Lebesgue integral entails that the distribution F X x satisfying (2.1) is absolutely
continuous, and so in particular it is continuous. Conversely, if F X is absolutely
continuous, then X admits a density function. We remark that, regardless of the
notion of integral being used, a simple (or discrete) random variable X cannot admit
a density in the sense of Definition 2.2.2, unless it is a deterministic constant.
N
Suppose in fact that X k1 a k I A k is not a deterministic constant. Assume that
a 1 maxa 1 , , a N . Then
lim F X x PA 2 PA N 1,
xa 1
while
14
lim F X x 1 F X a 1 .
xa 1
It follows that F X x is not continuous, and so in particular it cannot be written in the
form (2.1). To define the pdf of simple random variables, let
N
X akIA , k
k1
where without loss of generality we assume that the real numbers a 1 , , a N are
distinct and the sets A 1 , , A N are disjoint (see Exercise 2.5). The distribution function
of X is
F X x PX x PX a k . 2. 2
a k x
which extend (2.1) to simple random variables. We remark that it is possible to unify the
definition of pdf for continuum and discrete random variables by writing the sum (2.4) as
an integral with respect to the Dirac measure, but we shall not do so.
We shall see that when a random variable X admits a density f X , all the relevant
statistical information on X can be deduced by f X . We also remark that often one can
prove the existence of the pdf f X without however being able to derive an explicit formula
for it. For instance, f X is often given as the solution of a partial differential equation,
or through its (inverse) Fourier transform, which is called the characteristic function of
X, see Section 3.3. Some examples of density functions, which have important
applications in financial mathematics, are the following.
Examples of probability density functions
A random variable X : is said to be a normal (or normally
distributed)
random variable if it admits the density
xm 2
f X x 1 e 2σ 2 ,
2πσ 2
for some m and σ 0, which are called respectively the expectation (or
mean) and the deviation of the normal random variable X, while σ 2 is called
the
variance of X. We denote by Nm, σ 2 the set of all normal random variables
with
expectation m and variance σ 2 . If m 0 and σ 2 1, X N0, 1 is said to be a
standard normal variable. The density function of standard normal random
15
variables is denoted by , while their distribution is denoted by Φ, i.e.,
x y2
x 1 e 2 , Φx 1 e 2 dy.
x2
2π 2π
A random variable X : is said to be an exponential (or exponentially
distributed) random variable if it admits the density
f X x λe λx I x0 ,
for some λ 0, which is called the intensity of the exponential random variable
X.
We denote by λ the set of all exponential random variables with intensity
λ 0.
The distribution function of an exponential random variable X with intensity λ is
given by
x x
F X x f X ydy λ 0 e λy dy 1 e λx .
A random variable X : is said to be chi-squared distributed if it admits
the
density
δ/21 x/2
f X x xδ/2 e I x0 ,
2 Γδ/2
for some δ 0, which is called the degree of the chi-squared distributed
random
variable. Here Γt z t1 e z dz, t 0 is the Gamma-function. Recall the
0
relation
Γn n 1!
2
for n . We denote by χ δ the set of all chi-squared distributed random
variables with degree δ.
A random variable X : is said to be non-central chi-squared
distributed
with degree δ 0 and non-centrality parameter β 0 if it admits the density
δ
12
f X x 1 e 2 x
xβ 4
I δ/21 βx I x0 , 2. 5
2 β
where I ν y denotes the modified Bessel function of the first kind. We denote
by
χ 2 δ, β the random variables with density (2.5). It can be shown that
χ 2 δ, 0 χ 2 δ.
A random variable X : is said to be Cauchy distributed if it admits the
density
γ
f X x
πx x 0 2 γ 2
for x 0 and γ 0 , called the location and the scale of the Cauchy pdf.
A random variable X : R is said to be Levy distributed if it admits the
density
16
c
f X x c e 2xx0 I x x 0 ,
2π x x 0 3/2
for x 0 and c 0, called the location and the scale of the Levy pdf.
If a random variable X admits a density f X , then for all (possibly unbounded) intervals
I then
PX I I f X ydy. 2. 6
where p 0 PX 0 and Hx is the Heaviside function, i.e., Hx 1 if x 0, Hx 0
if x 0. By introducing the delta-distribution through the formal identity
H x δx 2. 8
then we obtain, again formally, the following expression for the density function
dFXx
f X x p 0 δx f X x. 2. 9
dx
The formal identities 2. 8 2. 9 become rigorous mathematical expressions when
they are understood in the sense of distributions. We shall refer to the term p 0 δx as
the discrete part of the density. The function f X is also called the defective density
of the random variable X. Note that
0 f X xdx 1 p 0 .
17
The defective density is the actual pdf of X if and only if p 0 0.
The typical example of financial random variable whose pdf may have a discrete part
is the stock price St at time t. For simple models (such us the geometric Browniam
motion 2. 14 defined in Section 2. 4 below), the stock price is strictly positive a.s. at all
finite times and the density has no discrete part. However for more sophisticated models
the stock price can reach zero with positive probability at any finite time and so the pdf
of the stock price admits a discrete part PSt 0δx. Hence these models take into
account the risk of default of the stock.
Moreover, if two random variables X, Y admit a joint density f X,Y , then each of them
admits a density (called marginal density in this context) which is given by
f X x f X,Yx, ydy, f Y y f X,Yx, ydx.
To see this we write
x x
PX x PX x, Y R f X,Yη, ξdηdξ f X ηdη
and similarly for the random variable Y
y y
PY y PY y, X R f X,Yη, ξdηdξ f Yξdξ
If W gX, Y, for some measurable function g, and I is an interval, the analogue
of (2.7) in 2 dimensions holds, namely:
PgX, Y I x,y:gx,yI f X,Yx, ydxdy.
As an example of joint pdf, let m m 1 , m 2 2 and C C ij i,j1,2 be a 2 2 positive
definite, symmetric matrix. Two random variables X, Y : are said to be jointly
normally distributed with mean m and covariance matrix C if they admit the joint
density
18
f X,Y x, y 1 exp 1 z m C 1 z m T , 2. 11
2
2π det C 2
where z x, y, “” denotes the row by column product, C 1 is the inverse matrix of C
and v T is the transpose of the vector v.
In the next theorem we establish a simple condition for the independence of two random
variables which admit a joint density.
Theorem 2.2.4 The following holds.
(i) If two random variables X, Y admit the densities f X , f Y and are independent, then
they admit the joint density
f X,Y x, y f X xf Y y.
(ii) If two random variables X, Y admit a joint density f X,Y of the form
f X,Y x, y uxvy,
for some functions u, v : 0, , then X, Y are independent and admit the densities
f X , f Y given by
f X x cux, f Y y 1c vy,
where
1
c vxdx uydy .
19
then c 1/c. It remains to prove that X, Y are independent. This follows by
PX U, Y V U V f X,Yx, ydxdy U uxdx V vydy
U cuxdx V 1 vydy
c U f X xdx V f Yydy
PX UPY V, for all U, V BR.
Remark 2.3 By Theorem 2.2.4 and the result of Exercise 12, we have that two jointly
normally distributed random variables are independent if and only if ρ 0 in the formula
(2.12).
The parameter t will be referred to as time parameter, since this is what it represents
in the applications in financial mathematics. Examples of stochastic processes in
financial mathematics are given in the next section.
Definition 2.3.2 Two stochastic processes Xtt 0, Ytt 0 are said to be
independent if for all m, n and 0 t 1 t 2 t n , 0 s 1 s 2 s m , the
σ algebras σXt 1 , , Xt n , σYs 1 , , Ys m are independent.
Hence two stochastic processes Xt t0 , Yt t0 are independent if the information
obtained by “looking” at the process Xt t0 up to time T is independent of the
information obtained by “looking” at the process Yt t0 up to time S, for all
S, T 0. Similarly one defines the notion of several independent stochastic processes.
Remark 2.4 (Notation). If t runs over a countable set, i.e., t t k k , then a stochastic
process is equivalent to a sequence of random variables X 1 , X 2 , , where X k Xt k . In
this case we say that the stochastic process is discrete and we denote it by X k kN . An
example of discrete stochastic process is the random walk defined below.
20
t, ω X k ωI t , t k k1
.
k0
A typical path of a step process is depicted in Figure 2.3. Note that the paths of a step
process are right-continuous, but not left-continuous. Moreover, since X k ω t k , ω,
we can rewrite t as
t t k It k , t k1 .
k
Hence X t is the smallest σ-algebra containing σXs, for all 0 s t, see
Definition 1.1.2. Similarly one defines the filtration X,Y t t0 generated by two
stochastic processes Xt t0 , Yt t0 , as well as the filtration generated by
any number of stochastic processes.
Definition 2.3.4 If t t0 is a filtration and X t t, for all t 0, we
say that the stochastic process Xt t0 is adapted to the filtration t t0 .
21
The property of Xtt 0 being adapted to t t0 means that the information
contained in t suffices to determine the value attained by the random variable
Xs, for all s 0, t. Clearly, Xt t0 is adapted to its own generated filtration
X tt 0. Moreover if Xt t0 is adapted to t t0 and Yt fXt, for
some measurable function f, then Yt t0 is also adapted to t t0 .
Next we give an example of (discrete) stochastic process. Let Xt t be a sequence
of independent random variables satisfying
X t 1 with probability 1 , X t 1 with probability 1 ,
2 2
for all t . For a concrete realization of these random variables, we may think of X t
as being defined on the sample space of the -coin tosses experiment (see
Section 1.4). In fact, letting ω γ j j , we may set
1, if γ t H,
X t ω
1, if γ t T.
Hence X t : 1, 1 is the simple random variable X t ω I A t I A ct , where
A t ω : γ t H. Clearly, X t is the collection of all the events that are
resolved by the first t tosses, which is given as indicated at the beginning of
Section 1.3.
Definition 2.3.5 The stochastic process M t t given by
t
M 0 0, Mt Xk,
k1
To understand the meaning of the term “random walk”, consider a particle moving on
the real line in the following way: if X t 1 (i.e., if the toss number t is a head), at time t
the particle moves one unit of length to the right, if X t 1 (i.e., if the toss number t is
a head) it moves one unit of length to the left. Then M t gives the total amount of units of
length that the particle has travelled to the right or to the left up to time t.
The increments of the random walk are defined as follows. If k 1 , , k N N , such
that 1 k 1 k 2 k N , we set
1 M k 1 M 0 M k 1 , 2 M k 2 M k 1 , , N M k N M k N1 .
Hence j is the total displacement of the particle from time k j1 to time k j .
Theorem 2.3.6 The increments 1 , , N of the random walk are independent random
variables.
Proof Since
22
1 X 1 X k 1 g 1 X 1 , . . . , X k 1 ,
2 X k 1 1 X k 2 g 2 X k 1 1 , . . . , X k
N X k N 1 1 · · · X k N g N X k N1 1 · · · X k N ,
the result follows by the fact that, given n i as the number of elements in the set I i so
that n 1 n 2 n m N, where the set X 1 , , X N has been divided into m separate
groups of random variables, and g 1 g N are measurable functions, then the increments
1 , , N are indpendent.
The interpretation of this result is that the particle has no memory of past movements:
the distance travelled by the particle in a given interval of time is not affected by the
motion
of the particle at earlier times.
It can be shown that Brownian motions exist. In particular, it can be shown that the
sequence of stochastic processes Wnt t0 , n N, defined by
W n t 1 Mnt, 2. 12
n
where M t is the symmetric random walk and z denotes the integer part of z,
converges to a Brownian motion. Therefore one may think of a Brownian motion as
a time-continuum version of a symmetric random walk which runs for an infinite
number of “infinitesimal time steps”. In fact, provided the number of time steps is
sufficiently large, the process Wnt t0 gives a very good approximation of a
Brownian motion, which is useful for numerical computations. Notice that there exist
many Brownian motions and each of them may have some specific properties besides
23
those listed in Definition 2.3.7. However, as long as we use only the properties i iii,
we do not need to work with a specific example of Brownian motion.
Once a Brownian motion is introduced it is natural to require that the filtration Ft t0
should be somehow related to it. For our future financial applications, the following
class of filtrations will play a fundamental role.
Definition 2.3.8 Let Wt t0 be a Brownian motion and denote by σ Wt the
σ algebra generated by the increments {W(s) - W(t); s t}, that is
σ Wt Ot , Ot st σWs Wt.
A filtration t t0 is said to be a non-anticipating filtration for the Brownian motion
Wt t0 if Wt t0 is adapted to t t0 and if the σ algebras σ Wt, t are
independent for all t 0.
The meaning is the following: the increments of the Brownian motion after time t are
independent of the information available at time t in the σ-algebra t. It is clear by the
previous definition that W t t0 is a non-anticipating filtration for Wt t0 . We shall
see later that many properties of Brownian motions that depend on W t t0 also holds
with respect to any non-anticipating filtration (e.g., the martingale property).
Another important example of stochastic process applied in financial mathematics is the
following.
Definition 2.3.9 A Poisson process with rate λ is a stochastic process Nt t0 such
that
(i) N0 0 a.s.;
(ii) The increments over disjoint time-intervals are independent;
(iii) For all s t, the increment Nt Ns belongs to Pλt s.
Note in particular that Nt is a discrete random variable, for all t 0, and that, in
contrast to the Brownian motion, the paths of a Poisson process are not continuous.
The Poisson process is the building block to construct more general stochastic
processes with jumps, which are very popular nowadays as models for the price of
certain financial assets.
24
is a stochastic process modelling a financial variable, then X(0) is a deterministic
constant.
The most popular model for the price of a stock is the geometric Brownian motion
stochastic process, which is given by
St S0expαt σWt. 2. 13
Here Wt t0 is a Brownian motion, α is the instantaneous mean of log-return,
σ 0 is the instantaneous volatility, while σ 2 is the instantaneous variance of the
stock. Note that α and σ are constant in this model. Moreover, S0 is the price at time
t 0 of the stock, which, according to Remark 2.5, is a deterministic constant.
Risk-free assets
A money market is a market in which the object of trading is money. More precisely,
a money market is a type of financial market where investors can borrow and lend
money at a given interest rate and for a period of timeT 1 year. Assets in the
money market (i.e., short term loans) are assumed to be risk-free, which means that
their value is always increasing in time. Examples of risk-free assets in the money
market are repurchase agreements (repo), certificates of deposit, treasure bills, etc.
25
The stochastic process corresponding to the price per share of a generic risk-free
asset will be denoted by Btt 0, T. The instantaneous interest rate of a
risk-free asset is a stochastic process Rt t0,T such that Rt 0, for all t 0, T,
and such that the value of the asset at time t is given by
t
Bt B0exp Rsds, t 0, T. 2. 14
0
This corresponds to the investor debit/credit with the money market at time t if the
amount B0 is borrowed/lent by the investor at time t 0. An investor lending
(resp. borrowing) money has a long (resp. short) position on the risk-free asset
(more precisely, on its interest rate). We remark that the integral in the right hand side
of (2.14) is to be evaluated path by path, i.e.,
t
Bt, ω B0exp Rs, ωds,
0
for all fixed ω . Although in the real world different risk-free assets have different
interest rates, throughout these notes we make the simplifying assumption that all
assets in the money market have the same instantaneous interest rateRt t0,T ,
which we call the interest rate of the money market. For the applications in options
pricing theory it is common to assume that the interest rate of the money market is
a deterministic constant Rt r, for all t 0, T. This assumption can be justified
by the relatively short time of maturity of options, see below.
Remark 2.6 The (average) interest rate of the money market is sometimes referred to
as “the cost of money”, and the ratio Bt/B0 is said to express the “time-value of
money”. This terminology is meant to emphasise that one reason for the
“time-devaluation” of money—in the sense that the purchasing power of money
decreases with time—is precisely the fact that money can grow interests by purchasing
risk-free assets.
26
Financial derivative
A financial derivative (or derivative security) is a contract whose value depends on
the performance of one (or more) other asset(s), which is called the underlying asset.
There exist various types of financial derivatives, the most common being options,
futures, forwards and swaps. Financial derivatives can be traded over the counter
(OTC), or in a regularised market. In the former case, the contract is stipulated
between two individual investors, who agree upon the conditions and the price of
the contract. In particular, the same derivative (on the same asset, with the same
parameters) can have two different prices over the counter. Derivatives traded in
the market, on the contrary, are standardized contracts. Anyone, after a proper
authorisation, can make offers to buy or sell derivatives in the market, in a way
much similar to how stocks are traded. Let us see some examples of financial
derivative
A call option is a contract between two parties, the buyer (or owner) of the call
and the seller (or writer) of the call. The contract gives to the buyer the right, but
not the obligation, to buy the underlying asset at some future time for a price agreed
upon today, which is called strike price of the call. If the buyer can exercise this
option only at some given time t T 0 (where t 0 corresponds to the time at
which the contract is stipulated) then the call option is called European, while if the
option can be exercised at any time in the interval 0, T, then the option is called
American. The time T 0 is called maturity time, or expiration date of the call.
The seller of the call is obliged to sell the asset to the buyer if the latter decides to
exercise the option. If the option to buy in the definition of a call is replaced by the
option to sell, then the option is called a put option.
In exchange for the option, the buyer must pay a premium to the seller. Suppose
that the option is a European option with strike price K, maturity time T and premium
Π 0 on a stock with price St at time t. In which case is it then convenient for the buyer
to exercise the call? Let us define the payoff of a European call as
Y ST K : max0, ST K,
i.e., Y 0 if the stock price at the expiration date is higher than the strike price of the
call and it is zero otherwise; similarly for a European put we set
Y K ST .
Note that Y is a random variable, because it depends on the random variable ST.
Clearly, if Y 0 it is more convenient for the buyer to exercise the option rather than
buying/selling the asset on the market. Note however that the real profit for the buyer
is given by NY Π 0 , where N is the number of option contracts owned by the buyer.
27
Typically, options are sold in stocks of 100 shares, that is to say, the minimum amount
of options that one can buy is 100, which cover 100 shares of the underlying asset.
One reason why investors buy calls in the market is to protect a short position on
the underlying asset. In fact, suppose that an investor short-sells 100 shares of a
stock at time t 0 with the agreement to return them to the original owner at time
t 0 0. The investor believes that the price of the stock will go down in the future,
but of course the price may go up instead. To avoid possible large losses, at time
t 0 the investor buys 100 shares of an American call option on the stock expiring
at T t 0 , and with strike price K S0. If the price of the stock at time t0 is not
lower than the price S0 as the investor expected, then the investor will exercise
the call, i.e., will buy 100 shares of the stock at the price K S0. In this way the
investor can return the shares to the lender with minimal losses. At the same
fashion, investors buy put options to protect a long position on the underlying asset.
The reason why investors write options is mostly to get liquidity (cash) to invest in
other assets.
Let us introduce some further terminology. A European call (resp. put) is said to be
in the money at time t if St K (resp. St K). The call (resp. put) is said to be
out of the money if St K (resp. St K). If St K, the (call or put) option is said
to be at the money at time t. The meaning of this terminology is self-explanatory.
The premium that the buyer has to pay to the seller for the option is the price (or value)
of the option. It depends on time (in particular, on the time left to expiration). Clearly,
the deeper in the money is the option, the higher will be its price. Therefore the holder of
the long position on the option is the buyer, while the seller holds the short position on
the
option.
European call and put options are examples of more general contracts called European
derivatives. Given a function g : 0, , a standard European derivative with
pay-off Y gST and maturity time T 0 is a contract that pays to its owner the
amount Y at time T 0. Here ST is the price of the underlying asset (which we take
to be a stock) at time T. The function g is called pay-off function of the derivative.
The term “European” refers to the fact that the contract cannot be exercised before
time T, while the term “standard” refers to the fact that the pay-off depends only on
the price of the underlying at time T. The pay-off of a non-standard European derivative
depends on the path of the asset price during the interval [0, T]. For example, the
T
pay-off of an Asian call is given by Y 0 Stdt K .
28
The price at time t of a European derivative (standard or not) with pay-off Y and
expiration date T will be denoted by Π Y t. Hence Π Y t t0,T is a stochastic
process. In addition, we now show that Π Y T Y holds, i.e., there exist no offers
to buy (sell) a derivative for less (more) than Y at the time of maturity. In fact,
suppose that a derivative is sold for Π Y T Y “just before” it expires at time T.
In this way the buyer would make the sure profit Y Π Y T at time T, which
means that the seller would loose the same amount. On the contrary, upon
buying a derivative “just before” maturity for more than Y , the buyer would
loose Y Π Y T. Thus in a rational market, Π Y T Y (or, more precisely,
Π Y t Y , as t T).
Portfolio
The portfolio of an investor is the set of all assets in which the investor is
trading. Mathematically it is described by a collection of N stochastic processes
h 1 t t0 , h 2 t t0 , , h N t t0 ,
where h k t represents the number of shares of the asset k at time t in the
investor portfolio. If h k t is positive, resp. negative, the investor has a long,
resp. short, position on the asset k at time t. If Π k t denotes the value of the
asset k at time t, then Π k t t0 is a stochastic process; the portfolio value is
the stochastic process Vt t0 given by
N
Vt h k tΠ k t.
k1
The investor makes a profit in the time interval t 0 , t 1 if Vt 1 Vt 0 ; the
investor incurs in a loss in the interval t 0 , t 1 if Vt 1 Vt 0 . We now introduce
the important definition of arbitrage portfolio.
Definition 2.4.1 An arbitrage portfolio is a portfolio whose value Vt t0
satisfies the following properties, for some T 0:
(i) V0 0 almost surely;
(ii) VT 0 almost surely;
(iii) PVT 0 0.
29
Hence an arbitrage portfolio is a risk-free investment in the interval 0, T which
requires no initial wealth and with a positive probability to give profit. We remark
that the arbitrage property depends on the probability measure P. However, it is
clear that if two measures P and P are equivalent, then the arbitrage property
is satisfied with respect to P if and only if it is satisfied with respect to P. The
guiding principle to devise theoretical models for asset prices in financial
mathematics is to ensure that one cannot set-up an arbitrage portfolio by investing
on these assets (arbitrage-free principle).
Markets
A market in which the objects of trading are N risky assets (e.g., stocks) and M
risk-free assets in the money market is said to be “N M dimensional”. Most of
these notes focus on the case of 1 1 dimensional markets in which we assume
that the risky asset is a stock. A portfolio invested in this market is a pair
h S t, h B tt 0 of stochastic processes, where h S t is the number of shares of
the stock and h B t the number of shares of the risk-free asset in the portfolio at time
t. The value of such portfolio is given by
Vt h S tSt h B tBt,
where St is the price of the stock (given for instance by (2.13)), while Bt is the value
at time t of the risk-free asset, which is given by (2.14).
Exercises
1. Prove that σX is a σ-algebra.
2. Show that when X, Y are independent random variables, then σX σY
1. consists of trivial events only. Show that two deterministic constants are
always
independent. Finally assume Y gX and show that in this case the two
random
variables are independent if and only if Y is a deterministic constant.
3. Prove Theorem 2.1.5 for the case N 2.
4. Let a random variable X have the form
M
X bkIB , k
k1
30
N
X akIa . k
k1
5. Show that
Pa X b F X b F X a.
Show also that F X is (1) right-continuous, (2) increasing and (3) lim x F X x 1.
6. Let X N0, 1 and Y X 2 . Show that Y χ 2 1.
7. LetX N0, 1 and Y 1 be independent. Compute PX Y.
8. Derive the density of the geometric Brownian motion (2.14) and use the result to
show that PSt 0 0, i.e., a stock whose price is described by a geometric
Brownian motion cannot default.
3. Expectation
Throughout this chapter we assume that , F, Ft t0 , P is a given filtered
probability space.
3.1 Expectation and variance of random variables
Suppose that we want to estimate the value of a random variable X before the
experiment has been performed. What is a reasonable definition for our “estimate”
of X? Let us first assume that X is a simple random variable of the form
N
X akIA , k
k1
for some finite partition A k k1,...,N of and real distinct numbers a 1 , , a N . In this
case, it is natural to define the expected value (or expectation) of X as
N N
EX a k PA k a k PX a k .
k1 k1
That is to say, EX is a weighted average of all the possible values attainable by
X, in which each value is weighted by its probability of occurrence. This definition
applies also for N (i.e., for discrete random variables) provided of course the
infinite series converges. For instance, if XPµ we have
k µ
EX kPX k k µ k!e
k0 k0
µk µ r1 µr
e µ e µ e µ µ µ.
k 1! r! r!
k1 r0 r0
Now let X be a non-negative random variable and consider the sequence s Xn n
of simple functions defined in Theorem 2.1.7 Recall that s Xn converges pointwise to
X as n , i.e., s Xn ω Xω, for all ω Since
31
n2 n 1
Es Xn k P k X k1
2n 2n 2n
nPX n, 3. 1
k1
i.e., EX lim n Es Xn , where s X1 , s X2 , is the sequence of simple functions converging
pointwise to X.
We remark that the limit in (3.2) exists, because (3.1) is an increasing sequence,
although this limit could be infinity. When the limit is finite we say that X has
finite expectation. This happens for instance when X is bounded, i.e.,0 X C a.s., for
some positive constant C.
Remark 3.1 (Monotone convergence theorem). It can be shown that that the limit (3.2)
is the same along any non-decreasing sequence of non-negative random variables that
converge pointwise to X, hence we can use any such sequence to define the
expectation
of a non negative random variable. This follows by the monotone convergence
theorem,
whose precise statement is the following: If X 1 , X 2 , . is a non-decreasing sequence of
non-negative random variables such that X n X pointiwise a.s., then EX n EX.
Remark 3.2 (Dominated convergence theorem). The sequence of simple random
variables
used to define the expectation of a non-negative random variable need not be
non-decreasing either. This follows by the dominated convergence theorem, whose
precise statement is the following: if X 1 , X 2 , is a sequence of non-negative random
variables such that X n X, as n , pointiwise a.s., and sup n X n Y for some
non-negative random variable Y with finite expectation, then lim n EX n EX.
Next we extend the definition of expectation to general random variables. For this
purpose we use that every random variable X : R can be written as
X X X,
where
X max0, X, X minX, 0
are respectively the positive and negative part of X. Since X are non-negative random
variables, then their expectation is given as in Definition 3.1.1
Definition 3.1.2. Let X : R be a random variable and assume that at least one of
32
the
random variables X , X has finite expectation. Then we define the expectation of X as
EX EX EX .
If X have both finite expectation, we say that X has finite expectation or that it is an
integrable random variable. The set of all integrable random variables on will be
denoted
by L 1 , or by L 1 , P if we want to specify the probability measure.
Remark 3.3 (Notation). Of course the expectation of a random variable depends on the
probability measure. If another probability measure P is defined on the σ algebra of
events
(not necessarily equivalent to P), we denote the expectation of X in P by EX.
Remark 3.4 (ExpectationLebesgue integral). The expectation of a random variable X
with respect to the probability measure P is also called the Lebesgue integral of X over
in the measure P and it is also denoted by
EX XωdPω.
We shall not use this notation.
33
EX Y EX X Y Y
EX X EY Y
EX X EY Y
EX EY.
Hence, it suffices to prove the claim for non - negative random variables.
Next assume that X, Y are independent simple functions and write
N M
X ajIA , j Y bkIB . k
j1 k1
We have
N M
EX Y E a j I A j b k I B k
j1 k1
N M
E aj I Aj E bkI Bk
j1 k1
N M
E aj I Aj E bkIB k
j1 k1
EX EY.
(ii) If X Y, let X fX, X gX, and similarly for Y , where fs max0, s,
gs min0, s. Each of X , Y , X , Y , X , Y and X , Y is a pair of
independent (non-negative) random variables. Then, using X X X ,
Y Y Y, we have
EX EX X
EX EX
EY EY , since X Y X X Y Y X Y and X Y
EY Y
EY
34
Hence, it suffices to prove the claim for non - negative random variables.
Next assume that X, Y are independent simple functions and write
N M
X aj I Aj , Y bkIB . k
j1 k1
We have
N
EX E ajIA j
j1
M N M
E bkI Bk , since X Y aj I Aj bkIB k
k1 j1 k1
EY.
(iii) Let X fX, X gX, and similarly for Y , where fs max0, s,
gs min0, s. Each of X , Y , X , Y , X , Y and X , Y is a pair of
independent (non-negative) random variables.Then, using X X X ,
Y Y Y, we have
EX 0 EX X 0 X X 0 X X .
So EX 0 X 0 a.s. Hence, it suffices to prove the claim for non - negative
random variables.
Next assume that X, is an independent simple functions and write
N
X ajIA . j
j1
Then
N
EX 0 E ajIA j 0 either a j 0 or I A j 0 X 0 a.s.
j1
It follows that
Es Xn n 0.
35
(iv) Let X fX, X gX, and similarly for Y , where fs max0, s,
gs min0, s. Each of X , Y , X , Y , X , Y and X , Y is a pair of
independent (non-negative) random variables. Then, using X X X ,
Y Y Y and the linearity of the expectation, we find
EXY EX X Y Y
EX Y EX Y EX Y EX Y
EX EY EX EY EX EY EX EY
EX EX EY EY EXEY.
Hence it suffices to prove the claim for non-negative random variables. Next
assume that X, Y are independent simple functions and write
N M
X ajIA , j Y b k IB k .
j1 k1
We have
N M N M
XY ajbkIA IB j k a j b k I A B .
j k
j1 k1 j1 k1
Thus by linearity of the expectation, and since the events A j , B k are independent,
for all j, k, we have
N M N M
EXY a j b k EI AjBk a j b k PA j B k
j1 k1 j1 k1
N M N M
a j b k PA j PB k a j PA j b k PB k EXEY.
j1 k1 j1 k1
36
Letting Y 1 in the Schwarz inequality
EXY EX 2 EY 2 3. 3
with X, Y L 2 , we find
L 1 L 2 .
The covariance CovX, Y of two random variables X, Y L 2 is defined as
CovX, Y EXY EXEY.
Two random variables are said to be uncorrelated if CovX, Y 0. By
Theorem 3.1.3 (iv), if X, Y are independent then they are uncorrelated, but the
opposite is not true in general. Consider for example the simple random variables
1 with probability 1/3
X 0 with probability 1/3
1 with probability 1/3
and
0 with probability 1/3
Y X2
1 with probability 2/3
Then X and Y are clearly not independent, but
CovX, Y EXY EXEY EX 3 0 0,
since EX 3 EX 0.
Definition 3.1.4 The variance of a random variable X L 2 is given by
VarX EX EX 2 .
Using the linearity of the expectation we can rewrite the definition of variance as
VarX EX 2 2EEXX EX 2 EX 2 EX 2 CovX, X.
Note that a random variable has zero variance if and only if X EX a.s., hence
we may view VarX as a measure of the “randomness of X”. As a way of example,
let us compute the variance of X Pµ. We have
k µ
EX 2 k 2 PX k k 2 µ k!e e µ k
k 1!
µk
k0 k0 k1
µr
e µ r 1 µ r1 e µ µ µ rPX r µ µEX µ µ 2 .
r! r!
r0 r0 r0
Hence
VarX EX 2 EX 2 µ µ 2 µ 2 µ.
37
VarX Y VarX VarY holds if and only if X, Y are uncorrelated. Moreover,
if we define the correlation of X, Y as
CovX, Y
CorX, Y ,
VarXVarY
then CorX, Y 1, 1 and |CorX, Y| 1 if and only if Y is a linear function of X.
The interpretation is the following: the closer is CorX, Y to 1 (resp. 1), the more
the variable X and Y have tendency to move in the same (resp. opposite) direction
(for instance, (CorX, 2X 1, CorX, 2X 1. An important problem in quantitative
finance is to find correlations between the price of different assets.
Remark 3.5 (L 2 -norm). The norm Z 2 EZ 2 in L 2 is called L 2 norm. It
can be shown that it is a complete norm, i.e., if X N n L 2 is a Cauchy
sequence of random variables in the norm L 2 , then there exists a random variable
X L 2 such that X N X 2 0 as n .
Next we want to present a first application in finance of the theory outlined above.
In particular we establish a sufficient condition which ensures that a portfolio is not
an arbitrage.
Theorem 3.1.5 Let a portfolio be given with value Vt t0 . Let V t DtVt be
the discounted portfolio value. If there exists a measure P equivalent to P such that
EV t is constant (independent of t), then the portfolio is not an arbitrage.
Proof. Assume that the portfolio is an arbitrage. Then V0 0 almost surely; as
V 0 V0, the assumption of constant expectation in the probability measure
P gives
EV t 0, for all t 0. 3. 4
Let T 0 be such that PVT 0 1 and PVT 0 0. Since P and P are
equivalent, we also have PVT 0 1 and PVT 0 0. Since the discounting
process is positive, we also have PV T 0 1 and PV T 0 0. However
this contradicts (3.4), due to Theorem 3.1.3 (iii). Hence our original hypothesis that
the portfolio is an arbitrage portfolio is false.
Theorem 3.1.6 (Radon - Nikodym). Let P and P be equivalent probability measures
defined on , . Then there exists an almost sure positive random variable Z such
that EZ 1 and
PA A ZdP for every A .
38
(ii) There exists a unique (up to null sets) random variable Z : such that
Z 0 almost surely, EZ 1 and PA EZI A , for all A F.
Moreover, assuming any of these two equivalent conditions, for all random variables
X such that XZ L 1 , P, we have X L 1 , P and
EX EZX. 3. 5
Proof The implication (i) (ii) follows by the Radon-Nikodym theorem.
(ii) (i), we first observe that P EZI EZ 1. Hence, to prove that P is
a probability measure, it remains to show that it satisfies the countable additivity
property: for all families A k k of disjoint events, P k A k k PA k . To prove this
let
B n nk1 A k .
Clearly, ZI B n is an increasing sequence of random variables. Hence, by the monotone
convergence theorem we have
n
lim EZI B n EZI B , B k1 A k ,
i.e.,
lim PBn PB .
n
3. 6
On the other hand, by linearity of the expectation,
n n
PB n EZ B n EZI nk1 A k EZI A 1 I A n EZA k PA k .
k1 k1
This proves that P is a probability measure. To show that P and P are equivalent, let
A be such that PA 0. Since ZI A 0 almost surely, then PA EZI A 0 is
equivalent, by Theorem 3.1.3 (iii), to ZI A 0 almost surely. Since Z 0 almost surely,
then this is equivalent to I A 0 a.s., i.e., PA 0. Thus PA 0 if and only if PA 0,
i.e., the probability measures P and P are equivalent. It remains to prove the identity
(3.5).
If Xis the simple random variable X k a k I A k , then the proof is straightforward:
For a general non-negative random variable X the result follows by applying (3.5) to an
increasing sequence of simple random variables converging to X and then passing to
the
limit (using the monotone convergence theorem). The result for a general random
variable
39
X : follows by applying (3.5) to the positive and negative part of X and using the
linearity of the expectation. Hence, the proof.
40