0% found this document useful (0 votes)
31 views40 pages

Stochastic Differential Equations Overview

Stochastic notes

Uploaded by

Alex Siame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views40 pages

Stochastic Differential Equations Overview

Stochastic notes

Uploaded by

Alex Siame
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

THE COPPERBELT UNIVERSITY

SCHOOL OF GRADUATE STUDIES


DEPARTMENT OF MATHEMATICS
M564 – STOCHASTIC DIFFERENTIAL EQUATIONS
2024/25 ACADEMIC YEAR

1.Probability Spaces
1.1 σ-algebras and information
We begin with some notation and terminology. The symbol  denotes a generic
non-empty set; the power of , denoted by 2, is the set of all subsets of . If the
number of elements in the set  is M  N, we say that  is finite. If  contains an
infinite number of elements and there exists a bijection   N, we say that  is
countably infinite. If  is neither finite nor countably infinite, we say that it is
uncountable. An example of uncountable set is the set  of real numbers. When
 is finite we write   ω 1 , ω 2 , , ω M , or   ω k  k1,,M . If  is countably infinite
we write   ω k  k . Note that for a finite set  with M elements, the power set
contains 2 M elements. For instance, if   , 1, $, then
2   , , 1, $, , 1, , $, 1, $, , 1, $  ,
which contains 2 3  8 elements. Here  denotes the empty set, which by definition
is a subset of all sets.
Within the applications in probability theory, the elements ω   are called sample
points and represent the possible outcomes of a given experiment (or trial), while the
subsets of  correspond to events which may occur in the experiment. For instance,
if the experiment consists in throwing a dice, then   1, 2, 3, 4, 5, 6 and A  2, 4, 6
identifies the event that the result of the experiment is an even number. Now let    N ,
 N  γ 1 , , γ N , γ k  H, T  H, T N , 1. 1
where H stands for “head” and T stands for “tail”. Each element ω  γ 1 , . . . , γ N   N
is called a N toss and represents a possible outcome for the experiment “tossing a coin
N consecutive times”. Evidently,  N contains 2 N elements and so 2  N contains
N
2 2 elements. We show later that   —the sample space for the experiment “tossing a
coin infinitely many times”—is uncountable.
A collection of events, e.g., A 1 , A 2 ,   2  , is also called information. To
understand the meaning of this terminology, suppose that the experiment has been
performed and we observe that the events A 1 , A 2 ,  have occurred. We may then
use this information to restrict the possible outcomes of the experiment. For instance,
if we are told that in a 5 toss the following two events have occurred:

1
1. there are more heads than tails
2. the first toss is a tail
then we may conclude that the result of the 5-toss is one of
T, H, H, H, H, T, T, H, H, H, T, H, T, H, H, T, H, H, T, H, T, H, H, H, T.
If in addition we are given the information that
3. the last toss is a tail,
then we conclude that the result of the 5 toss is T, H, H, H, T.
The power set of the sample space provides the total accessible information
and represents the collection of all the events that can be resolved (i.e., whose
occurrence can be inferred) by knowing the outcome of the experiment. For an
uncountable sample space, the total accessible information is huge and it is
typically replaced by a subclass of events   2  , which is imposed to form a
σ algebra.
Definition 1.1.1 A collection   2  of subsets of  is called a σ algebra (orσ field)
on  if
(i)   ;
(ii) A    A c : ω   : ω  A  ;
(iii)  k1 A k  , for all A k  k  .
If G is another σ algebra on  and G  , we say that G is a sub-σ-algebra of .

Remark 1.1 (Notation). The letter A is used to denote a generic event in the σ algebra.
If we need to consider two such events, we denote them by A, B, while N generic events
are denoted A 1 , , A N .
Let us comment on Definition 1.1.1 The empty set represents the “nothing happens”
event, while A c represents the “A does not occur” event. Given a finite number A 1 , . . . , A N
of events, their union is the event that at least one of the events A 1 , , A N occurs,
while their intersection is the event that all events A 1 , , A N occur. The reason to
include the countable union/intersection of events in our analysis is to make it possible
to “take limits” without crossing the boundaries of the theory. Of course, unions and
intersections of infinitely many sets only matter when  is not finite.
The smallest σ algebra on  is   , , which is called the trivial σ algebra. There
is no relevant information contained in the trivial σ algebra. The largest possible
σ algebra is   2  , which contains the full amount of accessible information. When 
is countable, it is common to pick 2  as σ-algebra of events. However, as already
mentioned, when  is uncountable this choice is unwise. A useful procedure to
construct a σ algebra of events when  is uncountable is the following. First we select
a collection of events (i.e., subsets of ), which for some reason we regard as
fundamental. Let O denote this collection of events. Then we introduce the smallest
σ algebra containing O, which is formally defined as follows.

2
Definition 1.1.2 Let O  2  . The σ algebra generated by O is
 O    :   2  is a σ  algebra and O   ,
i.e.,  O is the smallest σ algebra on  containing O.

The intersection of any number of σ-algebras is still a σ-algebra, see Exercise 3, hence
 O is a well-defined σ algebra. For example, let    d and let O be the collection of
all open balls:
O  B x R R0, x d , where B x R  y   d : |x  y| R.
The σ algebra generated by O is called Borel σ algebra and denoted B d . The
elements of B d  are called Borel sets.
Remark 1.2 (Notation). The Borel σ algebra B plays an important role in these
notes, so we shall use a specific notation for its elements. A generic event in the
σ algebra B will be denoted U; if we need to consider two such events we denote
them by U, V , while N generic Borel sets of  will be denoted U 1 , , U N . Recall that
for general σ algebras, the notation used is the one indicated in Remark 1.1.
The σ algebra generated by O has a particular simple form when O is a partition of .
Definition 1.1.3 Let I  N. A collection O  A k  kI of non-empty subsets of  is called
a partition of  if
(i) the events A k  kI are disjoint, i.e., A j  A k  , for j  k;
(ii)  kI A k  .
If I is a finite set we call O a finite partition of .

Note that any countable sample space   ω k  k is partitioned by the atomic events
A k  ω k , where ω k  identifies the event that the result of the experiment is exactly ω k .

1.2 Probability measure


To any event A   we want to associate a probability that A occurred.
Definition 1.2.1 Let  be a σ algebra on . A probability measure is a function
P :   0, 1
such that
(i)P  1;
(ii) for any countable collection of disjoint events A k  k  , we have
 
P  Ak   PA k .
k1 k1

A triple , , P is called a probability space.

3
The quantity PA is called probability of the event A; if PA  1 we say that the
event Aoccurs almost surely, which is sometimes shortened by a.s.; if PA  0 we say
that A is a null set. In general, the elements of  with probability zero or one will be
called trivial events (as trivial is the information that they provide). For instance,
P  1, i.e., the probability that “something happens” is one, and
P  P c   1  P  0, i.e., the probability the “nothing happens” is zero.
Let us see some examples of probability space.
 There is only one probability measure defined on the trivial σ-algebra, namely
P  0 and P  1.
 In this example we describe the general procedure to construct a probability
space on a countable sample space   ω k  k . We pick   2  and let
0  p k  1, k  N, be real numbers such that

 p k  1.
k1

 We introduce a probability measure on  by first defining the probability of


the atomic events ω 1 , ω 2 ,  as
Pω k   p k , k  .
 Since every (non-empty) subset of  can be written as the disjoint union of
atomic events, then the probability of any event can be inferred using the
property (ii) in the definition of probability measure, e.g.,
Pω 1 , ω 3 , ω 5   Pω 1   ω 3   ω 5 
 Pω 1   Pω 3   Pω 5   p 1  p 3  p 5 .
 In general we define
PA   pk, A  2,
k:ω k A

 while P  0.
 As a special case of the previous example we now introduce a probability
measure on the sample space  N of the N coin tosses experiment. Given
0  p  1 and ω   N , we define the probability of the atomic event ω as
Pω  p N H ω 1  p N T ω , 1. 2
 where N H ω is the number of H in ω and N T ω is the number of T in
ω N H ω  N T ω  N. We say that the coin is fair if p  1/2. The
probability of a generic event A    2  N is obtained by adding up
the probabilities of the atomic events whose disjoint union forms the event A.
For instance, assume N  3 and consider the event
“The first and the second toss are equal”.
 Denote by A   the set corresponding to this event. Then clearly A is the
(disjoint) union of the atomic events

4
H, H, H, H, H, T, T, T, T, T, T, H.
 Hence,
PA  PH, H, H  PH, H, T  PT, T, T  PT, T, H
 p 3  p 2 1  p  1  p 3  1  p 2 p  2p 2  2p  1.
 Let f : R  0,  be a measurable function such that

 fxdx  1.
 Then
PU  U fxdx, 1. 3

 defines a probability measure on B(R).

Remark 1.3 (Riemann vs. Lebesgue integral). The integral in (1.3) must be understood
in the Lebesgue sense, since we are integrating a general measurable function over
a general Borel set. If f is a sufficiently regular (say, continuous) function, and
U  a, b  R is an interval, then the integral in (1.3) can be understood in the
Riemann sense. Although this last case is sufficient for most applications in finance,
all integrals in these notes should be understood in the Lebesgue sense, unless
otherwise stated. The knowledge of Lebesgue integration theory is however not required
for our purposes.

Equivalent probability measures


A probability space is a triple , , P and if we change one element of this triple we get
a different probability space. The most interesting case is when a new probability
measure is introduced. Let us first show with an example (known as Bertrand’s
paradox) that there might not be just one “reasonable” definition of probability
measure associated to a given experiment. We perform an experiment whose result is
a pair of points p, q on the unit circle C (e.g., throw two balls in a roulette). The
sample space for this experiment is   p, q : p, q  C. Let T be the length of the
chord joining p and q. Now let L be the length of the side of a equilateral triangle
inscribed in the circle C. Note that all such triangles are obtained one from another by
a rotation around the center of the circle and all have the same sides length L. Consider
the event A  p, q   : T  L. What is a reasonable definition for PA? From one
hand we can suppose that one vertex of the triangle is p, and thus T will be greater than
L if and only if the point q lies on the arch of the circle between the two vertexes of the
triangle different from p, see Figure 1.1(a). Since the length of such arc is 1/3 the
perimeter of the circle, then it is reasonable to define PA  1/3. On the other hand, it
is simple to see that T  L whenever the midpoint m of the chord lies within a circle
of radius 1/2 concentric to C, see Figure 1.1(b). Since the area of the interior circle is
1/4 the area of C, we are led to define PA  1/4.

5
(a) PA  1/3 (b) PA  1/4
Figure 1.1: The Bertrand paradox. The length T of the cord pq is greater then L.

Whenever two probabilities are defined for the same experiment, we shall require them
to be equivalent, in the following sense.
Definition 1.2.2 Given two probability spaces , F, P and , F, P , the probability
measures P and P are said to be equivalent if PA  0  PA  0.

A complete characterisation of the probability measures P equivalent to a given P will


be given in Theorem 3.3.3

Conditional probability
It might be that the occurrence of an event B makes the occurrence of another event
A more or less likely. For instance, the probability of the event A  {the first two tosses
of a fair coin are both head} is 1/4; however if know that the first toss is a tail, then
PA  0, while PA  1/2 if we know that the first toss is a head. This leads to the
important definition of conditional probability.
Definition 1.2.3 Given two events A, B such that PB  0, the conditional probability
of A given B is defined as
PA  B
PA|B  .
PB

To justify this definition, let  B  A  B A , and set


P B   P|B. 1. 4
Then B,  B , P B  is a probability space in which the events that cannot occur
simultaneously with B are null events. Therefore it is natural to regard B,  B , P B  as

6
the restriction of the probability space , , P when B has occurred.
If PA|B  PA, the two events are said to be independent. The interpretation is the
following: if two events A, B are independent, then the occurrence of the event B does
not change the probability that A occurred. By Definition 1.1.4 we obtain the following
equivalent characterisation of independent events.
Definition 1.2.4 Two events A, B are said to be independent if PA  B  PAPB. In
general, the events A1, , A N N  2 are said to be independent if, for all 1  k 1  k 2 
  k m  N, we have
m
PA k 1    A k m    PA k .
j
j1

Two σ algebras , G are said to be independent if A and B are independent, for all
A  G and B  . In general the σ algebras  1 , ,  N N  2 are said to be
independent if A 1 , A 2 , , A N are independent events, for all A 1   1 , . . . , A N   N .

Note that if , G are two independent σ algebras and A    G, then A is trivial. In


fact, if A    G, then PA  PA  A  PA 2 . Hence PA  0 or 1. The
interpretation of this simple remark is that independent σ algebras carry distinct
information.

1.3 Filtered probability spaces


Consider again the N coin tosses probability space. Let A H be the event that the first
toss is a head and A T the event that it is a tail. Clearly A T  A cH and the σ algebra
 1 generated by the partition A H , A T  is  1  A H , A T , , . Now let A HH be the
event that the first 2 tosses are heads, and similarly define A HT , A TH , A TT . These
four events form a partition of  N and they generate a σ algebra  2 as indicated
in Exercise 1.4. Clearly,  1   2 . Going on with three tosses, four tosses, and so on,
until we complete the N toss, we construct a sequence
 1   2     N  2N
of σ algebras. The σ algebra  k contains all the events of the experiment that depend
on (i.e., which are resolved by) the first k tosses. The family F k  k1,,N of σ algebras
is an example of filtration.
Definition 1.3.1 A filtration is a one parameter family Ft t0 of σ algebras such that
t   for all t  0 and s  t for all s  t. A quadruple , , t t0 , P is
called a filtered probability space.

In our applications t stands for the time variable and filtrations are associated to
experiments in which “information accumulates with time”. For instance, in the
example given above, the more times we toss the coin, the higher is the number of
events which are resolved by the experiment, i.e., the more information becomes

7
accessible.

1.4 The   coin tosses probability space


In this section we outline the construction of the probability space for the -coin
tosses experiment. The sample space is
   ω  γ n  n , γ n  H, T.
Let us show first that  is uncountable. We use the well-known Cantor diagonal
argument. Suppose that   is countable and write
   ω k  k . 1. 5
Each ω k    is a sequence of infinite tosses, which we write as ω k  γ k
j ,
jN
where γ k
j is either H or T, for all j  N and for each fixed k  . Note that
k
γj is an “  ” matrix. Now consider the  toss corresponding to the
j, kN
diagonal of this matrix, that is
ω  γ m  m , γ m  γ m
m , for all m  .

Finally consider the  toss ω which is obtained by changing each single toss of ω,
that is to say
ω  γ m  m , where γ m  H if γm  T, and γ m  T if γ m  H, for all m  .
It is clear that the  toss ω does not belong to the set (1.5). In fact, by construction,
the first toss of ω is different from the first toss of ω 1 , the second toss of ω is
different from the second toss of ω 2 , . . . , the n th toss of ω is different from the nth
toss of ω n , and so on, so that each  toss in (1.5) is different from ω. We conclude
that the elements of   cannot be listed as they were comprising a countable set.
Now, let N   and recall that the sample space  N for the N tosses experiment is
given by (1.1). For each ω  γ 1 , . . . , γ N    N we define the event A ω   
by
A ω  ω  γ n  n : γ j  γ j , j  1, . . . , N,
i.e., the event that the first N tosses in a  toss be equal to γ 1 , . . . , γ N . Define the
probability of this event as the probability of the N toss ω, that is
P 0 A ω   p N H ω 1  p N T ω ,
where 0  p  1, N H ω is the number of heads in the N toss ω and N T ω  N  N H ω
is the number of tails in ω, see (1.2). Next consider the family of events
U N  A ω  ω N  2   .
It is clear that U N is, for each fixed N  , a partition of . Hence the σ algebra
 N   U N is generated according to Exercise 1.4. Note that N contains all events of  
that are resolved by the first N tosses. Moreover  N   N1 , that is to say,  N  N is
a filtration. Since P 0 is defined for all A ω  U N , then it can be extended uniquely to the
entire  N , because each element A   N is the disjoint union of events of U N (see

8
again Exercise 1.4) and therefore the probability of A can be inferred by the property (ii)
in the definition of probability measure, see Definition 1.1.4. But then P 0 extends
uniquely to
   N.
N

Hence we have constructed a triple   ,   , P 0 . Is this triple a probability space? The


answer is no, because   is not a σ algebra. To see this, let A k be the event that the
k th toss in a infinite sequence of tosses is a head. Clearly A k   k for all k and therefore
A k  k    . Now assume that   is a σ algebra. Then the event A   k A k would
belong to   and therefore also A c    . The latter holds if and only if there exists
N   such that A c   N . But A c is the event that all tosses are tails, which of
course cannot be resolved by the information  N accumulated after just N tosses.
We conclude that   is not a σ algebra. In particular, we have shown that   is
not in general closed with respect to the countable union of its elements. However
it is easy to show that   is closed with respect to the finite union of its elements,
and in addition satisfies the properties (i), (ii) in Definition 1.2.1. This set of
properties makes   an algebra. To complete the construction of the probability
space for the  coin tosses experiment, we need the following deep result.
Theorem 1.4.1 (Caratheodory’s theorem). Let U be an algebra of subsets of  and
N
P 0 : U  0, 1 a map satisfying P 0   1 and P 0  Ni1 Ai   i1 P 0 Ai, for every finite
collection A 1 , , A N   U of disjoint sets. Then there exists a unique probability
measure P on  U such that PA  P 0 A, for all A  U.

Remark 1.4: P 0 is called a pre-measure.


Hence the map P 0 :    0, 1 defined above extends uniquely to a probability
measure P defined on      . The resulting triple   , , P defines the probability
space for the  tosses experiment.
Exercises
Exercise 1.1 Let  be a σ algebra. Show that    and that  k A k  , for all
countable families A k  k   of events.
Exercise 1.2 Let   1, 2, 3, 4, 5, 6 be the sample space of a dice roll. Which of the
following sets of events are σ algebras on ?
1. , 1, 2, 3, 4, 5, 6, ,
2. , 1, 2, 1, 2, 1, 3, 4, 5, 6, 2, 3, 4, 5, 6, 3, 4, 5, 6, ,
3. , 2, 1, 3, 4, 5, 6, .
Exercise 1.3 () Prove that the intersection of any number of σ algebras (including
uncountably many) is a σ algebra. Show with a counterexample that the union of two
σ algebras is not necessarily a σ algebra.
Exercise 1.4 Show that when O is a partition, the σ algebra generated by O is given
by the set of all subsets of  which can be written as the union of sets in the partition

9
O (plus the empty set, of course).
Exercise 1.5 Find the partition of   1, 2, 3, 4, 5, 6 that generates the σ algebra 2 in
Exercise 1.2 .
Exercise 1.6 () Prove the following properties:
1. PA c   1  PA;
2. PA  B  PA  PB  PA  B;
3. If A  B, then PA  PB.
Exercise 1.7 (Continuity of probability measures (*)). Let A k  k   such that
A k  A k1 , for all k  . Let A   k A k . Show that
lim PA k   PA.
k

Similarly, if now A k  k   such that A k1  A k , for all k   and A   k A k , show
that
lim PA k   PA.
k

Exercise 1.8 () Prove that  ω N Pω  1, where Pω is given by (1.2).
Exercise 1.9 () Given a fair coin and assuming N is odd, consider the following two
events A, B   N :
A  “the number of heads is greater than the number of tails”,
B  “The first toss is a head”.
Use your intuition to guess whether the two events are independent.

2. Random variables and stochastic processes


Throughout this chapter we assume that , , t t0 , P is a given filtered probability
space.
2.1 Random variables
In many applications of probability theory, and in financial mathematics in particular, one
is more interested in knowing the value attained by quantities that depend on the
outcome of the experiment, rather than knowing which specific events have occurred.
Such quantities are called random variables.
Definition 2.1.1 A map X :    is called a (real-valued) random variable if
X  U  , for all U  B, where
X  U  ω   : Xω  U
is the pre-image of the Borel set U. If there exists c   such that Xω  c almost
surely, we say that X is a deterministic constant.

Occasionally we shall also need to consider complex-valued random variables. These


are defined as the maps Z :    of the form Z  X  iY , where X, Y are
real-valued random variables and i is the imaginary unit i 2  1. Similarly a vector

10
valued random variable X  X 1 , , X N  :    N can be defined by simply requiring
that each component X j :    is a random variable in the sense of Definition 2.1.1.
Remark 2.1 (Notation). A generic real-valued random variable will be denoted by X.
If we need to consider two such random variables we will denote them by X, Y , while
N real-valued random variables will be denoted by X 1 , , X N . Note that
X 1 , , X N  :    N is a vector-valued random variable. The letter Z is used for
complex-valued random variables.
Remark 2.2. Equality among random variables is always understood to hold up to a null
set. That is to say,X  Y always means X  Y a.s., for all random variables
X, Y :   .

Random variables are also called measurable functions, but we prefer to use this
terminology only when    and   BR. Measurable functions will be denoted
by small Latin letters (e.g., f, g, ). If X is a random variable and Y  fX for some
measurable function f, then Y is also a random variable. We denote
PX  U  PX  U the probability that X takes value in U  BR. Moreover,
given two random variables X, Y :    and the Borel sets U, V , we denote
PX  U, Y  V  PX  U  Y  V,
which is the probability that the random variable X takes value in U and Y takes value
in V . The generalization to an arbitrary number of random variables is straightforward.
As the value attained by X depends on the result of the experiment, random variables
carry information, i.e., upon knowing the value attained by X we know something about
the outcome ω of the experiment. For instance, if Xω  1 ω , where ω is the result
of tossing a dice, and if we are told that X takes value 1, then we infer immediately that
the dice roll is even. The information carried by a random variable X forms the
σ algebra generated by X, whose precise definition is the following.
Definition 2.1.2 Let X :    be a random variable. The σ algebra generated by
X is the collection σX   of events given by
σX  A   : A  X  U, for some U  BR.
If G   is another σ algebra of subsets of  and σX  G, we say that X is G
measurable. If Y :    is another random variable and σY  σX, we say that
Y is X measurable

Thus σX contains all the events that are resolved by knowing the value of X. The
interpretation of X being G measurable is that the information contained in G suffices
to determine the value taken by X in the experiment. Note that the σ algebra generated
by a deterministic constant consists of trivial events only.
Definition 2.1.3 The σ algebra σX, Y generated by two random variables X, Y :   
is the smallest σ algebra containing σX  σY, that is to say σX, Y   O , where
O  σX  σY, and similarly for any number of random variables.

11
If Y is X measurable then σX, Y  σX, i.e., the random variable Y does not add
any new information to the one already contained in X. Clearly, if Y  fX for some
measurable function f, then Y is X measurable. It can be shown that the opposite is
also true: if σY  σX, then there exists a measurable function f such that Y  fX.
The other extreme is when X and Y carry distinct information, i.e., when σX  σY
consists of trivial events only. This occurs in particular when the two random variables
are independent.
Definition 2.1.4 Let X :    be a random variable and G   be a subσ algebra.
We say that X is independent of G if σX and G are independent. Two random variables
X, Y :    are said to be independent random variables if the σ algebras σX and
σY are independent. More generally, the random variables X 1 , , X N are independent if
σX1, . . . , σXN are independent σ-algebras.

In the intermediate case, i.e., when Y is neither X measurable nor independent of X, it


is expected that the knowledge on the value attained by X helps to derive information on
the values attainable by Y . We shall study this case in the next chapter.
Theorem 2.1.5 Let X 1 , , X N be independent random variables. Let us divide the set
X 1 , , X N  into m separate groups of random variables, namely, let
X 1 , , X N   X k 1  k 1 I 1  X k 2  k 2 I 2    X k m  k m I m ,
where I 1 , I 2 , , I m  is a partition of 1, , N. Let n i be the number of elements in the
set I i , so that n 1  n 2    n m  N. Let g 1 , . . . , g m be measurable functions such that
g i :  n i  . Then the random variables
Y 1  g 1 X k 1  k 1 I 1 , Y 2  g 2 X k 2  k 2 I 2 , Y m  g m X k m  k m I m 
are independent.

For instance, in the case of N  2 independent random variables X 1 , X 2 , Theorem 2.1.5


asserts that Y 1  gX 1  and Y 2  fX 2  are independent random variables, for all
measurable functions f, g :   .
Simple and discrete Random Variables
A special role is played by simple random variables. The simplest possible one is the
indicator function of an event: Given A  , the indicator function of A is the random
variable that takes value 1 if ω  A and 0 otherwise, i.e.,
1, ω  A,
I A ω 
0, ω  A c .
Obviously, σI A   A, A c , , .
Definition 2..1.6 Let A k  k1, ,N   be a family of disjoint events and a 1 , , a N be
distinct real numbers. The random variable

12
N
X  akIA k
k1

is called a simple random variable. If N   in this definition, we call X a discrete


random variable.

Thus a simple random variable X can attain only a finite number of values, while a
discrete random variable X attains countably infinite many values. In both cases we
have
0, if x  ImX,
PX  x 
PAk, if x  ak,
where ImX  x   : Xω  x, for some ω   is the image of X. We remark that
most references do not assume, in the definition of simple random variable, that the sets
A 1 , , A N should be disjoint. We do so, however, because all simple random variables
considered in these notes satisfy this property and because the sets A 1 , , A N can
always be re-defined in such a way that they are disjoint, without modifying the image
of the simple random variable, see Exercise 2.5. Similarly the condition that
a 1 , , a N should be distinct can be removed from the definition of simple random
variable. Let us see two examples of simple/discrete random variables that appear
in financial mathematics (and in many other applications). A simple random variable
X is called a binomial random variable if
 RangeX  0, 1, . . . , N;

N
 There exists p  0, 1 such that PX  k  p k 1  p Nk , k  1, , N.
k
For instance, if we let X to be the number of heads in a N toss, then X is binomial.
A widely used model for the evolution of stock prices in financial mathematics
assumes that the price of the stock at any time is a binomial random variable
(binomial asset pricing model). A discrete random variable X is called a
Poisson variable if
 RangeX    0;
µ k e µ
 There exists µ  0 such that PX  k 
k!
, k  0, 1, 2, .
We denote by Pµ the set of all Poisson random variables with parameter µ  0.

The following important theorem shows that all non-negative random variables
can be approximated by a sequence of simple random variables.
Theorem 2.1.7 Let X :   0,  be a random variable and let n   be given. For
k  0, 1, . . . n2 n  1, consider the sets
A k,n : X  kn , k n 1
2 2

13
and for k  n2 n let
A n2 n , n  X  n.
Note that A k,n  k0, , n2 n is a partition of , for all fixed n  . Define the simple random
variables
n2 n
s Xn ω   k IA ω.
2 n k,n
k0

Then 0  s X1 ω  s X2 ω


 s Xn ω
 s Xn1 ω    Xω, for all ω   (i.e., the
sequence s Xn  n is non-decreasing) and
lim s X ω
n n
 Xω, for all ω  .

2.2 Distribution and probability density functions


Definition 2.2.1 The (cumulative) distribution function of the random variable
X :    is the non-negative function F X :   0, 1 given by F X x  PX  x. Two
random variables X, Y are said to be identically distributed if F X  F Y .
Definition 2.2.2 A random variable X :    is said to admit the probability density
function (pdf) f X :   0,  if f X is integrable on  and
x
F X x    f X ydy. 2. 1

Note that if f X is the pdf of a random variable, then necessarily


 f X xdx  x
lim F X x  1.

All probability density functions considered in these notes are continuous, and therefore
the integral in (2.1) can be understood in the Riemann sense. Moreover in this case F X
is differentiable and we have
f X  dF X .
dx
If the integral in (2.1) is understood in the Lebesgue sense, then the density f X can be a
quite irregular function. In this case, the fundamental theorem of calculus for the
Lebesgue integral entails that the distribution F X x satisfying (2.1) is absolutely
continuous, and so in particular it is continuous. Conversely, if F X is absolutely
continuous, then X admits a density function. We remark that, regardless of the
notion of integral being used, a simple (or discrete) random variable X cannot admit
a density in the sense of Definition 2.2.2, unless it is a deterministic constant.
N
Suppose in fact that X   k1 a k I A k is not a deterministic constant. Assume that
a 1  maxa 1 , , a N . Then
lim F X x  PA 2     PA N   1,
xa 1

while

14
lim F X x  1  F X a 1 .
xa 1

It follows that F X x is not continuous, and so in particular it cannot be written in the
form (2.1). To define the pdf of simple random variables, let
N
X  akIA , k
k1

where without loss of generality we assume that the real numbers a 1 , , a N are
distinct and the sets A 1 , , A N are disjoint (see Exercise 2.5). The distribution function
of X is
F X x  PX  x   PX  a k . 2. 2
a k x

In this case the probability density function f X x is defined as


PX  x, if x  a k for some k
f X x  2. 3
0, otherwise
and thus, with a slight abuse of notation, we can rewrite (2.2) as
F X x   f X y, 2. 4
yx

which extend (2.1) to simple random variables. We remark that it is possible to unify the
definition of pdf for continuum and discrete random variables by writing the sum (2.4) as
an integral with respect to the Dirac measure, but we shall not do so.
We shall see that when a random variable X admits a density f X , all the relevant
statistical information on X can be deduced by f X . We also remark that often one can
prove the existence of the pdf f X without however being able to derive an explicit formula
for it. For instance, f X is often given as the solution of a partial differential equation,
or through its (inverse) Fourier transform, which is called the characteristic function of
X, see Section 3.3. Some examples of density functions, which have important
applications in financial mathematics, are the following.
Examples of probability density functions
 A random variable X :    is said to be a normal (or normally
distributed)
random variable if it admits the density
xm 2
f X x  1 e 2σ 2 ,
2πσ 2
for some m   and σ  0, which are called respectively the expectation (or
mean) and the deviation of the normal random variable X, while σ 2 is called
the
variance of X. We denote by Nm, σ 2  the set of all normal random variables
with
expectation m and variance σ 2 . If m  0 and σ 2  1, X  N0, 1 is said to be a
standard normal variable. The density function of standard normal random

15
variables is denoted by , while their distribution is denoted by Φ, i.e.,
x y2
x  1 e  2 , Φx  1  e  2 dy.
x2

2π 2π 
 A random variable X :    is said to be an exponential (or exponentially
distributed) random variable if it admits the density
f X x  λe λx I x0 ,
for some λ  0, which is called the intensity of the exponential random variable
X.
We denote by λ the set of all exponential random variables with intensity
λ  0.
The distribution function of an exponential random variable X with intensity λ is
given by
x x
F X x    f X ydy  λ  0 e λy dy  1  e λx .
 A random variable X :    is said to be chi-squared distributed if it admits
the
density
δ/21 x/2
f X x  xδ/2 e I x0 ,
2 Γδ/2
 for some δ  0, which is called the degree of the chi-squared distributed
random

variable. Here Γt   z t1 e z dz, t  0 is the Gamma-function. Recall the
0
relation
Γn  n  1!
2
 for n  . We denote by χ δ the set of all chi-squared distributed random
variables with degree δ.
 A random variable X :    is said to be non-central chi-squared
distributed
with degree δ  0 and non-centrality parameter β  0 if it admits the density
δ
 12
f X x  1 e  2 x
xβ 4
I δ/21 βx I x0 , 2. 5
2 β
where I ν y denotes the modified Bessel function of the first kind. We denote
by
χ 2 δ, β the random variables with density (2.5). It can be shown that
χ 2 δ, 0  χ 2 δ.
 A random variable X :    is said to be Cauchy distributed if it admits the
density
γ
f X x 
πx  x 0  2  γ 2 
 for x 0   and γ  0 , called the location and the scale of the Cauchy pdf.
 A random variable X :   R is said to be Levy distributed if it admits the
density

16
 c

f X x  c e 2xx0  I x  x 0 ,
2π x  x 0  3/2
 for x 0   and c  0, called the location and the scale of the Levy pdf.
If a random variable X admits a density f X , then for all (possibly unbounded) intervals
I   then
PX  I  I f X ydy. 2. 6

It can be shown that (2.6) extends to


PgX  I  x:gxI f X xdx, 2. 7

for all measurable functions g :   . For example, if X  N0, 1,


1
PX 2  1  P1  X  1   1 xdx  0. 683,
which means that a standard normal random variable has about 68. 3% chances to take
value in the interval 1, 1.

2.2.1 Random variables with boundary values


Random variables in mathematical finance do not always admit a density in the classical
sense described above (or in any other sense), and the purpose of this section is to
present an example when one has to consider a generalized notion of density function.
Suppose that X takes value on the semi-open interval 0, . Then clearly F X x  0
for x  0, F X 0  PX  0, while for x  0 we can write
F X x  PX  x  P0  X  x  PX  0  P0  X  x.
Now assume that F X is differentiable on the open set x  0, . Then there exists a
x
function f X x, x  0, such that F X x  F X 0   f X tdt. Hence, for all x  
0
we find
x
F X x  p 0 Hx   f X tI t0 dt,


where p 0  PX  0 and Hx is the Heaviside function, i.e., Hx  1 if x  0, Hx  0
if x  0. By introducing the delta-distribution through the formal identity
H  x  δx 2. 8
then we obtain, again formally, the following expression for the density function
dFXx
f X x   p 0 δx  f X x. 2. 9
dx
The formal identities 2. 8  2. 9 become rigorous mathematical expressions when
they are understood in the sense of distributions. We shall refer to the term p 0 δx as
the discrete part of the density. The function f X is also called the defective density
of the random variable X. Note that

 0 f X xdx  1  p 0 .

17
The defective density is the actual pdf of X if and only if p 0  0.
The typical example of financial random variable whose pdf may have a discrete part
is the stock price St at time t. For simple models (such us the geometric Browniam
motion 2. 14 defined in Section 2. 4 below), the stock price is strictly positive a.s. at all
finite times and the density has no discrete part. However for more sophisticated models
the stock price can reach zero with positive probability at any finite time and so the pdf
of the stock price admits a discrete part PSt  0δx. Hence these models take into
account the risk of default of the stock.

2.2.2 Joint distribution


If two random variables X, Y :    are given, how can we verify whether or not they
are independent? This problem has a simple solution when X, Y admit a joint distribution
density.
Definition 2.2.3 The joint (cumulative) distribution F X,Y :  2  0, 1 of two random
variables X, Y :    is defined as
FX, Yx, y  PX  x, Y  y.
The random variables X, Y are said to admit the joint (probability) density function
f X,Y :  2  0,  if f X,Y is integrable in  2 and
x y
F X,Y x, y      f X,Yη, ξdηdξ. 2. 10

Note the formal identities


 2 F X,Y
f X,Y 
xy
,  2
f X,Y x, ydxdy  1.

Moreover, if two random variables X, Y admit a joint density f X,Y , then each of them
admits a density (called marginal density in this context) which is given by
f X x   f X,Yx, ydy, f Y y   f X,Yx, ydx.
To see this we write
x x
PX  x  PX  x, Y  R     f X,Yη, ξdηdξ    f X ηdη
and similarly for the random variable Y
y y
PY  y  PY  y, X  R     f X,Yη, ξdηdξ    f Yξdξ
If W  gX, Y, for some measurable function g, and I   is an interval, the analogue
of (2.7) in 2 dimensions holds, namely:
PgX, Y  I  x,y:gx,yI f X,Yx, ydxdy.
As an example of joint pdf, let m  m 1 , m 2    2 and C  C ij  i,j1,2 be a 2  2 positive
definite, symmetric matrix. Two random variables X, Y :    are said to be jointly
normally distributed with mean m and covariance matrix C if they admit the joint
density

18
f X,Y x, y  1 exp 1 z  m  C 1  z  m T , 2. 11
2
2π det C 2

where z  x, y, “” denotes the row by column product, C 1 is the inverse matrix of C
and v T is the transpose of the vector v.
In the next theorem we establish a simple condition for the independence of two random
variables which admit a joint density.
Theorem 2.2.4 The following holds.
(i) If two random variables X, Y admit the densities f X , f Y and are independent, then
they admit the joint density
f X,Y x, y  f X xf Y y.
(ii) If two random variables X, Y admit a joint density f X,Y of the form
f X,Y x, y  uxvy,
for some functions u, v :   0, , then X, Y are independent and admit the densities
f X , f Y given by
f X x  cux, f Y y  1c vy,
where
1
c  vxdx   uydy .

Proof (i) We have


F X,Y x, y  PX  x, Y  y  PX  xPY  y
x y
   f X ηdη   f Yξdξ
x y
     f X ηf Yξdηdξ
(ii) We first write
X  x  X  x    X  x  Y    X  x, Y  .
Hence,
x  x x
PX  x      f X,Yη, ydydη    uηdη  vydy    cuηdη,
where c   vydy. Thus X admits the density f X x  cux.

Similarly
Y  y    Y  y  X    Y  y  X  , Y  y.
Hence,
y  y y
PY  y      f X,Yη, ξdxdξ    vξdξ  uxdx    c  vξdξ
where c    uxdx. Thus Y admits the density f Y y  c  vy. Since

1   f X,Yx, ydxdy   uxdx  vydy  c  c,

19
then c   1/c. It remains to prove that X, Y are independent. This follows by
PX  U, Y  V  U V f X,Yx, ydxdy  U uxdx V vydy
 U cuxdx V 1 vydy 
c U f X xdx V f Yydy
 PX  UPY  V, for all U, V  BR.
Remark 2.3 By Theorem 2.2.4 and the result of Exercise 12, we have that two jointly
normally distributed random variables are independent if and only if ρ  0 in the formula
(2.12).

2.3 Stochastic processes


Definition 2.3.1 A stochastic process is a one-parameter family of random variables,
which we denote by Xt t0 , or by Xt t0,T if the parameter t is restricted to the
interval 0, T, T  0. Hence, for each t  0, Xt :    is a random variable. We
denote by Xt, ω the value of Xt on the sample point ω  , i.e., Xt, ω  Xtω.
For each ω   fixed, the curve γ ωX :   , γ ωX t  Xt, ω is called the ω  path of
the stochastic process and is assumed to be a measurable function. If the paths of a
stochastic process are all almost surely equal (i.e., independent of ω), we say that the
stochastic process is a deterministic function of time.

The parameter t will be referred to as time parameter, since this is what it represents
in the applications in financial mathematics. Examples of stochastic processes in
financial mathematics are given in the next section.
Definition 2.3.2 Two stochastic processes Xtt  0, Ytt  0 are said to be
independent if for all m, n   and 0  t 1  t 2    t n , 0  s 1  s 2    s m , the
σ  algebras σXt 1 , , Xt n , σYs 1 , , Ys m  are independent.

Hence two stochastic processes Xt t0 , Yt t0 are independent if the information
obtained by “looking” at the process Xt t0 up to time T is independent of the
information obtained by “looking” at the process Yt t0 up to time S, for all
S, T  0. Similarly one defines the notion of several independent stochastic processes.
Remark 2.4 (Notation). If t runs over a countable set, i.e., t  t k  k , then a stochastic
process is equivalent to a sequence of random variables X 1 , X 2 , , where X k  Xt k . In
this case we say that the stochastic process is discrete and we denote it by X k  kN . An
example of discrete stochastic process is the random walk defined below.

A special role is played by step processes: given 0  t 0  t 1  t 2  , a step process


is a stochastic process t t0 of the form

20

t, ω   X k ωI t , t k k1 
.
k0

Figure 2.1: The Path ω  ω  of a step process

A typical path of a step process is depicted in Figure 2.3. Note that the paths of a step
process are right-continuous, but not left-continuous. Moreover, since X k ω  t k , ω,
we can rewrite t as

t   t k It k , t k1 .
k

In the same way as a random variable generates a σ-algebra, a stochastic process


generates a filtration. Informally, the filtration generated by a stochastic process
Xt t0 contains the information accumulated by looking at the process for longer
and longer periods of time.
Definition 2.3.3 The filtration generated by the stochastic process Xt t0 is given
by  X tt  0, where
 X t   O t, Ot   0st σXs.

Hence  X t is the smallest σ-algebra containing σXs, for all 0  s  t, see
Definition 1.1.2. Similarly one defines the filtration  X,Y t t0 generated by two
stochastic processes Xt t0 , Yt t0 , as well as the filtration generated by
any number of stochastic processes.
Definition 2.3.4 If t t0 is a filtration and  X t  t, for all t  0, we
say that the stochastic process Xt t0 is adapted to the filtration t t0 .

21
The property of Xtt  0 being adapted to t t0 means that the information
contained in t suffices to determine the value attained by the random variable
Xs, for all s  0, t. Clearly, Xt t0 is adapted to its own generated filtration
 X tt  0. Moreover if Xt t0 is adapted to t t0 and Yt  fXt, for
some measurable function f, then Yt t0 is also adapted to t t0 .

Next we give an example of (discrete) stochastic process. Let Xt t be a sequence
of independent random variables satisfying
X t  1 with probability 1 , X t  1 with probability 1 ,
2 2
for all t  . For a concrete realization of these random variables, we may think of X t
as being defined on the sample space  of the -coin tosses experiment (see
Section 1.4). In fact, letting ω  γ j  j    , we may set
1, if γ t  H,
X t ω 
1, if γ t  T.
Hence X t :   1, 1 is the simple random variable X t ω  I A t  I A ct , where
A t  ω    : γ t  H. Clearly,  X t is the collection of all the events that are
resolved by the first t  tosses, which is given as indicated at the beginning of
Section 1.3.
Definition 2.3.5 The stochastic process M t  t given by
t
M 0  0, Mt   Xk,
k1

is called symmetric random walk.

To understand the meaning of the term “random walk”, consider a particle moving on
the real line in the following way: if X t  1 (i.e., if the toss number t is a head), at time t
the particle moves one unit of length to the right, if X t  1 (i.e., if the toss number t is
a head) it moves one unit of length to the left. Then M t gives the total amount of units of
length that the particle has travelled to the right or to the left up to time t.
The increments of the random walk are defined as follows. If k 1 , , k N    N , such
that 1  k 1  k 2    k N , we set
 1  M k 1  M 0  M k 1 ,  2  M k 2  M k 1 , ,  N  M k N  M k N1 .
Hence  j is the total displacement of the particle from time k j1 to time k j .
Theorem 2.3.6 The increments  1 , ,  N of the random walk are independent random
variables.
Proof Since

22
 1  X 1    X k 1  g 1 X 1 , . . . , X k 1 ,
 2  X k 1 1    X k 2  g 2 X k 1 1 , . . . , X k 

 N  X k N 1  1  · · ·  X k N  g N X k N1 1  · · ·  X k N ,
the result follows by the fact that, given n i as the number of elements in the set I i so
that n 1  n 2    n m  N, where the set X 1 , , X N  has been divided into m separate
groups of random variables, and g 1 g N are measurable functions, then the increments
 1 , ,  N are indpendent.

The interpretation of this result is that the particle has no memory of past movements:
the distance travelled by the particle in a given interval of time is not affected by the
motion
of the particle at earlier times.

We may now define the most important of all stochastic processes.


Definition 2.3.7 A Brownian motion (or Wiener process) is a stochastic process
Wt t0 such that
(i) The paths are continuous and start from 0 almost surely, i.e., the sample points
ω   such that γ ωW 0  0 and γ ωW is a continuous function comprise a set of
probability 1;
(ii) The increments over disjoint time intervals are independent, i.e., for all
0  t 0  t 1    t m  0, , the random variables
Wt 1   Wt 0 , Wt 2   Wt 1 , , Wt m   Wt m1 
are independent;
(iii) For all s  t, the increment Wt  Ws belongs to N0, t  s.

It can be shown that Brownian motions exist. In particular, it can be shown that the
sequence of stochastic processes Wnt t0 , n  N, defined by
W n t  1 Mnt, 2. 12
n
where M t is the symmetric random walk and z denotes the integer part of z,
converges to a Brownian motion. Therefore one may think of a Brownian motion as
a time-continuum version of a symmetric random walk which runs for an infinite
number of “infinitesimal time steps”. In fact, provided the number of time steps is
sufficiently large, the process Wnt t0 gives a very good approximation of a
Brownian motion, which is useful for numerical computations. Notice that there exist
many Brownian motions and each of them may have some specific properties besides

23
those listed in Definition 2.3.7. However, as long as we use only the properties i  iii,
we do not need to work with a specific example of Brownian motion.
Once a Brownian motion is introduced it is natural to require that the filtration Ft t0
should be somehow related to it. For our future financial applications, the following
class of filtrations will play a fundamental role.
Definition 2.3.8 Let Wt t0 be a Brownian motion and denote by σ  Wt the
σ  algebra generated by the increments {W(s) - W(t); s  t}, that is
σ  Wt   Ot , Ot   st σWs  Wt.
A filtration t t0 is said to be a non-anticipating filtration for the Brownian motion
Wt t0 if Wt t0 is adapted to t t0 and if the σ  algebras σ  Wt, t are
independent for all t  0.

The meaning is the following: the increments of the Brownian motion after time t are
independent of the information available at time t in the σ-algebra t. It is clear by the
previous definition that  W t t0 is a non-anticipating filtration for Wt t0 . We shall
see later that many properties of Brownian motions that depend on  W t t0 also holds
with respect to any non-anticipating filtration (e.g., the martingale property).
Another important example of stochastic process applied in financial mathematics is the
following.
Definition 2.3.9 A Poisson process with rate λ is a stochastic process Nt t0 such
that
(i) N0  0 a.s.;
(ii) The increments over disjoint time-intervals are independent;
(iii) For all s  t, the increment Nt  Ns belongs to Pλt  s.

Note in particular that Nt is a discrete random variable, for all t  0, and that, in
contrast to the Brownian motion, the paths of a Poisson process are not continuous.
The Poisson process is the building block to construct more general stochastic
processes with jumps, which are very popular nowadays as models for the price of
certain financial assets.

2.4 Stochastic processes in financial mathematics


All variables in financial mathematics are represented by stochastic processes.
The most obvious example is the price (or value) of financial assets. The stochastic
process representing the price per share of a generic asset at different times will be
denoted by Πtt  0. Depending on the type of asset considered, we use a different
specific notation for the stochas tic process modeling its price.
Remark 2.5 We always assume that t  0 is earlier or equal to the present time. In
particular, the value of all financial variables is known at time t  0. Hence, if Xtt  0

24
is a stochastic process modelling a financial variable, then X(0) is a deterministic
constant.

Before presenting various examples of stochastic processes in financial mathematics,


let us introduce an important piece of terminology. An investor is said to have a short
position on an asset if the investor profits from a decrease of its price, and a long
position if the investor profits from an increase of the price of the asset. The specific
trading strategy that leads to a short or long position on an asset depends on the type of
asset considered, as we are now ready to describe in more details.
Stock price
The price per share a time t of a stock will be denoted by St. Typically St  0, for
all t  0, however, as discussed in Section 2.2.1, some models allow for the
possibility that St  0 with positive probability at finite times t  0 (risk of default).
Clearly St t0 is a stochastic process. If we have several stocks, we shall denote
their price by S 1 t t0 , S 2 t t0 , etc. Investors who own shares of a stock are
those having a long position on the stock, while investors short-selling the stock
hold a short position. We recall that short selling a stock is the practice to sell the
stock without actually owning it. Concretely, an investor is short-selling N shares of
a stock if the investor borrows the shares from a third party and then sell them
immediately on the market. The reason for short-selling assets is the expectation
that the price of the asset will decrease. If this is the case, then upon re-purchasing
the N shares in the future, and returning them to the lender, the short-seller
will profit from the lower current price of the asset compared to the price at the time of
short-selling.

The most popular model for the price of a stock is the geometric Brownian motion
stochastic process, which is given by
St  S0expαt  σWt. 2. 13
Here Wt t0 is a Brownian motion, α   is the instantaneous mean of log-return,
σ  0 is the instantaneous volatility, while σ 2 is the instantaneous variance of the
stock. Note that α and σ are constant in this model. Moreover, S0 is the price at time
t  0 of the stock, which, according to Remark 2.5, is a deterministic constant.

Risk-free assets
A money market is a market in which the object of trading is money. More precisely,
a money market is a type of financial market where investors can borrow and lend
money at a given interest rate and for a period of timeT  1 year. Assets in the
money market (i.e., short term loans) are assumed to be risk-free, which means that
their value is always increasing in time. Examples of risk-free assets in the money
market are repurchase agreements (repo), certificates of deposit, treasure bills, etc.

25
The stochastic process corresponding to the price per share of a generic risk-free
asset will be denoted by Btt  0, T. The instantaneous interest rate of a
risk-free asset is a stochastic process Rt t0,T such that Rt  0, for all t  0, T,
and such that the value of the asset at time t is given by
t
Bt  B0exp  Rsds, t  0, T. 2. 14
0

This corresponds to the investor debit/credit with the money market at time t if the
amount B0 is borrowed/lent by the investor at time t  0. An investor lending
(resp. borrowing) money has a long (resp. short) position on the risk-free asset
(more precisely, on its interest rate). We remark that the integral in the right hand side
of (2.14) is to be evaluated path by path, i.e.,
t
Bt, ω  B0exp  Rs, ωds,
0

for all fixed ω  . Although in the real world different risk-free assets have different
interest rates, throughout these notes we make the simplifying assumption that all
assets in the money market have the same instantaneous interest rateRt t0,T ,
which we call the interest rate of the money market. For the applications in options
pricing theory it is common to assume that the interest rate of the money market is
a deterministic constant Rt  r, for all t  0, T. This assumption can be justified
by the relatively short time of maturity of options, see below.
Remark 2.6 The (average) interest rate of the money market is sometimes referred to
as “the cost of money”, and the ratio Bt/B0 is said to express the “time-value of
money”. This terminology is meant to emphasise that one reason for the
“time-devaluation” of money—in the sense that the purchasing power of money
decreases with time—is precisely the fact that money can grow interests by purchasing
risk-free assets.

The discounting process


The stochastic process Dt t0 given by
t B0
Dt  exp   Rsds 
0 Bt
is called the discounting process. In general, if an asset price is multiplied by Dt,
the new stochastic process is called the discounted price of the asset. We denote
the discounted price by adding a subscript  to the asset price. For instance, the
discounted price of a stock with price St at time t is given by
S  t  DtSt.
Its meaning is the following: S  t is the amount that should be invested on the money
market at time t  0 in order that the value of this investment at time t replicates the
value of the stock at time t. Notice that S  t  St. The discounted price of the stock
measures, roughly speaking, the loss in the stock value due to the “time-devaluation”
of money discussed above, see Remark 2.6.

26
Financial derivative
A financial derivative (or derivative security) is a contract whose value depends on
the performance of one (or more) other asset(s), which is called the underlying asset.
There exist various types of financial derivatives, the most common being options,
futures, forwards and swaps. Financial derivatives can be traded over the counter
(OTC), or in a regularised market. In the former case, the contract is stipulated
between two individual investors, who agree upon the conditions and the price of
the contract. In particular, the same derivative (on the same asset, with the same
parameters) can have two different prices over the counter. Derivatives traded in
the market, on the contrary, are standardized contracts. Anyone, after a proper
authorisation, can make offers to buy or sell derivatives in the market, in a way
much similar to how stocks are traded. Let us see some examples of financial
derivative

A call option is a contract between two parties, the buyer (or owner) of the call
and the seller (or writer) of the call. The contract gives to the buyer the right, but
not the obligation, to buy the underlying asset at some future time for a price agreed
upon today, which is called strike price of the call. If the buyer can exercise this
option only at some given time t  T  0 (where t  0 corresponds to the time at
which the contract is stipulated) then the call option is called European, while if the
option can be exercised at any time in the interval 0, T, then the option is called
American. The time T  0 is called maturity time, or expiration date of the call.
The seller of the call is obliged to sell the asset to the buyer if the latter decides to
exercise the option. If the option to buy in the definition of a call is replaced by the
option to sell, then the option is called a put option.

In exchange for the option, the buyer must pay a premium to the seller. Suppose
that the option is a European option with strike price K, maturity time T and premium
Π 0 on a stock with price St at time t. In which case is it then convenient for the buyer
to exercise the call? Let us define the payoff of a European call as
Y  ST  K  : max0, ST  K,
i.e., Y  0 if the stock price at the expiration date is higher than the strike price of the
call and it is zero otherwise; similarly for a European put we set
Y  K  ST  .
Note that Y is a random variable, because it depends on the random variable ST.
Clearly, if Y  0 it is more convenient for the buyer to exercise the option rather than
buying/selling the asset on the market. Note however that the real profit for the buyer
is given by NY  Π 0 , where N is the number of option contracts owned by the buyer.

27
Typically, options are sold in stocks of 100 shares, that is to say, the minimum amount
of options that one can buy is 100, which cover 100 shares of the underlying asset.

One reason why investors buy calls in the market is to protect a short position on
the underlying asset. In fact, suppose that an investor short-sells 100 shares of a
stock at time t  0 with the agreement to return them to the original owner at time
t 0  0. The investor believes that the price of the stock will go down in the future,
but of course the price may go up instead. To avoid possible large losses, at time
t  0 the investor buys 100 shares of an American call option on the stock expiring
at T  t 0 , and with strike price K  S0. If the price of the stock at time t0 is not
lower than the price S0 as the investor expected, then the investor will exercise
the call, i.e., will buy 100 shares of the stock at the price K  S0. In this way the
investor can return the shares to the lender with minimal losses. At the same
fashion, investors buy put options to protect a long position on the underlying asset.
The reason why investors write options is mostly to get liquidity (cash) to invest in
other assets.

Let us introduce some further terminology. A European call (resp. put) is said to be
in the money at time t if St  K (resp. St  K). The call (resp. put) is said to be
out of the money if St  K (resp. St  K). If St  K, the (call or put) option is said
to be at the money at time t. The meaning of this terminology is self-explanatory.

The premium that the buyer has to pay to the seller for the option is the price (or value)
of the option. It depends on time (in particular, on the time left to expiration). Clearly,
the deeper in the money is the option, the higher will be its price. Therefore the holder of
the long position on the option is the buyer, while the seller holds the short position on
the
option.
European call and put options are examples of more general contracts called European
derivatives. Given a function g : 0,   , a standard European derivative with
pay-off Y  gST and maturity time T  0 is a contract that pays to its owner the
amount Y at time T  0. Here ST is the price of the underlying asset (which we take
to be a stock) at time T. The function g is called pay-off function of the derivative.
The term “European” refers to the fact that the contract cannot be exercised before
time T, while the term “standard” refers to the fact that the pay-off depends only on
the price of the underlying at time T. The pay-off of a non-standard European derivative
depends on the path of the asset price during the interval [0, T]. For example, the
T
pay-off of an Asian call is given by Y   0 Stdt  K .

28
The price at time t of a European derivative (standard or not) with pay-off Y and
expiration date T will be denoted by Π Y t. Hence Π Y t t0,T is a stochastic
process. In addition, we now show that Π Y T  Y holds, i.e., there exist no offers
to buy (sell) a derivative for less (more) than Y at the time of maturity. In fact,
suppose that a derivative is sold for Π Y T  Y “just before” it expires at time T.
In this way the buyer would make the sure profit Y  Π Y T at time T, which
means that the seller would loose the same amount. On the contrary, upon
buying a derivative “just before” maturity for more than Y , the buyer would
loose Y  Π Y T. Thus in a rational market, Π Y T  Y (or, more precisely,
Π Y t  Y , as t  T).

A standard American derivative with pay-off function g is a contract which


can be exercised at any time t  0, T prior or equal to its maturity and that,
upon exercise, pays the amount gSt to the holder of the derivative.

Portfolio
The portfolio of an investor is the set of all assets in which the investor is
trading. Mathematically it is described by a collection of N stochastic processes
h 1 t t0 , h 2 t t0 , , h N t t0 ,
where h k t represents the number of shares of the asset k at time t in the
investor portfolio. If h k t is positive, resp. negative, the investor has a long,
resp. short, position on the asset k at time t. If Π k t denotes the value of the
asset k at time t, then Π k t t0 is a stochastic process; the portfolio value is
the stochastic process Vt t0 given by
N
Vt   h k tΠ k t.
k1

Remark 2.7 For modeling purposes, it is convenient to assume that an investor


can trade any fraction of shares of an asset, i.e., h k t :   , rather than
h k t :   .

The investor makes a profit in the time interval t 0 , t 1  if Vt 1   Vt 0 ; the
investor incurs in a loss in the interval t 0 , t 1  if Vt 1   Vt 0 . We now introduce
the important definition of arbitrage portfolio.
Definition 2.4.1 An arbitrage portfolio is a portfolio whose value Vt t0
satisfies the following properties, for some T  0:
(i) V0  0 almost surely;
(ii) VT  0 almost surely;
(iii) PVT  0  0.

29
Hence an arbitrage portfolio is a risk-free investment in the interval 0, T which
requires no initial wealth and with a positive probability to give profit. We remark
that the arbitrage property depends on the probability measure P. However, it is
clear that if two measures P and P are equivalent, then the arbitrage property
is satisfied with respect to P if and only if it is satisfied with respect to P. The
guiding principle to devise theoretical models for asset prices in financial
mathematics is to ensure that one cannot set-up an arbitrage portfolio by investing
on these assets (arbitrage-free principle).

Markets
A market in which the objects of trading are N risky assets (e.g., stocks) and M
risk-free assets in the money market is said to be “N  M dimensional”. Most of
these notes focus on the case of 1  1 dimensional markets in which we assume
that the risky asset is a stock. A portfolio invested in this market is a pair
h S t, h B tt  0 of stochastic processes, where h S t is the number of shares of
the stock and h B t the number of shares of the risk-free asset in the portfolio at time
t. The value of such portfolio is given by
Vt  h S tSt  h B tBt,
where St is the price of the stock (given for instance by (2.13)), while Bt is the value
at time t of the risk-free asset, which is given by (2.14).
Exercises
1. Prove that σX is a σ-algebra.
2. Show that when X, Y are independent random variables, then σX  σY
1. consists of trivial events only. Show that two deterministic constants are
always
independent. Finally assume Y  gX and show that in this case the two
random
variables are independent if and only if Y is a deterministic constant.
3. Prove Theorem 2.1.5 for the case N  2.
4. Let a random variable X have the form
M
X  bkIB , k
k1

for some non-zero b 1 , , b M   and B 1 , , B M  F. Show that there exists


a 1 , , a N   distinct and disjoint sets A 1 , , A N  F such that

30
N
X  akIa . k
k1

5. Show that
Pa  X  b  F X b  F X a.

Show also that F X is (1) right-continuous, (2) increasing and (3) lim x F X x  1.
6. Let X  N0, 1 and Y  X 2 . Show that Y  χ 2 1.
7. LetX  N0, 1 and Y  1 be independent. Compute PX  Y.
8. Derive the density of the geometric Brownian motion (2.14) and use the result to
show that PSt  0  0, i.e., a stock whose price is described by a geometric
Brownian motion cannot default.

3. Expectation
Throughout this chapter we assume that , F, Ft t0 , P is a given filtered
probability space.
3.1 Expectation and variance of random variables
Suppose that we want to estimate the value of a random variable X before the
experiment has been performed. What is a reasonable definition for our “estimate”
of X? Let us first assume that X is a simple random variable of the form
N
X  akIA , k
k1

for some finite partition A k  k1,...,N of  and real distinct numbers a 1 , , a N . In this
case, it is natural to define the expected value (or expectation) of X as
N N
EX   a k PA k    a k PX  a k .
k1 k1

That is to say, EX is a weighted average of all the possible values attainable by
X, in which each value is weighted by its probability of occurrence. This definition
applies also for N   (i.e., for discrete random variables) provided of course the
infinite series converges. For instance, if XPµ we have
 
k µ
EX   kPX  k   k µ k!e
k0 k0
  
µk µ r1 µr
 e µ   e µ   e µ µ   µ.
k  1! r! r!
k1 r0 r0

Now let X be a non-negative random variable and consider the sequence s Xn  n
of simple functions defined in Theorem 2.1.7 Recall that s Xn converges pointwise to
X as n  , i.e., s Xn ω  Xω, for all ω   Since

31
n2 n 1
Es Xn    k P k  X  k1
2n 2n 2n
 nPX  n, 3. 1
k1

it is natural to introduce the following definition.


Definition 3.1.1 Let X :   0,  be a non-negative random variable. We define
the expectation of X as
n2 n 1
EX  n
lim  k P k  X  k1
2n 2n 2n
 nPX  n, 3. 2
k1

i.e., EX  lim n Es Xn , where s X1 , s X2 ,  is the sequence of simple functions converging
pointwise to X.

We remark that the limit in (3.2) exists, because (3.1) is an increasing sequence,
although this limit could be infinity. When the limit is finite we say that X has
finite expectation. This happens for instance when X is bounded, i.e.,0  X  C a.s., for
some positive constant C.
Remark 3.1 (Monotone convergence theorem). It can be shown that that the limit (3.2)
is the same along any non-decreasing sequence of non-negative random variables that
converge pointwise to X, hence we can use any such sequence to define the
expectation
of a non negative random variable. This follows by the monotone convergence
theorem,
whose precise statement is the following: If X 1 , X 2 ,  . is a non-decreasing sequence of
non-negative random variables such that X n  X pointiwise a.s., then EX n   EX.
Remark 3.2 (Dominated convergence theorem). The sequence of simple random
variables
used to define the expectation of a non-negative random variable need not be
non-decreasing either. This follows by the dominated convergence theorem, whose
precise statement is the following: if X 1 , X 2 ,  is a sequence of non-negative random
variables such that X n  X, as n  , pointiwise a.s., and sup n X n  Y for some
non-negative random variable Y with finite expectation, then lim n EX n   EX.

Next we extend the definition of expectation to general random variables. For this
purpose we use that every random variable X :   R can be written as
X  X  X,
where
X  max0, X, X    minX, 0
are respectively the positive and negative part of X. Since X  are non-negative random
variables, then their expectation is given as in Definition 3.1.1
Definition 3.1.2. Let X :   R be a random variable and assume that at least one of

32
the
random variables X  , X  has finite expectation. Then we define the expectation of X as
EX  EX    EX  .
If X  have both finite expectation, we say that X has finite expectation or that it is an
integrable random variable. The set of all integrable random variables on  will be
denoted
by L 1 , or by L 1 , P if we want to specify the probability measure.

Remark 3.3 (Notation). Of course the expectation of a random variable depends on the
probability measure. If another probability measure P is defined on the σ algebra of
events
(not necessarily equivalent to P), we denote the expectation of X in P by EX.
Remark 3.4 (ExpectationLebesgue integral). The expectation of a random variable X
with respect to the probability measure P is also called the Lebesgue integral of X over

in the measure P and it is also denoted by
EX   XωdPω.
We shall not use this notation.

The following theorem collects some useful properties of the expectation:


Theorem 3.1.3 Let X, Y :    be integrable random variables. Then the following
holds:
(i) Linearity: For all α, β  , EαX  βY  αEX  βEY;
(ii) If X  Y a.s. then EX  EY;
(iii) If X  0 a.s., then EX  0 if and only if X  0 a.s.;
(iv) If X, Y are independent, then EXY  EXEY.
Proof For all claims, the argument of the proof is divided in three steps:
STEP 1: Show that it suffices to prove the claim for non-negative random variables.
STEP 2: Prove the claim for simple functions.
STEP 3: Take the limit along the sequences s Xn  n , s Yn  n of simple functions
converging to X, Y.
(i) Let X   fX, X   gX, and similarly for Y , where fs  max0, s,
gs   min0, s. Each of X  , Y  , X  , Y  , X  , Y   and X  , Y   is a pair of
independent (non-negative) random variables. Then, using X  X   X  , Y  Y   Y
we have that

33
EX  Y  EX   X    Y   Y
 EX   X    EY   Y
 EX   X    EY   Y
 EX  EY.

Hence, it suffices to prove the claim for non - negative random variables.
Next assume that X, Y are independent simple functions and write
N M
X  ajIA , j Y  bkIB . k
j1 k1

We have
N M
EX  Y  E   a j I A j    b k I B k
j1 k1
N M
 E   aj I Aj  E   bkI Bk
j1 k1
N M
 E  aj I Aj  E  bkIB k
j1 k1

 EX  EY.

Hence, the claim holds for simple functions. It follows that


Es Xn  n  s Yn  n   Es Xn  n   Es Yn  n 
 Es Xn  n   Es Yn  n .

As n   then Es Xn  n  s Yn  n   EX  Y and


Es Xn  n   Es Yn  n   EX  EY, hence
EX  Y  EX  EY.

(ii) If X  Y, let X   fX, X   gX, and similarly for Y , where fs  max0, s,
gs   min0, s. Each of X  , Y  , X  , Y  , X  , Y   and X  , Y   is a pair of
independent (non-negative) random variables. Then, using X  X   X  ,
Y  Y   Y, we have
EX  EX   X  
 EX    EX  
 EY    EY  , since X  Y  X   X   Y   Y   X   Y  and X   Y 
 EY   Y  
 EY

34
Hence, it suffices to prove the claim for non - negative random variables.
Next assume that X, Y are independent simple functions and write
N M
X  aj I Aj , Y  bkIB . k
j1 k1

We have
N
EX  E  ajIA j
j1
M N M
E  bkI Bk , since X  Y   aj I Aj   bkIB k
k1 j1 k1

 EY.

Hence, the claim holds for simple functions. It follows that


Es Xn  n   Es Yn  n , since X  Y.

Letting n   then s Xn  n  X and s Yn  n  Y, so that


EX  EY.

(iii) Let X   fX, X   gX, and similarly for Y , where fs  max0, s,
gs   min0, s. Each of X  , Y  , X  , Y  , X  , Y   and X  , Y   is a pair of
independent (non-negative) random variables.Then, using X  X   X  ,
Y  Y   Y, we have
EX  0  EX   X    0  X   X   0  X   X  .

So EX  0  X  0 a.s. Hence, it suffices to prove the claim for non - negative
random variables.
Next assume that X, is an independent simple functions and write
N
X  ajIA . j
j1

Then
N
EX  0  E  ajIA j  0  either a j  0 or I A j  0  X  0 a.s.
j1

It follows that
Es Xn  n   0.

Letting n   then s Xn  n  X, which implies EX  0  X  0 a.s.

35
(iv) Let X   fX, X   gX, and similarly for Y , where fs  max0, s,
gs   min0, s. Each of X  , Y  , X  , Y  , X  , Y   and X  , Y   is a pair of
independent (non-negative) random variables. Then, using X  X   X  ,
Y  Y   Y  and the linearity of the expectation, we find
EXY  EX   X  Y   Y  
 EX  Y    EX  Y    EX  Y    EX  Y  
 EX  EY    EX  EY    EX  EY    EX  EY  
 EX    EX  EY    EY    EXEY.
Hence it suffices to prove the claim for non-negative random variables. Next
assume that X, Y are independent simple functions and write
N M
X  ajIA , j Y  b k IB k .
j1 k1

We have
N M N M
XY    ajbkIA IB j k    a j b k I A B .
j k
j1 k1 j1 k1

Thus by linearity of the expectation, and since the events A j , B k are independent,
for all j, k, we have
N M N M
EXY    a j b k EI AjBk     a j b k PA j  B k 
j1 k1 j1 k1
N M N M
   a j b k PA j PB k    a j PA j   b k PB k   EXEY.
j1 k1 j1 k1

Hence the claim holds for simple functions. It follows that


Es Xn s Yn   Es Xn Es Yn .
Letting n  , the right hand side converges to EXEY. To complete the proof we
have to show that the left hand side converges to EXY. This follows by applying
the monotone convergence theorem to the sequence Z n  s Xn s Yn .

As |X| X   X  , a random variable X is integrable if and only if E|X|  . Hence


we have
X  L 1   EX    E|X|  .
The set of random variables X :    such that |X| 2 is integrable, i.e., E|X| 2   ,
will be denoted by L 2  or L 2 , P.

36
Letting Y  1 in the Schwarz inequality
EXY  EX 2 EY 2  3. 3
with X, Y  L 2 , we find
L 1   L 2 .
The covariance CovX, Y of two random variables X, Y  L 2  is defined as
CovX, Y  EXY  EXEY.
Two random variables are said to be uncorrelated if CovX, Y  0. By
Theorem 3.1.3 (iv), if X, Y are independent then they are uncorrelated, but the
opposite is not true in general. Consider for example the simple random variables
1 with probability 1/3
X 0 with probability 1/3
1 with probability 1/3
and
0 with probability 1/3
Y  X2 
1 with probability 2/3
Then X and Y are clearly not independent, but
CovX, Y  EXY  EXEY  EX 3   0  0,
since EX 3   EX  0.
Definition 3.1.4 The variance of a random variable X  L 2  is given by
VarX  EX  EX 2 .

Using the linearity of the expectation we can rewrite the definition of variance as
VarX  EX 2   2EEXX  EX 2  EX 2   EX 2  CovX, X.
Note that a random variable has zero variance if and only if X  EX a.s., hence
we may view VarX as a measure of the “randomness of X”. As a way of example,
let us compute the variance of X  Pµ. We have
  
k µ
EX 2    k 2 PX  k   k 2 µ k!e  e µ  k
k  1!
µk
k0 k0 k1
  
µr
 e µ  r  1 µ r1  e µ µ   µ  rPX  r  µ  µEX  µ  µ 2 .
r! r!
r0 r0 r0

Hence
VarX  EX 2   EX 2  µ  µ 2  µ 2  µ.

37
VarX  Y VarX VarY holds if and only if X, Y are uncorrelated. Moreover,
if we define the correlation of X, Y as
CovX, Y
CorX, Y  ,
VarXVarY
then CorX, Y  1, 1 and |CorX, Y| 1 if and only if Y is a linear function of X.
The interpretation is the following: the closer is CorX, Y to 1 (resp. 1), the more
the variable X and Y have tendency to move in the same (resp. opposite) direction
(for instance, (CorX, 2X  1, CorX, 2X  1. An important problem in quantitative
finance is to find correlations between the price of different assets.
Remark 3.5 (L 2 -norm). The norm Z 2  EZ 2  in L 2  is called L 2 norm. It
can be shown that it is a complete norm, i.e., if X N  n  L 2  is a Cauchy
sequence of random variables in the norm L 2 , then there exists a random variable
X  L 2  such that X N  X 2  0 as n  .

Next we want to present a first application in finance of the theory outlined above.
In particular we establish a sufficient condition which ensures that a portfolio is not
an arbitrage.
Theorem 3.1.5 Let a portfolio be given with value Vt t0 . Let V  t  DtVt be
the discounted portfolio value. If there exists a measure P equivalent to P such that
EV  t is constant (independent of t), then the portfolio is not an arbitrage.
Proof. Assume that the portfolio is an arbitrage. Then V0  0 almost surely; as
V  0  V0, the assumption of constant expectation in the probability measure
P gives
EV  t  0, for all t  0. 3. 4
Let T  0 be such that PVT  0  1 and PVT  0  0. Since P and P are
equivalent, we also have PVT  0  1 and PVT  0  0. Since the discounting
process is positive, we also have PV  T  0  1 and PV  T  0  0. However
this contradicts (3.4), due to Theorem 3.1.3 (iii). Hence our original hypothesis that
the portfolio is an arbitrage portfolio is false.
Theorem 3.1.6 (Radon - Nikodym). Let P and P be equivalent probability measures
defined on ,  . Then there exists an almost sure positive random variable Z such
that EZ  1 and
PA  A ZdP for every A  .

Theorem 3.1.7 The following statements are equivalents:


(i) P and P are equivalent probability measures;

38
(ii) There exists a unique (up to null sets) random variable Z :    such that
Z  0 almost surely, EZ  1 and PA  EZI A , for all A  F.
Moreover, assuming any of these two equivalent conditions, for all random variables
X such that XZ  L 1 , P, we have X  L 1 , P and
EX  EZX. 3. 5
Proof The implication (i)  (ii) follows by the Radon-Nikodym theorem.
(ii)  (i), we first observe that P  EZI    EZ  1. Hence, to prove that P is
a probability measure, it remains to show that it satisfies the countable additivity
property: for all families A k  k of disjoint events, P k A k    k PA k . To prove this
let
B n   nk1 A k .
Clearly, ZI B n is an increasing sequence of random variables. Hence, by the monotone
convergence theorem we have

n
lim EZI B n   EZI B  , B    k1 A k ,
i.e.,
lim PBn  PB  .
n
3. 6
On the other hand, by linearity of the expectation,
n n
PB n   EZ B n   EZI  nk1 A k   EZI A 1    I A n    EZA k    PA k .
k1 k1

Hence (3.6) becomes


 PA k   P k1 A k .


k1

This proves that P is a probability measure. To show that P and P are equivalent, let
A be such that PA  0. Since ZI A  0 almost surely, then PA  EZI A   0 is
equivalent, by Theorem 3.1.3 (iii), to ZI A  0 almost surely. Since Z  0 almost surely,
then this is equivalent to I A  0 a.s., i.e., PA  0. Thus PA  0 if and only if PA  0,
i.e., the probability measures P and P are equivalent. It remains to prove the identity
(3.5).
If Xis the simple random variable X   k a k I A k , then the proof is straightforward:

EX   a k PA k    a k EZI A   E k Z  a k IA k  EZX.


k k k

For a general non-negative random variable X the result follows by applying (3.5) to an
increasing sequence of simple random variables converging to X and then passing to
the
limit (using the monotone convergence theorem). The result for a general random
variable

39
X :    follows by applying (3.5) to the positive and negative part of X and using the
linearity of the expectation. Hence, the proof.

Remark 3.6 (Radon-Nikodym derivative). Using the Lebesgue integral notation


(see Remark 3.4) we can write (3.5) as
 XωdPω   XωZωdPω.
This leads to the formal identity dPω  ZωdPω, or Zω  dPωdPω, which
explains why Z is also called the Radon-Nikodym derivative of P with respect to P.

40

You might also like