LectureNotes Clean
LectureNotes Clean
September 5, 2016
Contents
Preface iii
1 Preference 1
1.1 Preference relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Preference over real vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Utility 8
2.1 Utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 From preference to utility: finite or countable sets . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Preference, but no utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 In no-man’s-land: A necessary and sufficient condition for utility representation . . . . . 11
2.5 Continuous utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6 Some special functional forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Choice 20
3.1 Existence of most preferred elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Revealed preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 General equilibrium 47
6.1 What is an equilibrium? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Pure exchange economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3 Welfare analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.4 Private ownership economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
i
8 Risk attitudes 60
8.1 In for a gamble? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Certainty equivalent and risk premium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
8.3 Arrow-Pratt measure of absolute risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.4 A derivation of the Arrow-Pratt measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
10 Time preference 67
10.1 Stationarity and exponential discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
10.2 Preference reversal and hyperbolic discounting . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.3 Limit-of-means and overtaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
10.4 Better may be worse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
11 Probabilistic choice 74
11.1 The Luce model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
11.2 The logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.3 The linear probability model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Notation 84
References 85
Suggested solutions 87
ii
Preface
Overview
The purpose of these notes is to introduce you to some mathematical foundations of economic theory.
These are building blocks of economics that hopefully contribute to your understanding of formal
modeling in your other courses and in the research papers you will read and — eventually — write.
The typical model of the behavior of an economic agent requires careful answers to the following
questions:
(Q1) What can the agent choose from, i.e., what is the set of feasible alternatives?
(Q2) What does the agent like, i.e., what are the preferences over alternatives?
(Q3) How are the former two combined to make a choice, i.e., to select among alternatives?
Although we make some brief excursions into bounded rationality, the main building block of traditional
economics is “rational” choice: choose from your set of feasible alternatives a most preferred one. This
raises important related questions:
(Q4) When do most preferred elements exist?
(Q5) How are they affected when the agent’s environment changes?
The fourth question is extremely important: you’d be surprised about how many people simply skip
over the existence issue and write papers about how solutions to economic problems are affected
by parameter changes, without ever wondering whether there even is a solution. The fifth question
concerns things like how a consumer’s demand is affected by price changes, wage increases, etc.
Try to keep this in mind, because this is what will occupy us most of the time and constitutes the red
line of the course: regardless of the setting, we first have to answer (Q1) to (Q3) to provide a meaningful
“microfounded” model of an economic agent’s behavior. Sections 1 to 3 provide a general framework
for modeling preferences over and choice from a feasible set of alternatives.
This general framework is then applied to a number of specific cases: traditional models of con-
sumer choice (Section 4), producer choice (Section 5), choice over outcomes that are no longer deter-
ministic, but occur with certain probabilities (Section 7), choice over outcomes occurring over time
(Section 10), and even the modeling of seemingly suboptimal choices (Section 11).
Special features
The material covered here is pretty standard for a first PhD course in microeconomic theory, but I want
to stress four things that might be slightly different from what you are used to if you used older texts.
F OCUS ON PREFERENCES : In line with recent trends in advanced microeconomics, the notes have a
relatively strong focus on preferences, rather than utility functions. Utility functions are practical in the
sense that they allow you to use standard calculus tools, but this tends to blur the picture by making
economics into an exercise in advanced differentiation. I try to avoid this. Although people make
statements like “I like coffee more than tea”, you hardly ever see them in a supermarket with a calculator
and their utility function written on a piece of paper.
This allows us to give a much more general answer — with a remarkably simple proof — to the
question when most preferred elements exist; see Proposition 3.1.
F ROM PREFERENCES TO UTILITY:
Not all preferences can be represented by a utility function. Graduate texts typically give exactly
one example, lexicographic preferences, as if it concerns an exotic phenomenon. These notes
iii
try to give some counterweight by providing several economically relevant examples, all arising
from the same general principle; see Section 2.3.
So the question remains, when does a utility function exist? Section 2.4 provides necessary and
sufficient conditions.
As an important special case, when does a continous utility function exist? Proposition 2.6 pro-
vides a detailed proof. Remarkably, not even Fishburn (1970a), the standard reference on utility
theory, contains such a proof, and neither does any of the standard textbooks in microeconomic
theory.
I don’t actually expect you to know the proof, I just wanted to fill a gap and make sure you have
access to it.
S OLUTIONS MANUAL : Like any textbook, these notes contain exercises. They also contain solutions
to the exercises, in the hope of facilitating self-study: if you have time to do some exercises, you can
immediately check your solutions. If you’re pressed for time, you can treat the worked exercises as a
collection of a few dozen (cleverly disguised) examples and applications.
Recommended reading
The lecture notes are the reading material for the course. You may omit the proof of Propositions 2.5
and 2.6, as well as the more mathematical exercises in Section 10.3. For the interested reader, the
following table refers to related material in Mas-Colell, Whinston, and Greene (1995, MWG), which is
by no means obligatory.
Terminology
In economics, there is little consensus on terminology. For instance, following Arrow (1959) and
Fishburn (1970b), I refer to a complete transitive binary relation that models an economic agent’s
preferences as a ‘weak order’. Other names include ‘rational preference relation’ (Mas-Colell et al., 1995),
iv
a very loaded term, simply ‘preference relation’ (Rubinstein, 2006), ‘complete preordering’ (Debreu,
1959), ‘complete weak order’ (Fishburn, 1979), and ‘complete ordering’ (Debreu, 1954). The Micro I
course and its exam use the definitions from these lecture notes.
v
1 Preference
1.1 Preference relations
Rational choice essentially means choosing from a set of feasible options a most preferred alternative.
Let X be a set of alternatives. A preference relation % is a binary relation on X , allowing the comparison
between pairs of alternatives. For each x, y ∈ X , read
Exercise 1.1 Are the following binary relations % necessarily complete, transitive?
(a) X consists of the items in an English dictionary, % is the alphabetical order in which they are listed.
(b) X is a group of people and for x, y ∈ X : x % y if and only if x knows y.
From preference relation %, one can derive two other binary relations:
1
This involves a slight but common abuse of notation: although this was not stated explicitly, the
notation above is taken to suggest, for instance, that also a Â1 c. Define a new preference relation Â
via majority rule voting: a  b, because a majority (namely the agents 1 and 2) strictly prefers a over
b. Similarly, b  c and c  a, in violation of transitivity. This example is sometimes referred to as the
Condorcet paradox.
N ONPERCEIVABLE DIFFERENCES AND SIMILARITY: The human body cannot perceive differences in
stimuli unless they exceed a certain threshold. For instance, you will typically not sense the difference
between a cup of tea with n ∈ N grains of sugar and n + 1 grains of sugar. Therefore, you will be
indifferent between them. If preferences are transitive, you will be indifferent between a cup of tea
with 1 grain of sugar, 2 grains of sugar, 3 grains of sugar. . . one kilo of sugar. Are you? This example
is related to the more general issue of similarity: nearby alternatives may be perceived similar and
therefore equally good. But with a long chain of nearby alternatives, you can create a huge change
between alternatives, so that you may no longer be indifferent between them.
Properties of % imply some properties of the indifference relation ∼ and the strict preference relation
Â. The proofs involve only simple manipulations of the definitions of ∼ and Â; check that you can do
this. I only prove part (d).
Proposition 1.1
Let % be a weak order on X .
(a) The indifference relation ∼ is an equivalence relation, i.e., it satisfies:
– reflexivity: ∀x ∈ X : x ∼ x.
– symmetry: ∀x, y ∈ X : if x ∼ y, then y ∼ x.
– transitivity: ∀x, y, z ∈ X : if x ∼ y and y ∼ z, then x ∼ z.
– irreflexivity: ∀x ∈ X : not x  x.
– asymmetry: ∀x, y ∈ X : if x  y, then not y  x.
– transitivity: ∀x, y, z ∈ X : if x  y and y  z, then x  z.
2
X equals RL+ or RL . These properties are typically illustrated using indifference curves. The indifference
curve containing x ∈ X is the set {y ∈ X : x ∼ y} of points equivalent with x.
Recall that the (Euclidean) distance between vectors x, y ∈ RL is defined as
v
u L
uX
kx − yk = t (x ` − y ` )2 .
`=1
The preference relation % over X satisfies local nonsatiation if, for every alternative x, there is an
alternative arbitrarily close to x that is better: for each x ∈ X and each ε > 0 there is a y ∈ X with
kx − yk < ε and y  x.
Monotonicity properties come in different varieties, all reflecting the intuition that “more is better”.
Let k ∈ {1, . . . , L} and let e k ∈ RL denote the k-th standard basis vector with k-th coordinate equal to one
and all other coordinates equal to zero. For x, y ∈ RL , write
The relations between the three monotonicity relations and local nonsatiation are:
locally nonsatiated
3
An arrow from strong monotonicity to monotonicity means that the former implies the latter; the
absence of an arrow in the opposite direction means that the converse is not true.
Local nonsatiation has no implications for monotonicity: the
x2
preference relation % on R2+ with
(x 1 , x 2 ) % (y 1 , y 2 ) ⇔ (x 1 − x 2 )2 + x 1 ≥ (y 1 − y 2 )2 + y 1
3
Proposition 1.2
Let % be a weak order on X . The following properties are equivalent:
(a) % is continuous, i.e., for every y ∈ X , the sets {x ∈ X : x % y} and {x ∈ X : x - y} are closed.
(d) For all sequences (x n )n∈N and (y n )n∈N in X , if x n → x, y n → y, and x n % y n for all n ∈ N,
then also x % y.
(e) For all x, y ∈ X , if x  y, then there is a neighborhood U x of x (i.e., an open set U x containing
x) and a neighborhood U y of y such that x 0 Â y 0 for all x 0 ∈ U x , y 0 ∈ U y .
Proof: Statements (a) and (b) are equivalent, since the complement of an open set is closed, and vice
versa. Also the equivalence of (c) and (d) is a matter of definition: (x n , y n ) ∈ X × X is an element of the
graph of % if and only if x n % y n . Proving three implications suffices to close the circle and make sure
that all five statements are equivalent:
[(b) implies (e):] Assume (b) holds. Let x, y ∈ X with x  y. Distinguish two cases:
C ASE 1: There is an m ∈ X with x  m  y.
Define U x = {z ∈ X : z  m} and U y = {z ∈ X : m  z}. These sets are open by (b). Moreover, x ∈ U x
and y ∈ U y by assumption. Let x 0 ∈ U x , y 0 ∈ U y . Then x 0  m and m  y 0 . By Proposition 1.1,  is
transitive, so x 0 Â y 0 , as we had to show.
C ASE 2: There is no m ∈ X with x  m  y.
Define U x = {z ∈ X : z  y} and U y = {z ∈ X : x  z}. These sets are open by (b). Moreover, x ∈ U x
and y ∈ U y by assumption. Let x 0 ∈ U x , y 0 ∈ U y . Then x 0  y. It cannot be that x  x 0 , otherwise we
would have x  x 0  y. By completeness, x 0 % x. Similarly, y % y 0 . So x 0 % x  y % y 0 . By Proposition 1.1,
x 0 Â y 0 , as we had to show.
Conclude from cases 1 and 2 that (e) holds.
[(e) implies (c):] Assume (e) holds. To establish (c), we need to show that the complement of the graph
{(x, y) ∈ X × X : x % y} is open. By completeness of %, this complement is the set
S = {(x, y) ∈ X × X : x ≺ y}.
4
For each (x, y) ∈ S, fix, using (e), neighborhoods U x of x and U y of y such that x 0 ≺ y 0 for all x 0 ∈ U x , y 0 ∈
U y . Conclude that
∀(x, y) ∈ S : (x, y) ∈ U x ×U y ⊆ S.
Taking the union over all (x, y) ∈ S, one obtains
S = ∪(x,y)∈S U x ×U y .
there is an ε > 0 such that for all x ∈ X with kx − yk < ε: x ∈Y. (1)
Many people overlook a slight subtlety, namely the statement “. . . for all x ∈ X . . . ” in (1). This looks
innocuous: if you want to define whether a subset of X is open, then obviously you’re not interested in
stuff that is outside of X . But it does matter in identifying open subsets! Notice, for instance, that as
subsets of X = R2+ , but not as subsets of X = R2 , sets like
are open. You might want to draw their pictures. In topological language, X = RL+ is endowed with the
relative topology that it inherits from the larger set RL : a set Y ⊆ X is open if and only if Y = X ∩ O ,
where O is an open set in the larger space RL . This provides quick proofs that the sets Y 1 , Y 2 , Y 3 are
open subsets of X = R2+ :
5
are open in R2 .
The next two properties are related to other changes, namely shifts in or rescaling of the coordinates.
The preference relation % is:
quasilinear in coordinate k if, for all x, y ∈ X and all ε > 0, x % y implies that x + εe k %
y + εe k : the preference relation is insensitive to parallel shifts in the sense that adding the
same positive amount of commodity k to both alternatives does not affect the preference
over them.
homothetic if rescaling the coordinates does not affect the preferences: for all x, y ∈ X and
all α > 0, if x % y, then αx % αy.
For instance, any preference relation where only the difference between the first coordinates matters,
like
(x 1 , x 2 ) % (y 1 , y 2 ) ⇔ 3x 1 + exp x 2 ≥ 3y 1 + exp y 2 ,
is quasilinear in the first coordinate. Often, such a coordinate is referred to as “numeraire” or “money”
and the economic idea is that not the exact amounts of money associated with two alternatives matter,
but the difference between them. A simple example of homothetic preferences arises in most linear
production processes: let alternatives x and y denote vectors of ingredients and let x be weakly
preferable to y if the ingredients of x suffice to make at least as much of your favorite cake as y. Then
also αx yields at least as much cake as αy. More generally, any preference relation defined in terms of
a homogeneous function is homothetic. Recall that a function f : RL+ → R is homogeneous of degree
k ∈ R if for each x ∈ RL+ and each α > 0: f (αx) = αk f (x). Suppose that x % y if and only if f (x) ≥ f (y).
Then % is homothetic:
A preference relation % is convex if for each y ∈ X , the set {x ∈ X : x % y} of weakly better alternatives is
convex.
6
Proposition 1.3
Let % be a weak order on X . Then % is convex if and only if for all x, y ∈ X with x % y and all
α ∈ [0, 1], also αx + (1 − α)y % y. Informally, if x is at least as good as y, just walking part of the
way from y to x is a weak improvement.
Exercise 1.7
(a) Prove this proposition.
(b) Give an example to show that the proposition is false if % is not a weak order.
A somewhat stronger version: a preference relation % is strictly convex if for all x, y ∈ X with x 6= y and
x % y and all α ∈ (0, 1), it holds that αx + (1 − α)y  y.
This property implies that if you are indifferent between two distinct alternatives x, y ∈ X , you can
still improve upon them: by strict convexity, the alternative 21 x + 12 y is strictly better.
7
2 Utility
2.1 Utility functions
In many cases, preferences over alternatives can be evaluated by some numerical assessment: “I prefer
the alternative with the higher percentage of alcohol” or “I prefer the alternative yielding the higher
profit”. In that case, we say that these functions — in the latter case the function assigning to each
alternative its associated profit — represent the decision maker’s preferences. Formally, a function
u : X → R is a utility function representing % if for all x, y ∈ X :
One often uses the following simple result to verify that u represents a complete preference relation %.
Proposition 2.1
Let % be a complete preference relation on a set X and let u : X → R be a function. The following
two claims are equivalent:
(a) u represents %;
Proof: (a) ⇒ (b): Assume (a) holds. Let x, y ∈ X . If x  y, by definition of Â: x % y and not y % x. Hence,
by definition of a utility function, u(x) ≥ u(y) and not u(y) ≥ u(x). Conclude that u(x) > u(y). Similarly,
if x ∼ y, u(x) = u(y).
(b) ⇒ (a): Assume (b) holds. Let x, y ∈ X . To show:
x % y ⇔ u(x) ≥ u(y).
One direction is easy: if x % y, then x  y or x ∼ y, so by (b), either u(x) > u(y) or u(x) = u(y). Hence
u(x) ≥ u(y). Conversely, assume that u(x) ≥ u(y). By completeness, x % y or y % x. Suppose x % y is
not true. Then y  x, so by (b), u(y) > u(x), a contradiction.
Exercise 2.1 The completeness condition in Proposition 2.1 cannot be omitted. Indeed, consider the preference
relation % on R with
∀x, y ∈ R : x % y ⇔ x ≥ y + 1
and the function u : R → R with u(x) = x for all x ∈ R. Show that:
If one function represents a preference relation, then many others do as well: if preferences are
represented by a profit function, then also “twice the profit” or “profit to the power three” represent the
same preference relation. In general, any order-preserving transformation will do:
Proposition 2.2
If u : X → R represents % and real-valued function f is strictly increasing on the range u(X ) =
{u(x) : x ∈ X } of u, then also the function v : X → R defined by v(x) = f (u(x)) represents %.
8
Proof: By (2) and the definition of strictly increasing, we find for all x, y ∈ X :
so v represents %.
Since the ≥ ordering of the real numbers is complete and transitive, a preference relation that can be
represented by a utility function is necessarily complete and transitive: it must be a weak order. But is
being a weak order enough to guarantee the existence of a utility function? The answer is positive for
finite or countable sets.
Proposition 2.3
Assume:
X is finite,
% is a weak order on X .
Then there is a utility function representing %.
Proof: For each x ∈ X , define u(x) = |{z ∈ X : x % z}|. Then u : X → R represents %: let x, y ∈ X . If x ∼ y,
then for each z ∈ X with y % z, Proposition 1.1(c) gives that x % z. So {z ∈ X : y % z} ⊆ {z ∈ X : x % z}.
Similarly, the converse inclusion holds, so
{z ∈ X : x % z} = {z ∈ X : y % z}. (3)
Hence u(x) = u(y). If x  y, Proposition 1.1(d) and the fact that x lies in the former set, but not in the
latter, imply:
{z ∈ X : x % z} ⊃ {z ∈ X : y % z}. (4)
Hence u(x) > u(y).
If X is countable, simply counting the number of weakly worse alternatives does not work: there may
be infinitely many of them. But we can give each element a positive weight, make sure that the weights
have a well-defined sum even if we add infinitely many of them, and use the total weight of the elements
weakly worse than x as a measure of the utility of x. For instance, label X = {x 1 , x 2 , . . .} and divide a bar
of chocolate by giving half (weight 2−1 ) to x 1 , then half of the remainder (weight 2−2 ) to x 2 , then half of
the remainder (weight 2−3 ) to x 3 , and so on.
Proposition 2.4
Assume:
X is countable;
% is a weak order on X .
Then there is a utility function representing %.
9
Proof: Since X is countable, there is an injective function n : X → N. For each x ∈ X , define
2−n(z) .
X
u(x) =
z∈X :x %z
The sequence (2−n )n∈N has a finite sum n∈N 2−n = 1, so u is well-defined. To see that u represents %,
P
let x, y ∈ X . If x ∼ y, (3) holds, so u(x) = u(y). If x  y, (4) holds, so u(x) − u(y) ≥ 2−n(x) > 0.
Secondly, if z < z 0 , then the good alternative associated with z is worse than the bad alternative
associated with z 0 :
∀z, z 0 ∈ I : z < z 0 ⇒ b(z 0 ) Â g (z). (6)
Combining (5) and (6), representing such preferences by a utility function requires, for z < z 0 :
So for each z ∈ I , the interval [u(b(z)), u(g (z))] has positive length and if z, z 0 ∈ I have z 6= z 0 , the intervals
[u(b(z)), u(g (z))] and [u(b(z 0 )), u(g (z 0 ))] are disjoint: one of them lies entirely to the left of the other
on the real axis. So uncountably many intervals [u(b(z)), u(g (z))] of positive length must somehow be
placed on the real line without any two of them intersecting. This is impossible: we simply run out
of space! Formally, since Q is dense in R, each interval [u(b(z)), u(g (z))] contains a rational number
r (z) ∈ Q. Since the intervals associated with different values of z are disjoint: z 6= z 0 implies r (z) 6= r (z 0 ),
i.e., the function r : I → Q is injective. But I is uncountable and Q is countable, a contradiction. Some
examples:
L EXICOGRAPHIC PREFERENCES . (Debreu, 1954) Let X = R2 . Define % as follows:
¡ ¢
(x 1 , x 2 ) % (y 1 , y 2 ) ⇔ x 1 > y 1 or x 1 = y 1 and x 2 ≥ y 2 .
Alternatives are compared according to their first coordinates; if these happen to be equal, they are
compared according to their second coordinates. Think of the way words are ordered in a dictionary.
For each z ∈ R, let b(z) = (z, 0) and g (z) = (z, 1). Then g (z) Â b(z) and, if z, z 0 ∈ R, z < z 0 , then g (z) =
(z, 1) ≺ (z 0 , 0) = b(z 0 ). So (5) and (6) hold: this preference relation cannot be represented by a utility
function.
P REFERENCES OVER INFORMATION . (Dubra and Echenique, 2001) It is common in economics to model
information by means of partitions of a state space. Let z ∈ R be a certain threshold. Suppose you get
the following information about a number x ∈ R: you are told the exact value of x if x < z, otherwise
you are told that x lies in the interval [z, ∞). That means you can perfectly distinguish between all real
numbers x with x < z, but cannot distinguish between the numbers in the interval [z, ∞). Therefore,
information is summarized by the partition
10
of R. Similarly, define the information partition
that arises if you are told the exact value of x also in the case where x = z: all numbers x ≤ z can be
perfectly distinguished, but larger ones not. Assume it is preferable to have more precise information,
i.e., finer information partitions (partition P is finer than partition Q if every set from P is contained in a
set from Q). Partition g (z) is finer than partition b(z), so g (z) Â b(z). Also if z < z 0 , partition b(z 0 ) is finer
than partition g (z), so b(z 0 ) Â g (z). So (5) and (6) hold: this preference relation cannot be represented
by a utility function.
P REFERENCES OVER UTILITY FLOWS . At every moment in time t ∈ [0, ∞), an agent receives payoff zero or
one: an alternative x is simply a function x : [0, ∞) → {0, 1}. Suppose preferences satisfy the following
monotonicity condition: if x(t ) ≥ y(t ) at all times t , with strict inequality for at least one time period,
then x  y. Define, for each z ∈ [0, ∞), the alternative b(z) giving payoff one before time z and payoff
zero afterwards: ½
1 if t < z,
b(z)(t ) =
0 otherwise.
Similarly, alternative g (z) gives payoff one at/before time z and payoff zero afterwards:
½
1 if t ≤ z,
g (z)(t ) =
0 otherwise.
By the monotonicity requirement, g (z) Â b(z) and if z < z 0 : b(z 0 ) Â g (z). So (5) and (6) hold: this
preference relation cannot be represented by a utility function.
Proposition 2.5
Let % be a weak order on a set X . There is a utility function representing % if and only if X is
Jaffray order-separable.
Exercise 2.2 This exercise guides you through the steps of the proof. Assume that u represents %. Let U = {u(x) :
x ∈ X } be the range of u. A jump in U is a pair (u 1 , u 2 ) ∈ U × U where u 1 < u 2 and the open interval (u 1 , u 2 )
contains no elements of U : (u 1 , u 2 ) ∩U = ;.
11
(a) Prove that u contains at most countably many jumps. (Suppose not. Use the idea behind (5) and (6) to find
a contradiction.)
For each jump (u 1 , u 2 ), fix a point x(u 1 , u 2 ) with utility u 1 and a point y(u 1 , u 2 ) with utility u 2 . Let
J = ∪{x(u 1 , u 2 ), y(u 1 , u 2 )}
Conversely, assume X is Jaffray order-separable via the set C . Let n : C → N be injective. Define u by u(x) =
P −n(c) .
c∈C :c -x 2
For finite or countable sets X , simply let C = X to show that X is Jaffray order-separable. For preferences
over uncountable sets, Jaffray order-separability is sometimes derived from additional assumptions
on preferences. We will see in Proposition 2.8, for instance, that on RL+ , adding continuity to our list
of requirements works. Or the whole discussion about order-separability is avoided by deriving the
existence of a utility function straight from the assumptions that are imposed on preferences. That will
be our approach, for instance, in the discussion about expected utility theory.
Proposition 2.6
Assume:
% is a weak order on X ;
X is Jaffray order-separable;
X is endowed with a topology where, for all y ∈ X , the sets {x ∈ X : x  y} and {x ∈ X : x ≺ y}
are open, i.e., % is continuous.
Then there exists a continuous utility function representing %.
Proof: Let C ⊆ X make X Jaffray order-separable. Omitting redundant elements from C if necessary,
one may assume that no two distinct elements of C are equivalent: for all c, c 0 ∈ C with c 6= c 0 , either
c  c 0 or c 0  c.
[Define utility on C :] Since C is countable, label C = {c 1 , c 2 , . . .}. Since the set Q = (0, 1) ∩ Q of rationals
in (0, 1) is countable, label Q = {q 1 , q 2 , . . .}. Define a utility function f : C → Q by induction: f (c 1 ) := q 1 .
Let n ∈ N, n ≥ 2, and assume f was defined on {c 1 , . . . , c n−1 }. To extend the utility function to {c 1 , . . . , c n },
12
define f (c n ) to be first element of Q (defined3 as the element q ` ∈ Q with smallest index `) among
those elements q ` that give the desired extension:
∀k ∈ {1, . . . , n − 1} : q ` > f (c k ) ⇔ c n  c k . (7)
A useful implication: let a, b ∈ C with a ≺ b. If the set of points in C between a and b,
β(a, b) = {c ∈ C : a ≺ c ≺ b},
is nonempty, it has a first element (Why?), say c m . By construction, c m is the first element in β(a, b) to
be assigned its value by f and therefore its image f (c m ) is the first element in ( f (a), f (b)) ∩Q.
[Extend utility to X :] For each x ∈ X , define u(x) = sup { f (c) : c ∈ C , c - x}. The set over which the
supremum is taken is nonempty (it contains x) and bounded from above (by 1), so this supremum
exists. Moreover, u represents %. Let x, y ∈ X . If x ∼ y, the supremum is taken over the same set, so
u(x) = u(y). If x  y, there exist, by Jaffray order-separability, elements a, b ∈ C with x % a  b % y, so
that u(x) ≥ f (a) > f (b) ≥ u(y).
[Establish continuity of utility:] The usual topology on R is generated by the intervals (−∞, r ) and
(r, ∞), with r rational. Therefore, it suffices to prove that u −1 ((−∞, r )) and u −1 ((r, ∞)) are open for all
r ∈ Q. Let’s do the former; the latter is similar.
Now u −1 ((−∞, r )) equals (i) ; if r ≤ inf f (C ), (ii) X if r > sup f (C ) or if r = sup f (C ) and r ∉ f (C ),
(iii) {x ∈ X : x ≺ f −1 (r )} if r ∈ f (C ). By assumption, all these sets are open.
The only remaining case is when r ∉ f (C ) and inf f (C ) < r < sup f (C ). We show that r belongs to a
jump of f (C ). Recall from Exercise 2.2 that a jump in f (C ) is a pair of points ( f 1 , f 2 ) ∈ f (C ) × f (C ) with
f 1 < f 2 and ( f 1 , f 2 ) ∩ f (C ) = ;.
Suppose not. Since inf f (C ) < r < sup f (C ), there exist a, b ∈ C with f (a) < r < f (b). Let m ∈ N be
the maximum of the indices of f (a), r, f (b) ∈ Q. Then {q 1 , . . . , q m } contains r and elements p, p 0 ∈ f (C )
with p < r < p 0 . Let n ∈ N be the smallest index for which {q 1 , . . . , q n } has this property. Let
p 1 = max f (C ) ∩ {q 1 , . . . , q n } ∩ (−∞, r ), so (p 1 , r ) ∩ {q 1 , . . . , q n } = ;,
3 Caveat: ‘first element’ is defined in terms of the chosen enumerations of C and Q. This allows us to speak, for instance, of the
first element in (0, 1), which makes absolutely no sense if one — mistakenly — were to believe it was defined in terms of the usual
≥ order on R.
13
Y is a connected subset of X .
The following two results hold:
Proof: (a): Suppose not: all elements of Y are strictly better/worse than x. That is, each element of Y
belongs to exactly one of the sets A = {z ∈ X : z ≺ x} and B = {z ∈ X : z  x}. The former contains y 0 , the
latter y. As A and B are open by continuity, they separate the connected set Y , a contradiction.
(b): Suppose not. Then each element of Y belongs to exactly one of the sets A = {z ∈ X : y  z} and
B = {z ∈ X : z  y 0 }. The former contains y 0 , the latter y. As A and B are open by continuity, they
separate the connected set Y , a contradiction.
In typical applications of this proposition, one takes Y to be equal to the entire set X , as in Proposition
2.8, or to a suitably chosen convex set like the diagonal {x ∈ RL+ : x 1 = · · · = x L } in Proposition 2.9.
Proposition 2.8
Assume:
X = RL
+ for some L ∈ N;
% is a continuous weak order on X .
Then there is a continuous utility function representing %.
Proof: The countable set C = QL+ makes X Jaffray order-separable: let x, y ∈ X with x  y. By Proposi-
tion 2.7, there is a z ∈ X with x  z  y. By continuity, the set
{a ∈ X : x  a  z} = {a ∈ X : x  a} ∩ {a ∈ X : a  z}
is the intersection of two open sets, hence open itself. It is nonempty by Proposition 2.7. The set C
is dense in X : every nonempty, open set in X has a nonempty intersection with C . Hence, there is
a c 1 ∈ C with x  c 1  z. Similarly, there is a c 2 ∈ C with z  c 2  y. Conclude that x  c 1  c 2  y, in
correspondence with the requirement for Jaffray order-separability. Now all conditions of Proposition
2.6 are satisfied.
Below we present a special case of Proposition 2.8 with a particularly simple proof.
Proposition 2.9
Assume:
X = RL
+ for some L ∈ N;
% is a continuous, monotonic weak order on X .
Then there is a continuous utility function representing %.
14
being connected, contains an element equivalent to x: there is an αx ≥ 0 with x ∼ αx e. Unicity follows
from monotonicity: increasing αx gives better alternatives, decreasing worse.
Step 2: Define u(x) = αx . Then u represents %.
Let x, y ∈ X . Then x % y ⇔ αx e % α y e ⇔ u(x) = αx ≥ α y = u(y).
Step 3: u is continuous.
It suffices to show that the preimage u −1 ((α, β)) of every open interval (α, β) is open. Now
As a simple application, suppose that preferences are also homothetic. Then x ∼ αx e and β ≥ 0 implies
that βx ∼ βαx e, so u(βx) = βαx = βu(x). This proves that if — in addition to the assumptions in
Proposition 2.9 — the preference relation % is homothetic, there is a utility function homogeneous of
degree one representing %.
The next exercise asks you to think about the connection between continuous preferences and
continuous utility functions. The fact that statement (a) in that exercise is true, is useful and important
for future reference: continuous utility implies continuous preferences! It is usually easier to recognize
continuity of a function than continuity of a binary relation, so this result can often be used as a quick
answer to show that preferences are continuous.
Exercise 2.3 Consider a weak order % on topological space X represented by utility function u : X → R. Are the
following claims true or false?
(a) If u is continuous, then % is continuous.
(b) If % is continuous, then u is continuous.
Proposition 2.10
Assume:
X = RL
+ for some L ∈ N;
% is a weak order on X ;
% is quasilinear and strongly monotonic in the first coordinate;
“Getting something is at least as good as getting nothing”: x % (0, . . . , 0) for every x ∈ X ;
“Any difference can be compensated for by money”: ∀x, y ∈ X : if x % y, there is a v ≥ 0 s.t.
x ∼ (y 1 + v, y 2 , . . . , y L ).
Then there is a utility function of the form u(x) = x 1 + v(x 2 , . . . , x L ) representing %.
15
This number is unique, since % is strongly monotonic in the first coordinate. Adding x 1 ≥ 0 to the first
coordinate, quasilinearity implies that
(x 1 , x 2 , . . . , x L ) ∼ (x 1 + v(x 2 , . . . , x L ), 0, . . . , 0) .
⇔ x 1 + v(x 2 , . . . , x L ) ≥ y 1 + v(y 2 , . . . , y L ),
where the second equivalence follows from strong monotonicity of % in the first coordinate.
The proof establishes that each alternative is equivalent with receiving a sufficiently large amount of
just the first commodity: utility can be measured in units of commodity 1. This explains the frequent
use of quasilinear preferences: only if they are measured on the same scale can one do meaningful
comparisons between, say, your utility and mine.
Exercise 2.4 Is the final property
Such m, m ∗ exist by (a) and the function v is independent of the particular choices of m, m ∗ by (b), so this function
is well-defined.
(c) Show that the function u : X → R with u(a, m) = v(a) + m is a utility function representing %.
Also convexity and strict convexity of preferences have implications for the form of the utility function.
Recall that a real-valued function u on a convex domain X (Why convex?) is
16
Proposition 2.11
Assume:
X = RL
+ for some L ∈ N;
% is a convex weak order on X ;
u : X → R represents %.
Then u is quasiconcave. If % is strictly convex, u is strictly quasiconcave.
Proof: Let x, y ∈ X and α ∈ (0, 1). Assume without loss of generality that x % y. Then u(x) ≥ u(y), so
min{u(x), u(y)} = u(y). By convexity of %: αx + (1 − α)y % y, so
u(αx + (1 − α)y) ≥ u(y) = min{u(x), u(y)},
as we had to show. The proof for strict quasiconcavity is analogous.
Exercise 2.6
(a) An equivalent way of defining a quasiconcave function u on a convex domain X is that for all r ∈ R, the
upper contour set X u (r ) = {x ∈ X : u(x) ≥ r } is convex. Provide a second proof of Proposition 2.11, using this
definition.
(b) As a converse to Proposition 2.11, prove that if u : X → R is a (strictly) quasiconcave utility function on a
convex set X , the corresponding preference relation % is (strictly) convex.
(c) Give an example of a convex weak order on R that can be represented by a utility function, but not by a
concave one.
Next, we provide conditions for a weak order to be representable by a linear utility function. Although
we go into more detail, the proof follows Diecidue and Wakker (2002). A convenient mathematical tool
is treated in the following exercise.
Exercise 2.7 C AUCHY ’ S FUNCTIONAL EQUATION : On two domains, we show that, under mild assumptions, additive
functions are linear. Let f : R → R be additive: f (x + y) = f (x) + f (y) for all x, y ∈ R.
(a) Let u ∈ R. Show that f (xu) = x f (u) for all rational x. Hint: First establish the claim for x ∈ N, then for x ∈ Z,
then for x ∈ Q.
Setting u = 1 and c = f (1), it follows that f (x) = cx for all rational x, i.e., f is linear on the field Q. Approximating
real numbers by rational ones and taking limits, it follows that continuous additive functions f : R → R are linear.
But much weaker conditions than continuity suffice:
(b) Suppose f is not linear on R. Show that its graph {(x, y) ∈ R2 : y = f (x)} is dense.
So any assumption that prevents the graph of f being dense implies that f must be linear! Such conditions include
continuity in a single point, boundedness/sign restrictions on small intervals, monotonicity, etc.
We now extend the domain to n-dimensional real vectors. Let F : Rn → R be additive: F (x + y) = F (x) + F (y)
for all x, y ∈ Rn .
(c) Reduce this to the previously solved case by showing that there exist additive functions f i : R → R for
i = 1, . . . , n such that, for all x ∈ Rn , F (x) = f 1 (x 1 ) + · · · + f n (x n ).
With this tool in our baggage, we can prove the linear representation result:
Proposition 2.12
Assume:
X = RL for some L ∈ N;
% is a weak order on X ;
% is strongly monotonic;
% is additive: for all x, y, z ∈ X , if x % y, then x + z % y + z;
17
For each x ∈ X there is a constant α ∈ R such that x ∼ α(1, . . . , 1).
Then there are α1 , . . . , αL ∈ R++ such that the function u : X → R with u(x) = α1 x 1 + · · · + αL x L
represents %.
Proof: By assumption, there is, for each x ∈ X , a number u(x) ∈ R such that x ∼ u(x)e. By strong
monotonicity, this number is unique. So the function u : RL → R is well-defined and represents
preferences %.
Moreover, u is additive. Let x, y ∈ X . Using additivity of % twice (for % and -), x ∼ u(x)e implies
that x + y ∼ u(x)e + y. Similarly, y ∼ u(y)e implies that u(x)e + y ∼ u(x)e + u(y)e = (u(x) + u(y))e. By
transitivity, x + y ∼ (u(x) + u(y))e. Hence u(x + y) = u(x) + u(y).
As u : RL → R satisfies Cauchy’s functional equation, Exercise 2.7 implies that there are additive
functions u i : R → R (i = 1, . . . , L) with u(x) = Li=1 u i (x i ). By strong monotonicity, each u i is strictly
P
increasing: its graph cannot be dense. Hence, each u i is linear: there are α1 , . . . , αL ∈ R such that
u(x) = Li=1 αi x i . The constants α1 , . . . , αL are positive by strict monotonicity.
P
Most assumptions are familiar. Strong monotonicity assures that all the αi are positive; with milder
monotonicity requirements, one can only assure that some of them are. If you don’t like the final
assumption, recall from Proposition 2.9 that it can be replaced by continuity. Additivity of preferences
is obviously the key assumption. It essentially states that in evaluating two alternatives x, y ∈ X , only
their difference x − y matters: preferences are insensitive to translations.
With later applications in mind (see Proposition 2.13), there is no nonnegativity assumption on the
vectors over which preferences were defined: X = RL , not RL+ . If this makes you nervous, notice that
the proof hinges on the linearity of the function satisfying Cauchy’s functional equation. Fortunately,
linearity can be derived even if additivity holds only on the nonnegative orthant.
The remainder of this section is based on Voorneveld (2008), which contains more general results.
Due to its analytical tractability, the Cobb-Douglas utility function
L
a a a
u : RL+ → R with u(x) = x 1 1 · · · x L L = (L ∈ N, a 1 , . . . , a L > 0)
Y
xi i
i =1
is among the most commonly used in economics; see also Exercise 4.11. Its name credits Cobb
and Douglas (1928), who used it in the context of production theory. What properties of an agent’s
preferences assure that they can be represented by a Cobb-Douglas utility function?
Part of the trick is in exploiting the fact that this function also goes under the name of log-linear
utility: taking logarithms, we have that for all x, y ∈ RL++ :
L
X L
X
x%y⇔ a i ln x i ≥ a i ln y i .
i =1 i =1
This reduces preferences to a linear utility function in the logarithm of the variables, allowing us to
exploit Proposition 2.12. Of course, this trick goes only part of the way, as one cannot take logarithms
on the boundary of RL+ , where some coordinates equal zero.
Proposition 2.13
Assume:
X = RL
+ for some L ∈ N;
% is a weak order on X ;
% is strongly monotonic;
% is homothetic in each coordinate: for each i ∈ {1, . . . , L}, all x, y ∈ X , and each t > 0: if
x % y, then (x 1 , . . . , x i −1 , t x i , x i +1 , . . . , x L ) % (y 1 , . . . , y i −1 , t y i , y i +1 , . . . , y L ).
18
For each x ∈ X there is a constant α ∈ R+ such that x ∼ α(1, . . . , 1).
Then % can be represented by a Cobb-Douglas utility function.
Proof: We use Proposition 2.12 to show that % can be represented by a Cobb-Douglas utility function
on RL++ . The domain is then extended to RL+ .
Step 1, domain RL++ : Define f : RL → RL++ for each x ∈ RL by f (x) = (exp x 1 , . . . , exp x L ). Notice that f
and its inverse f −1 : RL++ → RL with f −1 (y) = (ln y 1 , . . . , ln y L ) are continuous. Given the weak order %
on RL++ , define a weak order % f on RL as follows:
The exponential function is strictly increasing, so by substitution in (9), properties imposed on % carry
over in a straightforward way to properties of % f : one easily verifies that it is a weak order satisfying
strong monotonicity, and there exists, for each x ∈ RL , a scalar α such that x ∼ f α(1, . . . , 1). Applying
coordinatewise homotheticity n times, if follows that
L
X L
X
x%y ⇔ (ln x 1 , . . . , ln x L ) % f (ln y 1 , . . . , ln y L ) ⇔ a i ln x i ≥ a i ln y i .
i =1 i =1
a
Taking exponentials, % is represented by utility function u with u(x) = Li=1 x i i on Rn++ .
Q
L L
Step 2, domain R+ : To see that u represents % on the entire domain R+ , we must establish that
x ∼ (0, . . . , 0) for each x ∈ RL+ with some, but not all, coordinates equal to zero. Pick such an x. As
x + (1/n)e ∈ RL++ for each n ∈ N, strong monotonicity implies (0, . . . , 0) ≺ x + (1/n)e. Hence, there is an
εn > 0 with x + (1/n)e ∼ εn e. As at least one coordinate of x + (1/n)e goes to zero:
a +···+a L
0 = lim u(x + (1/n)e) = lim u(εn e) = lim εn1 .
n→∞ n→∞ n→∞
19
3 Choice
3.1 Existence of most preferred elements
Hitherto, we discussed how microeconomists usually model what economic agents want. The obvious
next step is to consider what they actually do. The rationality paradigm underlying the classical
microeconomic theory requires that given (1) a set of mutually exclusive alternatives and (2) a nicely
behaved preference relation/utility function over the alternatives, the agent will choose a most preferred
alternative. This sounds pretty obvious, but an abundance of economic terminology sometimes blurs
the picture: most of traditional microeconomics is plain and simple constrained optimization.
This begs the question: when do most preferred alternatives exist? This is not straightforward: if
you have strongly monotonic preferences over apples and face no consumption constraints whatsoever,
there is no optimal amount of apples. Here is a very general existence result:
Proposition 3.1
Assume:
% is a weak order on a set X ;
% is upper semicontinuous: for all x ∈ X , the lower contour set L(x) = {y ∈ X | y ≺ x} is
open;
Y is a nonempty, compact subset of X .
Then Y contains a most preferred element:
∃y ∗ ∈ Y : y ∗ % y for all y ∈ Y .
Proof: Suppose not: for every y ∈ Y there is a y 0 ∈ Y with y 0 Â y. Then the lower contour sets {L(y) :
y ∈ Y } are an open covering of the compact set Y . By compactness, there is a finite subcovering, i.e., a
finite subset Y 0 ⊆ Y such that {L(y 0 ) : y 0 ∈ Y 0 } covers Y . Since Y 0 is finite, it contains a most preferred
element y ∗ . But then L(y ∗ ) covers Y , i.e., y ∗ is a best element of Y , contradicting our assumption.
A PPLICATION TO CONSUMER MODEL : Let X = RL+ . Suppose a consumer has a continuous (or upper
semicontinuous) weak order % on X reflecting his preferences and an amount of money w > 0 in his
pocket (w for “wealth”). Suppose the price vector is p ∈ RL++ . The budget set B (p, w) at prices p and
wealth w consists of all affordable feasible commodity bundles:
20
Since B (p, w) is nonempty and compact and % is assumed to be an upper semicontinuous weak order,
the budget set contains at least one most preferred alternative.
Exercise 3.1 A decision maker has lexicographic preferences % over R2 :
(x 1 , x 2 ) % (y 1 , y 2 ) ⇔ x 1 > y 1 or (x 1 = y 1 and x 2 ≥ y 2 ).
The idea behind WARP is this: in both choice problems A and B , alternatives x and y are available. If
x ∈ C (A), this reveals x to be at least good as y; otherwise x wouldn’t be acceptable. Similarly, if y ∈ C (B ),
then y must be at least as good as x. But then x and y ought to be equivalent and you should find x
acceptable also in B .
Independence of irrelevant alternatives (IIA) The choice structure (X , B,C ) satisfies IIA if
Intuitively, suppose that some items on “menu” B are not feasible after all and choice is restricted to A.
If A still contains some acceptable elements from B , choice should remain unaffected: an element is
acceptable in the smaller set A if and only if it was acceptable in the larger set B .
21
Proposition 3.2
Consider a choice structure (X , B,C ).
(a) If it satisfies WARP, then it satisfies IIA.
(b) If it satisfies IIA and all choice sets with at most three elements are contained in B, then
(X , B,C ) is rationalizable.
Proof: (a): Assume WARP holds. Let A, B be as in the definition of IIA. Let a ∈ C (A) and b ∈ C (B ) ∩ A.
To show: a ∈ C (B ) ∩ A, b ∈ C (A). Since C (A) ⊆ A ⊆ B , we have
a, b ∈ A ∩ B,
a ∈ C (A),
b ∈ C (B ).
By WARP, a ∈ C (B ), b ∈ C (A).
(b): For all x, y ∈ X , the set {x, y} lies in B by the assumption on B. Hence, we may define x % y if
x ∈ C ({x, y}). We need to check three things:
[% is complete:] Let x, y ∈ X . By nonemptiness, either x ∈ C ({x, y}) or y ∈ C ({x, y}), i.e., x % y or y % x.
[% is transitive:] Let x, y, z ∈ X and assume that x % y and y % z. By definition of %: x ∈ C ({x, y}) and
y ∈ C ({y, z}). To show: x % z, i.e., x ∈ C ({x, z}).
If x = y or y = z, this follows immediately. If x = z, then x % z is the same as x % x, which follows
from completeness. So let x, y, z be distinct and consider the set {x, y, z} ∈ B. It suffices to show that
x ∈ C ({x, y, z}), because then x ∈ C ({x, z}) by IIA.
Suppose, to the contrary, that x ∉ C ({x, y, z}). By nonemptiness of C , C ({x, y, z}) ∩ {y, z} 6= ;. By IIA
and y % z: y ∈ C ({y, z}) = C ({x, y, z}) ∩ {y, z}. So C ({x, y, z}) ∩ {x, y} 6= ;. By IIA and x % y: x ∈ C ({x, y}) =
C ({x, y, z}) ∩ {x, y}, contradicting the assumption that x ∉ C ({x, y, z}).
[% rationalizes (X , B,C ):] To show that (11) holds, let B ∈ B.
Firstly, let z ∈ C (B ). To show: z % y for all y ∈ B . So let y ∈ B . Then {y, z} ∈ B, {y, z} ⊆ B , and
z ∈ C (B ) ∩ {y, z} 6= ;. By IIA, z ∈ C ({y, z}). So z % y.
Secondly, let z ∈ B satisfy z % y for all y ∈ B . To show: z ∈ C (B ). By nonemptiness, there is a y ∈ C (B ).
Then {y, z} ∈ B, {y, z} ⊆ B , and y ∈ C (B ) ∩ {y, z} 6= ;. By z % y and IIA: z ∈ C ({y, z}) = C (B ) ∩ {y, z}, so
z ∈ C (B ).
Exercise 3.4 investigates the other relations between rationalizability, WARP, and IIA.
3.3 Exercises
Exercise 3.3 W EIERSTRASS ’ M AXIMUM T HEOREM : Use Proposition 3.1 to prove that a continuous function f : X →
R on a nonempty, compact set X achieves a maximum and a minimum.
Exercise 3.4
(a) Show that if (X , B,C ) is rationalizable, it satisfies WARP.
(b) Does IIA imply WARP?
(c) Can the restriction on B in Proposition 3.2(b) be omitted? I.e., does IIA imply rationalizability?
(d) Does WARP imply rationalizability?
Exercise 3.5 Let X = {1, 2, . . . , n} for some n ∈ N, n ≥ 3, and let B consist of all nonempty subsets of X . For each of
the following choice rules C , prove whether the choice structure (X , B,C ) satisfies WARP and/or IIA. If possible,
construct a weak order % rationalizing it.
22
(a) S ATISFICING (S IMON , 1955): A function v : X → R assigns to each alternative x ∈ X a value v(x) ∈ R. Those
with a value at/above a given threshold r ∈ R are deemed ‘satisfactory’. For each B ∈ B, the choice C (B ) is
defined as follows: go through the elements of B in increasing order and choose the first satisfactory one. If
no such element exists, choose the final (i.e., largest) element of B .
(b) M ADLY IN LOVE : Assume your partner has a weak order % on X in which no two distinct elements are
equivalent. For each choice set B ∈ B with two/more elements, you politely abstain from choosing your
partner’s favorite: C (B ) = {x ∈ B | ∃y ∈ B : y  x}.
Exercise 3.6 A TASTE FOR PRECIOUS METALS : A consumer faces two luxury goods, the first is gold, the second
platinum, and spends the entire wealth on the good with the highest price. If prices are equal, half of the wealth is
spent on each good. To investigate the rationality of such behavior, consider a choice structure (X , B,C ), where
X = R2+ , the commodity space, and B consists of two choice sets: B 1 = B ((2, 1), 2), the budget set at prices p = (2, 1)
and wealth w = 2, and B 2 = B ((1, 2), 2).
(a) Draw the choice sets B 1 and B 2 in the same figure. Given the assumptions above, find C (B 1 ) and C (B 2 ) and
also draw these in your figure.
(b) Does the choice structure (X , B,C ) satisfy IIA?
(c) Does the choice structure (X , B,C ) satisfy WARP?
(d) Is the choice structure (X , B,C ) rationalizable?
(e) Give an example of a utility function depending both on the commodity bundle x and the price vector p —
denoted u(x, p) — that makes the consumer’s behavior utility maximizing for every (p, w) ∈ R3++ .
23
4 Choices of a consumer: classical demand theory
4.1 The preference/utility maximization problem
Section 3.1 set the stage for the classical model of consumer behavior. This model consists of a
specification of: (i) what the consumer wants: a preference relation or utility function; (ii) what the
consumer finds feasible: a budget set indicating the commodity bundles that he can choose from; (iii)
what the consumer — putting these two together — finds the most preferable commodity bundles.
Formally:
there are L ∈ N commodities that can be consumed in nonnegative quantities, so the commodity
space is X = RL+ ;
a price vector p ∈ RL++ assigns to each commodity i ∈ {1, . . . , L} a price p i > 0;
the consumer has a given income/‘wealth’ w > 0, i.e., an amount of money to spend on buying a
commodity bundle;
the consumer has a preference relation % on X or even a utility function u : X → R representing
these preferences.
Typically, no additional restrictions are imposed on consumption, so the budget set
B (p, w) = {x ∈ RL+ : p · x ≤ w}
specifies the commodity bundles the consumer can afford. At this stage, it would be a good idea to
look back at Section 3.1 to recapitulate some properties of this budget set. The consumer solves the
following preference maximization problem (%-MP):
%-MP: Find the set of most preferable commodity bundles according to % in the budget
set B (p, w).
Given utility function u, this yields the utility maximization problem (UMP):
It is common economic practice to assign special names to the set of solutions and — in case a utility
function is given — the corresponding optimal value of such optimization problems. The (Walrasian)
demand correspondence assigns to each price vector p ∈ RL++ and wealth w > 0 the associated set
x(p, w) of optimal commodity bundles:
is the utility of an arbitrary vector in the demand at (p, w). This is independent of the particular choice
of x ∗ ∈ x(p, w): since all such vectors are utility maximizers, their utility is the same.
24
Remark 1 If the utility function u is a C 1 -function (its partial derivatives exist and are continuous on
an open set containing X ), the UMP
max u(x)
s.t. p · x ≤ w,
x 1 ≥ 0,
..
.
x L ≥ 0,
is usually solved using the associated Kuhn-Tucker conditions. /
Remark 2 If the Walrasian demand correspondence is single-valued, i.e., if x(p, w) consists of a single
element for each (p, w) ∈ RL+1
++ , it is common to treat demand as a function, rather than a correspon-
dence. /
Let us conclude this subsection with an example involving a well-known type of utility function.
L EONTIEV UTILITY: Baking your favorite cake requires fixed proportions of its L ≥ 2 ingredients: one
unit of cake takes a vector (a 1 , . . . , a L ) ∈ RL++ of ingredients. Given ingredient vector x ∈ RL+ , how much
cake can you produce? Well, looking at the i -th ingredient, your guess will be at most x i /a i units. What
constrains you are those ingredients i where this fraction is the smallest. Therefore, a suitable utility
function would be
u(x) = min{x 1 /a 1 , . . . , x L /a L }, (12)
specifying how many units of cake you can make from x. Voorneveld (2014) characterizes preferences
that can be represented by a Leontief utility function, but uses some mathematical tools that are outside
the scope of this course. This utility function is not differentiable, so the Kuhn-Tucker conditions are
not applicable.
Exercise 4.1 Check that the associated preference relation is continuous, monotonic (but not strongly), convex
(but not strictly), and homothetic.
Let prices and wealth be (p, w) ∈ RL+1 ++ . Since preferences are continuous and the budget set B (p, w)
nonempty and compact, there is at least one solution to the UMP (see Section 3.1): x(p, w) 6= ;. Let’s
compute it. Firstly, if x ∗ solves the UMP, it must be that
x 1∗ /a 1 = · · · = x L∗ /a L . (13)
Why? Well, suppose this were not true: min{x 1∗ /a 1 , . . . , x L∗ /a L } < max{x 1∗ /a 1 , . . . , x L∗ /a L }. Then you’re
using the ingredients in the wrong proportions: you can only make u(x ∗ ) = min{x 1∗ /a 1 , . . . , x L∗ /a L } units
of cake, but there are commodities i where you have enough for x i∗ /a i = max{x 1∗ /a 1 , . . . , x L∗ /a L } units,
an utter waste. If you were to trade a small amount of these wasted ingredients for the non-wasted
ones, you would still be in your budget set, but able to make more cake. Hurray!
Secondly, preferences are monotonic, so you will use your entire budget on ingredients: p · x ∗ = w.
Combining this with (13) gives us that there is a unique solution to the UMP at (p, w), namely
à !
∗ a1 w aL w
x = PL , . . . , PL .
i =1 a i p i i =1 a i p i
25
instead of a single-valued demand correspondence:
(Ã !)
a1 w aL w
∀(p, w) ∈ RL+1
++ : x(p, w) = PL , . . . , PL .
i =1 a i p i i =1 a i p i
Substituting the demand vector in the utility function, we find the indirect utility function:
à !
L+1 a1 w aL w w
∀(p, w) ∈ R++ : v(p, w) = u PL , . . . , PL = PL .
a p
i =1 i i a p
i =1 i i i =1 a i p i
Exercise 4.2 Our definition of the budget set is standard, but other realistic restrictions can be modeled just as
easily. In the commodity space X = R2+ , let the price vector be p = (8, 4). The consumer has wealth w = 40 and an
upper semicontinuous weak order % on X . In each of the following cases separately, specify the budget set given
the additional information. Does the new budget set necessarily contain at least one most preferred bundle?
(a) I NDIVISIBILITIES : The commodities cannot be cut into ever smaller pieces. Only integer quantities are
feasible.
(b) R ATIONING : The consumer is not allowed to buy more than three units of the first commodity.
(c) R EBATES 1: If the consumer buys more than five units of the second commodity, these additional units in
excess of the first five have a lower price, namely two.
(d) R EBATES 2: If the consumer buys more than five units of the second commodity, the price of this commodity
(also the first five units) is decreased to two.
(e) I NITIAL ENDOWMENT: Instead of having wealth w, suppose the consumer has an initial endowment ω = (1, 1)
of one unit of both commodities. He can sell (parts of) his initial endowment to generate income to purchase
other commodity bundles.
(f) PACKAGE DEAL : The consumer has to buy the same quantity of both commodities.
(g) G IFT CERTIFICATE : The consumer has received a gift certificate of one monetary unit, which he can spend in
its entirety on commodity one.
Proposition 4.1
Let X = RL+ for some L ∈ N and let % be a weak order on X . The Walrasian demand correspondence
has the following properties:
(a) If % is upper semicontinuous, then x(p, w) is nonempty for all (p, w) ∈ RL+1
++ .
(b) If % is continuous, the Walrasian demand correspondence has a closed graph: for each
sequence (p n , w n , x n )n∈N in RL+1 L+1 n n n
++ × X with limit (p, w, x) ∈ R++ × X : if x ∈ x(p , w ) for
all n ∈ N, then also x ∈ x(p, w).
(d) If % is convex, or equivalently, if u is quasiconcave, then x(p, w) is a convex set for all
(p, w) ∈ RL+1
++ .
26
(f) Walras’ law: “All money is spent”: If % is locally nonsatiated, then p · x = w for all (p, w) ∈
RL+1
++ and x ∈ x(p, w).
To formulate properties of indirect utility, we will need to assume (Surprise!) that preferences are
represented by a utility function and that the demand correspondence is non-empty valued: otherwise,
indirect utility is undefined.
Proposition 4.2
Assume:
X = RL
+ for some L ∈ N;
The consumer’s preference relation % is represented by utility function u : X → R;
L+1
Walrasian demand is nonempty-valued: ∀(p, w) ∈ R++ , x(p, w) 6= ;.
Then the indirect utility function has the following properties:
(b) For each commodity i , v is nonincreasing in the price of i (higher prices cannot make you
better off).
27
(d) v is quasiconvex: ∀r ∈ R : {(p, w) ∈ RL+1
++ : v(p, w) ≤ r } is a convex set.
for all x ∈ B (p 00 , w 00 ).
Let x ∈ B (p 00 , w 00 ). Then x ∈ RL+ and α(p · x) + (1 − α)(p 0 · x) ≤ αw + (1 − α)w 0 . Therefore, p · x ≤ w or
p · x ≤ w 0 (or both). W.l.o.g., p · x ≤ w. Then x ∈ B (p, w), so u(x) ≤ v(p, w) ≤ r .
0
min p ·x
s.t. x ∈ RL+ ,
u(x) ≥ u.
The Hicksian or compensated demand correspondence assigns to each price vector p ∈ RL++ and each
utility level u the associated set h(p, u) of solutions to the EMP:
h(p, u) = {x ∈ RL+ : u(x) ≥ u and p · x ≤ p · y for all y ∈ RL+ with u(y) ≥ u}.
The Hicksian demand correspondence specifies the set of consumption bundles solving the EMP, the
expenditure function e(p, u) indicates its value:
28
Similar to our earlier approach to Walrasian demand and indirect utility, one can derive properties of
Hicksian demand and the expenditure function. To make the proposition at all sensible, one needs
to restrict attention to utility levels that are actually reachable; therefore, let U = {u(x) : x ∈ RL+ } be the
range of the utility function u.
Proposition 4.3
Let X = RL+ for some L ∈ N and let u : X → R represent a consumer’s weak order %. The Hicksian
demand correspondence has the following properties:
(a) If % is upper semicontinuous, then h(p, u) is nonempty for all (p, u) ∈ RL++ ×U .
(b) Homogeneity of degree zero in prices: ∀(p, u) ∈ RL++ ×U , ∀α > 0 : h(αp, u) = h(p, u).
(c) If % is convex, or equivalently, if utility is quasiconcave, then h(p, u) is a convex set for all
(p, u) ∈ RL++ ×U .
(d) If utility is continuous and % is strictly convex, or equivalently, if utility is strictly quasicon-
cave, then h(p, u) contains at most one element for all (p, u) ∈ RL++ ×U .
(e) “No excess utility”: If utility is continuous, then u(x) = u for all (p, u) ∈ RL++ × U with
u ≥ u(0, . . . , 0) and all x ∈ h(p, u).
(f) Compensated law of demand: let p 0 , p 00 ∈ RL++ and u ∈ U . If x 0 ∈ h(p 0 , u) and x 00 ∈ h(p 00 , u),
then (p 0 − p 00 ) · (x 0 − x 00 ) ≤ 0.
Do you see why we have the restriction u ≥ u(0, . . . , 0) in the “no excess utility” part? Well, suppose
that u < u(0, . . . , 0). Since p · x ≥ 0 for all x ∈ RL+ , it follows that h(p, u) = {(0, . . . , 0)}: expenditure is not
minimal at utility u, because the zero vector, with higher utility, is the cheapest option. Under suitable
monotonicity restrictions, however, this will turn out to be an exotic case: the zero vector will often give
you the lowest utility in RL+ , so that this footnote becomes irrelevant.
Proof: (a): Let (p, u) ∈ RL+ × U . By feasibility, u(y) = u for some y ∈ X . By upper semicontinuity of
preferences, the set {x ∈ X : u(x) ≥ u} = {x ∈ X : x % y} is closed. Therefore, the solution of the EMP lies
in the nonempty set {x ∈ RL+ : u(x) ≥ u} ∩ {x ∈ RL+ : p · x ≤ p · y}, which is the intersection of a closed and
a compact set and therefore compact. The goal function x 7→ p · x is continuous. A continuous function
on a nonempty, compact set achieves a minimum; see Section 3.1.
(b): Minimizing x 7→ (αp) · x gives the same solutions as minimizing x 7→ p · x.
(c): Let (p, u) ∈ RL++ ×U . If h(p, u) = ;, it is convex. If h(p, u) 6= ;, let y ∈ h(p, u). By definition,
29
(f ): Since x 0 is optimal and x 00 feasible in the EMP at (p 0 , u), it follows that
p 0 · x 0 ≤ p 0 · x 00 .
Similarly,
p 00 · x 00 ≤ p 00 · x 0 .
Adding these inequalities and rewriting gives the compensated law of demand.
If h is single-valued, we will treat it as a function, rather than a correspondence, just as we did for
Walrasian demand (see Remark 2). The compensated law of demand implies that if you raise the price
of one of the goods, then the Hicksian demand for this good will not increase.
The next proposition states some properties of the expenditure function. Given the similarity with
earlier results, proofs are left as an exercise.
Proposition 4.4
Assume:
X = RL
+ for some L ∈ N;
The consumer’s preference relation % is represented by utility function u : X → R;
Hicksian demand is nonempty-valued: ∀(p, u) ∈ RL++ ×U : h(p, u) 6= ;.
Then the expenditure function e : RL++ ×U → R has the following properties:
(a) Homogeneity of degree one in prices: ∀(p, u) ∈ RL++ ×U , ∀α > 0 : e(αp, u) = αe(p, u).
(b) Monotonicity in u: If utility is continuous, then for all p ∈ RL++ and all u 0 , u 00 ∈ U with
u(0, . . . , 0) ≤ u 0 < u 00 :
e(p, u 0 ) < e(p, u 00 ).
Remark 3 5 Establishing continuity properties for Hicksian demand and expenditure is less straight-
forward than for Walrasian demand and indirect utility. Concave functions are continuous, so Propo-
sition 4.4(d) implies that expenditure is continuous in prices. The utility function u : R+ → R with
u(x) = max{0, x − 1} shows that expenditure is not necessarily continuous in utility levels. Letting p > 0
be the price of the only commodity, one finds
½
0 if u = 0,
e(p, u) =
p(u + 1) if u > 0.
Since p > 0, e(p, ·) has a discontinuity at u = 0. However, if the utility function is both continuous and
locally nonsatiated, continuity of the expenditure function e : RL++ ×U → R can be established using a
result known as Berge’s Maximum Theorem. Contrary to what most textbooks (which do not provide the
proof) suggest, the proof is not straightforward. To establish continuity at an arbitrary (p 0 , u 0 ) ∈ RL++ ×U ,
local nonsatiation is used to establish existence of a y ∈ RL+ with u(y) > u 0 . Next, on a neighborhood
of (p 0 , u 0 ), the EMP reduces to minimizing p · x subject to x ∈ {z ∈ RL+ : u(z) ≥ u, p · z ≤ p · y}. This final
condition assures that the conditions of the Maximum Theorem are satisfied. /
5 Requires some knowledge of topology. Can be omitted.
30
Let us proceed with the example on Leontiev utility functions.
L EONTIEV UTILITY (C ONTINUED ): The Leontiev utility function in (12) has range U = R+ . In order
not to waste resources, a solution x ∗ to the EMP at (p, u) ∈ RL++ ×U must satisfy (13). Moreover, by
continuity, it satisfies u(x ∗ ) = u. Combining these two conditions gives us that there is a unique solution
to the EMP at (p, u), namely x ∗ = (a 1 u, . . . , a L u). Since the solution is unique, it is common to write
the result as a Hicksian demand function, rather than a correspondence: h(p, u) = (a 1 u, . . . , a L u) and
e(p, u) = p · (a 1 u, . . . , a L u) = u Li=1 a i p i .
P
The following result gives a relation between h(p, u) and e(p, u) in a particularly simple case.
Proposition 4.5
Assume the utility function u : RL+ → R is continuous and represents locally nonsatiated, strictly
convex preferences. Then for all p ∈ RL++ and all u > u(0, . . . , 0), Hicksian demand for each good
` = 1, . . . , L can be found by derivating the expenditure function with respect to the price p ` :
∂e(p, u)
∀` = 1, . . . , L : h ` (p, u) = . (14)
∂p `
Proof: We will not prove that the expenditure function is differentiable.6 The remainder of the proof
proceeds as follows. By strict convexity of preferences, Hicksian demand is single-valued, so we treat
h(·) as a function. Fix p ∈ RL++ and u ∈ U and let x = h(p, u) denote Hicksian demand at prices p and
utility level u. For every price vector p 0 ∈ RL++ ,
e(p 0 , u) = min p 0 · x 0 ≤ p 0 · x,
x 0 ∈RL+ ,u(x 0 )≥u
Assume the utility function u : RL+ → R is continuous and represents locally nonsatiated, strictly
convex preferences. Assume that the indirect utility function v(·) is differentiable at a point (p, w)
with p ∈ RL++ and w > 0. Then the Walrasian demand for each good ` = 1, . . . , L can be found as
follows:
∂v(p, w)/∂p `
∀` = 1, . . . , L : x ` (p, w) = − .
∂v(p, w)/∂w
Do this by showing that the function f : RL++ → R with f (p 0 ) = v(p 0 , p 0 · x), where x = x(p, w), achieves its minimum
at p 0 = p.
6 It follows from a duality result in convex analysis: for fixed u, e(·, u) is the support function of the strictly convex set
{x ∈ X : u(x) ≥ u}.
31
Proposition 4.6
Assume the utility function u : RL+ → R is continuous and represents locally nonsatiated prefer-
ences. Fix a price vector p ∈ RL++ . Then:
(a) If x ∗ is optimal in the UMP with wealth w > 0, then x ∗ is optimal in the EMP with utility
level u = u(x ∗ ). Moreover, the expenditure level in this EMP is exactly p · x ∗ = w :
(b) If x ∗ is optimal in the EMP with utility level u ∈ U , u > u(0, . . . , 0), then x ∗ is optimal in the
UMP with wealth w = p · x ∗ . Moreover, the indirect utility level in this UMP is exactly u :
Proof: (a): Let x ∗ ∈ x(p, w). By Walras’ law, p · x ∗ = w. Bundle x ∗ is feasible in the EMP with prices p
and utility level u(x ∗ ). Let x ∈ h(p, u(x ∗ )). By definition,
The first inequality means that x ∈ B (p, w). But then its utility cannot exceed that of the utility maxi-
mizing bundle x ∗ ∈ x(p, w). So u(x) = u(x ∗ ) and by Walras’ law:
e(p, u(x ∗ )) = p · x = p · x ∗ = w.
The first claim shows that x is feasible in the EMP at (p, u). But then the inequality in the second claim
cannot be strict: p · x = p · x ∗ . By Proposition 4.3(e),
Proof: (15): Let x ∗ ∈ x(p, w). By definition, v(p, w) = u(x ∗ ). By Proposition 4.6(a), e(p, v(p, w)) =
e(p, u(x ∗ )) = w.
(17): We first show that x(p, w) ⊆ h(p, v(p, w)). Let x ∈ x(p, w). Then u(x) = v(p, w), so x ∈ h(p, u(x)) =
h(p, v(p, w)) by Proposition 4.6(a). Secondly, we show that h(p, v(p, w)) ⊆ x(p, w). Let x ∈ h(p, v(p, w)).
By Proposition 4.6(b), x ∈ x(p, p ·x). Moreover, x ∈ h(p, v(p, w)) and (15) imply that p ·x = e(p, v(p, w)) =
w. Conclude that x ∈ x(p, p · x) = x(p, w).
(16), (18): Similar.
32
These results give convenient ways to find solutions to the UMP from those of the EMP and vice versa.
Let us illustrate this in our Leontiev example.
L EONTIEV UTILITY (C ONTINUED ): Recall that
à !
w a1 w aL w
v(p, w) = PL and x(p, w) = PL , . . . , PL .
i =1 a i p i i =1 a i p i i =1 a i p i
e(p,u) PL
By (16), expenditure solves u = v(p, e(p, u)) = PL , so e(p, u) = u i =1 a i p i , exactly (Good news,
a p
i =1 i i
isn’t it!) as we saw before. Hicksian demand can now be found in different ways. Firstly, using
Proposition 4.5:
∂e(p, u)
∀` = 1, . . . , L : h ` (p, u) = = a ` u,
∂p `
and, secondly, using (18): h(p, u) solves
à !
a 1 e(p, u) a L e(p, u)
h(p, u) = x(p, e(p, u)) = PL , . . . , PL = (a 1 u, . . . , a L u).
i =1 a i p i i =1 a i p i
Exercise 4.7 For Leontiev utility, use (15) and (17) to find Walrasian demand and indirect utility from the solutions
of the EMP.
Exercise 4.8 S LUTSKY EQUATION : The so-called Slutsky equation provides a relation between the sensitivity to
price changes of the Walrasian and Hicksian demand functions.
Assume the utility function u : RL+ → R is continuous and represents locally nonsatiated, strictly convex
preferences. We know that in this case there are unique solutions to the UMP and EMP: we can consider Walrasian
and Hicksian demand functions. If these functions are differentiable, the following holds. Fix (p, w) ∈ RL+1
++ and
utility level u = v(p, w) > u(0, . . . , 0).7 Then for all commodities k, ` ∈ {1, . . . , L}:
∂h ` (p, u) ∂x ` (p, w) ∂x ` (p, w)
= + x k (p, w). (19)
∂p k ∂p k ∂w
Prove (19) as follows: You know from (18) that h ` (p, u) = x ` (p, e(p, u)). Differentiate this equation w.r.t. p k , using
the Chain rule. Continue by substituting (14), (15), and (18).
Whereas the above describes the idea behind welfare analysis in its full generality and simplicity,
economic textbooks tend to restrict attention to changes only in prices and wealth. The initial vector
of prices and wealth is denoted (p 0 , w 0 ) ∈ RL+1
++ and the vector of prices and wealth after the change is
denoted (p 1 , w 1 ) ∈ RL+1
++ . This allows changes in prices only, keeping wealth constant (p 0 6= p 1 , w 0 = w 1 ),
changes in wealth only, keeping prices constant (p 0 = p 1 , w 0 6= w 1 ), or simultaneous changes in prices
and wealth (p 0 6= p 1 , w 0 6= w 1 ).
7 This inequality holds because the zero vector cannot solve the utility maximization problem: by local nonsatiation and strict
positivity of prices and wealth, there is an affordable bundle preferred to the zero vector.
33
Exercise 4.10 Let % be a locally nonsatiated weak order on RL+ . Consider a change from (p 0 , w 0 ) to (p 1 , w 1 ). Let
x 0 ∈ x(p 0 , w 0 ). Show that if p 1 · x 0 < w 1 , the consumer is strictly better off under (p 1 , w 1 ) than under (p 0 , w 0 ).
Assume that the consumer’s continuous, locally nonsatiated preference relation % can be represented
by a utility function. We can derive the consumer’s indirect utility function v and conclude that the
consumer is better off after the change if and only if v(p 1 , w 1 ) > v(p 0 , w 0 ).
However, since the indirect utility function depends on which utility function is chosen to represent
%, this does not tell us how much better off the consumer is. To express welfare changes unambigu-
ously in monetary units, one constructs a so-called money metric indirect utility function using the
expenditure function. Fix an arbitrary price vector p̄ ∈ RL++ . Consider the real-valued function e(p̄, ·).
By Proposition 4.4, this function is strictly increasing, so
can be used as a monetary measure of welfare change: if it is positive, the welfare of the consumer
increases as a consequence of the change from (p 0 , w 0 ) to (p 1 , w 1 ), if it is negative, the welfare of the
consumer has decreased. It remains to prove that this money metric does not depend on the choice of
utility function representing the consumer’s preferences. This follows from the fact that expenditure
can be expressed in a form independent of the utility function: for all (p, u) ∈ RL++ ×U , there is a y ∈ RL+
with u(y) = u, so
e(p, u) = min p · x = min p · x
s.t. x ∈ RL+ s.t. x ∈ RL+
u(x) ≥ u x%y
In (20), two natural choices for p̄ would be the initial vector of prices p 0 and the new vector of prices
p 1 . These choices give rise to two well-known measures of welfare change: equivalent variation (EV)
and compensating variation (CV). Let u 0 = v(p 0 , w 0 ) and u 1 = v(p 1 , w 1 ). Notice that e(p 0 , u 0 ) = w 0
and e(p 1 , u 1 ) = w 1 by local nonsatiation. We define
There is no obvious way to say that one of the measures is better than the other, although the equivalent
variation has an advantage when comparing alternative changes: suppose (p 0 , w 0 ) changes either to
(p 1 , w 1 ) or (p 2 , w 2 ). Both EV ((p 0 , w 0 ), (p 1 , w 1 )) and EV ((p 0 , w 0 ), (p 2 , w 2 )) are expressed in terms of
wealth at prices p 0 and can consequently be compared. However, CV ((p 0 , w 0 ), (p 1 , w 1 )) is expressed
in wealth at prices p 1 and CV ((p 0 , w 0 ), (p 2 , w 2 )) in wealth at prices p 2 , so they are incomparable.
L EONTIEV UTILITY (C ONTINUED ): The equivalent and compensating variation for Leontiev utility follow
immediately from the indirect utility function and expenditure function computed earlier:
w0 w1
u 0 = v(p 0 , w 0 ) = PL 0
and u 1 = v(p 1 , w 1 ) = PL 1
,
i =1 a i p i i =1 a i p i
so µ PL ¶
0
0 0 1 1 0 1 0 0 1 Pi =1 a i p i
EV ((p , w ), (p , w )) = e(p , u ) − e(p , u ) = w L − w 0,
a p1
i =1 µ i i
PL ¶
ai p 1
CV ((p 0 , w 0 ), (p 1 , w 1 )) = e(p 1 , u 1 ) − e(p 1 , u 0 ) = w 1 − w 0 PiL=1 i0 .
a p
i =1 i i
34
L UMP- SUM TAX : Given initial prices and wealth (p 0 , w 0 ), suppose that the government levies a lump-
sum tax T ∈ (0, w 0 ) on the consumer’s wealth, keeping prices unchanged. Then (p 1 , w 1 ) = (p 0 , w 0 − T ).
Hence e(p 0 , u 0 ) = e(p 1 , u 0 ) = w 0 and e(p 1 , u 1 ) = e(p 0 , u 1 ) = w 1 = w 0 − T , so EV ((p 0 , w 0 ), (p 1 , w 1 )) =
CV ((p 0 , w 0 ), (p 1 , w 1 )) = −T . This is intuitive: since the prices remain unchanged, the monetary mea-
sure of welfare change as a consequence of a decrease of T in the consumer’s wealth should equal
−T .
D EADWEIGHT LOSS : Let the preference relation % be a continuous, locally nonsatiated, strictly convex
weak order on RL+ . Fix a price vector p 0 ∈ RL++ and wealth w > 0. Suppose the government levies
a commodity tax t > 0 on the price of good `. Thus, the new price vector is p 1 = p 0 + t e ` , where
e ` = (0, . . . , 0, 1, 0, . . . , 0) is the `-th standard basis vector of RL with `-th coordinate 1 and all other
coordinates 0. The total tax revenue is T = t x ` (p 1 , w) and
where u 1 = v(p 1 , w) as before. Alternatively, to raise the same amount, the government can levy a
lump-sum tax T directly on the wealth of the consumer, keeping prices fixed, yielding an equivalent
variation −T .
The consumer is at least weakly better off under lump-sum taxation. Let x ∗ solve the UMP under
commodity taxation. Then x ∗ ∈ B (p 0 + t e ` , w), so p 0 · x ∗ + t x `∗ ≤ w, i.e., p 0 · x ∗ ≤ w − t x `∗ = w − T . So
x ∗ ∈ B (p 0 , w − T ), i.e., x ∗ is feasible in the UMP under lump-sum taxation: the consumer cannot be
worse off under lump-sum taxation than under commodity taxation.
Therefore, e(p 0 , u 1 ) − w ≤ −T . The difference
w − T − e(p 0 , u 1 ) ≥ 0
35
Similarly,
Z p `0
CV ((p 0 , w), (p 1 , w)) = 0
h ` (p ` , p −` , u 0 )d p ` . (22)
p `1
This means that the equivalent and compensating variation due to such a simple price change can be
represented by areas “to the left of” the Hicksian demand curve.
N ORMAL GOODS : Suppose good ` is a normal good (i.e., its Walrasian demand is weakly increasing in in-
come) and that its price is decreased: p `0 > p `1 . We claim that EV ((p 0 , w), (p 1 , w)) ≥ CV ((p 0 , w), (p 1 , w)).
To see this, write u 0 = v(p 0 , w) and u 1 = v(p 1 , w). Since v is nonincreasing in p ` , u 0 ≤ u 1 . Since e is
0
increasing in u, this implies that e(p ` , p −` , u 0 ) ≤ e(p ` , p −`
0
, u 1 ) for all p ` > 0. Since good ` is normal
and x ` (p, e(p, u)) = h ` (p, u), it follows that
0
h ` (p ` , p −` , u 0 ) = x ` (p ` , p −`
0 0
, e(p ` , p −` , u 0 )) ≤ x ` (p ` , p −`
0 0
, e(p ` , p −` , u 1 )) = h ` (p ` , p −`
0
, u1)
for all p ` > 0. Combining this with (21) and (22), it follows that
Z p `0 Z p `0
EV ((p 0 , w), (p 1 , w)) −CV ((p 0 , w), (p 1 , w)) = 0
h ` (p ` , p −` , u 1 )d p ` − 0
h ` (p ` , p −` , u 0 )d p `
p `1 p `1
Z p `0 £
0
, u 1 ) − h ` (p ` , p −`
0
, u0) d p`
¤
= h ` (p ` , p −`
p `1
≥ 0.
36
5 Choices of a producer: classical supply theory
5.1 Production sets
Having treated the demand side of the economy in detail, we now turn to the supply side. The supply
side consists of firms that use a technology to convert one set of commodities (inputs) to another
(outputs). Just as for consumers, it is assumed that firms take prices as given and that all commodities
are traded at the market at publicly quoted prices. Consider an economy with L ∈ N commodities.
The firm’s production can be described by a production vector or production plan y = (y 1 , . . . , y L ) ∈ RL
which gives the net amount produced of each of the L commodities. If y ` < 0, we say that good ` is used
as an input in the production plan y, if y ` > 0, we say that good ` is used as an output in y. For instance,
if L = 2, the production plan y = (−2, 6) indicates that two units of the first commodity are used as an
input to produce an output of 6 units of the second commodity. The production set of technologically
feasible production vectors is denoted by Y ⊂ RL . This general description allows that a commodity
is used as an input in some production vectors, but as an output in others. You may come across the
following special cases:
T RANSFORMATION FUNCTIONS : Sometimes the production set can conveniently be described using a
function F : RL → R called the transformation function as follows:
β
Consider, for instance, a Cobb-Douglas production function f : R2+ → R given by f (z) = z 1α z 2 , where
α, β > 0. Then
β
Y = {(−z 1 , −z 2 , q) ∈ R3 : q ≤ z 1α z 2 , and z 1 , z 2 ≥ 0}.
37
Nonincreasing returns to scale: if y ∈ Y and α ∈ [0, 1], then αy ∈ Y . This means that
feasible production plans can be scaled down.
Nondecreasing returns to scale: if y ∈ Y and α ≥ 1, then αy ∈ Y . This means that feasible
production plans can be scaled up.
Constant returns to scale (CRS): if y ∈ Y and α ≥ 0, then αy ∈ Y . This is the conjunction of
the previous two properties.
Additivity/free entry: if y, y 0 ∈ Y , then y + y 0 ∈ Y . If both y and y 0 are feasible, then it is
feasible to set up two independent plants, one producing y, the other y 0 , together yielding
y + y 0.
Y is convex: if y, y 0 ∈ Y and α ∈ [0, 1], then αy + (1 − α)y 0 ∈ Y .
Y is a convex cone: if y, y 0 ∈ Y and α, β ≥ 0, then αy + βy 0 ∈ Y .
One easily establishes relations between these properties. Possibility of inaction implies nonemptiness.
Nondecreasing and nonincreasing returns to scale imply constant returns to scale. Less trivial are:
Proposition 5.1
(b) Y is a convex cone if and only if Y is convex and has constant returns to scale.
(c) Y is a convex cone if and only if Y is additive and has nonincreasing returns to scale.
(d) If Y satisfies no free lunch and for all x, y ∈ Y and α ∈ (0, 1), there is a z ∈ Y with z ≥
αx + (1 − α)y, z 6= αx + (1 − α)y, then Y satisfies irreversibility.
show that Y is a convex cone, let y, y 0 ∈ Y and α, β ≥ 0. By CRS, 2αy ∈ Y and 2βy 0 ∈ Y . By convexity,
1 1
2 (2αy) + 2 (2βy ) = αy + βy ∈ Y .
0 0
(c): If Y is a convex cone, it is additive (take α = β = 1) and has nonincreasing returns to scale (similar to
the proof of CRS above). Conversely, assume that Y is additive and has nonincreasing returns to scale.
Let y, y 0 ∈ Y and α, β ≥ 0. By additivity, k y ∈ Y and k y 0 ∈ Y for all k ∈ N. Choose k ∈ N such that α/k ≤ 1
and β/k ≤ 1. Since Y has nonincreasing returns to scale, (α/k)y ∈ Y and (β/k)y 0 ∈ Y . By additivity:
αy = k(α/k)y ∈ Y and βy 0 ∈ Y . Again by additivity αy + βy 0 ∈ Y .
(d): Let y ∈ Y , y 6= 0, and suppose −y ∈ Y . By assumption, there is a z ∈ Y such that z ≥ 12 y + 12 (−y) =
0, z 6= 0, contradicting no free lunch.
Properties of the production set are related to properties of the production function:
Proposition 5.2
(a) Y has constant returns to scale if and only if f is homogeneous of degree one.
38
(b) Y is convex if and only if f is concave.
Proof: (a): First, assume that Y satisfies constant returns to scale. We show that for each z ∈ RL−1
+ and
α > 0: α f (z) ≤ f (αz). So let z ∈ RL−1
+ and α > 0.
• By definition of the production set Y , (−αz, α f (z)) ∈ Y means that α f (z) ≤ f (αz).
Using the above, it follows that if Y has CRS, then for each z ∈ RL−1 + and each α > 0 : α f (z) = f (αz). So
f is homogeneous of degree one.
Conversely, assume that f is homogeneous of degree one: for each z ∈ RL−1 + and each α > 0 : α f (z) =
f (αz). To show: Y has CRS, i.e, if (−z, q) ∈ Y and α ≥ 0, then (−αz, αq) ∈ Y . This follows from the
assumption that f (0, . . . , 0) = 0 if α = 0. So let (−z, q) ∈ Y and α > 0.
is convex. Multiplying the first L − 1 coordinates with −1 maintains convexity, so this is equivalent with
being convex.
max p·y
s.t. y ∈ Y.
39
The profit function π assigns to every price vector p ∈ RL++ the maximal profit
π(p) = max{p · y : y ∈ Y }.
The supply correspondence y(·) assigns to every price vector p ∈ RL++ the set of profit-maximizing
production vectors:
y(p) = {y ∈ Y : p · y = π(p)}.
As opposed to the utility maximization problem, which has a solution under mild conditions (like
continuity of the utility function), there may not be a solution to the PMP: profits may be unbounded.
In that case, we set π(p) = +∞. Indeed, we may have the following:
Proposition 5.3
Let Y ⊂ RL be nonempty and satisfy nondecreasing returns to scale. For each price vector p ∈ RL++ ,
either p · y ≤ 0 for all y ∈ Y , which means that no positive profit can be made, or π(p) = +∞.
Proof: Consider a price vector p ∈ RL++ . Suppose that p · y > 0 for some y ∈ Y . Since Y has nondecreas-
ing returns to scale, αy ∈ Y for all α ≥ 1, so p · (αy) = α(p · y) can be made arbitrarily large by letting α
go to infinity.
This makes the existence of solutions to the PMP a nontrivial issue. The following two results provide
sufficient conditions.
Proposition 5.4
closed,
bounded above: there is an r ∈ R such that y ` ≤ r for all y ∈ Y and all ` ∈ {1, . . . , L}.
Then the profit maximization problem has at least one solution for each price vector p ∈ RL++ .
Proof: Let p ∈ RL++ . By nonemptiness, there is a y 0 ∈ Y . A solution to the PMP must lie in the set
P = Y ∩ {y ∈ RL : p · y ≥ p · y 0 }.
P is closed: Y is closed by assumption and the second set in the intersection is closed, since it is the
upper contour set of a continuous function. The intersection of two closed sets is closed.
P is bounded: By assumption, the coordinates of vectors in P are bounded above by r . Moreover, all
coordinates are bounded from below as well: let y ∈ P and consider an arbitrary coordinate ` ∈ {1, . . . , L}.
Since p · y ≥ p · y 0 , it follows that
p` y` ≥ p · y 0 − pk yk ≥ p · y 0 −
X X
p k r,
k6=` k6=`
Hence, P is compact. Since we maximize a continuous profit function over a compact set Y , there
is at least one solution.
The following result establishes existence of solutions to the profit maximization problem under
resource constraints.
40
Proposition 5.5
Exercise 5.1 This exercise guides you through the proof of Proposition 5.5.
To show that Y 0 = Y ∩ {y ∈ RL : y ≥ −ω} is bounded, suppose it were not: there is a sequence (y n )n∈N of vectors in
Y 0 whose increasing length ky n k diverges to infinity. Define z n = y n /ky n k.
(b) Show that for n ∈ N large enough, z n lies in Y and satisfies z n + ω/ky n k ≥ 0.
(c) Show that (z n )n∈N has a convergent subsequence with limit z 6= 0 in Y .
(d) Combine this with (b) to derive a contradiction.
As Y 0 is nonempty and compact and the profit function is continuous, a maximum exists!
Thus, whenever we talk about properties of the profit function and the supply correspondence, we
implicitly assume that the PMP has a solution, so that y(p) 6= ; and π(p) < ∞.
Proposition 5.6
(d) Hotelling’s lemma: Let p ∈ RL++ . If y(p) consists of a single point y, then the profit function
is differentiable at p and ∂π(p)/∂p ` = y ` for all goods ` = 1, . . . , L.
(e) Law of supply: for all p, p 0 ∈ RL++ and all y ∈ y(p) and y 0 ∈ y(p 0 ):
(p − p 0 ) · (y − y 0 ) ≥ 0.
41
Second proof: we show that for all p 1 , p 2 ∈ RL++ and all α ∈ [0, 1] : π(αp 1 + (1 − α)p 2 ) ≤ απ(p 1 ) + (1 −
α)π(p 2 ). So let p 1 , p 2 ∈ RL++ and α ∈ [0, 1]. Let y ∈ y(αp 1 +(1−α)p 2 ). Then p i · y ≤ π(p i ) for both i = 1, 2,
so
π(αp 1 + (1 − α)p 2 ) = αp 1 · y + (1 − α)p 2 · y ≤ απ(p 1 ) + (1 − α)π(p 2 ).
(c): Let p ∈ RL++ . Then y(p) = Y ∩ {y ∈ RL : p · y = π(p)} is the intersection of Y and a hyperplane. Since
both are convex, so is y(p).
(d): We prove Hotelling’s lemma, assuming that π is differentiable at p. By definition of the profit
function we know that for all p 0 ∈ RL++ : p 0 · y ≤ π(p 0 ), with equality if p 0 = p. So the function h : RL++ → R
with h(p 0 ) = π(p 0 ) − p 0 · y achieves its minimum at p. But then its partial derivatives at p must be zero:
∀` = 1, . . . , L : ∂h(p)/∂p ` = ∂π(p)/∂p ` − y ` = 0,
Y = {y ∈ RL : F (y) ≤ 0},
where F is continuously differentiable and the price vector is p ∈ RL++ , a necessary first order condition
for y ∗ ∈ Y to be a solution to the PMP
max p · y
s.t. F (y) ≤ 0
is that there exists a Lagrange multiplier λ ≥ 0 such that for each good ` = 1, . . . , L :
∂F (y ∗ )
p` = λ . (23)
∂y `
If we divide the first order condition for good ` with that for good k, we find that for all pairs of goods
`, k :
p ` ∂F (y ∗ )/∂y `
= ,
p k ∂F (y ∗ )/∂y k
i.e., in an optimal production plan y ∗ , the price ratio between two goods equals its so-called marginal
rate of transformation. If the set Y is convex, the first order conditions in (23) are also sufficient for a
solution to the PMP.
In the single-output case, assume the production function f is differentiable and that the price of
input ` = 1, . . . , L − 1 equals w ` > 0 and the price of the output equals p > 0.
Remark 1 I don’t know the reason for this sudden change of notation from a price vector p to an
output-input price vector (p, w). Do not confuse the vector of input prices w with the wealth level w of
the consumer. This choice of notation is unfortunate, but widespread in economics. /
42
The PMP can be rewritten as
max p f (z) − w · z
s.t. z ∈ RL−1
+ .
If z ∗ is optimal, the Kuhn-Tucker conditions imply the existence of Lagrange multipliers λ` ≤ 0 for each
of the conditions z ` ≥ 0 such that for all inputs ` = 1, . . . , L − 1 :
∂ f (z ∗ )
p − w ` = λ` and λ` z `∗ = 0. (24)
∂z `
Assuming an interior solution (z `∗ > 0 for all `), this implies that λ` = 0 for all `, so the first order
conditions become
∂ f (z ∗ )
∀` = 1, . . . , L − 1 : p = w`,
∂z `
so that for all inputs `, k :
w ` ∂ f (z ∗ )/∂z `
= , (25)
w k ∂ f (z ∗ )/∂z k
which has the interpretation that the price ratio between two goods has to equal their so-called marginal
rate of technical substitution. Again, if the set Y is convex, the first order conditions in (24) are also
sufficient for a solution to the PMP.
min w ·z
s.t. z ∈ RL−1
+ ,
f (z) ≥ q.
The conditional factor demand correspondence specifies the set of input vectors solving the CMP, the
cost function c(w, q) indicates its value:
min p ·x
s.t. x ∈ RL+
u(x) ≥ u
are identical, up to a relabeling of the involved functions. Therefore, rewriting Propositions 4.3, 4.4, and
4.5 provides a long list of properties for conditional factor demand and the cost function.
43
If the production function f is continuously differentiable, the Kuhn-Tucker conditions can be used
to show that at a solution z ∗ of the CMP, there must be a Lagrange multiplier λ ≥ 0 associated with the
condition q − f (z) ≤ 0 and Lagrange multipliers λ` ≥ 0 associated with the conditions −z ` ≤ 0 such that
for all ` = 1, . . . , L − 1 :
∂ f (z ∗ )
w` = λ + λ` and λ` z `∗ = 0.
∂z `
If the solution uses positive amounts of all inputs (z `∗ > 0 for all `), this implies that λ` = 0 for all `, so
∂ f (z ∗ )
w` = λ
∂z `
for all ` and consequently
w ` ∂ f (z ∗ )/∂z `
= ,
w k ∂ f (z ∗ )/∂z k
as in (25)!
The set of solutions is commonly denoted as y(p, w) and the maximal profit as π(p, w). In a solution
(z, q), positivity of the output price (p > 0) implies that q = f (z), otherwise the profit can be increased:
pq − w · z < p f (z) − w · z.
Consequently, the PMP simplifies to
max p f (z) − w · z.
z∈RL−1
+
Moreover, production has to be as cheap as possible, so there is a link with the CMP:
Proposition 5.7
(c) If one of the problems (P1) and (P2) has a solution, so does the other and the corresponding
maximum values coincide:
44
Exercise 5.2 Prove Proposition 5.7.
The PMP as formulated in (P2) is particularly easy: given the cost function, the PMP reduces to a
single-variable maximization problem. In practice, this is often the easiest way to solve the PMP. Under
suitable differentiability assumptions, the necessary Kuhn-Tucker condition at an optimum q ∗ is that
there exists a Lagrange multiplier λ ≥ 0 associated with the condition −q ≤ 0 such that:
∂c(w, q ∗ )
p− = −λ and − λq ∗ = 0.
∂q
Assuming q ∗ > 0, this means that λ = 0 and hence that price equals marginal costs at a profit maximizing
quantity. If the cost function is convex in q, this condition is also sufficient.
E XAMPLE : SOME CALCULATIONS IN A SINGLE - OUTPUT ECONOMY: Consider a technology using a single
p
input to produce a single output via the production function f : R+ → R with f (z) = z for all z ≥ 0.
The production set is
p
Y = {(−z, q) ∈ R2 : q ≤ f (z), z ≥ 0} = {y ∈ R2 : y 1 ≤ 0, y 2 ≤ −y 1 }.
Assume that the input price is w > 0 and the output price is p > 0. The profit maximization problem
(P1) becomes p
max p z − w z.
z≥0
At z = 0, the profit is zero. At an interior solution z ∗ > 0, the following first order condition must be
satisfied:
p
p − w = 0,
2 z∗
¡ p ¢2 p p p2 p2 p2
so z ∗ = 2w , yielding output z ∗ = 2w and profit 2w − 4w = 4w > 0. Conclude that the supply function
is
p ´2 p
µ ³ ¶
y(p, w) = − , ∈Y (26)
2w 2w
p2
and the profit function π(p, w) = 4w . The cost minimization problem for production level q is
min wz
s.t. z ≥ 0,
p
z ≥ q.
p
At an optimum z ∗ , it is clear that z ∗ = q: no inputs are wasted. Hence the conditional factor de-
mand is z ∗ = z(w, q) = q 2 and the cost function is c(w, q) = w q 2 . This allows us to rewrite the profit
maximization problem as in (P2):
p
Solving this optimization problem yields an optimal output quantity q ∗ = 2w as in (26).
45
5.7 Efficiency
A production plan y ∈ Y is efficient if there is no y 0 ∈ Y with y 0 ≥ y and y 0 6= y. In words, there is no
different production plan producing at least as much output while using at most as much input. There
is a close connection between profit maximization and efficiency:
Proposition 5.8
(b) If Y is convex, then for every efficient y ∗ ∈ Y there is a nonzero price vector p ∈ RL+ such
that y ∗ is profit maximizing at prices p.
Proof: (a): Suppose y is not efficient: there is a y 0 ∈ Y with y 0 ≥ y, y 0 6= y. Then p · y 0 > p · y: the profit
from y 0 exceeds that from the profit-maximizing y, a contradiction.
(b): Let Z = {y 0 ∈ RL : y 0 > y ∗ }. Since y ∗ is efficient: Z ∩ Y = ;. By the separating hyperplane theorem,
there is a vector p ∈ RL , p 6= 0 such that p · y 0 ≥ p · y for all y 0 ∈ Z and y ∈ Y . Two things remain to be
shown:
Firstly, that p ∈ RL+ . Suppose, to the contrary, that p ` < 0 for some coordinate `. Then p · y 0 < p · y ∗
for some y 0 ∈ Z with y `0 − y `∗ > 0 sufficiently large. A contradiction.
Secondly, that y ∗ is profit maximizing at prices p. Let y ∈ Y . To show: p · y ∗ ≥ p · y. For each n ∈ N,
define the vector y n = (y 1∗ + 1/n, . . . , y L∗ + 1/n) ∈ Z . Then p · y n ≥ p · y. Since y n → y ∗ , it follows that also
in the limit p · y ∗ ≥ p · y.
Exercise 5.3 This exercise investigates the need for the different assumptions in Proposition 5.8.
(a) Give an example of a production set Y ⊂ R2 , a point y ∈ Y and price vector p ∈ R2+ , p 6= (0, 0), such that y
maximizes profits at prices p, but y is not efficient.
(b) Give an example of a convex production set Y ⊂ R2 and a point y ∈ Y which is efficient but not profit
maximizing for any p ∈ R2++ .
(c) Give an example of a production set Y ⊂ R2 which is not convex and a point y ∈ Y which is efficient, but not
profit maximizing for any nonzero price vector p ∈ R2+ .
46
6 General equilibrium
6.1 What is an equilibrium?
Earlier, we studied how consumers choose optimal consumption bundles given their preferences,
wealth, and the price vector and how firms choose optimal production plans given their technology
and the price vector. Are there price vectors where all these optimal choices are actually feasible?
You don’t, for instance, want people demanding ten apples if there only are five. Such a price vector
and the corresponding demand and supply constitute a Walrasian equilibrium. Its definition follows
the central idea behind any economic equilibrium concept with decent micro-foundations — it is a
description of:
something feasible, where
each involved agent — taking as given those things beyond his control — makes a choice that
makes him as happy as possible.
Notice, in particular, that it involves no statements like “markets clear” or “supply equals demand”.
Economic agents — quite frankly — couldn’t care less: they have their preferences, some constraints,
and all they wish for is to choose optimally. Nevertheless, some people become very nervous when one
doesn’t assume that markets clear (“excess demand equal to zero”) in equilibrium. I want to take this
concern seriously, so let me briefly explain this.
Market clearing is an assumption about aggregate behavior that is not in line with the microeco-
nomic idea behind equilibrium that combines feasibility with optimal behavior of individual
agents; Kreps (1990, p. 6), for instance, states:
Sometimes, it is downright silly to insist on market clearing. Suppose agents in an economy are
endowed with a positive quantity of a commodity that is undesirable and of no use whatsoever
as an input. Why would you insist on supply and demand for this commodity being equal? What
are you going to do? Stuff the good down people’s throat?
Or what if agents only want to consume gloves in matching pairs? If there happen to be more left-
than right-hand gloves, simply leave excess gloves to gather dust somewhere.
Consequently, market clearing is often not a part of the definition of equilibrium. See, for instance,
Arrow and Hahn (1971, p. 107), Kreps (1990, p. 190), Mas-Colell (1985, p. 169), and Varian (1992,
p. 316).
Market clearing in equilibrium, however, turns out to be a consequence of commonly imposed
restrictions. You may find Exercise 6.2 helpful.
To illustrate the main ideas behind general equilibrium analysis, we start by studying a pure exchange
economy where there is no production, but where consumers are initially endowed with certain
amounts of the different goods. This entails no real loss of generality: our main tool will be to study
excess demand, regardless of whether it involves producers or not. Walrasian equilibrium is defined
and shown to exist in a particularly simple case. Also, we study some of its welfare properties. After
introducing producers into the model, a more general existence result is provided in Section 6.4.
47
H is a nonempty, finite set of consumers/households,
and each consumer h ∈ H has
a weak order %h over RL+ , where L ∈ N,
an initial endowment ωh ∈ RL+ of the L commodities.
The total endowment is denoted ω = h∈H ωh . An allocation x = (x h )h∈H assigns to each consumer
P
feasible if h∈H x h ≤ ω,
P
nonwasteful if h∈H x h = ω.
P
If the price vector is p, the initial endowment of consumer h ∈ H is worth p · ωh , so consumer h can
afford bundles x ∈ RL+ with p · x ≤ p · ωh , i.e., consumer h’s budget set is B h (p, p · ωh ). Let x h (·) denote
this consumer’s demand correspondence.
The basic idea behind equilibria (feasibility and optimal choices) leads to the following definition.
A Walrasian equilibrium of a pure exchange economy E = (%h , ωh )h∈H is a pair (p, x), where:
p ∈ RL+ , p 6= (0, . . . , 0), is a price vector,
x = (x h )h∈H is a feasible allocation,
for each consumer h ∈ H , x h is a most preferred bundle at prices p, i.e., x h ∈ x h (p, p · ωh ).
Properties of Walrasian equilibrium are often studied using the excess demand correspondence z
assigning to each price vector p the difference between total demand for and the total availability of
the commodities: X³ h ´ X
z(p) = x (p, p · ωh ) − {ωh } = x h (p, p · ωh ) − {ω}.
h∈H h∈H
By definition of Walrasian equilibrium, p is an equilibrium price vector if and only if there is a cor-
responding excess demand vector z ∈ z(p) where no commodity has positive excess demand, i.e., a
z ∈ z(p) ∩ RL− .
Budget sets are homogeneous of degree zero in prices:
Therefore, if p ∗ is an equilibrium price vector, then so is αp ∗ for all α > 0. In the computation of
Walrasian equilibria, this allows some simplifications, for instance by assuming that the equilibrium
price of one of the goods is equal to one, or that the sum of the prices is equal to one, i.e., they lie in
the unit simplex ∆ = {p ∈ RL+ : L`=1 p ` = 1} (also denoted ∆L if we want to stress the dimension of the
P
vectors).
To illustrate the idea behind existence proofs of Walrasian equilibria, the next result makes a lot of
simplifications.
Proposition 6.1
Assume that excess demand z:
is a well-defined function (rather than a correspondence) z : ∆ → RL ,
is continuous,
satisfies Walras’ Law: p · z(p) = 0 for all p ∈ ∆.
Then there is a price vector p ∈ ∆ with z(p) ≤ (0, . . . , 0).
48
Proof: The idea is to change prices by making goods in excess demand relatively more expensive and
hope that demand for them goes down. If there are no more changes, there is no excess demand, and
we found an equilibrium price vector. Define f : ∆ → ∆ by
à !
p i + max{z i (p), 0}
f (p) = .
1 + Lj=1 max{z j (p), 0}
P
i =1,...,L
Function f increases the price of commodities for which excess demand is positive and then rescales
the resulting price vector so that its coordinates add up to one. As the composition of continuous
functions, f is continuous. By Brouwer’s fixed point theorem, there is a p ∈ ∆ with f (p) = p. We show
that z(p) ≤ (0, . . . , 0). By Walras’ Law:
Therefore,
L
X
max{z i (p), 0}z i (p) = 0. (27)
i =1
Notice: ½
0 if z i (p) ≤ 0,
max{z i (p), 0}z i (p) =
z i (p)2 > 0 if z i (p) > 0.
So (27) is the sum of nonnegative terms. The only way in which it can be zero, is if all its terms are zero,
i.e., if z i (p) ≤ 0 for all i , as we had to show.
The price vector p ∈ ∆ with z(p) ≤ (0, . . . , 0) together with the allocation x = (x h (p, p · ωh ))h∈H is a
Walrasian equilibrium. Using z(p) ≤ (0, . . . , 0) and Walras’ Law (p · z(p) = 0), it follows that excess
demand is zero for commodities i with p i > 0: a good can be in excess supply in equilibrium, but only
if its price equals zero. The desired properties of excess demand are usually derived from conditions on
consumer preferences, using Proposition 4.1.
Pareto dominated if there is another feasible allocation x̂ with x̂ h %h x h for all h ∈ H and
x̂ h Âh x h for some h ∈ H , i.e., if all consumers are at least as well off in x̂ as in x and at least
one of them is strictly better off.
Pareto optimal if it is not Pareto dominated.
Call a nonempty collection S ⊆ H of consumers a coalition. Coalition S can improve upon a feasible
allocation x if there are commodity bundles x̂ h for all h ∈ S such that
these bundles simply redistribute initial endowments: h∈S x̂ h = h∈S ωh ,
P P
49
Proposition 6.2
If (p, x) is a Walrasian equilibrium of E , then x lies in the core.
Proof: Suppose coalition S ⊆ H can improve upon x via commodity bundles (x̂ h )h∈S . Then x̂ h Âh x h
for each h ∈ S. By definition, x h is a most preferred bundle at prices p, so x̂ h cannot lie in the budget set
B h (p, p · ωh ), i.e., p · x̂ h > p · ωh . Summing over all h ∈ S gives p · h∈S x̂ h > p · h∈S ωh . This contradicts
P P
Proof: Suppose x is Pareto dominated by feasible allocation x̂: x̂ h %h x h for all h ∈ H , x̂ k Âk x k for some
k ∈ H . As x k is most preferred in k’s budget set, x̂ k isn’t feasible: p · x̂ k > p · ωk . By local nonsatiation,
p · x̂ h ≥ p · ωh for all h ∈ H . (Otherwise, if p · x̂ h < p · ωh for some h, local nonsatiation implies that
h’s budget set contains a strictly better bundle x̃ h . But x̃ h Âh x̂ h %h x h contradicts that x h is h’s most
preferred bundle.) So p · h∈H x̂ h > p · h∈H ωh , contradicting feasibility of x̂: h∈H x̂ h ≤ h∈H ωh .
P P P P
As a partial converse to the previous result, some additional assumptions guarantee that anything that
is Pareto optimal can be sustained as a Walrasian equilibrium allocation — at least if initial endowments
can somehow be redistributed.
Proof: By assumption, the resulting pure exchange economy has a Walrasian equilibrium (p̂, x̂). For
each h ∈ H , x̂ h is optimal and x h is feasible in the budget set B h (p̂, p̂ · x h ), so x̂ h % x h . By Pareto
optimality of x, none of these preferences can be strict, so x̂ h ∼h x h for all h ∈ H . To see that x̂ h = x h
for all h ∈ H , suppose there is an h ∈ H with x̂ h 6= x h . Consumer h can afford (x̂ h + x h )/2. By strict
convexity of preferences, this bundle is strictly preferred to x̂ h , contradicting that x̂ h is an optimal
bundle for the consumer in the Walrasian equilibrium.
where:
50
H is a nonempty, finite set of consumers/households, F a nonempty, finite set of firms,
each firm f ∈ F has a production set Y f ⊂ RL , where L ∈ N,
and each consumer h ∈ H has
a weak order %h over RL+ ,
an initial endowment ωh ∈ RL+ of the L commodities,
a claim to a share θ h f ∈ [0, 1] of the profit of firm f ∈ F (where θ h f = 1 for all f ∈ F ).
P
h∈H
An allocation (x, y) = ((x h )h∈H , (y f ) f ∈F ) assigns to each consumer h ∈ H a commodity bundle x h ∈ RL+
and to each firm f ∈ F a production plan y f ∈ Y f . Allocation (x, y) is feasible if
X h X h X f
x ≤ ω + y .
h∈H h∈H f ∈F
If the price vector is p and firms decide on production plans (y f ) f ∈F , consumer h ∈ H has budget set
n ³ X h f f ´o
x ∈ RL+ : p · x ≤ p · ωh + θ y ,
f ∈F
because the initial endowment is worth p · ωh and h receives share θ h f of the profit p · y f of firm f ∈ F .
Let x h (·) denote the demand correspondence of consumer h ∈ H , y f (·) the supply correspondence
of firm f ∈ F , and π f (·) its profit function. The basic idea behind equilibria (feasibility and optimal
choices) leads to the following definition. A Walrasian equilibrium of a private ownership economy E
is a triple (p, x, y), where
p ∈ RL+ , p 6= (0, . . . , 0), is a price vector,
(x, y) = ((x h )h∈H , (y f ) f ∈F ) is a feasible allocation,
for each consumer h ∈ H , x h is a most preferred bundle at prices p:
³ ³ X h f f ´´
x h ∈ x h p, p · ωh + θ y ,
f ∈F
and the goal is to find a price vector p where z(p) ∩ RL− 6= ;. The following result (Debreu, 1959, Section
5.6) establishes existence of such a price vector in the unit simplex ∆.
Proposition 6.5
Assume that excess demand z:
achieves values in some convex, compact set Z ⊂ RL : z(p) ⊆ Z for all p ∈ ∆,
51
Then there is a price vector p ∈ ∆ with z(p) ∩ RL− 6= ;.
Proof: Once again, the idea is to make goods with large excess demand expensive in the hope of
decreasing it. This is achieved by maximizing, for a given excess demand vector z, the expression p · z,
which requires putting all weight of p ∈ ∆ on the largest coordinate(s) of z. Define the correspondence
from Z to ∆ by
µ(z) = {p ∈ ∆ : p · z = max p 0 · z}.
p 0 ∈∆
is nonempty-valued, convex-valued, and has a closed graph because µ and z have these properties.
By Kakutani’s fixed point theorem, there is a (p, z) ∈ ∆ × Z with (p, z) ∈ ϕ(p, z) = µ(z) × z(p). As
z ∈ z(p), the weak Walras’ Law implies that p · z ≤ 0. As p ∈ µ(z), p · z ≥ p 0 · z for all p 0 ∈ ∆. For each
` ∈ {1, . . . , L}, taking p 0 = e ` ∈ ∆ gives that z ` = p 0 · z ≤ p · z ≤ 0, so z ≤ (0, . . . , 0).
The trick, of course, is to derive the desired properties of the excess demand correspondence by
imposing properties on the components of the private ownership economy E . Given the results of
Sections 4 and 5, most of them should not come as a surprise. Only the first is somewhat complicated:
what allows us to restrict attention to such a convex, compact set Z ? Convexity of Z is not the issue: if
you can find a compact set containing all the images z(p), they also lie in a sufficiently large (convex) ball.
Without going into details, compactness of Z is established by realizing that the relevant production
plans, by feasibility, must satisfy f ∈F y f + ω ≥ (0, . . . , 0). Following the lines of Proposition 5.5, this set
P
6.5 Exercises
Exercise 6.1
(a) What is wrong with the following argument: “Proposition 6.2 implies Proposition 6.3: if x lies in the core,
the coalition S = H of all consumers cannot improve upon it. So x is Pareto optimal.”
(b) Give an example of a pure exchange economy E and a Walrasian equilibrium (p, x) such that x lies in the
core, but is not Pareto optimal.
Exercise 6.2 M ARKET CLEARING : Consider a (pure exchange/private ownership) economy E where, in equilibrium,
Walras’ Law holds: p · z = 0 for all price vectors p 6= (0, . . . , 0) and all z ∈ z(p) ∩ RL− . Prove:
(b) If prices are positive and L − 1 markets “clear”, then so does the final one:
52
Markets clear in most standard applications:
(c) Consider an equilibrium. Suppose (c1) or (c2) is true for at least one consumer h ∈ H :
(a) Why do you think Pareto dominance is defined in terms of consumer preferences, ignoring those of
producers?
(b) Prove the F IRST FUNDAMENTAL WELFARE THEOREM : If (p, x, y) is a Walrasian equilibrium of E and consumers
have locally nonsatiated preferences, then (x, y) is Pareto optimal.
Exercise 6.4 Restricting attention to prices in the unit simplex ∆ (to avoid trivialities), give an example of a pure
exchange economy E with two consumers, two commodities, and
(a) no Walrasian equilibrium.
(b) exactly one Walrasian equilibrium.
(c) exactly two Walrasian equilibria.
(d) infinitely many Walrasian equilibria.
Answer the same question for a private ownership economy by adding 714 producers (yes, seven hundred and
fourteen. . . You don’t seriously believe I’d ask this if the answer weren’t trivial, do you?).
Exercise 6.5 K ING S OLOMON ’ S PROBLEM : In a well-known parable, king Solomon settles a dispute between two
women, each claiming that a certain baby is hers, by suggesting to cut it in two with his sword: the true mother is
revealed as she is willing to give up her child to the liar, rather than have it killed. Swords make babies divisible
commodities, so consider a pure exchange economy with two consumers (the two women), one commodity (the
baby). Let x ∈ [0, 1] be a share of a baby. The true mother has utility function u T : [0, 1] → R with u T (x) = x if
x ∈ {0, 1} and u T (x) = −1 otherwise. The liar has utility function u L : [0, 1] → R with u L (x) = x. Determine for each
initial allocation (ωT , ωL ) ∈ {z ∈ R2+ : z 1 + z 2 = 1} the set of feasible allocations, the set Pareto optimal allocations,
the core, and the set of Walrasian equilibria.
Remark: This is not something to chalk up on my insensitivity account, but a perfectly legitimate economic
problem. There is — for instance — a substantial mechanism design literature on King Solomon’s problem; see for
instance Motty Perry and Phil Reny’s “A general solution to King Solomon’s dilemma” in a 1999 issue of Games and
Economic Behavior.
53
7 Expected utility theory
Hitherto, we assumed that decision makers act in a world of absolute certainty; typically, however, the
consequences of decisions entail some stochastic elements. This section treats the development of
expected utility theory, using the axiomatic approach of von Neumann and Morgenstern.
For instance, when tossing a coin, the outcome will be heads H or tails T , so A = {H , T }. A fair coin
corresponds with the simple gamble ( 21 ◦ H , 21 ◦ T ). Some notational conventions:
one often omits outcomes with probability zero from the notation of a simple gamble: ( 12 ◦ a 1 , 12 ◦
a n ) is an abbreviation for the simple gamble
µ ¶
1 1
◦ a 1 , 0 ◦ a 2 , · · · , 0 ◦ a n−1 , ◦ a n .
2 2
one often writes a i for the simple gamble (1 ◦ a i ) whose outcome is a i with probability one.
Not all gambles are simple. Perhaps you decided to bet one dollar on your favorite number in a roulette
game, but toss a coin to decide which of two roulette wheels you want to play in a casino: the outcome
of the first gamble (the coin toss) is another gamble (the roulette game). This is an example of a
compound gamble. In principle, we can have any level of compound gambles. For convenience, we
will assume that a compound gamble ends in a deterministic outcome after only finitely many steps.
Formally, the set of compound gambles is defined as follows. Let G 0 = A and, inductively, for each
m ∈ N, let G m be the set of gambles whose outcomes are gambles from the lower levels G 0 , . . . ,G m−1 :
( )
k
m−1
G m = (p 1 ◦ g 1 , · · · , p k ◦ g k ) : k ∈ N, p 1 , . . . , p k ≥ 0,
X
p i = 1, and g 1 , . . . , g k ∈ ∪`=0 G ` .
i =1
54
7.2 Preferences over gambles
Assume the DM has a preference relation % over the set G of compound gambles. Impose the following
properties:
(G1) % is a weak order.
Given the set of deterministic outcomes A = {a 1 , . . . , a n }, every simple gamble g ∈ G 1 is fully described
by its vector (p 1 , . . . , p n ) ∈ Rn of probabilities, i.e., we can interpret G 1 simply as the unit simplex
∆n = {p ∈ Rn+ : i p i = 1}. And in Rn , we know what continuity means, so we can state:
P
Proposition 7.1
Assume the preference relation % on G satisfies (G1) to (G4).
(a) There is a best element ḡ and a worst element g in G 1 , i.e., for all g ∈ G 1 : ḡ % g % g .
g ∼ (αg ◦ ḡ , (1 − αg ) ◦ g ).
(p 1 ◦ g 1 , · · · , p k ◦ g k ) ∼ (p 1 ◦ h 1 , · · · , p k ◦ h k ).
55
(d) Monotonicity: for all α, β ∈ [0, 1], if α > β, then
(α ◦ ḡ , (1 − α) ◦ g ) Â (β ◦ ḡ , (1 − β) ◦ g ).
Proof: (a): Immediate from continuity (G2) of the weak order (G1) % on the compact unit simplex ∆n .
(b): Let g ∈ G and let g s ∈ G 1 be its reduced simple gamble. Since g ∼ g s by (G3) and ḡ % g s % g , it
follows from transitivity (G1) that ḡ % g % g .
Let p, p̄ ∈ ∆n be the associated probabilities of g and ḡ . By connectedness of the set of convex
combinations of these best and worst gambles in the unit simplex, Proposition 2.7 implies that there is
a gamble with probabilities
αg p̄ + (1 − αg )p
equivalent with g . By reduction to simple gambles (G3), this means
g ∼ (αg ◦ ḡ , (1 − αg ) ◦ g )
(c): By induction on k ∈ N. The claim is trivially true if k = 1. Let k ∈ N, k ≥ 2, and suppose the claim is
true for mixtures of less than k gambles. To prove the case with mixtures of k gambles, notice that
p p
(p 1 ◦ g 1 , · · · , p k ◦ g k ) ∼ (p 1 ◦ g 1 , (1 − p 1 ) ◦ ( 1−p2 1 ◦ g 2 , · · · , 1−pk 1 ◦ g k )) by (G1) and (G3)
p p
∼ (p 1 ◦ h 1 , (1 − p 1 ) ◦ ( 1−p2 1 ◦ h 2 , · · · , 1−pk 1 ◦ h k )) by induction
∼ (p 1 ◦ h 1 , · · · , p k ◦ h k ) by (G1) and (G3)
(α ◦ ḡ , (1 − α) ◦ g ) Â (α ◦ g , (1 − α) ◦ g ) by (G4)
∼ g by (G1) and (G3).
(α ◦ ḡ , (1 − α) ◦ g ) = ĝ
β β
∼ ( α ◦ ĝ , (1 − α ) ◦ ĝ ) by (G1) and (G3)
β β
 ( α ◦ ĝ , (1 − α ) ◦ g ) by (G4)
(α ◦ ḡ , (1 − α) ◦ g ) Â (β ◦ ḡ , (1 − β) ◦ g ),
as we had to show.
56
7.3 von Neumann-Morgenstern utility functions
Equipped with these results, one can show that properties (G1) to (G4) imply the existence of a utility
function u : G → R that is linear in the effective probabilities over the outcomes. Formally, a von
Neumann-Morgenstern (vNM) utility function is a function u : G → R that
represents the preference relation % on G:
∀g , h ∈ G : g % h ⇔ u(g ) ≥ u(h),
Proposition 7.2
If % is a preference relation over G satisfying (G1) to (G4), there exists a vNM utility function
representing %.
Proof: By Proposition 7.1(a), there exists a best gamble ḡ and a worst gamble g in G 1 . In the trivial case
where ḡ ∼ g , any constant function is a vNM utility function. So assume, w.l.o.g., that ḡ Â g .
For each g ∈ G, Proposition 7.1 implies the existence of a unique number αg ∈ [0, 1] such that
g ∼ (αg ◦ ḡ , (1 − αg ) ◦ g ). Define
u(g ) = αg . (29)
This utility function represents %: let g , h ∈ G. Then
g %h ⇔ (αg ◦ ḡ , (1 − αg ) ◦ g ) % (αh ◦ ḡ , (1 − αh ) ◦ g )
⇔ u(g ) = αg ≥ αh = u(h),
where the first equivalence follows from transitivity (G1) of % and the second equivalence from mono-
tonicity and the definition of u.
To obtain the expected utility expression, let g ∈ G and let g s = (p 1 ◦ a 1 , · · · , p n ◦ a n ) be the simple
gamble induced by g . By (G3), g ∼ g s , so u(g ) = u(g s ). For each a i ∈ A, we know from Proposition 7.1
and the definition of u(a i ) that
a i ∼ (u(a i ) ◦ ḡ , (1 − u(a i )) ◦ g ).
For each i = 1, . . . , n, define h i = (u(a i ) ◦ ḡ , (1 − u(a i )) ◦ g ). By substitution:
g s = (p 1 ◦ a 1 , · · · , p n ◦ a n ) ∼ (p 1 ◦ h 1 , · · · , p n ◦ h n ).
Notice that h 1 , . . . , h n are gambles over the best and worst gambles only. By computing the probability
for the best gamble ḡ and using reduction to simple gambles (G3), one finds that (p 1 ◦ h 1 , · · · , p n ◦ h n ) is
equivalent with ÃÃ ! Ã ! !
Xn n
X
p i u(a i ) ◦ ḡ , 1 − p i u(a i ) ◦ g .
i =1 i =1
57
Combining the above with transitivity of % we find:
ÃÃ ! Ã ! !
n
X n
X
g ∼ g s ∼ (p 1 ◦ h 1 , · · · , p n ◦ h n ) ∼ p i u(a i ) ◦ ḡ , 1 − p i u(a i ) ◦ g . (30)
i =1 i =1
g ∼ (u(g ) ◦ ḡ , (1 − u(g )) ◦ g ).
Pn
Combining this with (30) yields u(g ) = i =1 p i u(a i ).
Remark 1 Conversely, it is straightforward to verify that if a preference relation % on G can be repre-
sented by a vNM utility function, it must satisfy properties (G1) to (G4). /
The linearity requirement on vNM utility implies that the earlier result from utility theory — any strictly
increasing transformation of the utility function of the consumer still represents the same preferences
— no longer holds. Indeed, the only transformations of a vNM utility function that remain vNM utility
functions, are positive affine transformations:
Proposition 7.3
Consider the vNM utility function u : G → R defined in (29). For all a, b ∈ R with a > 0, also
au + b is a vNM utility function representing %. Conversely, if v : G → R is a vNM utility function
representing % on G, there exist a, b ∈ R with a > 0 such that v = au + b.
Proof: To avoid trivialities, assume that ḡ Â g . The first claim is simple. To establish the second claim,
let a > 0 and b be the unique solution (do you understand why a solution exists and why it is unique?)
to
v(ḡ ) = au(ḡ ) + b,
v(g ) = au(g ) + b.
and, similarly,
58
7.4 Exercises
Exercise 7.1 Throughout this exercise, let G = ∪∞n=0 G n be the set of compound gambles over a finite set {a 1 , . . . , a k }
of k ≥ 2 different deterministic outcomes in R. Recall: G n is the set of n-th level gambles. For each of the preference
relations % over G defined below, answer the following questions:
If possible, find the best and the worst elements of G.
For each of the four properties (G1) to (G4) guaranteeing the existence of a vNM utility function, check
whether % satisfies it.
If (G1) to (G4) are satisfied, find a vNM utility function representing %.
(a) M OST LIKELY OUTCOMES : A decision maker bases preferences on the average of the deterministic outcomes
that are most likely to occur. Let g ∈ G and let (p 1 ◦ a 1 , · · · , p k ◦ a k ) be its induced simple gamble. Let
be the set of most likely deterministic outcomes and |L(g )| its number of elements. The preference relation
% on G is defined as follows: for all g , h ∈ G:
1 X 1 X
g %h ⇔ a ai ≥ ai .
|L(g )| i ∈L(g ) |L(h)| ai ∈L(h)
(b) K EEPING IT SIMPLE : A decision maker dislikes complex alternatives and has preferences % over G repre-
sented by the following utility function: for each g ∈ G, there is a unique n with g ∈ G n . Let (p 1 ◦ a 1 , · · · , p k ◦
a k ) be its induced simple gamble. Then u(g ) = km=1 p m a m − n.
P
(c) S ATISFICING : A decision maker is content with all deterministic outcomes larger than 5. The preference
relation % on G is represented by the following utility function: for each g ∈ G, let (p 1 ◦ a 1 , · · · , p k ◦ a k ) be its
P
induced simple gamble. Then u(g ) = i :ai >5 p i .
59
8 Risk attitudes
8.1 In for a gamble?
Let us confine attention to cases where the outcomes of the gambles are amounts of money: A is a
convex set in R. Despite the fact that we now allow an infinite set of outcomes, we will assume that
every gamble assigns positive probability to only finitely many outcomes. The existence theorem of
vNM utility functions can be adjusted to this case by modifying the properties (G1) to (G4) to infinite
sets. We assume that the vNM utility function u is increasing in money and investigate the relation
between this function and the DM’s attitude towards risk.
Consider a nontrivial (i.e., at least two different deterministic outcomes have positive probability)
simple gamble g = (p 1 ◦ w 1 , · · · , p n ◦ w n ) and suppose the DM is offered two scenarios:
Pn
1. Accept the gamble; this yields utility u(g ) = i =1 p i u(w i ).
2. Accept the outcome that gives the expected value of the gamble with certainty (this is where
we need convexity of A!). The expected value of the gamble is equal to E (g ) = ni=1 p i w i . This
P
Pn
alternative has utility u(E (g )) = u( i =1 p i w i ).
Proposition 8.1
Let A ⊆ R be nonempty and convex. Assume the DM has a vNM utility function u. Then the DM
is:
(a) risk averse if and only if u is strictly concave on A,
Proof: We only prove the first claim; the others are similar. Risk aversion means that for every nontrivial
gamble (p 1 ◦ w 1 , · · · , p n ◦ w n ),
à !
n
X n
X
u(p 1 ◦ w 1 , · · · , p n ◦ w n ) = p i u(w i ) < u(E (g )) = u pi wi .
i =1 i =1
But this is equivalent with strict concavity: by induction it follows that the function u is strictly concave
Pn Pn
on¡ A if and only
¢ if for all different w 1 , . . . , w n ∈ A and all p 1 , . . . , p n > 0 with i =1 p i = 1 : i =1 p i u(w i ) <
Pn
u i =1 p i w i .
60
8.2 Certainty equivalent and risk premium
The certainty equivalent of a simple gamble g is an amount of money C E (g ) offered with certainty
such that the DM is indifferent between the gamble g and accepting C E (g ):
u(g ) = u(C E (g )).
Remark 1 For topologists (can be omitted): generalizing the continuity requirement G2 to the case
of an infinite set A ⊆ R of deterministic outcomes entails in particular that preferences on A are
continuous. So for each simple gamble g , there is a w ∈ A (say, weight one on the best deterministic
outcome in g ) with w % g and a w ∈ A with g % w. By the Intermediate value theorem for preferences,
Proposition 2.7, there is a C E (g ) ∈ A with g ∼ C E (g ). By monotonicity of preferences in money, C E (g )
is unique: the certainty equivalent is a well-defined notion. /
The risk premium of a simple gamble g is an amount of money P (g ) such that u(g ) = u(E (g ) − P (g )).
Clearly,
P (g ) = E (g ) −C E (g ).
Intuitively, a risk averse DM prefers E (g ) with certainty over the gamble g . But there will be some
amount that makes him indifferent between accepting that amount with certainty and accepting the
gamble g . This amount is called the certainty equivalent. It is easy to show (see below) that for a risk
averse DM who strictly prefers more money to less, the certainty equivalent is less than the expected
value E (g ) of the gamble: a risk averse person is willing to pay a positive amount of money to avoid the
gamble’s inherent risk. This willingness to pay is the risk premium.
Proposition 8.2
Consider a DM with vNM utility function u which is increasing in wealth. The following three
statements are equivalent:
1. DM is risk averse,
As a simple exercise, try to formulate similar characterizations of risk neutral and risk loving behavior.
E XAMPLE . Take A = R++ and assume that u(w) = ln(w) for all w ∈ A. This DM is risk averse, since u
is strictly concave. Assume DM’s initial wealth is w 0 and DM faces a gamble g offering 50-50 odds of
winning or losing an amount h ∈ (0, w 0 ) :
g = ((1/2) ◦ (w 0 − h) , (1/2) ◦ (w 0 + h)).
1 1
Hence E (g ) = 2 (w 0 − h) + 2 (w 0 + h) = w 0 .
The certainty equivalent C E (g ) must satisfy
1 1 q
u(C E (g )) = u(g ) = ln(w 0 − h) + ln(w 0 + h) = ln w 02 − h 2 ,
2 2
where the final equation follows from the properties of the natural logarithm. Hence
q q
C E (g ) = w 02 − h 2 < w 0 = E (g ) and P (g ) = w 0 − w 02 − h 2 > 0.
61
8.3 Arrow-Pratt measure of absolute risk aversion
Arrow and Pratt considered the problem of measuring the extent of risk aversion. They assumed that
the vNM utility function u is an increasing, strictly concave function of wealth levels that is twice
differentiable. In particular, they assume:
Using this, the Arrow-Pratt measure of absolute risk aversion at wealth w is defined as
u 00 (w)
R a (w) = − .
u 0 (w)
Why is this a sensible measure of risk aversion? A heuristic derivation is provided in the next subsection.
The intuition is as follows: the more risk averse a DM is, the more he is willing to pay to avoid certain
gambles. Thus, the size of the risk premium in some way measures risk aversion. It turns out that the
Arrow-Pratt measure of absolute risk aversion is roughly proportional to the risk premium the DM is
willing to pay to avoid actuarially fair bets (a bet is actuarially fair if its expected value equals initial
wealth: the expected loss/gain is zero). Thus, if DM 1 is more risk averse than DM 2, his risk premium
for every nontrivial gamble exceeds that of DM 2, so the same should hold (due to proportionality) for
the Arrow-Pratt measures of absolute risk aversion. The actual proof is somewhat more complicated;
we omit it.
Proposition 8.3
Consider two DMs with vNM utility functions u and v respectively, both satisfying (32). The
following two claims are equivalent:
00 (w) 00 (w)
1. R a1 (w) = − uu 0 (w) > − vv 0 (w) = R a2 (w) for all wealth levels w,
2. The risk premium P 1 (g ) of the DM with utility function u is strictly larger than the risk
premium P 2 (g ) of the DM with utility function v for every nontrivial gamble g ∈ S.
Notice that positive affine transformations of the utility functions do not affect R a (w): it does not
depend on the choice of vNM utility function.
It is common in the literature on for instance portfolio choice to assume that risk aversion decreases
with wealth. This is the DARA assumption (Decreasing Absolute Risk Aversion):
1 1
u(g ) = u(w 0 − h) + u(w 0 + h) = u(E (g ) − P ) = u(w 0 − P ). (33)
2 2
Take a first order Taylor approximation of u(w 0 − P ) around w 0 :
62
Take a second order Taylor approximation of u(w 0 − h) and u(w 0 + h) around w 0 :
1
u(w 0 − h) ≈ u(w 0 ) − u 0 (w 0 )h + u 00 (w 0 )h 2 ,
2
1
u(w 0 + h) ≈ u(w 0 ) + u 0 (w 0 )h + u 00 (w 0 )h 2 .
2
Consequently,
1 1 1
u(w 0 − h) + u(w 0 + h) ≈ u(w 0 ) + u 00 (w 0 )h 2 . (35)
2 2 2
Using (33), (34), and (35), it follows that
1
u(w 0 ) + u 00 (w 0 )h 2 ≈ u(w 0 ) − u 0 (w 0 )P.
2
Rearranging terms, one finds
1 −u 00 (w 0 )
P ≈ h2 0 .
2 u (w 0 )
Conclude that the Arrow-Pratt measure of absolute risk aversion is approximately proportional to the
risk premium P , the willingness to pay in order to avoid the 50-50 odds of winning or losing an amount
h.
63
9 Some critique on expected utility theory
Expected utility theory is the main tool in economic models involving uncertainty. Nevertheless,
expected utility theory has been under constant attack from behavioral economists and psychologists
who show that subjects in experiments or real-life situations systematically violate the properties (G1)
to (G4) or that mindless application of expected theory leads to counterintuitive conclusions. For this
reason, many alternative models for decision making under risk and uncertainty have been developed.
Perhaps the most well-known — especially since Daniel Kahneman was awarded the 2002 Nobel Prize
in economics — is Kahneman and Tversky’s prospect theory (Kahneman and Tversky, 1964). Although
we lack time to go into such alternative models, we stand still for a while and consider a number of
blows to the expected utility model.
9.1 Problems with unbounded utility: a variant of the St. Petersburg paradox
Nothing in the development of our expected utility model required the utility function to be bounded.
Unbounded utility functions, however, make decision-makers susceptible to cunning exploitation.
Suppose a DM with initial wealth w 0 > 0 has a vNM utility function u over money which is not bounded
from above.
By assumption, there is some wealth w 1 with u(w 0 ) < 12 (u(0) + u(w 1 )). Smile and offer your victim
the gamble ( 21 ◦ 0, 12 ◦ w 1 ), which he will accept by construction.
If he loses, he ends up with wealth zero. If he wins, reach him w 1 and just before he takes the money
from your hand, retract it, turn your smile back on, and offer him a gamble ( 12 ◦ 0, 12 ◦ w 2 ), where w 2 is
chosen such that u(w 1 ) < 12 (u(0) + u(w 2 )). Again, by construction, the DM will accept.
As long as the DM goes on winning, keep offering such 50-50 odds gambles. . . The DM will end up
with wealth zero with probability one!
It turns out that in different experiments, most people prefer g 1 to g 2 , but g 4 to g 3 . This violates
expected utility theory. Suppose a DM has vNM utility function u. Then
g 1 Â g 2 ⇔ u($1, 000, 000) > 0.10u($5, 000, 000) + 0.89u($1, 000, 000) + 0.01u($0).
where the last equivalence follows from computing the expected utility of g 3 and g 4 .
64
9.3 Probability matching
You are paid $1 each time you guess correctly whether a red or a green light will flash. The lights flash
randomly, but the red is set to turn on three times as often as the green. It has been found that many
subjects in experiments of this type try to imitate the chance mechanism: they choose red about three
quarters of the time and green one quarter. Obviously it would be more profitable to always choose red.
Formally, the expected utility of the compound lottery of choosing red with probability 3/4 gives you a
one dollar payoff with probability (3/4)2 + (1/4)2 = 10/16, corresponding with the simple gamble
while choosing red with probability one corresponds with the simple gamble
Since 3/4 > 10/16, the second gamble should be strictly preferred over the first.
This type of matching behavior has been frequently observed in real life, as well as laboratory
experiments, using both humans and animals as subjects. In an experiment with animals, for instance,
foraging behavior of pigeons was studied, using two food patches (call them red and green, as above)
with food being dispatched at the red location three quarters of the time and at the green location one
quarter of the time. The pigeons tried to match this probability distribution.
A small personal anecdote: jointly with two colleagues, I published two papers on a game theoretic
model of bounded rationality in which players are assumed to display matching behavior. To explain
the type of behavior to laymen and motivate that it is observed in real life, we used different examples,
among them the pigeon example mentioned above. This led the Dutch Foundation for Mathematical
Research, which at that time was financing my work, to publish a press statement proudly proclaiming:
“People behave like pigeons when dealing with probability”, a press statement that gave us extensive
media coverage but where we desperately tried to qualify our employers’ overzealous interpretation. So
in case you sometimes wonder what you are doing. . . you may just be behaving like a pigeon!
1. For each level of wealth, the DM will reject the lottery with a 50 percent chance of loosing 100
dollars and a 50 percent chance of gaining 150 dollars.
2. For each level of wealth, the DM will reject the lottery with a 50 percent chance of loosing 100
dollars and a 50 percent chance of gaining 1, 500 dollars.
3. For each level of wealth, the DM will reject the lottery with a 50 percent chance of loosing 100
dollars and a 50 percent chance of gaining 1, 000, 000 dollars.
4. For each level of wealth, the DM will reject the lottery with a 50 percent chance of loosing 100
dollars and a 50 percent chance of gaining 1, 000, 000, 000, 000, 000, 000 dollars.
5. You can proceed with gains G as high as you want, but the DM will always reject the lottery with
a 50 percent chance of loosing 100 dollars and a 50 percent chance of gaining G.
65
Which of these statements are true? The first and the second may perhaps not be so surprising and I
probably wouldn’t be asking you if the question was trivial, so even the third could be true. On the other
hand, one would certainly doubt the sanity of a DM rejecting the bet in the fourth claim and lingering
doubt turns to certainty in the fifth case. Yet this is exactly what the DM will do: no amount of money
in the world will make him accept a gamble with a 50 percent chance of loosing 100 dollars. Clearly,
such behavior is absurd.
Let us try to establish some intuition. The fact that the DM at each wealth level w rejects the gamble
µ ¶
1 1
◦ (w − 10) , ◦ (w + 11)
2 2
implies that
1 1
∀w : u(w − 10) + u(w + 11) < u(w),
2 2
or, rewriting the expression, that
Hence, on average, the DM values each dollar between w and w + 11 by at most 10/11 times as much
as he, on average, values each dollar between w − 10 and w :
10 0
∀w : u 0 (w + 11) < u (w − 10). (36)
11
Repeated application of (36) implies an enormous decrease in marginal utility of money: the marginal
utility of dollar w + 32 is at most 10/11 times the marginal utility of dollar w + 11, which is at most
10/11 times the marginal utility of dollar w − 10, so the marginal utility of dollar w + 32 is at most
(10/11)2 ≈ 0.83 times the value of dollar w − 10. Similarly, the DM values dollar w + 53 by at most
(10/11)3 ≈ 0.75 times the value of dollar w − 10. More generally, the DM values dollar w + 11 + 21k,
where k ∈ N, by at most (10/11)k+1 times the value of dollar w − 10, which is an extremely high rate of
deterioration for the value of money.
66
10 Time preference
Discounting essentially means that a given benefit is valued higher when it is received immediately
than when it is received with a delay. A common economic motivation for discounting is that, say, one
dollar today is worth more than one dollar next year, as the immediate reward can be put into a bank at
an annual interest rate r > 0, making the dollar today worth 1 + r dollars next year. Another motivation,
common in evolutionary models, is the risk that a delayed benefit may not be realized: you may die
before receiving it (or be interrupted in achieving it, or be cheated in the promise of receiving it).
In addition to the question how to model discounting in an appropriate way, decision theory in the
presence of time involves a number of careful considerations:
Choice of horizon: should one look finitely or infinitely far into the future? Keynes’ famous
quote “In the long run, we are all dead” could be an argument in favor of a finite horizon. Many
economic models involve just two time periods as an abstraction of “now” and “the future”. On
the other hand, many decisions have no clearly defined final period: you — or in an evolutionary
sense as in overlapping generations models, your genes — may live to see another day. In such
cases, an infinite horizon makes sense.
Choice of time as a discrete or continuous variable: also here, common sense, the appropriate
level of abstraction, and (not rarely) the modeler’s choice of mathematical tools is decisive.
Unless specified otherwise, this section takes time as being discrete and uses an infinite horizon. We
derive the standard exponential discounting model from a stationarity assumption on preferences and
briefly discuss a violation of stationarity and hyperbolic discounting. Section 10.3, based on Osborne
and Rubinstein (1994, Sec. 8.3), considers two criteria for evaluating outcomes over time without
discounting. The final section, based on Voorneveld (2010), illustrates the somewhat paradoxical
statement that a sequence of utility-maximizing choices can minimize utility.
with δ(0) = 1.
The function in (37) is often interpreted as a sum of discounted instantaneous utilities: the outcome
c t at time t ∈ N gives utility u(c t ), but is discounted by a factor δ(t ) ∈ (0, 1) as it lies in the future. The
discount factor δ(0) = 1 for current outcomes is mostly cosmetic, facilitating the notation involving an
infinite sum.
Exercise 10.1 The expression in (37) involves an infinite sum, which may not be well-defined.
(a) Give an example to show this.
(b) Prove that the sum is well-defined if the sequence of discount factors is summable ( ∞t =0 δ(t ) < ∞) and the
P
67
Recall the earlier motivation for discounting of money: given a fixed interest rate r > 0 per period, one
dollar tomorrow is worth only (1 + r )−1 dollars today, so it makes sense to discount future money by
powers of δ = (1 + r )−1 . Following Koopmans (1960), exponential discounting can also be derived by
imposing a stationarity requirement on preferences.
Preferences % satisfy stationarity if they are not affected if a common first outcome is dropped,
and the timing of all other outcomes is advanced by one period. By repeated application, it implies that
for a comparison between two sequences all initial periods with common outcomes can be dropped,
and the first period of different outcomes can be taken as the initial period. Formally, the preference
relation % is
(c 0 , c 1 , c 2 , . . .) % (d 0 , d 1 , d 2 , . . .) ⇔ (c 1 , c 2 , . . .) % (d 1 , d 2 , . . .).
Proposition 10.1
For notational convenience, let 0 be a feasible outcome. Assume that:
preferences % can be represented by a utility function as in (37),
satisfy stationarity,
the decision-maker is indifferent between:
option 1: getting α today and α0 tomorrow (i.e., the sequence (α, α0 , 0, 0, . . .)),
option 2: getting β today and β0 tomorrow,
u(α)−u(β) τ
³ ´
Proof: By induction on t . The result is trivial if t = 0. Let t ∈ N and assume that δ(τ) = 0 0
u(β )−u(α )
for
all τ < t . Repeated application of stationarity implies that
( 0, . . . , 0 , α, α0 , 0, 0, . . .) ∼ ( 0, . . . , 0 , β, β0 , 0, 0, . . .),
| {z } | {z }
t −1 times t −1 times
so
u(α) − u(β) u(α) − u(β) t
µ ¶ µ ¶
δ(t ) = δ(t − 1) = ,
u(β0 ) − u(α0 ) u(β0 ) − u(α0 )
where the final equality uses the induction hypothesis.
Exercise 10.2 R ATIONAL SUICIDE : A decision maker (DM) lives for at most two periods, t = 0 and t = 1. At each
time t ∈ {0, 1} that he is alive, he must decide, depending on his mood, whether or not to commit suicide. Regardless
of his initial mood, at time t = 1 he will be depressed with probability 1/2 or happy with probability 1/2. His
instantaneous utility is state-dependent, i.e., it depends not only on his action but also on the state of the world at
time t . The set of states is S = {h, d , D}, where h is “alive and happy”, d is “alive and depressed”, and D is “dead”. The
68
set of actions is A = {k, `}, where k is “commit suicide” and ` is “go on living”. The instantaneous utility function
u : S × A → R is defined as follows:
1 if (s, a) = (h, `),
−1 if (s, a) = (h, k),
u(s, a) =
−α if (s, a) = (d , `),
0 otherwise,
where α > 0 is the intensity of the depression. Thus, given that you’re happy, killing yourself appears silly, but if
you’re depressed, it may seem less so. State D is irreversible: should the DM decide to kill himself at time t = 0,
then he receives utility 0 at time t = 1. The DM discounts the future exponentially at rate 0 < δ < 1, and maximizes
expected lifetime utility (of the standard additive form). We solve the decision problem by backward induction,
starting with optimal behavior in the final period t = 1. Assume the DM is alive at time t = 1.
(a) What is the optimal action and the resulting instantaneous utility if the DM at t = 1 is (a1) happy? (a2)
depressed?
Now consider the initial period: assume the DM is depressed at time t = 0.
(b) Assuming optimal behavior at time t = 1, what is the optimal action at time t = 0? Note: (i) if the DM does
not kill himself immediately, there is uncertainty about his mood at time t = 1; (ii) the answer depends on α
and δ.
(c) A psychologist claims that the option of future suicide might prevent depressed people from killing them-
selves straight away. Explain this claim using the answers above.
Remark: I am just venting my morbidity here. This is a legitimate topic of economic study. The exercise was
inspired by reading “An economic theory of suicide” by Hamermesh and Soss in a 1974 issue of the Journal of
Political Economy.
To see that this model can explain the preference reversal for the apples, assume that utility u satisfies
u(0) = 0 and is strictly increasing in apples. Preferring one apple today over two apples tomorrow
means that
u(1) > βδu(2). (39)
Preferring two apples one year and a day from now to one apple a year from now (and assuming we’re
not in a leap year) means that
βδ366 u(2) > βδ365 u(1). (40)
For (39) and (40) to hold simultaneously, we simply need (β, δ) ∈ (0, 1) × (0, 1) to satisfy
u(1)
βδ < < δ.
u(2)
Taking δ sufficiently close to one and β sufficiently close to zero will do the trick.
69
Exercise 10.3 (Loewenstein and Prelec, 1992) Discount factors δ(t ) = (1 + αt )−γ/α , with α, γ > 0, fit experimental
data well. Show that also this model captures the preference reversal described above.
Exercise 10.4 (Wärneryd, 2007) S EX AND TIME PREFERENCE : In some evolutionary models of intertemporal
consumption, time periods represent generations and people care about future consumption to the extent that
it is exercised by their offspring (children, grandchildren, etc.). To simplify matters, assume that (i) a DM cares
about consumption of its offspring only if it has a specific gene; (ii) mates are selected at random and have the
relevant gene with probability α ∈ [0, 1], (iii) offspring gets in expectation half of its genes from each parent, (iv) we
consider one unit of offspring per time period.
(a) Let t ∈ N. Show, for instance by conditioning on the giver of the gene, that the probability p t of the DM’s
t -th period offspring carrying the gene satisfies the recurrence relation p t = 12 p t −1 + 12 α.
³ ´t
(b) Set p 0 = 1 and show that the solution to the recurrence relation is p t = α + (1 − α) 12 for all t = 0, 1, 2, . . .
With these kinship parameters (p t )∞ t =0 in the place of discount factors, the standard separable utility function
becomes U (c) = ∞
P
t =0 p t u(c t ). Assume that consumption is in units of apples, u is strictly increasing, and u(0) = 0.
Let’s investigate the opportunity of preference reversal.
(c) Let α and u be such that the DM prefers 1 apple now (t = 0) to 2 apples next generation (t = 1). Prove: for
T ∈ N sufficiently large, the DM prefers 2 apples at time T + 1 to 1 apple at time T .
x 0 + x 1 + · · · + x T −1
lim .
T →∞ T
However, even if the sequence is bounded, this limit may not exist: the average may continue to
oscillate. We verify this statement with a binary (zero-one) sequence. The idea is to append enough
ones to increase the average until it achieves a fixed high value, then to append enough zeroes to
decrease the average until it reaches a fixed low value, and continue this process.
A N OSCILLATING AVERAGE : Consider the binary sequence
(0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, . . .)
obtained by starting with a zero and two ones, and then after each block of zeroes or ones, double the
length of the sequence obtained so far with a block of the other number: after the first block of ones,
we have three coordinates, so we double the length to six coordinates by appending some zeroes. Then
we double the length to twelve coordinates by adding some ones, etc. A simple inductive proof shows
that after the k-th block of ones, the sequence has 3 · 22k−2 coordinates, 22k−1 of them equal to one, and
therefore an average of 2/3. Doubling the length to 3 · 22k−1 coordinates by appending zeroes decreases
the average by a factor 1/2 to 1/3. As appending zeroes decreases, and appending ones increases the
average, it follows that the average continues to oscillate between 1/3 and 2/3.
Taking, instead, a pessimistic view of how the average utility changes over time will give us a well-
defined criterion. This requires some mathematical preliminaries. Consider a bounded sequence
(x t )∞
t =0 of real numbers. For each t , s t = inf{x s : s ≥ t } indicates the infimum (in somewhat colloquial
terms, the “worst” value) of the tail of the sequence from time t onwards. This infimum is well-defined,
as the sequence is bounded. Notice also that the sequence (s t )∞ t =0 is weakly increasing: increasing
70
t implies taking the infimum over a smaller set. As (s t )∞ t =0 is a monotonic, bounded sequence, it
converges. Its limit is called the lower limit or limes inferior (liminf ) of the original sequence (x t )∞
t =0 :
By convention, lim inft →∞ x t = −∞ if (x t )∞t =0 is not bounded from below. If it is bounded from below,
but not from above, the sequence of infima may diverge, in which case one sets lim inft →∞ x t = +∞.
The following characterization of the lower limit may come in handy. Let (x t )∞ t =0 be a sequence and
let c ∈ R. Then lim inft →∞ x t = c if and only if:
[L1] for each ε > 0, there is a T ∈ N such that c − ε < x t for all t ≥ T ,
[L2] for each ε > 0 and each T 0 ∈ N, there is a t ≥ T 0 with x t < c + ε.
In words, the sequence eventually remains above c − ε, but dives below c + ε infinitely often, no matter
how small ε > 0.
Exercise 10.5 Prove this.
The limit-of-means criterion evaluates utility streams by means of the lower limit of the average utility:
1 TX−1
lim inf (x t − y t ) > 0. (41)
T →∞ T t =0
Inequality (41) is equivalent with the statement that for some ε > 0, the average difference between
sequences x and y eventually exceeds ε:
1 TX−1
(x t − y t ) > ε for all but finitely many periods T .
T t =0
Changes in a single coordinate of a sequence become negligible once the average is taken over a long
time, so under the limit-of-means criterion, changes in any finite number of periods do not matter. In
particular, these preferences are stationary.
Exercise 10.7 Some authors refer to the limit-of-means criterion as the preference relation represented by the
1 PT −1
utility function assigning to each bounded sequence x = (x t )∞
t =0 the number U (x) = lim infT →∞ T t =0 x t .
(a) Why must the sequences be bounded?
(b) Aside from this, are the two definitions really the same?
The following criterion also assigns equal weight to periods, but remains sensitive to changes in single
coordinates:
T
X
lim inf (x t − y t ) > 0.
T →∞ t =0
71
Let us compare exponential discounting, and the limit-of-means and overtaking criteria. The latter
were defined in terms of strict preferences. Define the corresponding indifference relation ∼L as follows:
x ∼L y if neither x ÂL y nor y ÂL x. Of course, ∼O is defined similarly.
C OMPARISON :
The sequence (1, −1, 0, 0, . . .) is preferred to the sequence (0, 0, . . .) under exponential discounting
for all δ ∈ (0, 1). Under the other two criteria, they are equivalent.
The sequence (−1, 2, 0, 0, . . .) is preferred to the sequence (0, 0, . . .) under the overtaking criterion.
Under the limit-of-means criterion, they are equivalent.
For every n ∈ N, the sequence
(0, . . . , 0, 1, 1, . . .)
| {z }
n times
is preferred to (1, 0, 0, . . .) under the limit-of-means criterion. However, for each δ ∈ (0, 1), a large
enough delay in a constant stream of ones makes the instant gratification of getting 1 immediately
the preferable option.
H EALTH CONCERNS : nevertheless, the best thing is never to drink and the worst thing is to drink
at all times: (0, 0, . . .) maximizes, and (1, 1, . . .) minimizes U .
This sounds paradoxical and is indeed impossible under a finite horizon: suppose there are only T ∈ N
periods. Start with an arbitrary drinking pattern and switch, one period at a time, any abstention (0)
to drinking (1). By temptation, each such switch weakly increases the utility function. So drinking at
all times maximizes utility, in conflict with health concerns, which would require that all these weak
increases in utility eventually lead to a plunge in utility: it is like climbing a stairway, but ending up
lower than before (Figure 1).
The next example shows that temptation and health concerns can be reconciled under an infinite
horizon.
D RINKING PARADOX : Define the utility for each drinking pattern x as follows:
3 if x t = 1 for only finitely many t (a rare drinker),
U (x) = 0 if x t = 0 for only finitely many t (a heavy addict),
x t · 2−t
P
t otherwise.
72
Figure 1: An impossible stairway
As a switch from 0 to 1 at time t leaves the utility unaffected in the first two cases and increases it by
2−t > 0 otherwise, the temptation assumption is satisfied. However,
73
11 Probabilistic choice
Consider a DM with a finite set A of alternatives. Earlier, we saw that if the DM has a weak order % over
these alternatives, there is a utility function u : A → R representing these preferences and making an
optimal choice reduces to choosing an alternative a ∈ arg maxb∈A u(b), a utility maximizing alternative.
However, in numerous experiments, it turns out that DMs:
do not always make the same choice under seemingly identical circumstances,
sometimes choose seemingly suboptimal alternatives.
Such apparently irrational behavior has led to the development of so-called probabilistic choice mod-
els, where the main idea is that:
each alternative is chosen with some probability,
if a and b are feasible choices and a % b, then the probability of choosing a should be at least as
large as the probability of choosing b.
This section gives a very short introduction to three probabilistic choice models: the Luce model,
the logit model, and the linear probability model. Often, probabilistic choice models are derived
in a random utility framework, where the true utility of each alternative consists of a deterministic
component plus a random component. Depending on the realization of the random utility component,
a feasible choice will look good under some circumstances and bad under others, thus motivating
that observed choice is probabilistic: an alternative is only chosen in circumstances where it looks
optimal. We will not consider such random utility models: they are (or should be) treated in detail in
the econometrics courses. The development of these models was one of the main causes for awarding
Daniel McFadden the Nobel Prize in 2000. Instead, we derive the models either axiomatically or via the
introduction of control costs: DMs want to choose optimally, but incur costs to precisely implement
their choices.
A good introduction to probabilistic choice models can be found in Anderson et al. (1992, Ch.
2) and Ben-Akiva and Lerman (1985, Ch. 3). On the content of this section: The Luce model is due
to Luce (1959). The derivation of the logit choice probabilities using the entropy cost function can
be found in Mattsson and Weibull (2002). The derivation of the linear probability model using the
Euclidean distance as cost function is due to Voorneveld (2006). It is based on an early contribution to
the literature on bounded rationality in games by Rosenthal (1989).
74
With this notation, the following two properties should be intuitive. The first property states that if some
alternative a ∈ T is never chosen in a pairwise comparison with some other b ∈ T , i.e., P {a,b} (a) = 0,
then a can be deleted from T without affecting the choice probabilities of the remaining alternatives:
for all S ⊂ T .
When making a choice from a set T ⊆ A, (L1) allows us to restrict attention to the alternatives for which
there is “imperfect discriminatory power”: P {a,b} (a) ∉ {0, 1} for all a, b ∈ T, a 6= b. The path independence
condition then yields the following result:
Proposition 11.1
Assume that P {a,b} (a) ∉ {0, 1} for all different a, b ∈ A. Path independence (L2) holds if and only if
there is a function u : A → R++ such that
u(a)
P S (a) = P (42)
b∈S u(b)
Proof: S TEP 1: Assume path independence (L2) holds. We first prove that P A (a) > 0 for all a ∈ A.
Suppose, to the contrary, that P A (a) = 0 for some a ∈ A. By (L2), we know that for every b ∈ A \ {a} :
Since P {a,b} (a) 6= 0, it follows that P A ({a, b}) = P A (a) + P A (b) = 0 for all b ∈ A \ {a}. Probabilities are
nonnegative, so it must be that
∀b ∈ A : P A (b) = 0,
P
contradicting b∈A P A (b) = 1. Having shown that P A (a) > 0 for all a ∈ A, define u(a) = P A (a). Path
independence (L2) implies that for every S ⊆ A :
u(a)
P S (a) = P
b∈S u(b)
75
for every S ⊆ A. To show: (L2) holds. So let S ⊂ T ⊆ A and a ∈ S. Then
P
u(a) b∈S u(b) u(a)
P T (a) = P =P ·P = P T (S)P S (a).
b∈T u(b) b∈T u(b) b∈S u(b)
S TEP 3: To show that the function u in (42) is unique up to multiplication with a positive constant,
suppose there are two such functions u and u 0 . It follows that for every a ∈ A :
u(a) u 0 (a)
P A (a) = P
=P 0
.
b∈A u(b) b∈A u (b)
Suppose now that two buses can be used, which are completely identical, except in their colors: one of
them is red, the other is blue. So the choice set is A = {car, blue bus, red bus}. Assume that the DM pays
no attention to color:
P {blue bus, red bus} (blue bus) = P {blue bus, red bus} (red bus). (44)
Intuitively, since the DM according to (43) doesn’t seem to care whether he goes by car or by bus, it
would seem reasonable to expect that he will choose to go by car with probability 1/2 and to go by bus
with probability 1/2, choosing randomly between the blue and the red bus:
or — at least — that the probability of taking the car should be larger than the probability of taking any
of the two buses. However, path independence (L2) implies
76
11.2 The logit model
Again, consider a choice set A = {1, . . . , n} with at least two distinct elements. Assume that each alterna-
tive i ∈ A gives some utility or payoff π(i ). In the logit model with parameter δ > 0, the probability of
choosing alternative i from A is equal to
e π(i )/δ exp(π(i )/δ)
P A (i ) = P =P . (45)
j ∈A e π( j )/δ j ∈A exp(π( j )/δ)
Notice from (42) that this is just a special case of Luce’s model, where the utility assigned to each
alternative i ∈ A is equal to u(i ) = exp(π(i )/δ) > 0. Our goal will be two-fold:
1. motivating these choice probabilities by introducing control costs,
2. studying the role of the parameter δ > 0.
C ONTROL COSTS . We allow the DM to choose each of the alternatives with a certain probability, so the
DM chooses a probability distribution from
( )
n
n
∆n = p ∈ R+ :
X
pi = 1 .
i =1
Of course, if the DM is faced with choice set A and has preferences % over the outcomes such that i % j
if and only if π(i ) ≥ π( j ), the optimal thing to do is to choose only elements from the set arg maxi ∈A π(i )
with positive probability. In most real-life situations, the DM cannot guarantee the exact implementa-
tion of his choices: a careless driver may drive of the road, an absentminded shopper may by mistake
buy the wrong item. To model this, we assume that it requires effort to implement choices: associated
with each choice p ∈ ∆n will be a disutility or control cost c(p) ∈ R.
The (expected total) utility associated with each choice p ∈ ∆n is defined as the difference between
the expected payoff ni=1 p i π(i ) and δ > 0 times the control cost c(p), where δ is a positive scalar
P
representing the relative weight assigned to the effort of implementing choice p. Hence, the DM aims
to solve
n
p i π(i ) − δc(p).
X
max
p∈∆n i =1
Different cost functions give rise to different choice probabilities. A common control cost function that
appears in many branches of science (physics, chemistry, information science, to name but a few) is
the following entropy function:
n
X ¡ ¢
c(p) = p i ln p i , (46)
i =1
where we use the convention that 0 ln 0 = 0. One can show (we will not do so) that this is a strictly
convex function achieving its minimum at the vector (1/n, . . . , 1/n), where all alternatives are chosen
with equal probability.
Proposition 11.2
The optimization problem
n
p i π(i ) − δc(p),
X
max (47)
p∈∆n i =1
with the control cost function from (46) has a unique maximum location with
exp(π(i )/δ)
∀i ∈ A : p i = P ,
j ∈A exp(π( j )/δ)
77
the logit choice probabilities from (45).
Proof: The cost function c is strictly convex, so the function ni=1 p i π(i ) − δc(p) is strictly concave.
P
Since we maximize a strictly concave, continuous function over a compact set, a maximum exists and
is unique. Since the feasible set is entirely defined by linear (in)equalities, the Kuhn-Tucker conditions
give necessary and sufficient conditions for a solution to be a maximum. The condition for an interior
solution p ∈ ∆n , i.e., a solution where p i > 0 for all i , is that there exists a Lagrange multiplier λ ∈ R
associated with the constraint ni=1 p i = 1, such that
P
since the gradient at p of the goal function ni=1 p i π(i ) − δc(p) has i -th coordinate
P
∂c(p)
π(i ) − δ = π(i ) − δ(ln p i + 1).
∂p i
Rewriting (48) gives, for each i = 1, . . . , n:
exp(π(i )/δ)
∀i = 1, . . . , n : pi = P ,
j ∈A exp(π( j )/δ)
as we had to show.
T HE ROLE OF δ. Let us investigate what happens with the logit choice probabilities in (45) as δ → 0 and
as δ → ∞. Consider two alternatives i , j ∈ A, i 6= j . Notice that the ratio of their logit choice probabilities
equals
P A (i ) π(i ) − π( j )
µ ¶
exp (π(i )/δ)
= ¢ = exp , (49)
P A ( j ) exp π( j )/δ δ
¡
which converges to one as δ → ∞. But if the ratios of any two choice probabilities converge to one,
their limits must be equal; together with the fact that probabilities add up to one, we conclude that the
choice probabilities converge to 1/n as δ → ∞.
To consider the limit behavior as δ → 0, suppose that π(i ) > π( j ). But then ratio (49) goes to infinity
as δ → 0. Since we are dealing with probabilities here, which are bounded below by zero and above by
one, if must be that P A ( j ) → 0. If we let i be the alternative with maximal payoff π(i ), it follows that
the probability of choosing an alternative with less than maximal payoff converges to zero. So in the
limit, all probability is restricted to optimal alternatives and it is clear from the definition of the choice
probabilities that all of these will be chosen with equal probability.
In summary, the parameter δ can be interpreted as a measure of irrationality of the DM: for large
values of δ, the DM chooses by more or less blindly picking any of the alternatives, while for small
values of δ, the choice of the DM is more or less optimal.
78
The adjective linear indicates that the difference between these two probabilities should be linear in
the payoff difference: for a parameter δ > 0, we require that
Unfortunately, it is not always possible to combine these two properties for large values of δ. Let’s
consider a simple example with two alternatives: A = {1, 2} and respective payoffs π(1) = 4, π(2) = 0. By
(50), we want P A (1) ≥ P A (2) and by (51), we want P A (1) − P A (2) = δ(π(1) − π(2)) = 4δ. If we take δ = 1/8,
this gives P A (1) − P A (2) = 1/2. The probabilities have to add up to one, so the unique solution is that
P A (1) = 3/4 and P A (2) = 1/4. So far, so good. Now take δ = 100: P A (1) − P A (2) = 4δ = 400. Since P A (1)
and P A (2) are probabilities between zero and one, making their difference equal to 400 (or — for that
matter — any number larger than 1) is simply impossible.
So we have to relax our requirements (50) and (51) somewhat. Unwilling to change (50), let us adapt
(51). Indeed, we require the linearity condition whenever possible, but when we run into problems
like the one in the example above, we simply require that alternatives with low payoff are chosen
with probability zero. Formally, choice probabilities P A (i ) for all alternatives i ∈ A satisfy the linear
probability model with parameter δ > 0 if the following holds:
This implies
P A (i ) − P A ( j ) = δ(π(i ) − π( j )),
in correspondence with the linearity requirement (51).
The choice probabilities also satisfy (50): take i , j ∈ A with π(i ) ≥ π( j ). We need to show that the
choice probabilities in the linear probability model satisfy P A (i ) ≥ P A ( j ). Discern two cases. First,
if P A ( j ) = 0, it automatically follows that P A (i ) ≥ 0 = P A ( j ). If P A ( j ) > 0, application of (52) yields
P A ( j ) − P A (i ) ≤ δ(π( j ) − π(i )) ≤ 0,
79
Proposition 11.3
For each δ > 0, there is a unique solution to the maximization problem
n 1
p i π(i ) −
X
max c(p) (54)
p∈∆n i =1 2δ
with the cost function given in (53). The solution coincides with the choice probabilities in the
linear probability model with parameter δ.
T HE ROLE OF δ. Comparing the parameter δ in the two optimization problems with control costs in
(47) and (54), you will notice that they switched roles: large values of δ correspond with a large weight
assigned to the control cost function in the logit model, but with a small weight assigned to the control
cost function in the linear probability model. This change was necessary because I wanted to follow
the standard definition of the linear probability model in (52). But the intuition remains the same: δ
measures (ir)rationality. In the case of the linear probability model: for large values, (52) indicates that
the difference in the probability of choosing an optimal alternative (highest π(i )) and a suboptimal
alternative must be large. In the limit, this forces the probability of choosing suboptimal alternatives to
zero.
Conversely, for small values of δ, (52) indicates that the difference in the probability of choosing
any two alternatives must be small. Combining this with the fact that probabilities add up to one, this
implies that in the limit, all alternatives will be chosen with equal probability.
11.4 Exercises
Exercise 11.1 Prove Proposition 11.3.
Exercise 11.2 Let A = {1, 2}, π(1) = 4, π(2) = 0.
(a) Compute for every δ > 0 the choice probabilities satisfying the linear probability model.
(b) What happens with the choice probabilities as δ → 0? Interpret.
(c) What happens with the choice probabilities as δ → ∞? Interpret.
Exercise 11.3 Let A = {1, 2, 3}, π(1) = 0, π(2) = 2, π(3) = 8.
(a) Compute for each δ > 0 the choice probabilities in the logit model. Do these choice probabilities, for each
δ > 0, satisfy path independence? What happens with the choice probabilities as δ → ∞?
(b) Answer the same questions for the linear probability model.
Exercise 11.4 T HE PENALTY FUNCTION APPROACH : Two of the probabilistic choice models considered above could
be rationalized using control cost functions giving a penalty to deviations from uniform randomization. This
exercise gives the general argument behind such rationalizations.
A penalty function on Rn is a function c : Rn → R+ . A symmetric penalty function is independent of rear-
ranging the coordinates: for each bijection r : {1, . . . , n} → {1, . . . , n} and each x ∈ Rn , it follows that c(x 1 , . . . , x n ) =
c(x r (1) , . . . , x r (n) ).
Consider a probabilistic choice model over a finite set A = {1, . . . , n} with n ≥ 2 elements and payoff function
π : A → R. Suppose a decision maker’s choice probabilities can be rationalized using a symmetric penalty function:
given parameter δ ≥ 0, they solve the problem
n
p i π(i ) − δc(p − (1/n, . . . , 1/n)).
X
P (δ) : max
p∈∆n i =1
Show that the resulting choice probabilities satisfy the desired monotonicity requirement: if p solves P (δ) and
π(i ) > π( j ), then p i ≥ p j .
80
Full circle
To make sure you get the big picture, let us — at the end of this course — turn back to where we started:
the overview of the course goals in the preface, and briefly summarize how we achieved them.
81
A PPLICATION 1: CONSUMER FACING BUDGET CONSTRAINT.
Feasible alternatives: commodity bundles x ∈ RL+ in a budget set B (p, w).
Preferences: an arbitrary weak order % over the commodity space X = RL+ .
Changes in agent’s environment: see Sections 4.2 and 4.5.
t =0 δ(t )u(c t ),
P∞
1. represented by a utility function of the form U (c) =
2. in terms of the limit of means criterion,
3. in terms of the overtaking criterion.
A PPLICATION 7: PROBABILISTIC CHOICE . Although slightly outside the general framework, in some
probabilistic choice models like the logit and linear probability model, agents choose probabilities as if
they maximize expected payoffs subject to implementation costs:
Feasible alternatives: choice probabilities assigned to a finite set A of alternatives.
Preferences: represented by a utility function of the form “expected payoff minus control costs”;
see Propositions 11.2 and 11.3.
82
Beyond these notes
Applications of the general framework abound also in other branches of economics. In macroeco-
nomics, a government may evaluate alternative policies in terms of some social welfare function
summarizing the well-being of its citizens. In game theory — the mathematical toolbox used to study
interaction between agents, used in many branches of microeconomics, industrial organization, and
political economics — players have different strategies to choose from and evaluate them in terms of a
preference relation that incorporates the uncertainty they face about, for instance, the choices of the
other players.
And what if we leave the realm of rational decision making? Parts of these notes (see, for instance,
Exercises 3.5, 3.6, and Section 11) illustrate that as long as we can write down formal postulates about
agents’ behavior, our mathematical tools allow us to study their consequences in a rigorous and
consistent way. This is just the right amount of “rationality” we need:
Behavior is procedurally rational if there is a procedure — a recipe, if you wish — that translates a
decision problem to a well-defined choice. Procedurally rational decision makers are not wild maniacs
choosing without any logic whatsoever: paraphrasing Shakespeare, there is method to their madness.
I hope that the tools you acquired during this course will help you to address also other economic
problems in a structured way.
83
Notation
If X is a finite set, |X | denotes its cardinality, i.e., its number of elements.
Weak set inclusion (each element of A is also an element of B ): A ⊆ B .
Strict/proper set inclusion (A ⊆ B , but A 6= B ): A ⊂ B .
Set of positive integers: N = {1, 2, 3, . . .}.
Set of integers: Z = {. . . , −2, −1, 0, 1, 2, . . .}.
Set of rational numbers: Q = {p/q : p, q ∈ Z, q 6= 0}.
Set of real numbers: R.
For arbitrary L ∈ N :
Set of vectors in RL with nonnegative coordinates: RL+ = {x ∈ RL : x 1 , . . . , x L ≥ 0}.
Set of vectors in RL with positive coordinates: RL++ = {x ∈ RL : x 1 , . . . , x L > 0}.
Sets like QL++ are defined analogously.
For two vectors x, y ∈ RL , their inner product is denoted by x · y = x 1 y 1 + · · · + x L y L .
Moreover, write
e k = (0, . . . , 0, 1
|{z} , 0, . . . , 0).
k−th coordinate
84
References
Anderson, S.P., de Palma, A., Thisse, J.-F., 1992. Discrete choice theory of product differentiation. MIT
Press.
Arrow, K.J., 1959. Rational choice functions and orderings. Economica 26, 121-126.
Arrow, K.J., Hahn, F.J., 1971. General competitive analysis. Amsterdam: North-Holland.
Ben-Akiva, M., Lerman, S.R., 1985. Discrete choice analysis. MIT Press.
Cobb, C.W., Douglas, P.H., 1928. A theory of production. American Economic Review (supplement) 18,
139-165.
Debreu, G., 1954. Representation of a preference ordering by a numerical function. In: Decision
Processes. Thrall, Davis, Coombs (eds.), John Wiley, pp. 159-165.
Debreu, G., 1959. Theory of value. Yale University Press.
Debreu, G., 1960. Review of R.D. Luce, Individual Choice Behavior: A Theoretical Analysis. American
Economic Review 50, 186-188.
Debreu, G., 1964. Continuity properties of Paretian utility. International Economic Review 5, 285-293.
Diecidue, E., Wakker, P.P., 2002. Dutch books: avoiding strategic and dynamic complications, and a
comonotonic extension. Mathematical Social Sciences 43, 135-149.
Dubra, J., Echenique, F., 2001. Monotone preferences over information. Topics in Theoretical Eco-
nomics 1, article 1. [Link]
Fishburn, P.C., 1970a. Utility theory for decision making. New York: John Wiley & Sons.
Fishburn, P.C., 1970b. Intransitive individual indifference and transitive majorities. Econometrica 38,
482-489.
Fishburn, P.C., 1979. Transitivity. Review of Economic Studies 46, 163-173.
Hildenbrand, W., Kirman, A.P., 1988. Equilibrium analysis. North-Holland.
Jaffray, J.-Y., 1975. Existence of a continuous utility function: An elementary proof. Econometrica 43,
981-983.
Kahneman, D., Tversky, A., 1964. Prospect theory: an analysis of decision under risk. Econometrica 47,
263-291.
Kamke, E., 1950. Theory of sets. New York: Dover Publications.
Kaneko, M., 1976. Note on transferable utility. International Journal of Game Theory 5, 183-185.
Koopmans, T.C., 1960. Stationary ordinal utility and impatience. Econometrica 28, 287-309.
Kreps, D.M., 1990. A course in microeconomic theory. Hertfordshire: Harvester Wheatsheaf.
Loewenstein, G., Prelec, D., 1992. Anomalies in intertemporal choice: evidence and interpretation.
Quarterly Journal of Economics 107, 573-597.
Luce, R.D., 1959. Individual choice behavior: A theoretical analysis. Wiley.
Mas-Colell, 1985. The theory of general economic equilibrium; A differentiable approach. Cambridge:
Cambridge University Press.
Mas-Colell, A., Whinston, M.D., Green, J.R., 1995. Microeconomic theory. Oxford: Oxford University
Press.
Mattsson, L.-G., Weibull, J.W., 2002. Probabilistic choice and procedurally bounded rationality. Games
and Economic Behavior 41, 61-78.
Osborne, M.J, Rubinstein, A., 1994. A course in game theory. Cambridge, MA: MIT Press.
Phelps, E.S., Pollak, R.A., 1968. On second-best national saving and game-equilibrium growth. Review
of Economic Studies 35, 201-208.
Pratt, J.W., 1964. Risk aversion in the small and in the large. Econometrica 32, 122-136.
Rabin, M., 2000. Risk aversion and expected-utility theory: a calibration theorem. Econometrica 68,
1281-1292.
Rosenthal, R.W., 1989. A bounded-rationality approach to the study of noncooperative games. Interna-
tional Journal of Game Theory 18, 273-292.
85
Rubinstein, A., 2006. Lecture notes in microeconomic theory. Princeton NJ: Princeton University Press.
[Link]
Simon, H., 1955. A behavioral model of rational choice. Quarterly Journal of Economics 69, 99-118.
Simon, H.A., 1976. From substantive to procedural rationality. In: Method and Appraisal in Economics.
Latsis, S.J. (ed.), Cambridge University Press, pp. 129-146.
Starr, R.M., 1997. General equilibrium theory. Cambridge University Press.
Thaler, R., 1981. Some empirical evidence on dynamic inconsistency. Economics Letters 8, 201-207.
Varian, H.R., 1992. Microeconomic analysis. New York: W.W. Norton & Company, 3rd edition.
Voorneveld, M., 2006. Probabilistic choice in games: properties of Rosenthal’s t -solutions. International
Journal of Game Theory 34, 105-121.
Voorneveld, M., 2008. From preferences to Cobb-Douglas utility. SSE/EFI Working Paper Series in
Economics and Finance, No. 701.
Voorneveld, M., 2010. The possibility of impossible stairways: Tail events and countable player sets.
Games and Economic Behavior 68, 403-410.
Voorneveld, M., 2014. From preferences to Leontief utility. Economic Theory Bulletin, forthcoming.
Wärneryd, K., 2007. Sexual reproduction and time-inconsistent preferences. Economics Letters 95,
14-16.
86
Suggested solutions
These are (sometimes short) solutions to most exercises in the lecture notes. In solutions to the home
assignments and exam questions, you are expected to start from relevant definitions, and clearly deduce
and motivate your answers. Suggestions for improvements (and corrections of potential mistakes?) are
welcome!
1.1 (a) Each pair of words can be arranged in alphabetical order, so % is complete. Moreover, if word x is found
before or at the same place as (in case the words are identical) word y in the dictionary, and word y is found
before or at the same place as word z in the dictionary, then word x is found before or at the same place as
word z in the dictionary. Conclude that % is transitive.
(b) The binary relation % defined by “knows” is not necessarily complete or transitive. A violation of com-
pleteness occurs if there exist people who are unfamiliar with each other. Also violations of transitivity are
common: I know my wife, my wife knows her boss, but I do not know my wife’s boss.
1.2 (a) [Reflexivity of ∼] Let x ∈ X . By completeness of %: x % x and (simply changing the order of writing) x - x.
By definition of ∼: x ∼ x. Conclude that ∼ is reflexive.
[Symmetry of ∼] Let x, y ∈ X with x ∼ y. By definition of ∼, x % y and y % x. But this is also the definition of
y ∼ x. Conclude that ∼ is symmetric.
[Transitivity of ∼] Let x, y, z ∈ X have x ∼ y and y ∼ z. By definition of ∼, this means that x % y, y % x, y % z,
z % y. By transitivity of %, x % y and y % z give x % z. Similarly, z % y and y % x give z % x. Since x % z and
z % x: x ∼ z. Conclude that ∼ is transitive.
(b) [Irreflexivity of Â] Let x ∈ X . By definition of Â, x  x would require that x % x but not x % x, a contradiction.
Conclude that  is irreflexive.
[Asymmetry of Â] Let x, y ∈ X with x  y. By definition of Â, x % y but not y % x. By definition of Â, not
y  x. Conclude that  is asymmetric.
[Transitivity of Â] Let x, y, z ∈ X have x  y and y  z. By definition of Â, this means that x % y but not
y % x and that y % z, but not z % y. By transitivity of %, x % y and y % z give x % z. It is not true that z % x.
If it were, transitivity of % with z % x and x % y would imply z % y, contradicting y  z. Since x % z, but not
z % x: x  z. Conclude that  is transitive.
(c) Let x, y, z ∈ X have x ∼ y and y % z. By definition of ∼, this implies that x % y. As x % y and y % z, transitivity
of % gives x % z.
1.3 (a) Assume % is strongly monotonic. Let k ∈ {1, . . . , L} be one of the coordinates, let x ∈ X , and ε > 0. Then
x + εe k ≥ x and x + εe k 6= x, so by strong monotonicity, x + εe k  x. Conclude that % is strongly monotonic
in coordinate k.
Now assume that % is strongly monotonic in each of its coordinates and transitive. Let x, y ∈ X with x ≥ y
and x 6= y. To show: x  y.
Starting with x, change the coordinates one by one to those of y. Formally, let z(0) = x and, for each
k ∈ {1, . . . , L}, define z(k) = x + k`=1 (y ` − x ` )e ` . Then either z(k) = z(k − 1) if the k-th coordinates of x and y
P
are the same, or z(k − 1) Â z(k) by strong monotonicity in the k-th coordinate. By transitivity, we find that
x = z(0) Â z(L) = y.
(b) The preference relation % on R2+ with
is strongly monotonic in coordinate k for both k = 1 and k = 2, but not strongly monotonic: (2, 2) is not
strictly preferred to (1, 1). Notice: in line with (a), relation % is not transitive.
(c) No. The point (0, . . . , 0) ∈ RL+ cannot be improved upon: since “less is better”, (0, . . . , 0) Â x for every x ∈ RL+
with x 6= (0, . . . , 0).
87
(d) Yes. Notice that the issue above, that improvements beyond the zero vector are impossible if one is
constrained to vectors with nonnegative coordinates, disappears. Let x ∈ RL and ε > 0. Define y = x − 2ε e 1 ∈
RL . Then kx − yk = 2ε < ε and y ≤ x, with strict inequality in the first coordinate. Since “less is better”, y  x.
Conclude that % is locally nonsatiated.
x%y ⇔ x1 x2 ≥ y 1 y 2
is a weak order (check this yourself; in section 2 you will learn that preferences representable by a utility
function, here u(x) = x 1 x 2 , are always a weak order). It is monotonic:
If x, y ∈ R2+ have x ≥ y, write εi = x i − y i ≥ 0 for i = 1, 2. Then
x 1 x 2 = (y 1 + ε1 )(y 2 + ε2 ) = y 1 y 2 + ε1 y 2 + ε2 y 1 + ε1 ε2 ≥ y 1 y 2 ,
| {z } | {z } | {z }
≥0 ≥0 ≥0
so x % y.
If x, y ∈ R2+ have x > y, write εi = x i − y i > 0 for i = 1, 2. Then
x 1 x 2 = (y 1 + ε1 )(y 2 + ε2 ) = y 1 y 2 + ε1 y 2 + ε2 y 1 + ε1 ε2 > y 1 y 2 ,
| {z } | {z } | {z }
≥0 ≥0 >0
so x  y.
It is not strongly monotonic: (1, 0) ≥ (0, 0) and (1, 0) 6= (0, 0), but (1, 0) ∼ (0, 0), since 1 · 0 = 0 · 0 = 0.
(b) Reasoning as above, preference relation % over R2+ with
x%y ⇔ x2 ≥ y 2
x%y ⇔ x1 − x2 ≥ y 1 − y 2
is a weak order. It is strongly monotonic in coordinate 1, since the function x 7→ x 1 − x 2 is strictly increasing
in coordinate 1. It is not monotonic: (3, 3) > (2, 1), but (2, 1) Â (3, 3), since 2 − 1 = 1 > 0 = 3 − 3.
is strongly monotonic in coordinate 1, but not quasilinear in coordinate 1: let x = (1, 2) and y = (2, 1). Then
(x 1 + 1)(x 2 + 1) = (y 1 + 1)(y 2 + 1) = 6, so x ∼ y.
88
(d) The preference relation on R2+ with
x%y ⇔ 4x 1 + 3x 22 ≥ 4y 1 + 3y 22
satisfies all three monotonicity properties, but is not homothetic. For instance, (1, 0) Â (0, 1), as 4 · 1 + 3 · 02 >
4 · 0 + 3 · 12 , but 2(1, 0) ≺ 2(0, 1), as 4 · 2 + 3 · 02 < 4 · 0 + 3 · 22 .
1.6 (a) Let x, y ∈ RL+ have x ≥ y. For each n ∈ N, x n = x + (1/n, . . . , 1/n) ∈ RL+ satisfies x n > y, so x n % y (in fact, even
x n  y). Letting n → ∞, continuity implies that limn→∞ x n = x % y.
(b) Let x, y ∈ RL+ have x > y. Then min{x 1 , x 2 } > min{y 1 , y 2 }, so x % y, but not y % x, i.e., x  y. Let x = (2, 1) and
y = (1, 1). In both cases, you can only mix one unit of drink, but x wastes one unit of the first ingredient, so
even though x ≥ y, x ≺ y.
1.7 (a) Assume the first definition of convexity holds. Let y ∈ X . To show: {x ∈ X : x % y} is a convex set.
Let z, z 0 ∈ {x ∈ X : x % y} and α ∈ [0, 1]. Using completeness of %, we may assume w.l.o.g. that z % z 0 . By
convexity, αz + (1 − α)z 0 % z 0 % y, so αz + (1 − α)z 0 % y by transitivity of %.
Conversely, assume the second definition of convexity holds. Let x, y ∈ X with x % y and α ∈ [0, 1]. To show:
αx + (1 − α)y % y.
Elements x and (by completeness) y both lie in the set {x ∈ X : x % y}, which is convex by assumption, so it
also contains αx + (1 − α)y. Conclude that αx + (1 − α)y % y.
(b) Consider the preference relation % on R with
∀x, y ∈ R : x % y ⇔ x ≥ 0 > y.
For each y ∈ R:
½
; if y ≥ 0,
{x ∈ R : x % y} =
R+ if 0 > y,
is convex. Therefore, it satisfies the first convexity condition. However, if x = 1, y = −3, α = 1/2, then x % y,
but not αx + (1 − α)y % y, in violation of the second convexity definition.
2.1 (a) [Transitivity] Let x, y, z ∈ R satisfy x % y, y % z. By definition, x ≥ y +1 and y ≥ z+1, so x ≥ y +1 ≥ z+2 ≥ z+1,
so x % z.
[Violation of completeness] Completeness requires in particular that for each x ∈ R: x % x, i.e., that x ≥ x+1.
Clearly, this is not true.
(b) [Prop. 2.1(b) satisfied] Let x, y ∈ R. If x  y, then x % y, so x ≥ y + 1. Therefore, u(x) = x ≥ y + 1 > y = u(y).
Moreover, there are no x, y ∈ X with x ∼ y (as this would require x ≥ y + 1 and y ≥ x + 1), so the second
condition is vacuous.
[Prop. 2.1(a) violated] u does not represent %, since % is not complete and the order induced by u is.
2.2 (a) Suppose the collection of jumps in U is uncountable. Consider two distinct jumps (u 1 , u 2 ) and (v 1 , v 2 ). The
intervals (u 1 , u 2 ) and (v 1 , v 2 ) are disjoint by definition of a jump. Moreover, each such interval contains a
rational number, necessarily distinct from the one in the other interval, since these intervals are disjoint.
Therefore, there is an injective function from the uncountable set of jumps to the countable set of rational
numbers, a contradiction.
(b) C is the union of two countable sets J and R and therefore countable itself. Let x, y ∈ X with x  y. To show:
there are c 1 , c 2 ∈ C with x % c 1 Â c 2 % y.
Case 1: (u(y), u(x)) is a jump in U . By definition of J , there are points c 1 , c 2 ∈ J ⊆ C with utility u(c 1 ) = u(x),
u(c 2 ) = u(y). Hence x ∼ c 1 Â c 2 ∼ y, as in the requirement for Jaffray order-separability.
Case 2: (u(y), u(x)) is not a jump in U . Then (u(y), u(x)) ∩U 6= ;. By definition of R, there is a c ∈ R ⊆ C with
u(c) ∈ (u(y), u(x)). Now apply the reasoning so far to (u(c), u(x)). If it is a jump in U , Case 1 says that there
are c 1 , c 2 ∈ C with x ∼ c 1  c 2 ∼ c  y, as in the requirement for Jaffray order-separability. If it is not a jump,
repeating the construction of Case 2 says that there is a c 0 ∈ C with u(c 0 ) ∈ (u(c), u(x)), so that x  c 0  c  y,
as in the requirement for Jaffray order-separability.
89
(c) Let x, y ∈ X . If x  y, there exist, by Jaffray order-separability, c 1 , c 2 ∈ C with x % c 1  c 2 % y. Therefore,
{c ∈ C : c - x} ⊃ {c ∈ C : c - y}, as the former set includes c 1 , whereas the latter doesn’t. Conclude that
u(x) − u(y) ≥ 2−n(c1 ) > 0. If x ∼ y, then {c ∈ C : c - x} = {c ∈ C : c - y}, so u(x) = u(y).
2.3 (a) True. By definition of a continuous function, pre-images of open sets are open sets. Consequently, for each
x ∈ X , the sets
{y ∈ X : y ≺ x} = u −1 ((−∞, u(x)) and {y ∈ X : y  x} = u −1 ((u(x), ∞))
| {z } | {z }
open open
are open sets.
(b) False. The usual “greater than or equal to” order ≥ on R is represented by the continuous utility function
u : R → R with u(x) = x and hence, by (a), continuous. However, any strictly increasing function u : R → R
represents ≥, including the discontinuous function
½
x if x < 0,
u(x) =
x + 1 if x ≥ 0.
2.4 No. Lexicographic preferences (modified in such a way that you start comparing the second coordinates,
then the first) on R2+ constitute an example where preferences cannot even be represented by a utility
function. Let x, y ∈ R2+ have x 2 > y 2 . The modified lexicographic preference started by looking at these
second coordinates, so no matter how much money you add to the first coordinate of y, you will strictly
prefer x.
Here is an example where preferences can be represented by a utility function. It makes having a second
coordinate below one so bad, that you can never compensate this with money and make it look as nice as
an alternative whose second coordinate is at least one. The preference relation % on R2+ represented by the
utility function
Φ(x 1 ) + 1 if x 2 ≥ 1,
½
u(x) =
Φ(x 1 ) if x 2 < 1,
where Φ : R → (0, 1) is strictly increasing (like the cdf of a standard normal distribution), satisfies all properties
in Proposition 2.11, except (8).
Under additional assumptions (like continuity, monotonicity), the answer is yes. See Rubinstein (2006,
Lecture 4).
2.5 (a) Consider (a, 0) and (a 0 , 0) in X . Either (a, 0) ∼ (a 0 , 0), in which case we take m = m 0 = 0, or one of the
alternatives is strictly preferred over the other, w.l.o.g. (a, 0) Â (a 0 , 0). In the latter case, invoke the first
property to conclude that there is an amount of money m ∗ such that (a, 0) ∼ (a 0 , m ∗ ). Take m = 0, m 0 = m ∗ .
(b) W.l.o.g., m ≤ w. By the third property with c = w − m:
Therefore,
(a, m) % (a 0 , m 0 ) ⇔ (a ∗ , m 1 ) % (a ∗ , m 2 )
⇔ m1 ≥ m2
⇔ u(a, m) ≥ u(a 0 , m 0 ),
where the first equivalence follows from the fact that (a, m) ∼ (a ∗ , m 1 ) and (a 0 , m 0 ) ∼ (a ∗ , m 2 ), the second
equivalence from strong monotonicity in money, and the final one from (55).
90
2.6 (a) Let r ∈ R. If X u (r ) contains at most one element, it is convex. If it contains two or more, let x, y ∈ X u (r ) and
let α ∈ (0, 1). To show: αx + (1 − α)y ∈ X u (r ).
Without loss of generality, assume that x % y, so that u(x) ≥ u(y) ≥ r . By convexity of %: αx + (1 − α)y % y,
so u(αx + (1 − α)y) ≥ u(y) ≥ r , i.e., αx + (1 − α)y ∈ X u (r ).
(b) Let’s do the quasiconcavity part; strict quasiconcavity proceeds similarly. Assume u : X → R is quasiconcave.
Let y ∈ X . To show: {x ∈ X : x % y} is a convex set.
By definition, {x ∈ X : x % y} = {x ∈ X : u(x) ≥ u(y)} = X u (r ), with r = u(y). The latter set is convex by the
definition of a quasiconcave function under (a).
(c) A function u on a convex domain X is concave if its subgraph
is a convex set. Consider the weak order % on X = R represented by the utility function
½
0 if x ≤ 0,
u(x) =
1 if x > 0.
This preference relation is convex, as, for each y ∈ X , the upper contour sets are convex:
R
½
if y ≤ 0,
{x ∈ X : x % y} =
(0, ∞) if y > 0.
Suppose v : X → R were a concave utility function representing %. By definition, (−1, v(−1)) and (1, v(1)) are
elements of subgraph(v). Take α = 1/2 and consider the convex combination
1 1 1 1
(−1, v(−1)) + (1, v(1)) = (0, v(−1) + v(1)).
2 2 2 2
Since v(−1) < v(1), this point does not lie in the subgraph of v:
1 1
v(0) = v(−1) < v(−1) + v(1).
2 2
(αx + βy, f (αx + βy)) = (αx + βy, f (αx) + f (βy)) = (αx + βy, α f (x) + β f (y)) = αa + βb.
∀x i ∈ R : f i (x i ) = F (x i e i ).
91
To see that each f i must be additive, let x i , y i ∈ R. By additivity of F :
f i (x i + y i ) = F (x i e i + y i e i ) = F (x i e i ) + F (y i e i ) = f i (x i ) + f i (y i ).
y1
1 2 3
−1
−2
(b) Yes. Let Y ⊂ R2 be nonempty, compact. Since the function y 7→ y 1 is continuous, the set
3.2 (a)
u(x)
3
x
0 1 2 3 4
92
is the pre-image of an open interval. By Proposition 3.1, X contains a best element. By definition of %, this best
element is a maximum of f . Existence of a minimum can be established by applying the proposition to the weak
order %∗ with
∀x, y ∈ X : x %∗ y ⇔ f (x) ≤ f (y).
3.4 (a) Assume (X , B,C ) is rationalizable by the weak order % on X . Let A, B ∈ B, x, y ∈ A ∩ B , x ∈ C (A), y ∈ C (B ).
To show: x ∈ C (B ) = {z ∈ B : z % z 0 for all z 0 ∈ B }.
Since y ∈ A and x ∈ C (A) = {z ∈ A : z % z 0 for all z 0 ∈ A}: x % y. Let z 0 ∈ B . Since y ∈ C (B ): y % z 0 . Using x % y
and transitivity of %: x % z 0 . So x % z 0 for all z 0 ∈ B , i.e., x ∈ C (B ).
(b) No. Consider the choice structure with
X = {a, b, c, d }, B = {{a, b, c}, {b, c, d }},C ({a, b, c}) = {b},C ({b, c, d }) = {c}.
It trivially satisfies IIA: there are no distinct sets A, B ∈ B with A ⊆ B . It does not satisfy WARP: in the first
problem, b is revealed at least as good as c, in the second c is revealed at least as good as b. So b should
have been contained in C ({b, c, d }).
(c) No. The choice structure in (b) satisfies IIA, but is not rationalizable. Suppose, to the contrary, that %
rationalizes it. Since C ({a, b, c}) = {b}, we must have that b % c and b % a. Since C ({b, c, d }) = {c}, we must
have that c % b and c % d . But then b ∼ c, so c ∼ b % a implies c % a. But then c % y for all y ∈ {a, b, c}, so c
should have been included in C ({a, b, c}).
(d) No. Consider the choice structure with X = {a, b, c}, B = {{a, b}, {b, c}, {a, c}},C ({a, b}) = {a},C ({b, c}) =
{b},C ({a, c}) = {c}. As distinct choice sets have only one point in common, WARP is trivially satisfied.
It is not rationalizable, as a rationalizing % should satisfy a  b, b  c, c  a, in violation of transitivity.
3.6 (a) In B 1 , the first commodity has the highest price p 1 = 2, so spending wealth w = 2 on the first commodity
gives C (B 1 ) = {(1, 0)}. Similarly, C (B 2 ) = {(0, 1)}.
(b) Yes, there is no set-inclusion between the two choice sets, so IIA holds vacuously.
93
(c) No, bundles x = (1, 0) and y = (0, 1) lie in B 1 ∩ B 2 . Since x ∈ C (B 1 ) and y ∈ C (B 2 ), WARP would require
x ∈ C (B 2 ).
(d) No: C (B 1 ) = {x} would require x  y, whereas C (B 2 ) = {y} would require y  x.
(e) For instance:
x1 if p 1 > p 2 ,
u(x, p) = x if p 2 > p 1 ,
2
x1 x2 if p 1 = p 2 .
4.1 [Continuity:] As % is represented by the continuous utility function u, it is continuous. Formally, for each y ∈ X ,
is the preimage of a closed set under the continuous function u and therefore closed. Similarly, the set {x ∈ X : x - y}
is closed.
[Monotonicity, but not strong:] Take x, y ∈ RL+ with x ≥ y. There is an i ∈ {1, . . . , L} such that
u(x) = min{x 1 /a 1 , . . . , x L /a L } = x i /a i .
As x ≥ y, it follows that
u(0, . . . , 0) = u(1, 0, . . . , 0) = 0,
i.e., if you start with nothing, but get one unit of the first ingredient, you still cannot bake a cake due to lack of all
the other ingredients!
[Convexity, but not strict:] Let y ∈ RL+ and let u(y) = α. Then
is the intersection of convex halfspaces and therefore convex. For a violation of strict convexity, take x = (a 1 +
1, a 2 , . . . , a L ), y = (a 1 , . . . , a L ). Both vectors (and any convex combination) suffice to make one cake: for each
α ∈ (0, 1):
x ∼ y ∼ αx + (1 − α)y,
in contradiction with strict convexity.
[Homotheticity:] u is homogeneous of degree one.
94
4.3 Walrasian demand is homogeneous of degree one in wealth: for all (p, w) ∈ RL+1
++ and all α > 0, if x ∈ x(p, w), then
αx ∈ x(p, αw).
Proof: Suppose not: there is a z ∈ B (p, αw) with z  αx. Then y := (1/α)z ∈ B (p, w). As x ∈ x(p, w), x % y. As % is
homothetic, also αx % αy = z, contradicting that z  αx.
4.4 (a) Consider a consumer with utility function u(x) = x 1 + x 2 . Local nonsatiation is obvious. If p 1 > p 2 , the
consumer spends the entire income on the second commodity, so v(p, w) = w/p 2 if p 1 > p 2 . Increasing
p 1 even further does not affect indirect utility, i.e., indirect utility is not strictly decreasing in the price of
commodity 1.
(b) To show: for each sequence (p n , w n )n∈N in RL+1 L+1 n n
++ with limit (p, w) ∈ R++ , v(p , w ) → v(p, w).
Proof: For each n ∈ N, let x n ∈ x(p n , w n ), which is possible by the assumptions in Proposition 4.2. As
x n ∈ B (p n , w n ) for all n and (p n , w n ) → (p, w), the sequence (x n )n∈N eventually lies in the slightly enhanced
budget set B (p, w +1), which is compact: taking a subsequence if necessary, we may assume w.l.o.g. that the
sequence (x n )n∈N is convergent, with limit x ∈ X . The sequence (p n , w n , x n )n∈N satisfies the properties of
Proposition 4.1(b). In particular, x ∈ x(p, w), i.e., limn→∞ v(p n , w n ) = limn→∞ u(x n ) = u(x), by continuity
of u.
(c) Roughly speaking, because continuous preferences may be represented by discontinuous utility functions,
which may cause jumps in the indirect utility function as well.
For instance, suppose a consumer has continuous utility function U : R+ → R with U (x) = min{x, 1} and
hence continuous preferences. These preferences can also be represented by the discontinuous utility
function u : R+ → R with ½
x if x ≤ 1,
u(x) =
2 if x > 1.
Notice that ½
{w/p} if w ≤ p,
x(p, w) = £ ¤
1, w/p if w > p.
The indirect utility function given u is
½
w/p if w ≤ p,
v(p, w) =
2 if w > p,
with discontinuities at all points where p = w.
e(p, u) = p ·x
= α(p 1 · x) + (1 − α)(p 2 · x)
≥ αe(p 1 , u) + (1 − α)e(p 2 , u)
≥ αr 1 + (1 − α)r 2
= r.
95
4.6 Let (p, w) ∈ RL+1 0 L
++ and x = x(p, w). By Walras’ Law, p · x = w. For each p ∈ R++ , x is feasible in the UMP at prices p
0
0
and wealth p · x:
v(p 0 , p 0 · x) ≥ u(x) = v(p, w) = v(p, p · x).
So the function f : RL++ → R with f (p 0 ) = v(p 0 , p 0 · x) achieves its minimum at p 0 = p. By the first order conditions,
it partial derivatives must be zero at p:
By (15), indirect utility solves w = e(p, v(p, w)) = v(p, w) Li=1 a i p i , so v(p, w) = w/ Li=1 a i p i . By (17), x(p, w) =
P P
4.7
µ ¶
a w a w
h(p, v(p, w)) = (a 1 v(p, w), . . . , a L v(p, w)) = PL 1 , . . . , PL L .
i =1
ai p i i =1
ai p i
4.8 We know from (18) that h ` (p, u) = x ` (p, e(p, u)). Differentiating this equation w.r.t. p k and using the Chain rule
gives
∂h ` (p, u) ∂x ` (p, e(p, u)) ∂x ` (p, e(p, u)) ∂e(p, u)
= + .
∂p k ∂p k ∂w ∂p k
∂e(p,u)
Recall from (14) that ∂p k
= h k (p, u):
It follows from (15) and u = v(p, w) that e(p, u) = e(p, v(p, w)) = w and it follows from (18) and u = v(p, w) that
h(p, u) = h(p, v(p, w)) = x(p, w), so:
4.9 Indivisibilities, rationing, package deals, as well as the specific initial endowment ω = (1, 1) imply smaller budget
sets and therefore a (weakly) lower welfare. Rebates 1 and 2 and the gift certificate imply a larger budget set and
therefore a (weakly) higher welfare.
4.10 As p 1 · x 0 < w 1 , x 0 ∈ B (p 1 , w 1 ). As x 0 does not exhaust the budget, p 1 · y ≤ w 1 for all y with kx − yk sufficiently
close to zero. By local nonsatiation, this neighborhood contains a y strictly preferred to x 0 .
PL
4.11 Write A = a . Standard calculations give:
i =1 i
a1 w aL w
µ ¶
x(p, w) = , ..., ,
A p1 A pL
³ w ´A Y L a ai
µ ¶
i
v(p, w) = ,
A i =1 p i
L µ p ¶a i /A µ a aL
¶
i 1
u 1/A
Y
h(p, u) = ,..., ,
i =1 a i p1 pL
L µ p ¶a i /A
i
Au 1/A
Y
e(p, u) = .
i =1 a i
³ 1 ´A Q µ ¶a i
ai
Using u 1 = v(p 1 , w 1 ) = wA L
i =1 1 , local nonsatiation gives
pi
EV ((p 0 , w 0 ), (p 1 , w 1 )) = e(p 0 , u 1 ) − w 0
L p 0 a i /A
à !
1 1
− w 0.
Y
= w 1
i =1 p i
96
Likewise,
à !a /A
L p1 i
i
CV ((p 0 , w 0 ), (p 1 , w 1 )) w1 − w0
Y
= 0
.
i =1 p i
It is commonly assumed (w.l.o.g., as this is just a monotonic transformation of the utility) that A = 1, which yields
slightly more sympathetic expressions.
5.1 (a) Y ∩ {y ∈ RL : y ≥ −ω} is the intersection of closed sets, hence closed. It contains the zero vector 0.
(b) As the length of the vectors (y n )n∈N diverges
³ ´ infinity, ky n k ≥ 1 for n sufficiently large. By convexity and
to
possibility of inaction, z n = ky1 k y n + 1 − ky1 k 0 ∈ Y . By assumption, y n + ω ≥ 0, so dividing by ky n k gives
n n
z n + ω/ky n k ≥ 0.
(c) All vectors z n have length one. A bounded sequence contains a convergent subsequence. Let z be its limit.
Firstly, z 6= 0, as it is the limit of a sequence of vectors of length one. Secondly, as z n lies in Y for n large, and
Y is closed, also the limit z lies in Y .
(d) Letting n → ∞, and realizing that ω/ky n k → 0, (b) implies that z ≥ 0. As z 6= 0, this contradicts no free lunch.
5.2 Reasoning as in the EMP, the assumptions on f guarantee that the CMP is solvable.
(a) Define q z = f (z) ≥ 0. The CMP at (w, q z ) has a solution and z is feasible in this CMP, so c(w, q z ) ≤ w · z.
Conclude that p f (z) − w · z ≤ pq z − c(w, q z ).
(b) Let z q solve the CMP at (w, q): z q ∈ RL−1
+ , f (z q ) ≥ q, and c(w, q) = w · z q . Hence: p f (z q ) − w · z q ≥
pq − c(w, q).
(c) Assume (P1) has a solution z (the case where (P2) has a solution is similar). By (a), there is a feasible q z in
(P2) with equal or higher profit. It cannot be higher. Otherwise, by (b), there is a feasible z q z in (P1) yielding
a higher profit than q z and therefore higher than the profit maximizing z, a contradiction. Conclude that q z
solves (P2) and yields the same profit as z in (P1).
p
5.3 (a),(b) Consider the convex production set Y = {y ∈ R2 : y 1 ≤ 0, y 2 ≤ −y 1 }. The point (0, −1) ∈ Y maximizes profit
at price vector p = (1, 0) ∈ R2+ , but is not efficient, as also (0, 0) ∈ Y . The point (0, 0) ∈ Y is efficient, but does
not maximize profit at strictly positive prices.
(c) Consider the production set Y = {y ∈ R2 : y ≤ (1, 1), (y 1 − 1)2 + (y 2 − 1)2 ≥ 2}. The point
p (0, 0) ∈ Y is efficient,
but not profit maximizing for any nonzero vector p ∈ R 2 : if p ≥ p , then (1, 1 − 2) ∈ Y yields a positive
p + 1 2
profit, and if p 1 ≤ p 2 , then (1 − 2, 1) ∈ Y yields a positive profit, whereas (0, 0) ∈ Y yields only zero profit.
6.1 (a) Look at the definitions of improvements and Pareto optimality: the fact that the coalition S = H of all
consumers cannot improve upon x means that there is nothing feasible that makes everybody better off.
But there may still be room for improvement for some if not all consumers: it may still be Pareto dominated.
(b) Consider a pure exchange economy with two consumers and two commodities. The first consumer’s
preferences are represented by the utility function u 1 (x) = x 1 x 2 , the second consumer’s preferences by
a constant utility function: he is indifferent between all commodity bundles. If ω1 = ω2 = (1, 1), then
(p, x) = ((1, 1), (1, 1), (1, 1)) (i.e., prices are equal and each consumer sticks to the initial endowment) is a
Walrasian equilibrium. By Proposition 6.2, the allocation lies in the core. But the allocation is not Pareto
optimal: giving the total endowment to the first consumer makes him better off, while not affecting the
happiness of the second consumer.
(a) Let p ∈ RL+ , z ∈ z(p) ∩ RL− . By Walras’ Law, p · z = `:p ` >0 p ` z ` = 0. As the sum of nonpositive terms (p ` > 0
P
6.2
and z ` ≤ 0 give p ` z ` ≤ 0), it can be zero only if z ` = 0 whenever p ` > 0.
(b) Let p, z, ` be as in the statement of the exercise. As z k = 0 for k 6= `, Walras’ Law implies p · z = p ` z ` = 0. As
p ` > 0, this implies z ` = 0.
97
(c) If in equilibrium the market for good ` ∈ {1, . . . , L} does not clear, its price is zero by (a). So consumer h is
not constrained in his consumption of `. In equilibrium, h must choose a most preferred bundle from the
budget set, but there is none: under (c1), each bundle can be improved upon by adding more of good `;
under (c2), a most preferred bundle can’t lie on the axes, as h can afford a better alternative in RL++ ; the
latter can be improved upon by adding more of good `.
6.3 (a) Pareto dominance tries to compare allocations regardless of prices. Preferences of firms (profit) are functions
of prices. Observe: this doesn’t mean we do not care about firms. It’s just that firms are owned by the
consumers. Technically, in the private ownership economy, there aren’t other agents than the consumers
that matter.
(b) Let (p, x, y) be a Walrasian equilibrium of E . Suppose there is a feasible allocation (x̂, ŷ) Pareto dominating
(x, y). Local nonsatiation implies
³ ´
x̂ h %h x h ⇒ p · x̂ h ≥ p · x h = p · ωh + P θh f y f ,
f ∈F
∀h ∈ H : ³ ´
x̂ h Âh x h ⇒ p · x̂ h > p · x h = p · ωh + P hf y f .
f ∈F θ
By Pareto dominance, such a weak preference holds for all, and strict preference for some h ∈ H . Summing
over h ∈ H and using that equilibrium production plans (y f ) f ∈F are profit maximizing at prices p gives
x̂ h xh
X X
p· > p·
h∈H h∈H
³ X hf f ´
p · ωh + θ y
X
=
h∈H f ∈F
p ·ω+p · yf
X
=
f ∈F
p ·ω+p · ŷ f .
X
≥
f ∈F
6.4 P URE EXCHANGE ECONOMIES : You may verify that the following pure exchange economies E = (%1 , %2 , ω1 , ω2 )
have the desired property:
(a) Let %1 and %2 be lexicographic preferences over R2+ , ω1 = (1, 0), and ω2 = (0, 1). There is no Walrasian
equilibrium:
if p ∈ ∆ has both prices positive, then consumer 1 demands ω1 and consumer 2 demands (p 2 /p 1 , 0),
so there is excess demand for the first commodity;
if one of the commodities has price zero, demand for this commodity is unbounded.
if one of the commodities has price zero, demand for this commodity is unbounded: there are no
Walrasian equilibria at such prices;
if both prices are positive and p 1 > p 2 , the first consumer demands a bundle with 2x 1 = x 2 , i.e., the
bundle (p · ω1 /(p 1 + 2p 2 ), 2p · ω1 /(p 1 + 2p 2 )) and the second consumer spends the entire income on
the second commodity, i.e., demands the bundle (0, p · ω2 /p 2 ). In particular, demand for the second
commodity is at least twice the demand for the first commodity. As the total endowment of both
commodities is equal, not both markets can clear at the same time, contradicting the fact that (given
local nonsatiation) markets with a positive price must clear. There are no Walrasian equilibria at such
prices;
similarly, Walrasian equilibria with positive prices and p 2 > p 1 are ruled out;
98
if both prices are positive and equal, p = (1/2, 1/2), the first consumer’s demand is {(2/3, 4/3), (4/3, 2/3)}
and the second consumer’s demand is {x ∈ R2+ : x 1 + x 2 = 2}. There are two (equilibrium/market
clearing) allocations: ((2/3, 4/3), (4/3, 2/3)) and ((4/3, 2/3), (2/3, 4/3)).
(d) Preferences %1 , %2 are such that the consumers are indifferent between all commodity bundles; ω1 = ω2 =
(1, 1). Every (p, x) with p ∈ ∆ and x = (x 1 , x 2 ) ∈ R2+ × R2+ with x h ∈ B h (p, p ·ωh ) for both h = 1, 2 is a Walrasian
equilibrium.
P RIVATE OWNERSHIP ECONOMIES : Take the examples above and give the producers the trivial production set {0}
consisting of the remarkable feat of producing absolutely nothing using absolutely nothing. If you prefer slightly
larger production sets, you may want to choose them equal to R2− , containing all production plans producing
absolutely nothing, possibly using something.
The coalition of both women can improve upon any feasible allocation with x T ∈ (0, 1) by giving the liar the
entire baby, so x T ∈ {0, 1} in the core.
Combining the above gives that the core is
{(p, x T , x L ) ∈ R3 : p > 0, x T = 1, x L = 0} if ωT = 1.
7.1 (a) Best elements of G: those whose reduced simple gambles put largest probability on max{a 1 , . . . , a k }.
Worst elements of G: those whose reduced simple gambles put largest probability on min{a 1 , . . . , a k }.
1
P
(G1) SATISFIED : preferences represented by utility function u(g ) = |L(ga i ∈L(g ) a i .
)|
(G2) VIOLATED : assume w.l.o.g. that a 1 > a 2 and consider the gambles a 1 and (p ◦ a 1 , (1 − p) ◦ a 2 ).
If p > 1/2, a 1 is the most likely outcome in both gambles, so the DM is indifferent between them.
a +a
Continuity would require a 1 ∼ ( 12 ◦ a 1 , 12 ◦ a 2 ). However, at p = 1/2, the DM assigns value 1 2 2 < a 1
to the second gamble, so he strictly prefers the gamble giving a 1 for sure.
(G3) SATISFIED : preferences are defined in terms of reduced simple gambles: u(g ) = u(g s ).
99
(G4) VIOLATED : assume w.l.o.g. that a 1 > a 2 . Then the DM strictly prefers g = a 1 to g 0 = a 2 . Indepen-
dence requires that also
(α ◦ g , (1 − α) ◦ a 1 ) Â (α ◦ g 0 , (1 − α) ◦ a 1 )
for all α ∈ (0, 1). However, for α close to zero, a 1 is the most likely outcome in both gambles, so the
DM is indifferent between them.
As (G2) and (G4) are violated, Remark 1 implies that % cannot be represented by a vNM utility function.
(b) Best element of G: deterministic outcome max{a 1 , . . . , a k }. Worst elements of G do not exist: for each
g ∈ G, the gamble ( 12 ◦ g , 12 ◦ g ) has higher complexity and is therefore worse than g .
(G1) SATISFIED : preferences represented by a utility function.
Pk
(G2) SATISFIED : on G 1 , the DM’s utility function u(g ) = m=1 p m a m − 1 is continuous.
1 1
(G3) VIOLATED : the gambles a 1 ∈ G 0 and ( 2 ◦a 1 , 2 ◦a 1 ) ∈ G 1 both have reduced simple gamble (1◦a 1 ),
yet the former lies in G 0 and is therefore strictly preferred to the latter in G 1 .
(G4) VIOLATED : Let
g = a1 ∈ G0,
g0 = ( 12 ◦ a 1 , 12 ◦ a 1 ) ∈ G1,
g 00 = ( 12 ◦ g 0 , 12 ◦ g 0 ) ∈ G2.
Let α ∈ (0, 1). By construction,
(α ◦ g , (1 − α) ◦ g 00 ), (α ◦ g 0 , (1 − α) ◦ g 00 ) ∈ G 3 .
Hence
u(g ) = a 1 − 0,
0
u(g ) = a 1 − 1,
u(α ◦ g , (1 − α) ◦ g 00 ) = a 1 − 3,
u(α ◦ g 0 , (1 − α) ◦ g 00 ) = a 1 − 3,
in violation of (G4).
As (G3) and (G4) are violated, Remark 1 implies that % cannot be represented by a vNM utility function.
(c) To characterize the best and worst elements of G, distinguish two cases:
1. min{a 1 , . . . , a k } ≤ 5 < max{a 1 , . . . , a k }.
Best elements of G: those putting probability one on outcomes a m > 5 (utility equal to its
maximum, one).
Worst elements of G: those putting probability one on outcomes a m ≤ 5 (utility equal to its
minimum, zero).
2. Otherwise, if all a k exceed 5 or all a k are at most five, the utility function is constant (one in the
former case, zero in the latter), so all gambles are equivalent (and hence both best and worst
elements of G).
Shortcut: for each i = 1, . . . , k, define u(a i ) = 0 if a i ≤ 5 and u(a i ) = 1 otherwise. Then for every g ∈ G
with reduced simple gamble (p 1 ◦ a 1 , · · · , p k ◦ a k ), we have u(g ) = i :ai >5 p i = ki=1 p i u(a i ), i.e., this
P P
10.1 (a) If u has no upper bound, construct a sequence of instantaneous utilities (u(c t ))∞
t =0 with u(c t ) > 1/δ(t ) for
each time t . Then δ(t )u(c t ) > 1 at each time t and ∞ δ
P
t =0 t u(c t ) diverges.
(b) Let u be bounded by B ∈ R and let c = (c t )∞ t =0 be an arbitrary stream of choices. For each t , |δ(t )u(c t )| ≤
B δ(t ) and ∞ B δ(t ) = B ∞ δ(t ) converges. By the comparison test for summable sequences, also
P P
P∞ t =0 t =0
t =0 δ(t )u(c t ) converges.
10.2 (a1) k gives instantaneous utility u(h, k) = −1, ` gives instantaneous utility u(h, `) = 1, so the optimal action is `
with instantaneous utility 1.
100
(a2) k gives instantaneous utility u(d , k) = 0, ` gives instantaneous utility u(d , `) = −α, so the optimal action is k
with instantaneous utility 0.
(b) k gives expected discounted utility u(d , k)+δ·0 = 0, ` gives expected discounted utility u(d , `)+ 12 δ(u(h, `)+
u(d , k)) = −α + 21 δ(1 + 0) = 12 δ − α, so the optimal action is
1
2 δ − α < 0,
k if
1
k and ` if 2 δ − α = 0,
1
` 2 δ − α > 0.
if
(c) If the severity of the depression is relatively small ( 12 δ − α > 0), an initially depressed person may decide
not to take his life in the hope of becoming happy later while still having the option of suicide in case of
continued depression.
10.3 Preferring one apple today over two apples tomorrow means that
Preferring two apples one year and a day from now to one apple a year from now (and assuming we’re not in a leap
year) means that
(1 + 366α)−γ/α u(2) > (1 + 365α)−γ/α u(1).
These two inequalities hold simultaneously if
Given α, it remains possible to choose the exponent γ/α arbitrarily: having it equal to β simply means choosing
γ = αβ. So we can simplify the problem and show that there are α, β > 0 solving
1 β u(1) 1 + 365α β
µ ¶ µ ¶
< < ,
1+α u(2) 1 + 366α
or similarly
u(1) 1/β 1 + 365α
µ ¶
1
< < .
1+α u(2) 1 + 366α
Notice that α > 0 implies that
1 1 + 365α
0< < < 1,
1 + α 1 + 366α
The expression (u(1)/u(2))1/β is a continuous function of β > 0. As u(1)/u(2) ∈ (0, 1), it goes to zero to as β → 0 and
to one as β → ∞. By the Intermediate Value Theorem, there exists, for each α > 0, a β > 0 such that (u(1)/u(2))1/β
lies between the two desired bounds.
10.4 (a) By (iii), t -th period offspring gets the relevant gene with probability 12 from:
the DM’s lineage, who by assumption carries it with probability p t −1 ,
the randomly selected mate, who by (ii) carries it with probability α.
So p t = 21 p t −1 + 12 α.
³ ´0
(b) By induction on t . The claim is true for t = 0: α + (1 − α) 12 = 1 = p 0 . Now let t ∈ N and assume
³ ´τ
p τ = α + (1 − α) 12 for all τ < t . By (a),
µ µ ¶t −1 ¶ µ ¶t
1 1 1 1 1 1
p t −1 + α =
pt = α + (1 − α) + α = α + (1 − α) .
2 2 2 2 2 2
³ ´t
Hence the claim is true for t . By induction, p t = α + (1 − α) 21 for all t = 0, 1, . . ..
101
(c) Using (a), one obtains
1
u(1) p T +1 p T + 12 α 1 1 α
p T u(1) < p T +1 u(2) ⇔ < = 2 = + .
u(2) pT pT 2 2 pT
As u is strictly increasing: u(1)/u(2) < 1. Moreover, (b) implies that limT →∞ 12 + 12 pα = 1. So for T ∈ N
T
sufficiently large, two apples at time T + 1 is preferred to one apple at time T .
10.5 lim inft →∞ x t = c implies [L1] and [L2]: Let ε > 0. As limt →∞ inf{x s : s ≥ t } = c, there is a T ∈ N such that
c − ε < inf{x s : s ≥ T },
c − ε/2 < x t
i.e.,
inf{x s : s ≥ t } ≤ c + ε/2 < c + ε. (58)
Combining (57) and (58) gives that for each ε > 0 there is a T ∈ N such that
lim inf x t > 0 ⇔ ∃ε > 0 : x t > ε for all but finitely many t .
t →∞
(⇒): Assume lim inft →∞ x t > 0. If the liminf is infinite, the weakly increasing sequence of infima inf{x s : s ≥ t }
diverges, so there is a T ∈ N with inf{x s : s ≥ T } ≥ 1. In particular, x t ≥ 1 for all t ≥ T . If the liminf is finite, [L1] with
ε = c/2 implies that there is a T ∈ N with x t > c − ε = c/2 for all t ≥ T .
(⇐): Assume there is an ε > 0 such that x t > ε for all but finitely many t : there is a T ∈ N such that x t > ε for t ≥ T .
Then inf{x s : s ≥ t } ≥ ε for t ≥ T , so also the limit of the infima exceeds ε: it must be positive!
10.7 (a) If a sequence is unbounded, the liminf of average payoffs need not converge. For instance, the unbounded
sequence x = (x t )∞ 2 Pt −1
t =0 defined recursively by x 0 = 1 and, for all t ∈ N, x t = (t +1) − k=0 x k , has time average
1 PT −1
T t =0 x t = T , so its liminf diverges to infinity.
102
(b) Let x = (x t )∞ ∞
t =0 and y = (y t )t =0 be two bounded sequences. We need to investigate whether
1 TX−1 1 TX−1 1 TX
−1
lim inf (x t − y t ) > 0 ⇔ lim inf x t > lim inf yt . (59)
T →∞ T t =0 T →∞ T t =0 T →∞ T t =0
To see that this is not the case, let x = (0, 0, . . .) be the zero sequence. Substitution in (59) and using, for
any sequence z = (z t )∞t =0 , that lim inft →∞ −z t = − lim supt →∞ z t — where the limes superior is defined
analogously to liminf as lim supt →∞ z t = limt →∞ (sup{z s : s ≥ t }) — yields
1 TX−1 1 TX−1
lim sup yt < 0 ⇔ lim inf y t < 0.
T →∞ T t =0 T →∞ T t =0
This is obviously false. For an explicit example, take the sequence from page 70 with the oscillating average
and subtract 1/2 from each entry to obtain a sequence of averages with liminf equal to 1/3 − 1/2 = −1/6 < 0,
but limsup equal to 2/3 − 1/2 = 1/6 > 0.
1
The cost function c is strictly convex, so the function n p π(i ) − 2δ
P
11.1 i =1 i
c(p) is strictly concave. Since we maximize a
strictly concave, continuous function over a compact set, a maximum exists and is unique. Notice that the gradient
of the goal function has i -th coordinate
1 ∂c(p)
µ ¶ µ ¶
1 1 1 1
π(i ) − = π(i ) − 2 pi − = π(i ) − pi − .
2δ ∂p i 2δ n δ n
Since the feasible set is entirely defined by linear (in)equalities, the Kuhn-Tucker conditions give necessary and
sufficient conditions for a solution to be a maximum. So p ∗ ∈ ∆ solves the maximization problem if and only if
there are Lagrange multipliers λi ≥ 0 associated with the inequality constraints p i∗ ≥ 0 and µ ∈ R associated with
the equality constraint n p ∗ = 1 such that for each i = 1, . . . , n :
P
i =1 i
µ ¶
1 ∗ 1
π(i ) − pi − + λi + µ = 0 and λi p i∗ = 0. (60)
δ n
Rewriting we find
¢ 1
∀i = 1, . . . , n : p i∗ = δπ(i ) + δ λi + µ + .
¡
n
Assume that p ∗ solves the maximization problem. We check that it satisfies the linear probability model with
parameter δ. If p i∗ > 0, then λi = 0 by complementary slackness. Hence for every j ∈ A, we find, using (60):
· ¸ · ´ 1¸
1 ³
p i∗ − p ∗j = δπ(i ) + δµ + − δπ( j ) + δ λ j + µ +
n n
= δ(π(i ) − π( j )) − δλ j
≤ δ(π(i ) − π( j )),
where the inequality follows from the fact that δ > 0 and λ j ≥ 0. This is exactly requirement (52).
Conversely, if p ∗ ∈ ∆ satisfies requirement (52), one can easily show that it satisfies the Kuhn-Tucker conditions.
Recall that if p i∗ > 0 and p ∗j > 0, then
p i∗ − p ∗j = δ(π(i ) − π( j )),
so µ ¶ µ ¶
1 ∗ 1 1 ∗ 1
pi − − π(i ) = pj − − π( j ). (61)
δ n δ n
Hence if we choose i ∈ {1, . . . , n} with p i∗ > 0 and define
µ ¶
1 ∗ 1
µ= pi − − π(i ) ∈ R,
δ n
we have from (61) that µ ¶
1 ∗ 1
µ= pj − − π( j )
δ n
103
for all j with p ∗j > 0. Now define for each k :
if p k∗ > 0,
(
0
λk = 1
³ ´
∗ − 1 − π(k) − µ
δ
p k n if p k∗ = 0.
To see that λk ≥ 0 if p k∗ = 0, choose an alternative j with p ∗j > 0. By definition of the linear probability model,
p ∗j − p k∗ ≤ δ(π( j ) − π(k)),
which implies
1³ ∗ ´
(π( j ) − π(k)) − p j − p k∗ ≥ 0.
δ
Hence
µ ¶
1 ∗ 1
λk = pk − − π(k) − µ
δ n
µ ¶ µ ¶
1 ∗ 1 1 ∗ 1
= pk − − π(k) − pj − + π( j )
δ n δ n
1 ∗
³ ´
= (π( j ) − π(k)) − p − p k∗
δ j
≥ 0,
as we had to show. Substituting the definition of the Lagrange multipliers in (60) shows that the Kuhn-Tucker
conditions are satisfied.
11.2 (a) Choice probabilities are weakly increasing in payoffs, so the probability of choosing 1 must be positive. If
also the probability of choosing 2 is positive, the linearity requirement implies
11.3 (a) In the logit model with parameter δ > 0, the choice probability for each alternative i ∈ A is
exp(π(i )/δ)
P A (i ) = P . (64)
j ∈A exp(π( j )/δ)
exp(0/δ)
P A (1) =
exp(0/δ) + exp(2/δ) + exp(8/δ)
1
= ,
1 + exp(2/δ) + exp(8/δ)
exp(2/δ)
P A (2) = ,
1 + exp(2/δ) + exp(8/δ)
exp(8/δ)
P A (3) = .
1 + exp(2/δ) + exp(8/δ)
Since the exponential function takes strictly positive values, all choice probabilities lie in (0, 1).
104
The logit model is a special case of Luce’s choice model (see (42) and (45)), which satisfies path
independence. Hence the logit model satisfies path independence.
As δ → ∞, the choice probabilities converge to 1/3. See the motivation in Section 11.2.
(b) Choice probabilities P A (i ) for all alternatives i ∈ A satisfy the linear probability model with parameter
δ > 0 if the following holds:
if P A (i ) > 0, then P A (i ) − P A ( j ) ≤ δ(π(i ) − π( j )) for all j ∈ A. (65)
Since choice probabilities are weakly increasing in payoffs and π(3) > π(2) > π(1), there are three cases
to consider:
– Case 1: P A (i ) > 0 for all i ∈ A.
– Case 2: P A (3), P A (2) > 0, P A (1) = 0.
– Case 3: P A (3) > 0, P A (2) = P A (1) = 0, or equivalently, P A (3) = 1.
Using (65), the first case requires:
105
Using (65), the third case requires:
So choice probabilities P A (1) = P A (2) = 0, P A (3) = 1 satisfy the linear probability model as long as
δ ≥ 1/6.
The linear probability model does not satisfy path independence for every δ > 0. In particular, we will
show that for a specific value of δ > 0, P A (1) 6= P A ({1, 2})P {1,2} (1). This means that we have to consider
choice probabilities in the smaller problem with only alternatives 1 and 2. Let us assume that both
P {1,2} (1) and P {1,2} (2) are positive. This requires that
so
1 − 2δ 1 + 2δ
P {1,2} (1) = , P {1,2} (2) = .
2 2
These choice probabilities satisfy the linear probability model as long as δ ∈ (0, 1/2). Now let us choose
δ = 1/20. Then
1 − 10δ 1
P A (1) = =
3 6
but
µ ¶
1 − 10δ 1 − 4δ 1 − 2δ
P A ({1, 2})P {1,2} (1) = +
3 3 2
2 − 14δ 1 − 2δ
= ·
3 2
(1 − 7δ)(1 − 2δ)
=
3
39
=
200
1
6= .
6
As δ → ∞, it follows from our earlier analysis that Case 3 is the only feasible one: the decision maker
rationally chooses alternative 3 with probability one.
11.4 Suppose p i < p j . Exchange the probabilities assigned to the i -th and j -th alternative to obtain a vector p 0 . By
construction, n p 0 π(i ) > n p π(i ), and by symmetry, the control cost term is unaffected, contradicting that
P P
i =1 i i =1 i
p solves P (δ).
106