0% found this document useful (0 votes)

11 views13 pages

Random Variables

Uploaded by

tangentofthelpha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views13 pages

Random Variables

Uploaded by

tangentofthelpha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 3: Random Variables and Expectation

Rajat Mittal

IIT Kanpur

We have looked at the basics of probability (and conditional probability). For a random experiment, three
most important things were sample space, σ-field and probability distribution function. More often than not,
σ-field is the power set of the sample space and not explicitly specified. From our definition, sample space is
just a set, elements could be anything. They can be colors, persons, faces of dice, numbers and so on.
Many a times, we are interested in a numerical value associated to the outcomes of the experiment (sample
space). For example, number of heads in a sequence of tosses, pay out of a lottery, number of casualties after
a Hurricane etc. These functions, which assign a numerical value to the outcomes of the experiment, are
called random variables. In this lecture, we will look at random variables and their properties.

1 Random variable

Given the sample space Ω of an experiment, a random variable is a function X : Ω → R. That means, a
random variable assigns a real value X(ω) to every element ω of the sample space.

Note 1. In general, the range of a random variable X need not be R. It could be any other set with more
structure (like real numbers are ordered; they can be added, multiplied etc.). When range is real numbers,
such random variables are called real valued. In this course, we will mostly be interested in real valued random
variables, and we will simply call them random variables.
If the range of X is countable then X is called a discrete random variable. We will initially focus on
discrete random variables.
A random variable gives rise to events of the type X = x, which is associated with a subset X −1 (x) of
the sample space Ω. Given a probability function P on Ω, it can be naturally extended to the probability of
the random variable,
X
PX (x) := P (X = x) = P (ω).
ω:X(ω)=x

This is called the probability mass function of a random variable. Let’s look at some examples of random
variables and their probability mass function.

– Suppose you toss a fair coin 10 times. The sample space Ω is the set of all sequences of length 10 made
up with H, T . Define a random variable X : Ω → R to be the number of heads in the sequence. That is,
X(ω) is the number of H’s in ω.

Exercise 1. Show that the probability of getting a sequence with k heads for a length 10 sequence is,
k 10−k
10 1 1
.
k 2 2

The probability mass function of the random variable for k between 0 and 10 becomes,
10
1 10
P (X = k) = .
2 k

Exercise 2. Generalize the probability mass function if the probability of obtaining head is p.
– For an experiment, we ask the birthday’s of students in a class one by one (picking the next student
randomly). We stop as soon as we find two people with matching birthday. What is the probability
mass function for the random variable X which counts the number of students queried? What is the
probability that we queried k people?
If we had to query k people, then the first k − 1 birthdays are distinct and the last one matches at least
one of the first k − 1. First birthday will not match with anyone before that with probability 1. Second
person’s birthday will not match with probability 364/365 and so on. The birthday of the last person
needs to match, happens with probability (k − 1)/365.
Hence,
(k − 1) 365
P r(X = k) = (k − 1)! .
365k k−1

Exercise 3. Calculate these numbers using a calculator for k up to 23.

– In a set of 1000 balls, 150 balls have some defect. Say, we choose 50 balls and inspect, then let X be the
random variable which denotes the number of defected balls found.
The probability mass function of X is non-zero for x = 1 to 50. It is given by,
150
850
k 50−k
P r(X = k) = 1000
.
50

Exercise 4. What if we take out the balls one by one and return them to the set before the next pick.

Functions of random variables: We can have many random variables for a random experiment. To start with,
we can take any function g : R → R and a random variable X to create another random variable Y = g(X)
on the same sample space.

Exercise 5. Notice that Y can be thought of as a function from Ω to R, how? What will be the range of Y
in terms of range of X?

It is not difficult to come up with an expression for the probability mass function (PMF) of the new
random variable Y .
PY (y) =P (Y = y)
=P (g(X) = y)
X
= P (X = x) (1)
x:g(x)=y
X
= PX (x)
x:g(x)=y

Let us take an example, suppose X is the random variable which counts the number of heads in a sequence
of 10 coin tosses (where probability of head is p), remember

k 10−k 10
P (X = k) = p (1 − p) .
k

Define Y = X 2 , then the PMF of Y would be

√ √

10
P (Y = k) = p k
(1 − p)10− k √ ,
k

if 0 ≤ k ≤ 100 is a perfect square and 0 otherwise.

2
On a similar note, we can have functions of multiple random variables, like X +Y, XY etc. Let us consider
X + Y , where X, Y are two random variables on the same sample space. What would be the PMF of X + Y ,
given the PMF of X, Y . Intuitively,
X
P (X + Y = k) = P (X = x, Y = y).
x,y:x+y=k

What should be P (X = x, Y = y) in terms of P (X = x) and P (Y = y)? To illustrate the difficulty,

consider that we roll a dice twice and X be the outcome on first roll and Y be the outcome on the second
roll. You would guess that P (X = x, Y = y) = P (X = x)P (Y = y). It is good for this case, but what if
X = Y , would you want to still say P (X = x, Y = y) = P (X = x)P (Y = y). Figuring out P (X = x, Y = y)
in terms of P (X = x) and P (Y = y) is a deep question and we will be spend a lot of time in the next lecture
discussing this question (conditional probability and independence).
For now, let us skip this issue and just define joint probability mass function. The joint probability mass
function of X, Y is defined to be
PX,Y (x, y) := P (X = x and Y = y).
P
Notice that a valid joint
P distribution will satisfy x,y P (X = x, Y = y) = 1. If individual PMF’s are
given then P (X = x) = y P (X = x, Y = y) and a similar equation for P (Y = y) should also be satisfied.
Otherwise, in case only joint PMF is given, it will tell us the individual PMF by these equations.
If we want to compute the PMF of X + Y (or any function of X, Y ), we need the joint probability mass
function of X, Y .

2 Expectation
Remember that random variables were introduced because of our interest in numerical values associated
with the set of outcomes. These real values allow us to talk about their cumulative behavior (like average
and deviation). For example, if you get 1 rs. for a heads and 2 rs. for tails, you might want to estimate your
earnings in a sequence of 10 tosses.
The most basic cumulative quantity is called the expectation of a random variable. It is easy to define if
all outcomes are equally likely (known as average). In this case, if Ω = {ω1 , · · · , ωn }, we would expect to
get the average
X(ω1 ) + X(ω2 ) + · · · + X(ωn )
.
n
Taking this idea further, the expected value of a random variable X is defined as,
X
E[X] := P r(X(ω) = x)x.
x∈R

When random variables are discrete, range of X might be much smaller than R (by range of X, we mean
values attained by X with non-zero probability). Let R denote the range of X, simply
X
E[X] := P r(X(ω) = x)x.
x∈R

It is a common misinterpretation, probably because of name, that X will attain value E[X] with high
probability. It is easy to construct cases where E[X] might not even be in R.
Exercise 6. Construct a random variable such that E[X] is not in R.
The correct intuition is: if we independently repeat the experiment multiple times, then with high prob-
ability the average outcome will be close to expectation. This intuition will be formalized in later parts of
the course.
To start, solve a simple problem on expectation.

3
Exercise 7. In a probabilistic experiment, you get 100 Rs. every time an odd number shows up on a dice.
You loose 100 Rs. every time an even number shows up. What is your expected earning.
How about some more examples of expectation?
– Your friend is ready to give you 100 Rs. if on a throw of a dice, an odd prime turns up. What amount
can you give him if the number is not an odd prime?

Exercise 8. What is the random variable in this case?

You cannot always be certain to win this game if you bet any positive amount. We can define the bet
to be profitable if your expected profit is greater than zero (that means, at least you win when the
experiment is repeated multiple times).
If you bet x Rs., then the expected earning should be greater than zero, 1/3 × 100 + 2/3 × (−x) ≥ 0. So
you can agree to pay any amount less than 50. This example suggests that in a fair bet, the expected
profit/loss should be zero.

Exercise 9. Suppose the expected value of a random variable X is zero. What is the expected value of
−X? What is the expected value of αX for a constant α?

– You toss a coin till you get head (biased coin with head probability p). What is the expected number of
tosses? Clearly, the random variable is the number of tosses.

Exercise 10. What was the sample space though? Show that P r(X = k) = (1 − p)pk−1 .

The expected number of tosses is, X

E[X] = k(1 − p)pk−1 .
k
k−1
, then pS = (1 − p) k kpk . Subtracting these two equations tells us that
P P
Let S = k k(1 − p)p
S = 1/p.
– Your friend asks you to bet on the rise/fall of stock market. Both of you put 100 Rs. in a pot and guess. If
the guesses are same then both get 100 Rs., otherwise the one with the correct guess gets all the money.
Is this bet fair, assuming that none of you have any clue about share market?
Suddenly, your friend wants to include her brother’s share also. She proposes to put 200 Rs. in the pot
and will guess for both her and her brother. Should you take the bet?
Suppose you friend puts opposite guesses for her and her brother all the time. If you guess is correct
then you will get 150 Rs. and if you incorrectly guess then you get 0 Rs.. The expected value of your pot
earning is 75 Rs., which is less than the amount you put in the pot. So it is not an advisable bet.
One point of caution is that expectation need
Pnot be defined all the time.
P Let X be a random variable such
that P r(X = k) = π26k2 . It can be shown that k P r(X = k) = 1 but k P r(X = k)k is not convergent.
We saw that if X is a random variable then so is Y = g(X) where g is any function from R to R. Then,
X
E[Y ] = E[g(X)] = g(x)P r(X = x).
x∈R
P
Note 2. We need to assume that x |g(x)|P r(X = x) converges.

2.1 Linearity of expectation

One of the most important property of expectation is that it is linear. This might seem like a straightforward
property of expectation, but its applications are huge and hence deserves a separate section.
Given two random variables X and Y on the same sample space Ω,

E[X + Y ] = E[X] + E[Y ].

4
Note 3. Here say X : Ω → R and Y : Ω → R, then random variable X + Y : Ω → R is defined as
X + Y (ω) = X(ω) + Y (ω), ∀ω ∈ Ω.
This property is known as linearity of expectation. To get some intuition, think about averages. If you
know the average marks of students in each subject, can you find the average total marks (sum of marks)
over all students.
Proof. The expectation E[X + Y ] is given using the probability mass function P r(X = x, Y = y).
X
E[X + Y ] = (x + y)P r(X = x, Y = y)
x,y
X
= xP r(X = x, Y = y) + yP r(X = x, Y = y)
x,y
X X X X
= x P r(X = x, Y = y) + y P r(X = x, Y = y) (2)
x y y x
X X
= xP r(X = x) + yP r(Y = y)
x y

= E[X] + E[Y ]

The linearity of expectation can be extended to more than two events using induction.
Exercise 11. What will be the statement?
The property seems almost obvious and even the proof is pretty straightforward. In spite of that, linearity
of expectation is one of the very important tools and is used very frequently. One of the reason being, nothing
is assumed about the relationship between X and Y . The random variables need not depend on each other in
any way, still linearity of expectation holds true. In contrast, E[XY ] = E[X]E[Y ] is not in general true, more
will be said about it in future lectures. You will see the applications of linearity of expectation throughout
this course, below we look at examples.

A simple example: Remember the n envelope and letter problem. We have n letters and n envelopes,
numbered from 1 to n. If we assign the letters to envelopes randomly, how many of the letters do we expect
to get into the correct envelope?
Let C be the random variable which captures the number of correct letters to envelopes. The expected
value of C is X
E[C] = kP (C = k).
0≤k≤n

We have calculated P (C = k) before (inclusion-exclusion), it is a mess and it will be a further mess to put
it in the summation. Linearity of expectation gives us a way out. Suppose Ci denotes the random variable
which is 1 when i-th letter goes to the correct envelope and 0 otherwise.
Note 4. Such random variables are called indicator random variables which take value 0 or 1. They indicate
if a certain event happens or not.
Exercise 12. What is C in terms of Ci ’s?
Some thought shows that C can be expressed as summation of Ci ’s. The expectation of C can be computed
easily now.
E[C] =E[C1 + C2 + · · · + Cn ]
= E[C1 ] + E[C2 ] + · · · + E[Cn ]
(3)
= nE[C1 ]
=1
For the last two equations, you need the following exercise.

5
Exercise 13. Show that E[Ci ] = P [i-th letter goes in correct envelope] = 1/n.
It is not just that the calculation is simple, the same calculation goes through with much lesser assump-
tions on letter to envelope problem. We just need that each letter goes to correct envelope with probability
1/n, it doesn’t matter if all permutations are considered or equally likely.

Another example: Suppose you want to collect stickers which accompany your favorite chewing-gum. There
are n different stickers numbered 1 to n. Every time you buy a chewing-gum, one sticker comes out of n
with equal probability. What is the expected number of chewing-gums you need to buy to collect all possible
stickers.
Let T be the random variable which counts the number of packets to be bought to collect all the different
n stickers. We will define T as sum of random variables, these random variables need to be defined carefully.
Let S1 be the random variable that we get first distinct sticker (clearly S1 = 1). Let S2 be the extra
number of chewing-gums for getting second different sticker, similarly define Sk . Intuitively, you might want
to define S1 as the number of days to collect sticker number 1 and so on. You will show in the assignment
that this strategy will not work. So, Si is defined as the number of days to get i-th different sticker (sticker
number could be anything) after we got i − 1 different sticker.
We need to calculate E[T ] = E[S1 + S2 + · · · + Sn ]. By linearity of expectation, we only need to worry
about E[Sk ]. The probability that Sk = r is,

r−1 k−1
P r(Sk = r) = ((k − 1)/n) 1− .
n
n
Exercise 14. Show that E[Sk ] = n−(k−1) .

n
P
This implies that the expected number of days needed to collect all stickers is E[T ] = k n−(k−1) . Since
we know that 1 + 1/2 + 1/3 + · · · approach ln n, E[T ] is around n ln n.

2.2 Conditional expectation

The scope of a random variable X could be restricted to an event B, denoted by X|B . This means that we
can define conditional distribution of X|B .

Exercise 15. What should be PX|B (x)?

The conditional probability distribution P (X = x|B) should give the probability of X = x given that
event B has happened. Intuitively,

P (B ∩ X −1 (x))
PX|B (x) = P (X = x|B) = .
P (B)

Similarly, we can also talk about conditional expectation, similar to the expectation of a random variable.

Exercise 16. What should it mean? How should you define it?

It seems that the conditional expectation, E[X|A], should be the average

P value of random variable X
given that event A has happened. Our first guess would be, E[X|A] = ω∈A P (ω)X(ω). The only thing to
notice, it should be P (ω|A) and not just P (ω). This gives us the correct formula,
X 1 X X
E[X|A] = P (ω|A)X(ω) = P (ω)X(ω) = xP (X = x|B).
P (A)
ω∈A ω∈A x∈Range(X)

Exercise 17. Why did we get the second equality?

6
We can also generalize the partition formula, given disjoint events B1 , B2 , · · · , Bk ,
k
X
E[X] = P (Bi )E[X|Bi ].
i=1

Let us look at a problem. Suppose a miner, stuck in a mine, has to choose one door out of three. One
door takes him back to where he is but it will waste his 7 hours. Second door takes him out in 3 hours. The
last one again get him back to the starting point, but requires 5 hours. If he chooses a door uniformly, what
is the expected time in which he will get out of the mine?
Let time taken to get out be the random variable X. He chooses door D1 , D2 , D3 (use the same names
for corresponding events) with equal probability. Then,
3
X
E[X] = P (Di )E[X|Di ].
i=1

We know that P (Di ) = 1/3 and E[X|D2 ] = 3.

Exercise 18. What is E[X|D1 ] and E[X|D3 ]?

Substituting the values in the equation.

3E[X] = 7 + E[X] + 3 + 5 + E[X].

We expect that miner will get out in 15 hours.

The concept of conditional probability allows us to have an expression for joint probability distribution
function too. Remember that the joint distribution for two random variables X, Y was defined as pX,Y (x, y) =
P (X = x, Y = y). The right hand side is the probability that both events, X = x and Y = y, happen. Using
conditional probability,

P (X = x, Y = y) = P (X = x|Y = y)P (Y = y).

Exercise 19. Extend this expression to calculate P (X1 = x1 , X2 = x2 , · · · , Xn = xn ).

It seems like the joint distribution is easy to calculate if X, Y are independent. What should it mean for
two random variables to be independent?
Remember, a random variable X is a mapping from sample space to real numbers, X : Ω → R. Intuitively,
all events of kind X = x and Y = y should become independent.
Formally, two random variables X and Y are independent iff,

P (X = x ∩ Y = y) = P (X = x)P (Y = y) ∀x, y.

Here x, y are in the range of X, Y respectively.

When X, Y are independent, P (X = X|Y = y) = P (X = x) and we get P (X = x, Y = y) = P (X =
x)P (Y = y). Otherwise, we need the joint distribution explicitly or the relation between X and Y .

2.3 Variance
We have already seen one aggregate measure of a random variable, expectation. Clearly, expectation captures
only a very small amount of information about the random variable. It was emphasized before, expectation
does not imply that we get E[X] with high probability. In other words, a random variable which always
takes value 0 has same expectation as the one which takes value −2 and 2 with equal probability (even one
which takes values 1000 and −1000 with same probability). Expectation does not distinguish between these
cases.

7
This gives rise to another measure of interest, called the variance of a random variable. The idea is to
measure how far X could be from E[X].
Your first guess might be E[X −E[X]], but we need to take care of the signs of random variable X −E[X].
The variance of a random variable X is defined as,

Var [X] := E[(X − E[X])2 ).

Exercise 20. Show that

Var [X] = E[X 2 ] − (E[X])2 .

You can easily calculate the variance of a random variable taking value α, −α with equal probability. It
increases rapidly as the value of α increases, even though the expectation remains same.
A very related quantity is called the standard deviation and is defined as the square root of the variance.
p p
σ(X) := Var [X] = E[X 2 ] − E[X]2 .

Define a random variable X to be 0 for tails and 1 for head, where coin gets head with probability p.
This is called a Bernoulli random variable with parameter p.

Exercise 21. What is the expectation of this random variable? What is the expectation of X 2 ?

From this exercise, it is easy to find the standard deviation of X,

p p
σ(X) = p − p2 = p(1 − p).

Exercise 22. For what value of p is this maximum? Does it agree with your intuition?

3 Some important random variables and their distribution

Some of the well-known random variables and their distributions occur quite frequently in practice (many a
times in disguised form). We look at them below and to get more experience, also calculate their expectation.

Bernoulli random variable This is one of the simplest random variables, you have already seen it. Think
of a biased coin which lands head with probability p (sometimes called the success probability). Bernoulli
random variable, X, is obtained by assigning 1 to heads and 0 to tails.

Exercise 23. What is the sample space Ω? Write the random variable and its probability mass functions
explicitly (as a function).

Now, the expectation can be calculated,

E[X] = 1 · P (X = 1) + 0 · P (X = 0) = p.

We calculated its variance in the previous section.

Binomial random variable The coin from the previous example can be tossed multiple times. Define X
to be the random variable which counts the number of heads when the coin is tossed n times. Again, we can
write the probability mass function,

n k
P (X = k) = p (1 − p)n−k .
k

Note 5. This expression works for k even when it is larger than n.

8
Exercise 24. Calculate the expectation value of X?

You probably applied the formula and found the expectation to be np by applying your knowledge of
binomial coefficients. How about a totally different way? We can express the binomial random variable X as
the sum of n Bernoulli random variables X1 , X2 , · · · , Xn . So,
n
X
E[X] = E[X1 + X2 + · · · + Xn ] = E[Xi ] = np.
i=1

We see that linearity of expectation makes it easier to calculate the expectation. The variance could also
be calculated similarly by realizing that X1 , X2 , · · · are not related to each other and are independent. We
will see a formal definition later and the property of variance when the terms are independent. For now, you
will calculate the variance directly (assignment question).

Geometric random variable Again, we have a biased coin with probability p of getting head. Suppose,
we toss the coin till we get a head. Geometric random variable, X, counts the number of trials for such an
experiment.

Exercise 25. What is the sample space and probability distribution function for such an experiment?

With some thought,

P (X = k) = (1 − p)k−1 p.
This allows us to calculate the expectation,
∞
X ∞
X
E[X] = kp(1 − p)k−1 = p k(1 − p)k−1 = 1/p.
k=1 k=1

Exercise 26. How do you derive the last equality?

You can calculate the variance of geometric random variable by computing E[X(X − 1)], it comes out to
1−p
be p2 .
We can also stop the experiment only after getting n heads. The random variable which counts the
number of trials in this case, is called a negative binomial random variable.

Exercise 27. Calculate the expected value of a negative binomial random variable.

3.1 Continuous random variables

We can also define a continuous random variable, when the sample space is uncountable. For instance, Ω = R
could be the sample space and then the random variable will be a function X : Ω → R. What would be the
probability mass function? We discussed that P (X = r) for any real r should be zero.
In case of continuous random variables, we define probabilities of intervals. Specifically, the events of
interest are X ≤ r := {ω ∈ Ω : X(ω) ≤ r}. For this definition to make sense, the event X ≤ r should be in
sigma field. This comes from conditions imposed through measure theory, and that discussion is beyond the
scope of this course. For us, we will assume that the events X ≤ r are part of our sigma field.
Now we can define the probability distribution function to be F (x) := P (X ≤ x) in general. Throughout
this course we assume that F is going to be continuous and there exist a non-negative function f : R → R,
such that, Z a Z ∞
F (a) = f (x) dx and specifically f (x) dx = 1.
−∞ −∞

The function f is called the probability density function (pdf ) of X and is the analog of probability mass
function of a discrete random variable. Though, it is not correct to say that f (x) is the probability of getting

9
x, we know it to be zero (instead, intuitively we can think of it as the probability of the interval (x, x + dx)).
The integration of the pdf of X gives us the probability in an interval.
Z b
P (a ≤ x ≤ b) = f (x) dx.
a

The expectation of X can be defined by,

Z ∞
E[X] = xf (x) dx.
−∞

Like before, variance will be defined similar to the discrete case, Var [X] = E[(X − E[X])2 ]. Let us see
some examples.

Exponential random variable An exponential random variable with parameter λ is defined by the pdf,

f (x) = λe−λx for x ≥ 0.

We can calculate the expectation,

Z ∞
E[X] = xλe−λx dx = 1/λ.
0

By calculating E[X ], we get that Var [X] = 1/λ2 .

Normal/Gaussian random variable Consider a Bernoulli random variable with success probability p. We call
a sample to be a draw of n values from this distribution. In other words, our sample size is n. If we get
X1 , X2 , · · · , Xn as draws, say that (X1 + X2 + · · · + Xn )/n is the value of the sample.

Exercise 28. If we draw n such samples, how will the value of these samples be distributed?

This is nothing but the binomial random variable divided by n (for n tosses), we know that the expectation
is p. Let us plot this distribution for increasing values of n.
You see that the shape looks more and more like a bell. It can be formally shown that the distribution
(after some shifting and scaling) approaches the continuous random variable,
1 2
f (x) = √ e−x /2 .
2π
Such a distribution is called a standard normal or Gaussian distribution. Surprisingly, you can start with
any distribution of your choice. If you look at the distribution of mean of sample size n, the distribution
looks like a normal random variable as n tends to infinity. This is known as central limit theorem, I strongly
urge you to read more about it.
Another way to state central limit theorem is: we might not know a distribution coming from nature,
still if we look at the aggregate behavior (mean of the draws), it looks like a normal distribution. Hence, this
distribution is hugely popular in statistics.

Exercise 29. What is the expected value of normal random variable? Can you see the role of symmetry?

By a more difficult integration, you can show that E[X 2 ] = 1, where X is standard normal random
variable. This shows that the variance (and standard deviation) for a standard normal random variable is 1.
We can shift and scale the Gaussian random variable (change the expectation and standard deviation),
the Gaussian distribution with expectation µ and standard deviation σ is defined by the pdf,
1 −(x−µ)2
f (x) = √ e 2σ2 .
2πσ

10
Bayesian parameter estimation This example is taken from Notes on Bayesian Learning by Padhraic Smyth
(University of California, Irvine) [1].
The problem of parameter estimation is: given that data is supposed to follow a known distribution with
parameter θ (for instance, mean and variance for normal distribution), if we observe some data, how should
our belief change about θ. There can be multiple ways to concretely formulate this problem. In case of
Bayesian learning, we assume that θ itself has some known distribution (it is a random variable). Notice
that the distribution of θ need not be same as the distribution for the data (the prior distribution of the
parameters, before observing data, is completely known). We need to find the new belief (distribution of θ)
after observing the data.
To set up the notation, we will be given the prior distribution of parameter θ, say p(θ). Then we observe
some data D, and our task is to find new distribution p(θ|D). You can guess that it is a standard application
of Bayes’ theorem. It will be,

p(D|θ)p(θ)
p(θ|D) = .
p(D)
p(θ) is called the prior distribution, and p(θ|D) is called the posterior distribution. Our task is to update
prior to get posterior, given the data. The normalization factor p(D) can be calculated (most of the times)
by observing that p(θ|D) is a probability distribution.
Let us take an example. Suppose our data D ∈ {0, 1}n is coming from Bernoulli distribution with
parameter θ, i.e., the probability of success is θ. The data can be interpreted as a string of success and
failures. How should we find the distribution of θ.

Exercise 30. What should be the prior p(θ)?

One of the natural choice is the uniform distribution over [0, 1]. We will take a slightly more general form,
the motivation will become clear later. We assume that θ is distributed as a Beta distribution,
1
p(θ) = θα−1 (1 − θ)β−1 .
B(α, β)

Here α, β are the parameters of Beta distribution and are known before we observe the data; B(α, β) is
the normalization constant.

Exercise 31. If we initially believed that θ was uniformly distributed in [0, 1], what should be α, β?

Now, it is not difficult to find posterior distribution, but first notice that

p(D|θ) = θs (1 − θ)n−s ,

where s is the number of successes in D.

Now,
1
p(θ|D) ∝ θs (1 − θ)n−s θα−1 (1 − θ)β−1 .
B(α, β)
Exercise 32. Show that the posterior also has a Beta distribution with parameters α + s, β + n − s.

4 Buffon’s needle problem

Let us consider another problem in a continuous domain. Suppose you are given a board with lines drawn
horizontally at a distance of 1 unit length, parallel to each other. If we drop a needle of unit length on this
board, what is the expected value of intersections of the needle with lines on the board?
Again, what is the sample space, random variable and its probability density function? To not worry
about the boundary conditions, assume that the board is infinite. Clearly the random variable is the number

11
mid-point distance

1 unit

other needles

Fig. 1. Dropping a needle on an infinite board

of intersections. We can think of the dropping the needle uniformly at some distance from a line (mid-point
of the needle between 0 and 1/2) with some angle uniformly (between π/2 and −π/2).
In other words, we choose a point uniformly in the rectangle of area 1/2 × π = π/2. This defines our
sample space. What is the pdf? Since we are dropping the needle uniformly, define the pdf to be,
f (y, θ) = 2/π 0 ≤ y ≤ 1/2, −π/2 ≤ θ ≤ π/2.
R 1/2 R π/2
Exercise 33. Evaluate the double integral 0 pi/2
f (y, θ) dy dθ and show that it is a valid pdf.

Notice that the needle intersects one line if y ≤ 1/2 cos θ. We can ignore the case when midpoint is at a
distance of 1/2 (why?).
Calculating the expectation,
Z π/2 Z 1/2 cos θ
E[number of intersections] = f (y, θ) dy dθ.
−π/2 0

This is an elementary integral and the value comes out to be 2/π. We can make the problem much more
difficult. What if the needle is not of unit length? How about the case when needle is circular or bent or of
shape W?
Seems like it is going to be a very difficult integration, even if we can express the expectation as an
integral. Here, we will see the power of linearity of expectation.
Divide the needle into very small parts. Clearly the total number of intersections (say random variable Y )
can be expressed as the sum of intersections of each individual part (say random variable Xi ). By linearity
of expectation, X
E[Y ] = E[Xi ].
i
If we divide the needle into very very small parts, shows that the expectation is proportional to the length
of the needle.
E[number of intersections] = cL (L is the length of the needle).
It might take some time to convince yourself of the above statement, take your time. Linearity of expec-
tation seemed natural and easy. Still, if someone said that the expected number of intersections of the needle
does not depend on its shape, it would not sound believable. Remember that the needle could be circular,
bent or of any given shape.
We have one more job to finish. How to find this absolute constant c? If we know the number of inter-
sections of any needle and its length, it will give us the constant c.

12
Exercise 34. Can you think of a case where you know the expectation and length of the needle?

We already know one case. The expectation of a straight needle of length 1 is 2/π giving c = 2/π. There
is an easier case, we don’t even need to evaluate the first integral. Take a circular needle of diameter 1.

Exercise 35. What is the expected number of intersections in this case?

This gives us the final formula,

2
E[number of intersections] = L (L is the length of the needle).
π

5 Assignment
Exercise 36. In a group of 23 people, we ask birthday of everyone. Define the random variable X to be the
number of pairs whose birthdays match. What is P r(X ≥ 1)?

Exercise 37. Suppose you pick two cards from a deck with cards numbered from 1 to 1000. What is the
expected value of the greater number?

Exercise 38. Suppose you roll a dice twice and X be the number on the first roll and Y be the number on
the second roll. Assuming P (X = x, Y = y) = P (X = x)P (y = y), what is the k for which P (X + Y = k) is
maximized?

Exercise 39. Let X be a random variable with P r(X = 1) = p and P r(X = −1) = 1 − p. Find E[X n ].

Exercise 40. For the sticker collection problem in the linearity of expectation, let us look at a different
Ti be the random variable which counts the packets needed to collect ith sticker. Then E[Ti ] = n,
solution. SayP
and E[T ] = i E[Ti ] = n2 . Is this argument correct, if not, what is wrong with this argument?

Exercise 41. Find the variance of binomial distribution directly by finding out E[X(X − 1)].

References
1. P. Smyth. Notes on bayesian learning, 2019.

Q4 Lessons 3 4 Random Variables Prob Distribution
No ratings yet
Q4 Lessons 3 4 Random Variables Prob Distribution
18 pages
CDF of Dots in Tossed Die
No ratings yet
CDF of Dots in Tossed Die
68 pages
Lec-6 Random Variable 1D
No ratings yet
Lec-6 Random Variable 1D
12 pages
Discrete Random Variable Analysis
No ratings yet
Discrete Random Variable Analysis
128 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
71 pages
LEC4N
No ratings yet
LEC4N
10 pages
Chapt 2 PDF
No ratings yet
Chapt 2 PDF
39 pages
08 - Discrete Random Variables
No ratings yet
08 - Discrete Random Variables
28 pages
RV Intro
No ratings yet
RV Intro
5 pages
Block 2
No ratings yet
Block 2
87 pages
Domain of Random Variables Explained
No ratings yet
Domain of Random Variables Explained
20 pages
Random Variables and Probability Functions
No ratings yet
Random Variables and Probability Functions
320 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Random Variables
No ratings yet
Random Variables
21 pages
060 Random Variables
No ratings yet
060 Random Variables
5 pages
Sta Statistical Theory
No ratings yet
Sta Statistical Theory
104 pages
Unit-7 IGNOU STATISTICS
No ratings yet
Unit-7 IGNOU STATISTICS
33 pages
Lecture 1 RV Prob Dist 1
No ratings yet
Lecture 1 RV Prob Dist 1
13 pages
Lecture Note-TKD-2
No ratings yet
Lecture Note-TKD-2
7 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Lec4 MTH305
No ratings yet
Lec4 MTH305
44 pages
Stochastic Models in Operations Research
No ratings yet
Stochastic Models in Operations Research
7 pages
P&S Unit 1
No ratings yet
P&S Unit 1
50 pages
Lecture One
No ratings yet
Lecture One
12 pages
Random Variables and Pdfs
No ratings yet
Random Variables and Pdfs
18 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Lecture03 CH 03 DiscRVs Baron Inf Stats Final FA24
No ratings yet
Lecture03 CH 03 DiscRVs Baron Inf Stats Final FA24
92 pages
Outcomes of Beam Failure Analysis
No ratings yet
Outcomes of Beam Failure Analysis
48 pages
Lec-5 - Random Variables N Distributions
No ratings yet
Lec-5 - Random Variables N Distributions
12 pages
Chapter One - Probability Distribution
No ratings yet
Chapter One - Probability Distribution
26 pages
Statistics Tutorial
No ratings yet
Statistics Tutorial
4 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
9 pages
Eec 161 ch03
No ratings yet
Eec 161 ch03
148 pages
Understanding Random Variables and Their Distributions
No ratings yet
Understanding Random Variables and Their Distributions
43 pages
MATH 240 Probability For Engineers 2023 - 2024 Fall: Course Objective
No ratings yet
MATH 240 Probability For Engineers 2023 - 2024 Fall: Course Objective
48 pages
Random Variables: Jeff Chak Fu WONG
No ratings yet
Random Variables: Jeff Chak Fu WONG
104 pages
PPT of Statistics and Probability 2023
No ratings yet
PPT of Statistics and Probability 2023
43 pages
Sta 2200 Notes
No ratings yet
Sta 2200 Notes
55 pages
Random Variable and Discrete Distribution (Lecture 04)
No ratings yet
Random Variable and Discrete Distribution (Lecture 04)
46 pages
Chapter 3
No ratings yet
Chapter 3
26 pages
Understanding Discrete Random Variables
100% (1)
Understanding Discrete Random Variables
149 pages
Slide 3
No ratings yet
Slide 3
97 pages
02 Probability Mass, Density and Cumulative Distribution Functions
No ratings yet
02 Probability Mass, Density and Cumulative Distribution Functions
9 pages
CH 03
No ratings yet
CH 03
87 pages
10 Random Variables
No ratings yet
10 Random Variables
2 pages
Understanding Random Variables
No ratings yet
Understanding Random Variables
33 pages
Understanding Random Variables
No ratings yet
Understanding Random Variables
44 pages
Stats
No ratings yet
Stats
100 pages
SI Chapter-1
No ratings yet
SI Chapter-1
30 pages
Random Variables & Probability Basics
No ratings yet
Random Variables & Probability Basics
7 pages
EE311 Lecture Chapter #04 Random Variables and Expectation
No ratings yet
EE311 Lecture Chapter #04 Random Variables and Expectation
48 pages
Lecture No.4
No ratings yet
Lecture No.4
13 pages
Probability
No ratings yet
Probability
6 pages
BSC-301 - Probability - Distribution 1
No ratings yet
BSC-301 - Probability - Distribution 1
13 pages
Stat Q3 Mod 1 Random Variables
No ratings yet
Stat Q3 Mod 1 Random Variables
24 pages
Probability Exercises for Students
No ratings yet
Probability Exercises for Students
9 pages
To Probability: Engineering Data Analysis
No ratings yet
To Probability: Engineering Data Analysis
24 pages
Lec 1
No ratings yet
Lec 1
54 pages
English: Grade 10
0% (1)
English: Grade 10
40 pages
STA 101 LECTURE NOTES (September 2017)
No ratings yet
STA 101 LECTURE NOTES (September 2017)
28 pages
Chapter 3-171
No ratings yet
Chapter 3-171
53 pages
Probability & Statistics Lecture Notes
No ratings yet
Probability & Statistics Lecture Notes
300 pages
Probability Calculations and Simulations
No ratings yet
Probability Calculations and Simulations
5 pages
Infinite Sample Space in Probability
No ratings yet
Infinite Sample Space in Probability
139 pages
Probability Concepts and Applications
No ratings yet
Probability Concepts and Applications
44 pages
Probability Basics for College Students
No ratings yet
Probability Basics for College Students
37 pages
Introduction To Probability Detailed Solutions To Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF Download
100% (1)
Introduction To Probability Detailed Solutions To Exercises 1st Edition David F Anderson Timo Sepp Al Ainen Benedek Valkó PDF Download
51 pages
Understanding Probability Concepts
No ratings yet
Understanding Probability Concepts
19 pages
AP ECON 2500 Session 3
No ratings yet
AP ECON 2500 Session 3
29 pages
Probability Concepts in Computer Science
40% (10)
Probability Concepts in Computer Science
20 pages
Naive Bayes Algorithm: Machine Learning
No ratings yet
Naive Bayes Algorithm: Machine Learning
19 pages
Chapter 4 Introduction To Probability
100% (1)
Chapter 4 Introduction To Probability
63 pages
Probability Problems & Solutions
No ratings yet
Probability Problems & Solutions
7 pages
Probability Interview Qes.
No ratings yet
Probability Interview Qes.
11 pages
Multiplication Rules of Probability Explained
No ratings yet
Multiplication Rules of Probability Explained
19 pages
Unit 6: Probability OF Compound Events: 3 Quarter
No ratings yet
Unit 6: Probability OF Compound Events: 3 Quarter
26 pages
Econ 131 Problem Set 1 Guide
No ratings yet
Econ 131 Problem Set 1 Guide
3 pages
2 Random Variables
No ratings yet
2 Random Variables
7 pages
Exercise Probability
No ratings yet
Exercise Probability
2 pages
MT101: General Mathematics Overview
No ratings yet
MT101: General Mathematics Overview
208 pages
5 Probability
No ratings yet
5 Probability
5 pages

Random Variables

Uploaded by

Random Variables

Uploaded by

Lecture 3: Random Variables and Expectation

Exercise 3. Calculate these numbers using a calculator for k up to 23.

Define Y = X 2 , then the PMF of Y would be

if 0 ≤ k ≤ 100 is a perfect square and 0 otherwise.

What should be P (X = x, Y = y) in terms of P (X = x) and P (Y = y)? To illustrate the difficulty,

Exercise 8. What is the random variable in this case?

The expected number of tosses is, X

2.1 Linearity of expectation

E[X + Y ] = E[X] + E[Y ].

2.2 Conditional expectation

Exercise 15. What should be PX|B (x)?

It seems that the conditional expectation, E[X|A], should be the average

Exercise 17. Why did we get the second equality?

We know that P (Di ) = 1/3 and E[X|D2 ] = 3.

Exercise 18. What is E[X|D1 ] and E[X|D3 ]?

Substituting the values in the equation.

3E[X] = 7 + E[X] + 3 + 5 + E[X].

We expect that miner will get out in 15 hours.

P (X = x, Y = y) = P (X = x|Y = y)P (Y = y).

Exercise 19. Extend this expression to calculate P (X1 = x1 , X2 = x2 , · · · , Xn = xn ).

Here x, y are in the range of X, Y respectively.

Var [X] := E[(X − E[X])2 ).

Exercise 20. Show that

From this exercise, it is easy to find the standard deviation of X,

3 Some important random variables and their distribution

Now, the expectation can be calculated,

We calculated its variance in the previous section.

Note 5. This expression works for k even when it is larger than n.

With some thought,

Exercise 26. How do you derive the last equality?

3.1 Continuous random variables

The expectation of X can be defined by,

f (x) = λe−λx for x ≥ 0.

We can calculate the expectation,

By calculating E[X ], we get that Var [X] = 1/λ2 .

Exercise 30. What should be the prior p(θ)?

where s is the number of successes in D.

4 Buffon’s needle problem

Fig. 1. Dropping a needle on an infinite board

Exercise 35. What is the expected number of intersections in this case?

This gives us the final formula,

You might also like