Random
Variables
Kaustav Banerjee
IIM Lucknow
July 2021
K Banerjee (IIML) Random Variables July 2021 1 / 18
Random variable
Flip a coin three times: Ω = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}
If the coin is fair, each outcome has probability 1/8. Suppose our
interest is in the number of heads, we assign a random variable
X: number of heads in 3 flips of a coin ⇒ ΩX = {0, 1, 2, 3}
X P(X = x)
0 1/8
1 3/8
2 3/8
3 1/8
1
X numerically describes outcomes of the random experiment. An
experiment is random, if its outcomes are known but unpredictable.
K Banerjee (IIML) Random Variables July 2021 2 / 18
Discrete random variable
X : number of calls a student gets from ABCL
ΩX = {0, 1, 2, 3, 4}
Y : number of trials necessary to develop a vaccine for COVID
ΩY = {1, 2, 3, 4, ..}
Discrete random variable has either a finite number of values or
infinitely many values that can be arranged in a sequence: it can
only take countably many values
Number of road accidents in a month in a city
Number of defective products in a batch of size 20
The story of a discrete random variable X isPsummarized in its
probability mass function P(X = x), where x∈A P(X = x) = 1
K Banerjee (IIML) Random Variables July 2021 3 / 18
0.20
0.15
probability
0.10
0.05
0.00
0 5 10 15 20
defectives
20
X
P(X = x), x = 0(1)20; P(X = x) = 1
x=0
K Banerjee (IIML) Random Variables July 2021 4 / 18
Continuous random variable
X : survival time of a patient following a heart attack ∈ (0, ∞)
Y : random number picked from the interval (0, 1)
Random variable representing some measurement on a
continuous scale, is capable of assuming all values in an interval.
This is continuous random variable: it takes values in continuum.
any measuring device has limited accuracy and, therefore, a
continuous scale must be interpreted as an abstraction.
Waiting time for a machine to break down after repair
Blood pressure of a student while writing CAT
The story of a continuous random variable R X is summarized in its
probability density function f(x), where x∈A f(x) dx = 1
K Banerjee (IIML) Random Variables July 2021 5 / 18
0.00 0.02 0.04 0.06 0.08 0.10
[Link]
0 10 20 30 40
waiting time
Z Z
P(X ∈ A) = f(x)dx; f(x) ≥ 0, ∀x ∈ Ω and f(x) dx = 1
A Ω
K Banerjee (IIML) Random Variables July 2021 6 / 18
Cumulative distribution function
P
x≤a P(X = x), if X is discrete variable;
F(a) = P(X ≤ a) = R
x≤a
f(x) dx, if X is continuous variable.
For discrete variable, F(a) is step function
For continuous variable, F(a) is continuous curve
P(x < X < x + h) F(x + h) − F(x) d
lim = lim = F(x) = f(x)
h→0 h h→0 h dx
Probability density function is the rate of change of cumulative
distribution function: not to be confused with probability
K Banerjee (IIML) Random Variables July 2021 7 / 18
1.0
cumulative probability
0.8
0.6
0.4
0.2
0.0
0 5 10 15 20
defectives
Figure: cumulative distribution function of a discrete distribution
K Banerjee (IIML) Random Variables July 2021 8 / 18
1.0
0.8
cumulative probability
0.6
0.4
0.2
0.0
0 10 20 30 40
waiting time
Figure: cumulative distribution function of a continuous distribution
K Banerjee (IIML) Random Variables July 2021 9 / 18
Measure of centre
X: number of heads in 3 flips of a fair coin
x P(X = x) x × P(X = x)
0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
1 12/8 = 1.5 = µ
The mean of a probability distribution, also called population mean
for the variable X, alternatively its expected value, is denoted as E(X),
symbolically written as µ. It plays the role of centre of mass.
P
x∈A x × P(X = x), if X is discrete;
E(X) = µ = R
x∈A
x × f(x) dx, if X is continuous.
K Banerjee (IIML) Random Variables July 2021 10 / 18
Setting premium
A trip insurance policy pays | 2000 to the customer in case of a loss
due to theft or damage. If the risk of such a loss is 1 in 200, what is
the expected cost, per customer, to cover?
Payment Probability
|0 0.995
| 2000 0.005
The company’s expected cost per customer is | 10. A premium equal
to this amount is viewed as fair. If this premium is charged and no
other costs are involved, the company will neither make a profit nor
lose money in the long run. In practice, the premium is set at a higher
price because it must include administrative costs and intended profit.
K Banerjee (IIML) Random Variables July 2021 11 / 18
Gambler’s expectation
Consider a simple bet on the red of a roulette wheel that has 18 red,
18 black, and 2 green slots. This bet is at even money: a $10 wager
on red has an expected profit
18 20
E(Profit) = 10 × + (−10) × = −0.526
38 38
The negative expected profit says we expect to lose an average of
52.6g on every $10 bet. Over a long series of bets, the relative
frequency of winning will approach the probability 18/38 and that of
losing will approach 20/38, so a player will lose a substantial amount
of money. Other bets against the house have a similar negative
expected profit. How else could a casino stay in business?
K Banerjee (IIML) Random Variables July 2021 12 / 18
Expectation rules
If P(X = c) = 1, then E(X) = c
For any constants a and b, E(a + bX) = a + bE(X)
For any two random variables X and Y, E(X + Y) = E(X) + E(Y)
Expectation of a function of a random variable X, say g(X) is
P
x∈A g(x) P(X = x), if X is discrete,
E{g(X)} = R
g(x) f(x) dx, if X is continuous
x∈A
P
x∈A (x − µ)2 P(X = x), if X is discrete,
2
V(X) = E(X − µ) = R
(x − µ)2 f(x) dx, if X is continuous
x∈A
K Banerjee (IIML) Random Variables July 2021 13 / 18
Flaw of the average
K Banerjee (IIML) Random Variables July 2021 14 / 18
Measure of spread
X=0 with probability 1
(
−1 with probability 1/2;
Y=
1 with probability 1/2.
(
−100 with probability 1/2;
Z=
100 with probability 1/2.
All these distributions have same expected value 0. Y is more spread
than X, a constant; and Z is more spread than Y.
σ 2 = E(X − µ)2 = E(X2 ) − µ2 = V(X) or population variance captures
the spread of a distribution. As it’s in
p squared units, an alternative
measure is standard deviation: σ = V(X).
Note: mina E(X − a)2 = E(X − µ)2 and V(a + bX) = b2 V(X)
K Banerjee (IIML) Random Variables July 2021 15 / 18
Risk assessment
As a manager, which one of the following funds you will prefer?
Profit1 = | 400 with probability 1
(
| 10000 with probability 0.15;
Profit2 =
-| 1000 with probability 0.85.
| 1000
with probability 0.50;
Profit3 = | 500 with probability 0.30;
-| 500 with probability 0.20.
One way to assess volatility is to consider the coefficient of variation
SD(X) σ
CV(X) = =
E(X) µ
K Banerjee (IIML) Random Variables July 2021 16 / 18
0.00 0.05 0.10 0.15 0.20 0.25
probability density
p 1−p
−4 −2 x 0 2 4
x is 100p’th percentile if P(X ≤ x) ≥ p and P(X ≥ x) ≥ 1 − p
K Banerjee (IIML) Random Variables July 2021 17 / 18
Some alternative measures
Alternatively, x is 100p’th percentile if p ≤ F(x) ≤ p + P(X = x). For
continuous distribution, 100p’th percentile is a solution of F(x) = p.
Median (50’th percentile) is a robust measure of centre. A robust
measure of spread is IQR = 75’th percentile - 25’th percentile
−2 −1 0 1 2
K Banerjee (IIML) Random Variables July 2021 18 / 18