Discrete Random Variables
Discrete Random Variables
Table of Contents
Random Variables...............................................................................................................................3
Distribution Functions......................................................................................................................8
Expectations......................................................................................................................................15
Variances.............................................................................................................................................17
Bernoulli Trials...............................................................................................................................21
Geometric Distributions............................................................................................................24
Binomial Distributions................................................................................................................26
Uniform Distributions.................................................................................................................27
Hypergeometric Distributions................................................................................................28
Poisson Distributions.................................................................................................................29
Random variables are key to this course, and understanding them is a core part of
this course.
We use random variables to define the probability models associated with a random
probability models, but if we look at these probability models a little closely, we will
notice some limitations. The limitations are not severe, but they still exist.
o It is not compact
Random variables can be discrete, or continuous. It can also be mixed, but we will not
be studying mixed random variables here. Before we can define the two types of
random variables, we need to see what discrete sets and continuous sets are.
In a discrete set, the number of elements is finite, or at least countable for an infinite
number of elements. For example, a random experiment that tests if a data packet
has been sent successfully or not has a finite number of elements in its sample
space. Another example could be the number of attempts needed to send a data
packet successfully. There are an infinite number of elements in the sample space
In a continuous set, the number of elements is infinite and uncountable. For example,
consider a line from 0 to 1, on which we have to choose a point. There are an infinite
number of points we can choose, and the values can get infinitely more precise,
making the set uncountable. A continuous set like this one is represented as S= [ 0 ,1 ]
, which signifies all the values between 0 and 1. There are a few other
Obviously, the random variables that apply to each of these types of sets are called
The first limitation with simple probability models was the fact that the sample space
did not necessarily consist of numbers, which made further processing difficult. For
example, consider that our observation is the sum of two consecutive coin tosses.
This would force us to deal with ordered pairs, which is problematic. Thus, we need a
A random variable is a real valued function that converts each element of a sample
space S into a real number. If ω is an event that is in the sample space S, then X is a
function that, when given the parameter ω , produces a numeric result x , i.e. X ( ω )=x .
This function is called a random variable. For simplicity, the function X ( ω ) is simply
written as X .
The function X can be one-to-one, mapping each element in the sample space to
capturing the uncertain variability (the fact that the value can change in an uncertain
Sample Space: S= { FFF , FFD , FDF , FDD , DFF , DFD , DDF , DDD }
function. Thus, we can have multiple random variables for a single experiment if we
Thus,
E0 ={ FFF }
E3 ={ DDD }
E={ E0 , E1 , E 2 , E3 }
{
0 , ω ∈ E0
1 , ω ∈ E1
X ( ω )=
2 , ω ∈ E2
3 , ω ∈ E3
Thus, P [ X <2 ] represents the probability of an event occurring, where the elements
of the event are E0 and E1, or FFF , FFD , FDF and DFF . Essentially, it means that ω is
1 1
If p= , P [ X <2 ] = .
2 2
To complete the probability model, we still need to identify the set of values
S X ={ 0 , 1 ,2 , 3 }.
E={ E1 , E 2 , E3 , E4 , E5 , E6 , E 7 , E 8 }
X ( x 1 , x 2 )=x 1 + x 2=x
S X ={ 2 , 3 , 4 , 5 ,6 ,7 ,8 }
Distribution Functions
We have previously seen how taking any set of elements from the sample space
creates an event. Now let’s look at how random variables can be used to define
events.
we need to define X in such a way so that it defines an event space, meaning each
defined by a number of elements from the sample space that map to the number x .
For example, if E2= { FDD , DFD , DDF }, then X =2. Thus, X =2 defines that event
space.
The probability that the random variable X has a value x is given by P [ X=x ] . This
means that when we ran the experiment, we got an outcome ω from S that is
mapped by the function X ( ω ) to the number x . Thus, not only is each value of S x
mapped from an element or event from S, the probabilities are mapped as well.
For the previous example where we send three data packets, the sample space is
S= { FFF , FFD , FDF , DFF , FDD , DFD , DDF , DDD }, creating the set S X ={ 0 , 1 ,2 , 3 }.
1 3 3 1
Here, P [ X=0 ] = , P [ X=1 ]= , P [ X=2 ]= and P [ X=3 ] = .
8 8 8 8
P [ X=x ] gives us the amount of probability assigned to the number x . Using this, we
have assigned a probability to every possible number. For example, P [ X=2.5 ] =0.
one of the distribution functions associated with random variables. It represents the
probability of the event { X =x } occurring, where x is the value for some event for the
distribution of probability over the number line. It is presented as a list of all its
possible values.
{
1
x=0 ,3
8
P X ( x )= 3
x=1 ,2
8
0 otherwise
Notice that using this representation, we are able to represent the whole probability
model in a compact manner, and have the sample space and the probabilities
together. Thus, we have solved the second limitation of simple probability models.
For x ∈ S X , P X ( x ) >0
For x ∉ S X , P X ( x )=0
∑ P X ( x ) =1
x∈ SX
Finally, the reason this is called a probability mass function, is because the
Consider a gambling game. Say we need to pay $ 1 to play. The game is that three
coins are tossed one after another and the outcomes are seen. There are four
If the number of heads is 0 , we get nothing back, making the net gain −1.
Given that the coins are biased and P [ T ] =0.85 , we need to develop the probability
model.
{
0.61 x=−1
0.32 x=1
P X ( x )= 0.06 x=2
0.01 x=3
0 otherwise
The word ‘cumulative’ means starting from the very first possible instance until the
function gives us the probability of the event { X ≤ x }={ ω ∈ S|X ( ω ) ≤ x } occurring. The
values of x , while the CDF deals with all possible values of x . Thus, there is a
possibility that a value of x such that x ∉ S will have a probability under CDF.
If x is less than the least possible outcome in the given situation, meaning
x ∈ (−∞ , x min ), the probability will be 0 .
The exact value depends on where in the range x is. If x ∈ ( x 1 , x 2 ) such that
x 1 ∈ S X and x 2 ∈ S X , meaning x is between two valid members of S X , then
{
0 x <0
1
0 ≤ x<1
8
4
F X ( x )= 1 ≤ x <2
8
7
2 ≤ x <3
8
1 x≥3
Comparing this diagram to the one for PMF, we notice that the value of P X ( x ) for each
F X ( x )=P [ X ≤ x ] = ∑ P X ( y )
Thus, y ∈S X .
y≤x
Between any two numbers x 1 ∈ S X and x 2 ∈ S X , the curve is flat. If x ∉ S X , the curve is
flat for x . If x ∈ S X , the curve is a straight vertical line at x . This vertical line
Thus,
P [ X ≤ x 2 ]=P [ X ≤ x1 ] + P [ x 1< X ≤ x 2 ]
P [ x1 < X ≤ x 2 ] =F X ( x 2) −F X ( x 1)
A derived random variable is defined from another random variable. Thus, the derived
function G : X →Y .
PY ( y )=P [ Y = y ] = ∑ PX ( x )
x∈ SX
g ( x )= y
The same formula could be used for a one-to-one function, but of course in that
{ {
0.61 x=−1
0.93 y =1
0.32 x=1
0.06 y=4
Thus, for P X ( x )= 0.06 x=2 , we have PY ( y )= .
0.01 y=9
0.01 x=3
0 otherwise
0 otherwise
One of the criticisms we made of the simple probability model is that it does not allow
for further processing of the probability model. We mentioned that once the sample
space had been converted to real numbers and a more sophisticated probability
model, like the PMF, was being used, further processing would be possible. One such
Say someone gambles in a game n times. One scenario is where n is very small, like 1
or 2 or 3. The net gain in this case depends entirely on luck. A lucky person may play
the game once and win a large amount, while an unlucky person could play a few
However, if n is very large, to the extent that n extends to ∞ , the net gain no longer
depends on luck, but rather on the laws of probability. There will be a certain number
of outcomes of each possibility, and the net gain can be found by summing all of
those outcomes.
The number of wins is related to the PMF, which is the probability. In this case, the
certain outcome has a frequency of x , that outcome will appear n × x times. This will
−1 n × 0.61 −0.61 n
+1 n × 0.32 +0.32 n
+2 n × 0.06 +0.12 n
+3 n × 0.01 +0.03 n
The value of the grand total that we just found is an important quantity, which is
called the expected value. Regardless of our luck, if we play the game a large number
of times, on average, this will be the net gain, or rather net loss, of $−0.14 per game.
If this number was 0 , this would be a fair game, since on average a player would
This is the strategy used by casinos to ensure that they do not lose money. They do
not bother about winning every single game, but the games are rigged so that, on
The expected value is dependent on two things, the values of the random variable X
E [ X ]= ∑ x ⋅ P X ( x )
x∈ SX
If we use the analogy of mass that we used with PMF, the expected value represents
there are still situations where expectations might not work well. Variance gives us
the average deflection of the values of random variables from the expected value.
This deflection could be positive or negative. Simply put, it is the difference of the
programmatically, this would require us to pass over all the values in S X two times.
Finding the probability model for a random experiment in real life can be tedious in
some cases, even involving multiple random variables sometimes. However, most
random experiments actually follow the same general model. For example, tossing a
coin repeatedly until we get a head is the same as sending a data packet repeatedly
until it is successfully delivered. There are a few well known random variables, and we
should always try to fit a given random experiment into one of these random
variables first, before going through the manual process of defining the probability
model. We will generally find most random experiments can be fit into one of them.
Well-Known random variables can be divided into two categories. Uniform random
variables, Poisson random variables and Hypergeometric random variables fall under
one category, while Bernoulli random variables, Geometric random variables, Binomial
random variables and Negative Binomial random variables, also known as Pascal
Bernoulli random variables are the simplest random variables, but they are also one
of the most important. They are associated with Bernoulli experiments. Bernoulli
experiments have two outcomes and the outcomes are classified as success/failure
or on/off. Essentially, the outcomes are opposites of each other. Tossing a coin falls
into this category, as does sending a packet of data. If there are more than two
outcomes for a Bernoulli experiment, they will still be dividable into two groups, one
related to success and another to failure. For example, rolling a die and trying to get
an even number is a Bernoulli experiment since the outcomes can be divided into
two groups.
The Bernoulli random variables are defined such that any outcomes related to
Thus, S X ={ 0 , 1 }.
{
1− p x=0
P X ( x )= p x=1
0 otherwise
x 1−x
It is also possible to write this PMF in a single line as P X ( x )= p ( 1−p ) , x=0 , 1.
Similarly, the CDF can be expressed as
{
0 x <0
F X ( x )= 1− p 0 ≤ x <1
p x≥1
where is used to express ‘distributed as’. In this case, instead of using the
term ‘distribution function’, the PMF and CDF can be called a family of distribution.
Bernoulli experiments are actually building blocks for a wide range of random
is repeated multiple times. For example, tossing a coin repeatedly is a Bernoulli trial.
o Checking that the outcome of one experiment does not affect the
outcome of another
a head occurs
Example 1
Let’s say we have a random variable X =number of attempts until first success . Thus,
S X ={ 1 , 2, 3 , … } .
Now we need to find P [ X=x ] for all values of x ≥ 1. Once we find this, the probability
model is complete.
Notice how we do not have just two outcomes here. This is because this is not a
Bernoulli trial.
Example 2
Here, let X =number of successes . Thus, S X ={ 0 , 1 ,2 , 3 }. Here, we do not care about the
Here, X =number of packets sent and S X ={ 3 , 4 ,5 , … }. Thus, we need to find P [ X=x ] for
x= {3 , 4 ,5 , … }.
We can consider all three examples above together. We have a Bernoulli trial,
meaning the probability p is fixed and the number of repetitions is either fixed, or
conditional.
Number of
successes, n
Number of
2 Fixed n N/A , Binomial X bion ( n , p )
successes
Probability of
success, p
Number of
repetitions, k Negative
Until k Number of
3 Conditional , Binomial X pascal ( k , p )
successes repetitions
Probability of (Pascal)
success, p
Thus,
Geometric distribution deals with the number of trials required for a single
success
Binomial distribution deals with a specific number of successes
Keep in mind that the three distributions given above are not single distributions, but
Geometric Distributions
Geometric Random Variables. They are also related to binomial random variables,
since they are special case of binomial random variables, where n=1.
until we get a success. The sample space is thus S= { D, FD , FFD , … }. We can define
the random variable as X =number of repetitions needed ¿ get the first success
Our goal is to find the PMF probability model for a geometric distribution. Whenever
that this pattern follows the pattern for a geometric series. This is exactly why this is
called a geometric distribution, and the random variables associated with it are called
{
x−1
p ⋅ ( p−1 ) x≥1
P X ( x )=
0 otherwise
x
The CDF will be given by F X ( x )=1−( 1− p ) when x ≥ 1.
and P X x =
( )
0 {
p ⋅ ( p−1 ) x x≥0
otherwise
.
Binomial Distributions
A binomial random variable defines the number of successes in n attempts. For n=3,
S= { FFF , FFD , FDF , … DDD }. Thus, S X ={ 0 , 1 ,2 , … n }.
Here, P [ X=x ] = C x ⋅ p ⋅ ( 1− p )
n x n− x
. Thus,
{
n x n− x
C x ⋅ p ⋅ ( 1− p ) x≥0
P X ( x )=
0 otherwise
Notice how this formula follows the pattern of the binomial formula. Thus, this is
called a binomial distribution and the random variables associated with it are called a
The CDF for binomial variables must be found by summing up the individual PMFs.
Here, P [ X=x ] =
x−1 k x−k
C k−1 ⋅ p ⋅ ( 1−p ) . Notice the condition that the last delivery has to
be a success. Thus,
{
x−1 k x−k
C k−1 ⋅ p ⋅ ( 1− p ) x≥k
P X ( x )=
0 otherwise
Uniform Distributions
Uniform distributions are also a family of distributions, since they have parameters
possible values of X .
We can say that X is a random variable for a uniform distribution if all of the possible
values of X are equally likely, meaning they have the same probability.
slightly different variant that assumes that the values of X are successive integers. It
can start from any integer number, but they should be within an interval, say [ l ,m ] ,
where l<m. Thus, S X ={ l ,l+1 , l+ 2, … , m }. Since there are m−l+1 values, the probability
1
of each will be .
m−l+1
{
1
x∈SX
P X ( x )= m−l+1
0 otherwise
{
0 x <l
x−l+ 1
F X ( x )= l ≤ x <m
m−l+1
1 x ≥m
Hypergeometric Distributions
Say we have a box with some ICs, a of which are in a good condition and b of which
are defective. We now pick the ICs from the box in two conditions, firstly, with
Say in the first scenario we pick n ICs with replacement, putting each IC back after we
a
Here, the probability of picking a good IC is p= and the probability of picking a
a+ b
b
defective IC is ( 1− p )= . Since there are only two possible outcomes for each
a+b
experiment, this means each experiment is a Bernoulli experiment, and the entire
process of picking n ICs with replacement is a Bernoulli trial. More specifically, we are
( )( )
x n−x
n a b
P X ( x )= C x .
a+ b a+b
Now consider the scenario where we pick n ICs, but without replacement. In this
scenario, every time we pick an item, the number of items remaining in the box
This case is a hypergeometric distribution, and we will solve this by counting the
number of ways we can do the operations. First, consider how many ways we can
( a+b )
pick n ICs from a set of ( a+ b ) ICs. This is obviously C n. Next, consider how many
ways we can pick exactly x good and ( n−x ) defective ICs. This is C x × C n−x . Thus,
a b
a b
C x × C n−x
P [ X=x ] = ( a+b )
Cn
{
a
C x × bCn− x
x ≤ a ; ( n−x ) ≤ b
P X ( x )= ( a +b )
Cn
0 otherwise
Poisson Distributions
In Poisson distributions, some event always occurs. For example, a car arriving at its
scenarios, the average arrival rate will be given to us, i.e. the number of arrivals per
unit time, denoted by λ . This unit of time could be anything from a second to months
the formula for the PMF of Poisson distributions, due to the complicacy of the
formula. We will also be provided the formula during any examinations, so it is not
and a value of x like x=2, C x can give us huge numbers. In such a case, calculations
n
can become very cumbersome. Consider the case of a shop owner who knows 4.5
If we divide the one hour into seconds, then the average number of customers
4.5
arriving per second is . From this, we can assume that this value is the
3600
probability of a customer arriving in a particular second and the probability of a
customer not arriving in a particular second is 1−¿ this value. Then, in each second,
we have a Bernoulli experiment. Thus, the repetition of this experiment for 3600 × x
random variable.
{
e−α ⋅α x
x≥0
P X ( x )= x !
0 otherwise
Thus, Poisson random variables are only defined for positive values of x . Here,
α =λ × T . For the example above, α =4.5× 2.
Example
average number of misdialled calls, λ=4 . We want to find the probability of getting at
α =4 ×2=8
P [ X ≥2 ]=1−P[ X <2]
¿ 1−P X ( 0 )−P X ( 1 )
−8 0 −8 1
e ⋅8 e ⋅ 8
¿ 1− −
0! 1!
−8 −8
¿ 1−e −8 e
−8
¿ 1−9 e
Truncated Geometric Distribution
When we calculated geometric random variables, we did not place an upper limit on
X . Thus, S X ={ 1 , 2, … , ∞ }. For example, if we are sending data packets and we are not
getting a message telling us a particular packet was delivered, we assume there was
However, how many times do we keep doing this? What if the device we are trying to
send the packet to is offline? If we just keep sending the packet, it will be a huge
waste of network resources. Thus, we need to put a limit on it. This limit is denoted
For the first attempt, P [ D ] =p and P [ F ] =1−p . For the second attempt, P [ D ] =p (1−p )
and P [ F ] =( 1−p )2. For the third attempt, P [ D ] =p (1−p )2 and P [ F ] =( 1−p )3. If we
truncate the geometric distribution at R=3, we cannot have any more attempts after
this. Geometric distributions always end in a success, but here, we have a situation
Remember that our goal originally was find the probability that x attempts are
needed, not the probability that the packet is sent successfully. Till x=2, we have no
problems, but at x=3, we will not try again. As such, the probability that 3 attempts
{
p ( 1− p ) x−1 x=1 ,2 , … , R−1
P X ( x )= ( 1− p )R−1 x=R
0 otherwise
Expected Values of Well-Known Discrete Random Variables
E [ X ]= ∑ x ⋅ P X ( x )
x∈ SX
Var [ x ]= ∑ ( x−E [ x ]) ⋅ P X ( x )
2
x ∈S X
{
1− p x=0
P X ( x )= p x=1
0 otherwise
E [ x ] =0 ⋅ ( 1− p ) +1 ⋅ p= p
Actually, E [ x ] =np. (A little unsure about why, but that’s what he said.)
1
Similarly, for geometric random variables: E [ x ]=
p
n
For negative binomial random variables: E [ x ]=
p
For Poisson random variables: E [ x ] =λ