MSBD 5001
Foundations of Data Analytics
Fall 2025
Topic 2: Basic Statistical Analysis
Cecia Chan
Department of Computer Science and Engineering
The Hong Kong University of
Science and Technology
MSBD5001 Fall 2025 1
Basic Statistical Analysis
The lecture notes are prepared based on various sources on the Internet.
MSBD5001 Fall 2025 2
Statistical Analysis
• Statistical analysis is the science of collecting, exploring and presenting large
amounts of data (a.k.a. dataset) to discover underlying patterns and trends.
• It is used extensively in science, from physics to social sciences.
• The following are the major tasks in statistical analysis:
• Describing and summarizing the data
• Identifying the relationship between variables
• Forecasting the outcomes
1[Link]
MSBD5001 Fall 2025 3
Why is statistical analysis important?
What will my
salary be when I Describe the data
graduate?
The average salary of
our graduates is
$80,000.
MSBD5001 Fall 2025 4
Why is statistical analysis important?
Decide the proper method
New way?
Old way?
MSBD5001 Fall 2025 5
Part 1
Describing and
Summarizing the Data
MSBD5001 Fall 2025 6
Data Sets
• Data sets can be thought of as a bunch of number or a list of things.
• Examples:
• Suppose we ask twenty students their weights and then record them as:
122 146 65 162 148 155 136 151 151 153
201 156 235 157 160 171 178 197 142 131
• This is a data set of 20 observations.
• Note: Number of items in a sample is called sample size, denoted as n
• Suppose we ask the students their hair color and get the responses:
Red Blond Blond Brown Brown Red Blond Blond Brown Black
Blond Red Red Brown Black Brown Red Black Brown Blond
• Data come in two types:
• Discrete (Example: Hair color data set)
• Continuous (Example: Weight data set)
MSBD5001 Fall 2025 7
Describing and Summarizing Data
• There are many ways to describe and summarize our data. We discuss a few
below.
1. Table
MSBD5001 Fall 2025 8
Describing and Summarizing Data
2. Bar chart 3. Pie chart
MSBD5001 Fall 2025 9
Describing and Summarizing Data
4. Stem-and-leaf plot
• Assume we have the data of maximum
ozone reading (in parts per
billion(ppb)) taken on 80 summer days
in a large city.
• A stem-and-leaf plot can be
constructed using
• the first digit of the two-digit
numbers and the first two digits of
the three-digit numbers
as the stem number and
• the remaining digits
as the leaf number.
MSBD5001 Fall 2025 10
Advantage of using Stem-and-leaf Plot
• The plot can be constructed quickly using pencil and paper.
• The values of each individual data point can be recovered from the plot.
• The data is arranged compactly since the stem is not repeated in multiple data
points.
MSBD5001 Fall 2025 11
Describing and Summarizing Data
• Apart from describing data graphically, data can also be described using
numerical numbers.
• The following are common numerical descriptive measures of data.
• Describing central tendency • Describing variability
• Mean (Arithmetic mean) • Range
• Min • pth percentile
• Max • Interquartile range (IQR)
• Median • Variance
• Mode • Standard deviation
• Before elaborating all the numerical description measures above, we will first
define a few basic concepts of statistics.
• They are population, sample, and sampling error.
MSBD5001 Fall 2025 12
Population, Sample and Sample Error
• Population: The collection of all individuals or items under consideration in a statistical study.
• Sample: Part of the population from which information is collected.
• Sampling error: Reflects the fact that the result we get from our sample is not going to be exactly
equal to the result we would have got if we had been able to measure the entire population.
Population 1 2 3 4 5 6 7 8 9 10 11 12
Sample (every 3rd)
2 5 8 11
A sampling method is a procedure for selecting sample elements from a population.
MSBD5001 Fall 2025 13
Example
• A school takes a poll to find out what students to eat at lunch.
• 70 students are randomly chosen to answer the poll questions.
• What are the population and the sample of this study?
Answer
• Population: All students at the school.
• Sample: The 70 students polled.
MSBD5001 Fall 2025 14
Describing and Summarizing Data
• Suppose we have a sample of size n, denoted as 𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 , the
followings are the formal definitions of all the descriptive measures mentioned
earlier.
1 𝑛
• Mean of a sample: 𝑋ത = 𝛴𝑖=1 𝑥𝑖
𝑛
• Minimum of a sample: Minimum of {𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 }
• Maximum of a sample: Maximum of{𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 }
• Median of a sample:
Middle number of the sorted list of {𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 }.
If n is even, the median is the simple average of the middle two numbers.
• Mode of a sample: The value that appears most often in {𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 }
• Range of a sample: Minimum to Maximum
MSBD5001 Fall 2025 15
Describing and Summarizing Data
• pth percentile of a sample: The value so that roughly p% of the sample are smaller and
(100 - p)% of the sample are larger
• Interquartile range (IQR) of a sample: Third quartile - First quartile
• First quartile: Median of the first half of the data
• Third quartile: Median of the second half of the data
1 𝑛
• Variance of a sample: 𝑖=1
𝑥𝑖 − 𝑋ത 2
𝑛−1
1 𝑛
• Standard deviation of a sample: 𝑖=1 𝑥𝑖 − 𝑋ത 2
𝑛−1
MSBD5001 Fall 2025 16
Practice
• Suppose we ask twenty students their weights and record them as:
65 122 131 136 142 146 148 151 151 153
155 156 157 160 162 171 178 197 201 235
• Mean of a sample:
• Minimum of a sample:
• Maximum of a sample:
• Median of a sample:
• Mode of a sample:
• Range of a sample:
• Interquartile range (IQR) of a sample:
• Third quartile:
• First quartile:
• Variance of a sample:
• Standard deviation of a sample:
MSBD5001 Fall 2025 17
More about Graphic Displays of Basic
Statistical Descriptions
• Boxplot
• graphic display of five-number summary
• Histogram
• x-axis represents values, y-axis represents frequencies
• Quantile plot
• each value xi is paired with fi indicating that approximately 100 fi % of data are xi
• Quantile-quantile (q-q) plot
• graphs the quantiles of one univariant distribution against the corresponding quantiles of
another
• Scatter plot
• each pair of values is a pair of coordinates and plotted as points in the plane
MSBD5001 Fall 2025 18
Measuring the Dispersion of Data:
Quartiles & Boxplots
• Quartiles: Q1 (25th percentile), Q3 (75th percentile)
• Inter-quartile range: IQR = Q3 – Q1
• Five number summary: min, Q1, median, Q3, max
• Boxplot: Data is represented with a box
• Q1, Q3, IQR: The ends of the box are at the first and third quartiles, i.e., the height of
the box is IQR
• Median (Q2) is marked by a line within the box
• Whiskers: two lines outside the box extended to Minimum and Maximum
• Outliers: points beyond a specified outlier threshold, plotted individually
• Outlier: usually, a value higher/lower than 1.5 x IQR
MSBD5001 Fall 2025 19
40
35
Histogram Analysis 30
25
Histogram
20
• Histogram: Graph display of tabulated 15
frequencies, shown as bars 10
• Differences between histograms and bar 5
charts 0
10000 30000 50000 70000 90000
• Histograms are used to show distributions of
variables while bar charts are used to compare
variables
• Histograms plot binned quantitative data while
bar charts plot categorical data
• Bars can be reordered in bar charts but not in
histograms
• Differs from a bar chart in that it is the area of the
bar that denotes the value, not the height as in
bar charts, a crucial distinction when the
categories are not of uniform width Bar chart
MSBD5001 Fall 2025 20
Histograms Often Tell More than Boxplots
• The two histograms shown in the
left may have the same boxplot
representation
• The same values for:
• min, Q1, median, Q3, max
• But they have rather different data
distributions
MSBD5001 Fall 2025 21
Quantile Plot
• Displays all of the data (allowing the user to assess both the overall behavior and
unusual occurrences)
• Plots quantile information
• Let xi for i = 1 to N, be the data sorted in increasing order.
• xi is paired with fi, which indicates that approximately 100 fi% of the data are
below or equal to the value xi
MSBD5001 Fall 2025 22
Quantile-Quantile (Q-Q) Plot
• Graphs the quantiles of one univariate distribution against the corresponding
quantiles of another
• View: Is there is a shift in going from one distribution to another?
• Example shows unit price of items sold at Branch 1 vs. Branch 2 for each quantile.
Unit prices of items sold at Branch 1 tend to be lower than those at Branch 2
MSBD5001 Fall 2025 23
Scatterplot
• It uses Cartesian coordinates to display values for two variables for a set of data.
• The following shows the height and weight of 57 baseball players.
• (height, weight)
Things to think about when looking at scatterplots.
Form: Does it have a shape?
Direction: Does the data have a direction?
Strength: Are the points close together or
scattered?
MSBD5001 Fall 2025 24
Part II
Identifying Relationship
between Variables
MSBD5001 Fall 2025 25
Relationship between Variables
• We often collect data from several different variables on a subject.
• A simple example is a form, such as an application form, which are collected from
a group of people.
• Each item on the form corresponds to a variable.
• Example: Suppose the form is that students are filling out at a university. Items might include
the GPA, major, weight, height, gender, etc.
• We may describe each variable separately using the descriptive statistics, but
often we also want to investigate the relationship between the variables.
• Example: weight and height of students, denoted as Y and X, respectively.
MSBD5001 Fall 2025 26
Relationship between Variables
• Plot data with Y (weight) on the vertical axis and X (height) on the horizontal axis of a scatterplot.
Observations
• On the basis of the plot, a linear model is certainly worthy of a first try.
MSBD5001 Fall 2025 27
Relationship between Variables
• Linear model
𝑌 = 𝑎 + 𝑏𝑋 + 𝑒
where a is the y-intercept, b is the slope, and e is the random error
(i.e., if there were no error Y would be a deterministic linear function of X).
• Note: X is called independent variable and Y is called dependent variable.
MSBD5001 Fall 2025 28
Relationship between Variables
1. Eyeball Fit – Pick two points on the plot so that the line passing through them
gives a “fairly” good fit.
• To estimate the slope, take two points, say (𝑋1 , 𝑌1 ) and (𝑋2 , 𝑌2 ), then
𝑌2 − 𝑌1
𝑏 =
𝑋2 − 𝑋1
For the student data, we chose the points (69, 160) and (78, 225).
Hence the estimate of slope is
225 − 160 65
𝑏 = = = 7.2
78 − 69 9
• To estimate the y-intercept, simply take one of the points, say, (𝑋1 , 𝑌1 ),
1
then estimate the intercept by solving the linear equation for a, i.e., 𝑎ො = 𝑌1 − 𝑏𝑋
For the student data, we chose the point (69, 160),
𝑎ො = 160 − 7.2 69 = −336.8
• Thus, the predicted equation is
Y = −336.8 + 7.2X
MSBD5001 Fall 2025 29
Eyeball Fit
MSBD5001 Fall 2025 30
Relationship between Variables
2. Least Square Fit
• Fit a line 𝑌 = 𝑎 + 𝑏𝑋 such that it minimizes the error S
MSBD5001 Fall 2025 31
Relationship between Variables
• Least Square Fit (Cont'd)
• Alternatively, a and b can be found using the following:
• For the student data, we got
Can you prove the above?
b = 5.530918 𝑛 𝑛
Hint: 𝑛𝑋ത = 𝛴𝑖=1 𝑥𝑖 and 𝑛𝑌ത = 𝛴𝑖=1 𝑦𝑖
a = -215.861
Y = -215.861 + 5.530918X
MSBD5001 Fall 2025 32
Least Square Fit
MSBD5001 Fall 2025 33
Correlation Coefficient
• Correlation coefficient, denoted as r, measures the degree to which two
variables' movements are associated.
• where -1 ≤ r ≤ 1.
• r = 1 means a perfect positive relationship,
i.e., every positive increase of 1 in one variable, there is a positive increase of 1 in the other.
• r = -1 means a perfect negative relationship,
i.e., every positive increase of 1 in one variable, there is a negative decrease of 1 in the other.
• r close to zero indicate little or no linear relationship,
i.e., for every increase, there is not a positive or negative increase.
MSBD5001 Fall 2025 34
Correlation Coefficient
• For the student data, r = 0.704583.
MSBD5001 Fall 2025 35
Part III
Forecasting Outcomes
MSBD5001 Fall 2025 36
Experiment, Sample Space, Event
• An experiment is an action where the result is uncertain.
• A sample space is all the possible outcomes of an experiment, denoted as S.
Examples:
1. Flip a coin: S = {H, T}
2. Roll a six-sided die: S = {1, 2, 3, 4, 5, 6}
3. Roll a pair of six-sided dice: S = {(1, 1), (1, 2), (1, 3), …, (6, 6)}.
S consists of 36 pairs of integers.
• An event is a subset of S, denoted by A, B, C, etc.
Examples:
1. Flip a coin: A = {H}
2. Roll a six-sided die: B = {1, 2}
3. Roll a pair of six-sided dice: A = sum of up-faces 7 or 11
MSBD5001 Fall 2025 37
Probabilities
• Probability is the measure of how likely an event is to occur out of the number of
possible outcomes.
• In other words, it is a ratio where we compare how many times an outcome can
occur compared to all possible outcomes, i.e.,
• The followings are some facts about probability.
1. The probability of an event A is a number between 0 and 1.
2. The probability of the sample space is 1.
3. If two events cannot occur at the same time, the probability that one or the
other occurs is the sum of the probabilities of the individual events.
• We denote the probability of event A by P(A).
MSBD5001 Fall 2025 38
Examples
• Suppose we roll a six-sided die
• The sample space is S = {1, 2, 3, 4, 5, 6}.
• Let A be the event A = {1, 2}.
• The following are different probabilities on S and
the resulting probability of A = {1, 2}.
• p1 = p2 = p3 = p4 = p5 = p6 = 1/6,
P(A) = 2/6 = 1/3
• p1 = p2 = 0.25, p3 = p4 = 0.15, p5 = p6 = 0.1,
P(A) = 0.5
MSBD5001 Fall 2025 39
Determination of Probabilities: Tree
Diagrams
• Calculating probabilities can be hard. Sometimes we add them, sometimes we
multiply them, and often it is hard to figure out what to do.
• To remedy this, we often construct a tree diagram.
• Here is a tree diagram for the toss of a coin:
• The probability of each branch is written on the branch.
• The outcome is written at the end of the branch.
MSBD5001 Fall 2025 40
Determination of Probabilities: Tree
Diagrams
• We can extend the tree diagram to two tosses of a coin:
MSBD5001 Fall 2025 41
Determination of Probabilites: Tree Diagrams
• How to calculate the overall probabilities?
• Multiply probabilities along the branches.
• Add probabilities down columns.
• Results:
• The probability of “Head, Head” is 0.5 x 0.5 = 0.25.
• All probabilities add to 1.0.
• The probability of getting at least one Head from two tosses is 0.25 + 0.25 + 0.25 = 0.75.
MSBD5001 Fall 2025 42
Example
Problem:
• Suppose we have an urn with 30 blue balls and 50 red balls in it and that these balls are identical
except for color.
• Suppose further the balls are well mixed and that we draw 3 balls, without replacement.
• Determine the probability that the balls are all of the same color.
Solution:
• Step 1: Trace a branch up for blue, putting the probability of the first ball being blue, 30/80, on it
and a “B” and the end. Likewise, trace a branch down for red with 50/80 on it and an “R” at the
end.
• Step 2: The second ball is either blue or red, i.e., at the “B", draw one branch up for second ball
blue with the probability of 29/79 on it, and end it with a “B". Next draw one branch down for
second ball red with the probability 50/79 and end it with an “R”.
• Step 3: The ball can be blue or red so there will be two branches at the end of the four second
step branches.
MSBD5001 Fall 2025 43
Determination of Probabilities: Tree
Diagrams
MSBD5001 Fall 2025 44
Determination of Probabilities: Tree
Diagrams
• The probability of three blue balls is
• The probability of three red balls is
• Finally, the probability that the balls are the same color is the probability of either
3 reds or 3 blues is 0.0494 + 0.2386 = 0.2880.
• (Note: These events cannot happen at the same time.)
MSBD5001 Fall 2025 45
Independence
• Referring to the final tree diagram of the last example, the probabilities on the
branches are called conditional probabilities.
• Example
• Let B2 denote the event that the second ball is blue and
• Let B1 denote the event that the first ball is blue
• Then the probability on the first step upward branch is the probability that B2
occurs given that B1 has occurred, i.e., 29/79.
This is called conditional probability of B2 given B1 and we will denote it by
P(B2 | B1). The bar is pronounced “given”.
• In general, for two events A and B, if P(B|A) = P(B),
i.e., knowledge of A did not change the prediction of B,
then we say that A and B are independent events.
MSBD5001 Fall 2025 46
Independence
Question
What is P(B2)? Look at all the end nodes for which the second ball is
blue in the final tree diagram.
• Answer:
Observation
P(B2 | B1) = 29/79 and P(B2) = 30/80, which means B1 and B2 are not
independent events. In fact, they are dependent events.
MSBD5001 Fall 2025 47
Conditional Probability
• Let A and B be arbitrary events and we want to determine P(B|A).
• Formula of conditional probability
Assume we repeat the experiment many many times.
How to compute P(B|A)?
Count how many times that A occurs and count those times that B
has occurred.
• If A and B are independent events, we get
MSBD5001 Fall 2025 48
Example
• Problem:
• A jet airplane has 3 engines which function independently of one another.
The probability that an engine fails in fight is 0.0001. Furthermore, the plane
can fly if at least one engine is functioning. Determine the probability the
airplane has a successful fight.
• Solution:
• The event we want to consider is A = at least one engine operates throughout
the fight.
• Consider the complement of A, Ac which is the event all the three engines fail.
• Let B1 be the event that engine one fails, B2 be the event that engine two fails,
B3 be the event that engine three fails. Hence, Ac is the event B1 and B2 and B3
occurs. Thus
MSBD5001 Fall 2025 49
Example (Cont’d)
• Solution (Cont'd):
• As the engines function independently of one another, hence B1, B2 and B3 are
independent events. So,
• Therefore,
• Hence,
MSBD5001 Fall 2025 50
Random Variables
• In many problems there are only a few events of interest and, furthermore, they
are often be characterized in terms of a variable.
• In statistics, a random variable, usually written X, is a variable whose possible
values are numerical outcomes of an experiment.
• Example:
• Rolling a pair of dice, the events of interest are: the sum of up-faces is 2, or 3, or 4, ..., or 12.
Hence, there are only 11 events of interest.
• If we let X = the sum of the up-faces, then the events of interest can be expressed as: X = 2, X
= 3, ..., or X = 12. Hence X characterizes the events of interest. We call X a random variable.
• Random variables come in two types
• Discrete random variables
• Continuous random variables
MSBD5001 Fall 2025 51
Discrete Probability Models
• Using the example on the last page and assume that the dice are fair.
• P(X = 3) means the probability that a (1, 2) or a (2, 1) comes up which is 1/36 +
1/36 = 2/36
• Using the same reasoning for the other range items, we obtain probability model
for X:
• For a discrete random variable X, let p(x) denote the probability X assumes the
value x, and is called probability mass function (distribution).
• p(7) = 6/36
MSBD5001 Fall 2025 52
Parameters
• Sample can be generated by a probability model, where parameters are characteristics of
the model.
• Suppose 100 samples are drawn from the probability model of a number spinner
from 1 to 3.
22112212111111112313
13112233212323312121
32133122213311111111
12132212223232333311
11221132321132133112
• The frequency and relative frequency of each numbers are shown in the table below.
• This sample distribution is an estimate of the probability model for X,
i.e. 0.43 is our estimate of p(1), 0.31 is our estimate of p(2), and 0.26 is our estimate of p(3).
MSBD5001 Fall 2025 53
Parameters – Mean, Expected Value, or
Expectation
• The mean, expected value, or expectation of a random variable X is written as
E(X) or µ.
• If we observe n random values of X, i.e., x1, x2, … , xn, then the mean of n values
will be approximately equal to E(X) for large n defined as follows:
• where f(x) is the probability function of X.
MSBD5001 Fall 2025 54
Parameters – Mean, Expected Value, or
Expectation
• Referring to the spinner example, the sample mean (തx) = 183 / 100 = 1.83, which
can be calculated in one of the following ways:
or
• From the last line, xത is estimating
• where µ is called the mean (or the parameter) of the probability model.
MSBD5001 Fall 2025 55
Parameters – Variance
• Variance is another parameter of probability model.
• The variance of a random variable X is written as Var(X) or 𝜎 2 .
• It is a measure of how spread out it is.
• Are the values of X clustered tightly around their mean?
• The variance measures how far the values of X are from their mean, on average.
• Variance of X is
MSBD5001 Fall 2025 56
Variance
• Variance of the spinner sample is calculated by
MSBD5001 Fall 2025 57
Binomial Probability Model
• A binomial model is characterized by trials (called Bernoulli trials) which either
end in success or failure.
• Suppose we have n Bernoulli trials and p is the probability of success on a trial.
Then this is a binomial model if
• The Bernoulli trials are independent of one another.
• The probability of success, p, remains the same from trial to trial.
• The binomial random variable, X, is the number of successes in the n trials.
• Over the n trials, there could be one success, two successes, etc., up to n successes.
• So, the range of X is the set {0, 1, 2, …, n}.
• The probability of observing x success out of n trials is given by
where x = 0, 1, …, n.
• If the probabilities of X are distributed in this way, we write
MSBD5001 Fall 2025 58
Example
• Suppose we want the probability of getting 7 heads in ten flips of an unfair coin
for which the probability of getting a head is 2/3 and the probability of a tail is
1/3.
• In other words, X is bin(10, 2/3) and we want to compute P(X = 7).
• One possible way of obtaining 7 heads is if we observe the pattern HHHHHHHTTT and the
probability of obtaining this pattern is
• There are 𝐶710 of the patterns contain 7 heads.
• So, P(X = 7) can be computed by
MSBD5001 Fall 2025 59
Mean and Standard Deviation of Binomial
Probability Model
• Suppose we have an unfair coin for which the probability of getting a head is 2/3,
and the probability of a tail is 1/3.
• Consider tossing the coin five times in a row and counting the number of times we
observe a head.
• We denote this number as X = No. of heads in 5-coin tosses, where 0 ≤ X ≤ 5.
• Consider the example of the Binomial distribution below
• The mean value of the distribution can be calculated as
MSBD5001 Fall 2025 60
Mean and Standard Deviation of Binomial
Probability Model
• In general, there is a formula for the mean of a binomial distribution, µ.
There is also a formula for the standard deviation, 𝜎.
• In the example above, X is bin(5, 2/3) and so the mean and standard deviation are
given by
2
𝜇 = 𝑛𝑝 = 5 × = 3.3333
3
2 2
𝜎= 𝑛𝑝(1 − 𝑝) = 5 × ( ) × (1 − ) = 1.111
3 3
MSBD5001 Fall 2025 61
Shape of Binomial Distribution
• Different values of n and p lead to different distributions with different shapes.
Observations
• In general, the probabilities of a binomial will increase until np and then decrease.
• The probability distribution will be symmetric if p = 1/2, skewed right if p < 1/2, and
skewed left if p > 1/2.
MSBD5001 Fall 2025 62
Poisson Probability Model
• A Poisson distribution is a discrete probability distribution for the counts of
events that occur randomly in a given interval of time (or space).
• Let X be the number of events in a given interval, and if the mean number of
events per interval is λ, the probability of observing x events in a given interval is
given by
• where x = 0, 1, 2, 3, 4, …, and e ≈ 2.718282.
• If the probabilities of X are distributed in this way, we write
MSBD5001 Fall 2025 63
Example
• Births in a hospital occur randomly at an average rate of 1.8 births per hour. What
is the probability of observing 4 births in a given hour at the hospital?
• Let X = No. of births in a given hour, the probability of observing exactly 4 births in a given
hour can be calculated as
• What about the probability of observing more than or equal to 2 births in a given
hour at the hospital?
• We want P(X ≥ 2), i.e.,
MSBD5001 Fall 2025 64
Mean and Standard Deviation of Poisson
Probability Model
• In general, there is a formula for the mean of a Poisson distribution, µ. There is
also a formula for the standard deviation, 𝜎.
MSBD5001 Fall 2025 65
Shape of Poisson Distribution
Observations
• Unimodal
• Skewed left if λ increases
• Centred roughly on λ
• The variance (spread)
increases as λ increases
MSBD5001 Fall 2025 66
Probability Models for Continuous Data
• So far, we consider discrete data and discrete probability distributions.
• In practice, many data that we collect from experiments consist of continuous
measurements.
• So, we need to study probability models for continuous data.
• For continuous data, we do not have equally spaced discrete values so instead we
use a curve or function that describes the probability density over the range of
the distribution.
• The curve is chosen so that the area under the curve is equal to 1.
• If we observe a sample of data from such a distribution, we should see that the
values occur in regions where the density is highest.
MSBD5001 Fall 2025 67
Expectation and Variance of Continuous
Random Variables
• The expectation is defined differently for continuous and discrete random
variables.
• Let X be a continuous random variable with probability density function fX(x).
• The expected value of X is
• Similarly, variance is also defined differently.
MSBD5001 Fall 2025 68
Normal Probability Model
(a.k.a Gaussian or Gauss or Laplace-Gauss
Distribution)
• There will be many possible probability density functions over a continuous range
of values.
• The normal distribution describes a special class of such distributions that are
symmetric and can be described by two parameters.
• µ = The mean of the distribution.
• 𝜎 = The standard deviation of the distribution.
• Changing the values of µ and 𝜎 alters the positions and shapes of the
distributions.
MSBD5001 Fall 2025 69
Normal Probability Model
(a.k.a Gaussian or Gauss or Laplace-Gauss
Distribution)
MSBD5001 Fall 2025 70
Normal Probability Model
(a.k.a Gaussian or Gauss or Laplace-Gauss Distribution)
• If X is normally distributed with mean µ and standard deviation 𝜎, we write
• The probability density function of normal distribution is given by
MSBD5001 Fall 2025 71
Standard Normal Distribution
• The standard normal distribution has a mean of zero and a variance of one.
• The following shows the graph of the standard normal distribution which has probability
density function
• If the behavior of a continuous random variable X is described by the distribution
• then the behavior of the random variable is described by the standard
normal distribution N(0, 1).
• We call Z the standardized normal variable.
MSBD5001 Fall 2025 72
Example
1. If the random variable X is described by the distribution
N(45, 0.000625) then what is the transformation to obtain the standardized
normal variable?
• Given µ = 45, σ2 = 0:000625 and so that σ = 0.025,
hence Z = (X - 45)/0.025 is the required transformation.
2. When the random variable X takes value between 44.95 and 45.05, between
which values does the random variable Z lie?
• When X = 45.05, Z = (45.05 - 45) / 0.025 = 2.
• When X = 44.95, Z = (44.95 - 45) / 0.025 = -2.
• Hence Z lies between -2 and 2.
• fkccec
MSBD5001 Fall 2025 73
Probabilities and the Standard Normal
Distribution
• Standard normal distribution is used frequently, a table has been produced to
help calculate probabilities on the next page.
• It is based upon the following diagram:
• Since the total area under the curve is equal to 1, it follows from the
symmetry in the curve that the area under the curve in the region x > 0 is
equal to 0.5.
• The shaded area in the diagram above is the probability that Z takes values
between 0 to z1.
• When we look-up a value in the table, we obtain the value of the shaded
area.
MSBD5001 Fall 2025 74
MSBD5001 Fall 2025 75
Examples
What is the probability that Z takes values between 0 and 1.9?
• The second column headed ‘0’ is the one to choose and
its entry in the row beginning ‘1.9’ is 4713.
• This is to be read as 0.4713 (we omitted the 0 in each entry for clarity).
• So, the probability that Z takes values between 0 and 1.9 is 0.4713.
MSBD5001 Fall 2025 76
Probabilities and the Standard Normal
Distribution
• Now, let's see how to calculate probabilities represented by areas other than those we have
shown earlier.
• The following shows what we do if both Z values are positive.
Example: Find the probability that Z takes
values between 1 and 2.
P(0 < Z < z2), i.e. P(0 < Z < 2) is 0.4772.
P(0 < Z < z1), i.e. P(0 < Z < 1) is 0.3413.
Hence,
P(1 < Z < 2) = 0.4772 – 0.3413 = 0.1359.
• We can compute the probability that Z takes values between z1 and z2 by taking the area
difference of 0 and z2, and 0 and z1.
MSBD5001 Fall 2025 77
Confidence Intervals
• Taking a random sample from a lot of population and computing a statistic, such
as the mean from the data, is to approximate the mean of the population.
• But how well the sample statistic estimates the underlying population value is
always an issue?
• A confidence interval addresses this issue because it provides a range of values
which is likely to contain the population parameter of interest.
• Confidence intervals are constructed at a confidence level, such as 95% selected
by the user.
• What does it mean? It means that if the same population is sampled on
numerous occasions and interval estimates are made on each occasion, the
resulting intervals would bracket the true population parameter in approximately
95% of the cases.
MSBD5001 Fall 2025 78
Confidence Intervals
• The shaded area is 95% of the total area. If we look at the entry in the table
shown earlier corresponding to z = 1.96, we see that the value is 4750, which
means the probability of Z taking a value between 0 and 1.96 is 0.475. By
symmetry, the probability Z takes a value between -1.96 and 0 is also 0.475.
Combining these results, we see that
P(-1.96 < Z < 1.96) = 0.95 or 95%
• We say that the confidence interval for Z (about its mean of 0) is (-1.96, 1.96). It
follows that there is a 5% chance that Z lies outside this interval.
MSBD5001 Fall 2025 79
Example
• Suppose we measure the heights of 40 randomly chosen men, and get
• Mean height of 175cm
• Standard deviation of 20cm
• For 95%, the Confidence Interval is -1.96 to 1.96.
• Use Z = 1.96 in the following formula for the Confidence Interval:
• where µ is the mean, Z is the chosen Z-value, is the standard deviation, and n is the
number of samples.
• In other words, the true mean of ALL men (if we could measure their heights) is likely to
be between 168.8cm to 181.2cm.
MSBD5001 Fall 2025 80
Uniform Distribution
• The uniform distribution of random variable X restricted to a finite interval [a; b]
and fX (x) has constant density over the interval. We write
• The probability density function of uniform distribution is given by
MSBD5001 Fall 2025 81
Mean and Variance of a Uniform Distribution
• Mean of a uniform distribution
• Variance of a uniform distribution
MSBD5001 Fall 2025 82
Example
• Consider the random variable X which is distributed uniformly whose probability
dense function is
Find E(X) and Var (X).
• Solution:
MSBD5001 Fall 2025 83
Exponential Distribution
• The exponential distribution of random variable X is written as
• where the probability density function is given by
• and λ > 0 is called the rate of the distribution.
MSBD5001 Fall 2025 84
Mean and Variance of an Exponential
Distribution
• Mean of an exponential distribution
• Variance of an exponential distribution
MSBD5001 Fall 2025 85
Example
• Consider the random variable X which is distributed exponentially with the rate of
distribution 5
Find E(X) and Var (X).
• Solution:
MSBD5001 Fall 2025 86
Cumulative Distribution Function
• Cumulative distribution function of random variable X is defined as
• where fX(u) is the probability density function of random variable X.
• Graphical interpretations
MSBD5001 Fall 2025 87
Joint Probability Mass Function
• Discrete case
• Suppose X and Y are two discrete random variables, where X = {x1, x2, …, xm}
and Y = {y1, y2, …, yn}
• We define the joint probability mass function of X and Y by
where f(x, y) ≥ 0 and
• The probability of the event that X = xi and Y = yj is given by
MSBD5001 Fall 2025 88
Joint Probability Mass Function
• A joint probability mass function for X and Y can be represented by a joint
probability below.
• The probability that X = xi is obtained by adding all entries in the row corresponding to xi and
is given by
• Similarly, the probability Y = yj is obtained by adding all entries in the column corresponding to yj and is
given by
MSBD5001 Fall 2025 89
Example
• The joint probability function of two discrete variables X and Y is given by
f(x, y) = c(2x + y),
where x and y can assume all integers such that 0 ≤ x ≤ 2, 0 ≤ y ≤ 3, and f(x, y) = 0 otherwise.
• Find the value of the constant c.
• The sample points (x, y) for which probabilities are different from zero are shown on the
left. The probabilities associated with these points, given by c(2x + y), are shown on the
right.
• Since the grand total 42c must equal to 1, we have c = 1/42.
MSBD5001 Fall 2025 90
Example (Cont’d)
• Find P(X = 2, Y = 1).
• From the table, we see that
• Find P(X ≥ 1, Y ≤ 2).
MSBD5001 Fall 2025 91
Joint Probability Density Function
• Continuous case
• Suppose X and Y are two continuous random variables
• We define the joint probability density function of X and Y by f(x, y)
where f(x, y) ≥ 0 and
MSBD5001 Fall 2025 92
Example
• The joint probability density function of two continuous random variables X and Y is
• Find the value of the constant c.
• We must have the total probability equal to 1, i.e.,
• Using the definition of f(x, y), the integral has the value
• Then c = 1/96.
MSBD5001 Fall 2025 93
Example (Cont’d)
• Find P(1 < X < 2, 2 < Y < 3).
• Using the value c found, we have
MSBD5001 Fall 2025 94
Independent Random Variables
• Suppose that X and Y are discrete random variables.
• If the events X = x and Y = y are independent events for all x and y, then we say
that X and Y are independent random variables. In such case
P(X = x, Y = y) = P(X = x)P(Y = y)
• or equivalently
f(x, y) = fX(x) fY(y)
MSBD5001 Fall 2025 95
Multivariate Distributions
• All the results derived for the univariate case can be generalized to k random
variables.
• The joint probability distribution function of X1, X2, …, Xk will have the form
• when the random variables are discrete.
• when the random variables are continuous.
MSBD5001 Fall 2025 96
Multivariate Normal Distribution
• Recall the univariate normal distribution
• The k-variate normal distribution is given by
• where
MSBD5001 Fall 2025 97
Bayes Theorem
• In many situations, you will know one conditional distribution P(x|y) and P(x) but
you are really interested in the other conditional distribution P(y|x).
• Let A1,A2, …, An be a set of mutually exclusive events that together form the
sample space S.
Let B be any event from the same sample space, such that P(B) > 0. Then
MSBD5001 Fall 2025 98
Example
• For a magazine, the probability that the reader is male given that the reader is at least 35 years
old is 0.3. The probability that a reader is male, given that the reader is under 35, is 0.65. If 75% of
the reader are under 35, what is the probability that a randomly chosen reader is
a) Male
b) Female
c) Under 35 and it is given the reader is a female
MSBD5001 Fall 2025 99
Example
• Solution:
• (a) Let A1 be the event of the reader being at least 35 years old,
A2 the event of the reader being under 35 years old,
M be the event of the reader is being a male, and
F be the event of the reader is being a female.
𝑃 𝐴2 = 0.75, 𝑃 𝐴1 = 1 − 0.75 = 0.25
𝑃 𝑀|𝐴1 = 0.3, 𝑃 𝑀|𝐴2 = 0.65
𝑃 𝐹|𝐴1 = 0.7, 𝑃 𝐹|𝐴2 = 0.35
𝑃 𝑀 = 𝑃 𝐴1 , 𝑀 + 𝑃 𝐴2 , 𝑀
= 𝑃 𝐴1 𝑃(𝑀|𝐴1 ) + 𝑃 𝐴2 𝑃 𝑀 𝐴2 = 0.25 × 0.3 + 0.75 × 0.65 = 0.5625
• (b) 𝑃 𝐹 = 1 − 𝑃 𝑀 = 1 − 0.5625 = 0.4375
𝑃 𝐹|𝐴2 𝑃(𝐴2 ) 0.35×0.75
• (c) 𝑃 𝐴2 |𝐹 = = = 0.6
𝑃 𝐹|𝐴1 𝑃 𝐴1 +𝑃 𝐹|𝐴2 𝑃(𝐴2 ) 0.7×0.25+0.35×0.75
MSBD5001 Fall 2025 100