0% found this document useful (0 votes)
23 views117 pages

Intro To Stat Module 1

Uploaded by

lusungutchuwa53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views117 pages

Intro To Stat Module 1

Uploaded by

lusungutchuwa53
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

LILONGWE UNIVERSITY OF AGRICULTURE AND NATURAL

RESOURCES

Course Name: Introduction to statistics

Module Number: 1

Course Code: Mat 211

Module Writer: Alfred Ngwira and Steve Makungwa

1
OWNERSHIP

(COPYRIGHTS)

LILONGWE UNIVERSITY OF AGRICULTURE AND NATURAL


RESOURCES

June, 2016

2
Module Overview...................................................................................................8

Unit 1: Descriptive Statistics..................................................................................8


Unit 2: Correlation and regression...........................................................................8
Unit 3: Elementary Probability Theory....................................................................9
Unit 4: Random Variable and Probability Distribution...........................................9
Unit 5: Statistical inference......................................................................................9
Method of Assessment.............................................................................................9
Learning tasks in this module.................................................................................10
Activities, self-help questions and case studies......................................................10

3
4
Table of contents
Unit 1: Descriptive Statistics..............................................................................11
1.0 Introduction......................................................................................................11
1.1 Unit Objectives.................................................................................................11
1.2 Basic statistical terms.......................................................................................11
1.3 Summary of data using tables..........................................................................12
1.4 Graphical summary of data.............................................................................14
1.4.1 Histogram.....................................................................................................14
1.4.2 Frequency polygon.......................................................................................18
1.4.3 Ogive............................................................................................................19
1.4.4 Stem and leaf diagram..................................................................................20
1.4.5 Bar graph......................................................................................................22
1.4.6 Pie chart........................................................................................................23
1.5 Numerical summaries of data..........................................................................24
1.5.1 Measures of central tendency.......................................................................24
1.5.1.1 The mode...................................................................................................24
1.5.1.2 The median................................................................................................25
1.5.1.3 The mean...................................................................................................26
1.5.2 Measures of location.....................................................................................28
1.5.2.1 Quartiles.....................................................................................................28
1.5.2.2 Percentiles..................................................................................................29
1.5.2.3 Deciles.......................................................................................................30
1.5.3. Measures of data variability.........................................................................30
1.5.3.1 The range....................................................................................................30
1.5.3.3 Variance and standard deviation.................................................................31
1.5.3.4 Standard Deviation (s, sd)........................................................................32
1.5.4 The Five Number Summary..........................................................................33
1.7 Reflection.......................................................................................................34
Unit summary.........................................................................................................34

Unit 2: Correlation and linear regression.........................................................37

2.0 Introduction......................................................................................................37
2.1 Unit Objectives................................................................................................37
2.2 Scatter diagrams and correlations....................................................................37
2.3 Pearson and Spearman rank correlation coefficient.........................................39
2.3.1 Pearson correlation coefficient......................................................................39
2.3.2 Spearman’s rank correlation coefficient........................................................40
2.4 Linear regression..............................................................................................43
2.6 Reflection.........................................................................................................48
Unit Summary .......................................................................................................48
5
Unit 3: Elementary probability theory................................................................51

3.0 Introduction........................................................................................................51
3.1 Unit Objectives...................................................................................................51
3.2 Basic probability terms.......................................................................................51
3.3 Basic probability rules........................................................................................53
3.4 Intersection and union of events.........................................................................54
3.5 Conditional Probabilities....................................................................................56
3.6 Law of total probability and Bayes rule.............................................................59
3.6.1 Law of total probability...................................................................................59
3.8 Reflection............................................................................................................62
Summary...................................................................................................................62

Unit 4 : Random variables and probability distribution...................................66

4.0 Introduction........................................................................................................66
4.1 Unit Objectives...................................................................................................66
4.2 Random variable and probability distribution....................................................66
4.3 Discrete probability distributions.......................................................................69
4.3.1 Properties of discrete distributions..................................................................69
4.3.2 Mean and variance of a discrete probability distribution...............................70
4.3.3 Examples of discrete random variables and their distributions....................71
4.3.3.1 Discrete uniform random variable...............................................................71
4.3.3.2 Binomial random variable............................................................................71
4.3.3.3 Bernoulli random variable and probability distribution..............................73
4.3.3.4 Poisson random variable and distribution....................................................73
4.3.3.5 Multinomial random variable.......................................................................74
4.3.3.6 Hypergeometric random variable.................................................................75
4.4 Property of continuous random variable and distribution.................................76
4.5 Some continuous random variables and distributions.......................................79
4.5.1 Normal random variable and distribution.......................................................79
4.5.2 Standard Normal Distribution........................................................................80
4.5.2.1 Using standard normal to find probability of normal random variable........84
4.5.3 The student t-distribution...............................................................................85
4.5.4 The Chi-square (ᵪ2) random variable and distribution...................................86
4.5.5 The F random variable and distribution ..........................................................86
4.8 Sampling Distributions......................................................................................87
4.8.1 Distribution of sample mean (X).....................................................................87
4.8.2 The sampling distribution of X-μs/n...............................................................89
4.9 Reflection............................................................................................................90
Summary...................................................................................................................90

6
Unit 5: Inferential Statistics.................................................................................93

5.0 Introduction........................................................................................................93
5.1 Unit Objectives...................................................................................................93
5.2 Hypothesis Testing.............................................................................................93
5.3 Hypothesis test about population mean.............................................................94
5.3.1 Hypothesis about one population mean..........................................................94
5.3.2 Hypothesis about difference between two population means........................97
5.4 Testing hypothesis about population proportion..............................................102
5.4.1 Hypothesis about single population proportion ............................................102
5.4.2 Hypothesis about the difference between two population proportions.........102
5.5 Testing for equality of multiple populations means using F-test....................104
5.6 Testing for association in contingency tables using the Pearson chi-square....107
5.7 Point and interval estimation of population mean and proportion..................109
5.8 Reflection..........................................................................................................110
Unit Summary.........................................................................................................110
References..............................................................................................................117

7
Module Overview

The aim of this module is to introduce you to the knowledge of organizing and sum-
marizing of data. It further introduces you to the basic knowledge of correlation and
regression. The module further looks at introductory probability theory which forms
the basis for statistical inference. The module has been written so as to equip you with
basic statistical knowledge and skills so that you can be able to organize and summarize
data which will enable you to have first impression of the data. Also the module will en-
able you to have basic statistical inference skills, particularly in hypothesis testing and
interval estimation. Specifically in statistical inference and let alone hypothesis testing
and interval estimation, the module will enable you to set up research hypothesis, select
appropriate sample statistic eg sample mean and then make conclusion about the re-
search hypothesis. Knowledge and skills learnt in the module provide basic statistical
knowledge and skills in the later courses like experimental designs, research project
courses, and also in real life situations particularly in research projects. In a nutshell
this module will enable you to :

• summarize data using tables, graphs and numerical measures


• compute correlation coefficient between two sets of data
• describe correlation between two sets of data
• compute the simple regression equation between two sets of data
• apply probability rules to solve real life problems
• carry out hypothesis testing about population mean and proportion

Specifically you will learn the following in each unit.

Unit 1: Descriptive Statistics

In this unit you will learn introductory issues regarding descriptive statistics. First you
will learn how to organize and presents data using frequency tables and graphs. You
will also learn describing data using numerical summaries like measures of the centre
and spread. Apart from measures of the centre you will look at measures of location
which are not necessarily measures of the centre like quartiles and percentiles.

Unit 2: Correlation and regression

In this unit you will learn about assessing correlation using scatter plots. You will then
learn how to compute and interpret Pearson and Spearman correlation coefficients as
numerical measures of correlation. You will finally learn about simple linear regression
model as a means to investigate linear relationship between two variables.

8
Unit 3: Elementary Probability Theory

In this unit you will learn elementary probability theory. You will specifically learn
elementary probability rules like the addition and multiplication rule. This will be
followed by conditional probability, law of total probability and the Bayes rule. What
you will learn in this unit will help you in working out basic probability problems and
knowledge will provide you foundation to Bayesian inference which stems from Bayes
theorem.

Unit 4: Random Variable and Probability Distribution

The unit continues from previous unit by looking at random variables and their
probability distributions. In the unit you will learn two types of random variables:
discrete and continuous. For the discreet random variables, you will look at the
uniform, Poisson, binomial and hypergoemetric and for continuous random variables,
you will learn about normal, t, chi-square and the f-random variables. Finally in this
unit, you will learn about sampling distribution of the sample mean and the related
quantities. What is presented in this unit is important for statistical inference proce-
dures presented in unit 5.

Unit 5: Statistical inference

This unit introduces you to statistical inference in terms of hypothesis testing and in-
terval estimation. In the unit you will learn about hypothesis test about the mean and
proportion. You will further learn about hypothesis test about more than two means
using the F test in ANOVA. In the in the unit, you will also learn about hypothesis test
about independence or association in a cross table. Finally you will learn about interval
estimation of population mean and proportion.

Method of Assessment

This module is divided into a number of units. Each unit addresses some of the learning
outcomes. In each unit there are a number of concepts to be learnt and corresponding
to help you internalize the concepts discussed therein. The activities vary in nature to
address the three levels of Blooms learning taxonomy. The module details will help you
to succeed in your all your lesson tasks.

Your work in this module will be assessed in a number of ways:

All the unit tasks which are self-reflection, self-assessment, group work or unit test will
account for your 40%. In each unit as part of continuous assessment; to promote your
learning, you will also do activities throughout this module that will help you prepare
for your major assignment or for the final examination. Depending on the way your
9
course lecturer wishes to assess you, you might be given a written examination set
by the university. The module test at the end of it will account for 60% of your final
grade.

Learning tasks in this module

This module gives you a unit-by-unit guide to the course you are studying. Each unit
includes information, case studies, activities, self-help questions and readings for you
to comprehend the unit. These are all designed to help you achieve the learning out-
comes that are stated at the beginning of the module.

Activities, self-help questions and case studies

The activities, self-help questions and case studies are part of the course activities.
They help you make your learning more active, enjoyable and effective. The tasks
will help you to engage with concepts being discussed and help you check your own
understanding. Where you are not sure, work with your peers or if possible consult the
course lecturer.

Please note that the activities may be reflective exercises designed to get you thinking
about the issues raised in it. They may be practical tasks to undertake on your own or
with fellow students. The self-help questions are usually more specific and require a
brief written response. The answers are given at the end of each unit, however, the
answers are not biblical; you are free to think of your own. If you wish, you can also
record your answers to the self-help questions in your learning journal, or you may use
a separate notebook

10
Unit 1:
Descriptive Statistics
1.0 Introduction

In this unit you will learn about descriptive statistics where basically you will learn how
to summarize and organize data. Specifically under organization and summarization
of data you will learn about organizing and summarizing data using tables and graphs.
In the unit you will also specifically learn about summarizing data using numerical
summaries like measures of centre, location and dispersion. The unit is important as it
provides you preminary data analysis tools before carrying out inferential statistics.

1.1 Unit Objectives

By the end of the unit you should be able to


a) define basic statistical terms
b) organize data using frequency tables
c) present data using graphs
d) summarize data using numerical summaries like mean

Key terms

• Frequency distribution
• Measure of centre
• Measure of location
• Measure of dispersion

1.2 Basic statistical terms

• Data consists of information coming from observations, counts, measurements,


or responses. The singular of data is datum or data point.
• Statistics is the science of collection, organizing, analyzing and interpreting
data in order to make decisions.
• A population is a collection of all items of interest in the study. For example,
lets say you want to find out the mean height of statistics class, then the
population include all students in the statistics class.
• A sample is a subset of a population. For example, to measure heights of
students in the statistics class, we may not measure everyone probably due to
time; so we take a subset of students. This subset is the sample.
• A parameter is a numerical description of a population characteristic. Examples
include population mean which describe population center and population

11
standard deviation which describe population variation.
• A statistic is a numerical description of a sample characteristic. Examples
include sample mean describing sample centre and sample standard deviation
which describe how sample values vary.
• Descriptive Statistics is the branch of statistics that involves the organization,
summarization, and display of data.
• Inferential statistics is the branch of statistics that involves using a sample to
draw conclusions about a population. Some of the problems concerned in
inferential statistics are those of estimation and hypothesis testing.
• Data types: There are two types of data
o Qualitative and quantitative
o Qualitative data is non numeric data and is usually treated as categorical
data.

This means the observed data values can be put into categories. Examples of
qualitative/categorical data include: gender data(male/female), college(poly/chanco/
bunda)

o Quantitative data is numeric data that arises from frequencies or measurements.

There are two types of quantitative data:

o Discrete and continuous


o Discrete Data: the data are said to be discrete if the measurements are integers
(e.g. number of employees of a company, number of incorrect answers on a
test, number of participants in a program, etc)
o Continuous Data: the data are said to be continuous if the measurements can
take on any value, usually within some range (e.g. weight, age, income etc).

Activity 1.1
State the type of the following data
a) Fertilizer type
b) Fish weight

1.3 Summary of data using tables

A frequency table: is a list of possible values and their frequencies.


It is used to summarize categorical data into tables to see frequency distribution for
each category

Example
Frequency distribution table for political affiliation

12
A frequency table: is a list of possible values and their frequencies.
It is used to summarize categorical data into tables to see frequency distribution for each
category
Example
Frequency distribution table for political affiliation
Political Frequency Relative Cumulative Cumulative
affiliation frequency frequency relative
frequency
PP 70 70/300=0.23 70 0.23
AFORD 30 30/300=0.1 100 0.33
DPP 100 100/300=0.33 200 0.66
UDF 20 20/300=0.07 220 0.73
MCP 80 80/300=0.27 300 1

IfIfyou
youhave
havecontinuous
continuous data
data and
and want
want to
to have
have frequency
frequency table
tableput
putthe
thedata
datainto
intogroups
groupsor
orcategories.
categories.

Example
4
Group internet use data into group frequency table
7,7,11,17,17,18,19,20,21,22,23,28,29,29,30,30,31,31,33,34,36,37,39,39,39,40,41,41,4
2,44,44,46,50,51,53,54,54,56,56,56,59,62,67,69,72,73,77,78,80,88

Step 1: Find Range = Largest observation – Smallest observation


=88-7
=81.

Step 2: Find class width by dividing the range by the number of classes: Class width

Step 3: Find class boundaries starting from lowest observation and adding class width
to find upper limit. This gives boundaries of 7, 19, 31, 43, 55, 67, 79, and 91

NOTE: We will have the classes running from 7 to 19, etc where the upper bound is
exclusive, i.e it does not include 19

13
Step 3: Find class boundaries starting from lowest observation and adding class width to
upper limit. This gives boundaries of 7, 19, 31, 43, 55, 67, 79, and 91
NOTE: We will have the classes running from 7 to 19, etc where the upper bound is
exclusive, i.e it does not include 19
Class Freq Rel freq Cum freq Cum rel
freq
7 -19 6 6/50=0.12 6 0.12
19 - 31 10 10/50=0.20 16 0.32
31 -43 13 0.26 29 0.58
43 - 55 8 0.16 37 0.74
55 - 67 5 0.10 42 0.84
67 - 79 6 0.12 48 0.96
79 - 91 2 0.04 50 1.00
Step 3: For each class, count the number of observations that fall in that class. This numb
Step 3: For each
called class,
the class count the number of observations that fall in that class. This
frequency.
number is called the class frequency.
Step 4:Step
The4:relative
The relative frequency
frequency of aisclass
of a class is calculated
calculated by f/n fwhere
by f/n where is the ffrequency
is the frequency o
of the class and nn is the number of observations in the data set.
Step 5:Step
Find5:the
Find the cumulative
cumulative frequency
frequency and cumulative
and cumulative relative
relative cumulative
cumulative frequency
frequency

ActivityActivity
1.2 1.2
A sample of the variable x assumes the following values:
1 2 6 7 12 13 2 6 9 5
18 7 3 15 15 4 17 1 14 5
A4 sample
16 of the
4 variable
5 x assumes
8 the
6 following
5 values:
18 5 2
Generate
1 a frequency
2 distribution
6 7 indicating
12 group
13 of
2 x and frequency
6 9 of x,5
18 7 3 15 15 4 17 1 14 5
1.4 Graphical4 summary
16 of data
4 5 8 6 5 18 5 2
Generate a frequency distribution indicating group of x and frequency of x,
1.4.1 Histogram
5
A histogram displays the frequency distribution of a quantitative variable by showing
the frequency (count) or percent of the values that are in various classes.
The classes are typically intervals of numbers that cover the full range of the variable.
Histograms are used to assess the distribution of the quantitative variable.
To construct a histogram, group the data into a frequency table, i.e. put data into class-
es, and determine frequency for each class.
Plot class frequency against class intervals to have a histogram.

14
13

Shapes of the histograms


(a) Normal (symmetric)
A histogram has bell shape.

Fig 1.2: Normal shape of a histogram


15
The normal shape of histogram means that most data is clustered in the centre.

(b) Skewed (asymmetric)


The distribution’s peak is off center and a tail stretches away from it.
These distributions are called right or left-skewed according to the direction of the
tail.

16
In the right skewed distribution it means most data is to the left and in the left skewed
distribution it means most data is to the right.

(c) Bimodal
The bimodal distribution looks like the back of a two-humped camel.

17
distribution it means most data is to the right.

(c) Bimodal
The bimodal distribution looks like the back of a two-humped camel.

25
20
Frequency

15
10
5
0

-3 6

Temperature
Fig 1.5: Bimodal shape of histogram
The bimodal distribution means that the data come from two distinct populations.
Fig 1.5: Bimodal shape of histogram
1.4.2 The bimodalpolygon
Frequency distribution means that the data come from two distinct populations.
Is a line graph plotted by plotting frequency against class interval. It shows frequency
1.4.2 Frequency
distribution of classes bypolygon
a line graph. For example the frequency
polygon for internet use data is as shown below:
Is a line graph plotted by plotting frequency against class interval. It shows
frequency distribution of classes by a line graph. For example the frequency polygon
for internet use data is as shown below:

18
1.4.3 Ogive

Is a plot of cumulative frequency versus class intervals. The diagram below shows an
ogive for internet use data.

19
1.4.4 Stem and leaf diagram
It is used to display the frequency distribution of quantitative data by showing actual
data rather than frequencies.

Example
Draw stem and leaf diagram for the students scores:
35, 36, 38, 40, 42, 42, 44, 45, 45, 47, 48, 49, 50, 50, 50.

Solution
Range: 35 to 50

A stem-and-leaf plot shows the shape and distribution of data. It can be clearly seen in
the diagram above that the data clusters around the row with a stem of 4. It is some how
normal(ie most data is at the centre and a few towards the ends).

20
Example
A teacher asked 10 of her students how many books they had read in the last 12 months.
Their answers were as follows:

12, 23, 19, 6, 10, 7, 15, 25, 21, 12

Prepare a stem and leaf plot for these data. Tip: The number 6 can be written as 06,
which means that it has a stem of 0 and a leaf of 6. The stem and leaf plot should look
like this:

Using the data from table above, we make the ordered stem and leaf plot shown below:

Example

The weights (to the nearest tenth of a kilogram) of 30 students were measured and
recorded as follows:
59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9, 65.7, 60.4,
58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2
Prepare an ordered stem and leaf plot for the data. Briefly comment on what the
analysis shows.

Solution
In this case, the stems will be the whole number values and the leaves will be the
decimal values. The data range from 56.3 to 65.7, so the stems should start at 56 and
finish at 65

21
1.4.5 Bar graph
It is used to show graphical frequency distribution for a categorical variable.

Example
Draw the bar graph for the data below:

Causes of Shrinkage $million


Employee Theft 15.6
Shoplifting 14.7
Administrative Error 7.8
Vendor Fraud 2.9

22
1.4.6 Pie chart
A pie is an alternative display of bar graph

Example
Draw the pie chart for the data in example above.

A pie chart is a circle with slices proportional to the relative frequency of each category.
A Pie chart is made by representing the relative frequency of a category by an angle of
a circle determined by:

Angle of a category = Relative frequency of the category

Relative frequency=f/n

Example
A shop sells different sizes of gloves. The table shows the percentage of gloves sold in
a year that were each size.

As the values are percentages, the total must be 100% (but check to make sure)
23
Note: Sometimes rounding the angles leads to a total angle of more or less than 360º.
If this happens, adjust the angle of the largest sector so that the total is correct e.g if the
total comes to 361º, take 1º away from the largest angle.

Activity 1.3

What is the difference between a histogram and a bar graph?

1.5 Numerical summaries of data

1.5.1 Measures of central tendency


Definition: Given a series of observations measured on a quantitative variable,
there is a general tendency among the values to cluster around a central value. Such
clustering is called central tendency and measures put forward to measure these
tendency are called measures of central tendency or averages.

Measures of the centre indicate where the center or the most typical value of the
variable lies in collected set of measurements. Measures of center are often referred to
as averages.

1.5.1.1 The mode


The sample mode of a qualitative or a discrete quantitative variable is that value of the
variable which occurs with the greatest frequency in a data set.

24
Example
(a)
Student mid term scores
1 94
2 90
3 90
4 90
5 81
6 70
7 65
8 56
9 30

(b)

Find the mode in (a) and (b)?

Solution
(a) mode is 90 (b) mode is category no anaemia
Note that for a categorical variable, look at the category with highest frequency and for
a quantitative (numeric) variable look at number with frequent occurrence.

1.5.1.2 The median


The sample median of a quantitative variable is that value of the variable in a data set
that divides the set of observed values in half, so that the observed values in one half
are less than or equal to the median value and the observed values in the other half are
greater or equal to the median value. To obtain the median of the variable, we arrange
observed values in a data set in increasing order and then determine the middle value
in the ordered list.

If the number of observation is odd, then the sample median is the observed value
exactly in the middle of the ordered list. If the number of observation is even, then the
sample median is the number halfway between the two middle observed values in the
ordered list. In both cases, if we let n denote the number of observations in a data set,
then the sample median is at position (n+1) in the ordered list.
2

25
Example
Seven participants in bike race had the following finishing times in minutes:
28,22,26,29,21,23,24. What is the median?

Solution
Medium is the middle observation in the ordered list 21,22,23,24,26,28,29 i.e 24.
What it means, half of the observations are less than or equal to 24 and half of the ob-
servations are more than or equal to 24(check).

Example
Eight participants in bike race had the following finishing times in minutes:
28,22,26,29,21,23,24,50. What is the median?

Solution
Ordered list: 21,22,23,24,26,28,29,50, and median is the sum of two observed values
middle values dived by 2 i.e. (24+26) = 50 = 25
2 2
In both cases the median is at position in the ordered list (check).

1.5.1.3 The mean


The most commonly used measure of center for quantitative (numeric) variable is the
(arithmetic) sample mean. When people speak of taking an average, it is mean that they
are most often referring to.

Definition (Mean). The sample mean of the variable is the sum of observed values in
a data divided by the number of observations.

Example
Seven participants in bike race had the following finishing times in minutes:
28,22,26,29,21,23,24. What is the mean?

Solution
Mean=(28+22+26+29+21+23+24)=25
7

Which measure to choose?


The mode should be used when calculating measure of center for the qualitative vari-
able (i.e which category is the centre?). When the variable is quantitative (numeric)
with symmetric distribution, then the mean is proper measure of center.

26
In case of this symmetric distribution, the mean=mode=median (see Fig 11).

27
In a case of quantitative variable with skewed distribution, the median is good choice
for the measure of center. This is related to the fact that the mean can be highly
influenced by an observation that falls far from the rest of the data, called an outlier.

Fig 1.12: Skewed distribution (median is a appropriate measure of the centre)

Activity 1.4

The median is less sensitive than the mean to:


A. the skewness of a variable.
B. the sum of squares.
C. dichotomous variables.
D. absolute error

1.5.2 Measures of location

1.5.2.1 Quartiles
The quartiles of the variable divide the observed values into quarters, or 4 equal parts.
The variable has 4 quartiles, denoted by Q1, Q2 and Q3, and Q4

28
Roughly speaking, the first quartile, Q1, is the number that divides the bottom 25% of
the observed values from the top 75%; second quartile, Q2, is the median, which is the
number that divides the bottom 50% of the observed values from the top 50%; and the
third quartile, Q3, is the number that divides the bottom 75% of the observed values
from the top 25%. The fourth quartile (Q4) is the maximum value in the data set.

Definition (Quartiles). Let n denote the number of observations in a data set. Arrange
the observed values of variable in a data in increasing order.

1. The first quartile Q1 is at position (n+1)


4.
2. The second quartile Q2 (the median) is at position 2(n+1))=(n+1)
4 2.
3. The third quartile Q3 is at position 3(n+1) in the ordered list.
4

1.5.2.2 Percentiles divide the data into 100 parts, i.e 1st, 2nd, 3rd, 4th , ….100th
percentile.

If data is divided into hundred parts by percentiles it means 1% of data is below the
1st percentile, 2% below 2nd percentile, 3% below 3rd , 50% of data below the 50th
percentile, 90% of the data falls below the 90th percentile. To find percentiles sort data
from low to high. Apply the formula for percentiles to locate the percentile you are
interested in,

where p= desired value and n= number of values in data set.

Example
Find
a) the 50th percentile
b) 20th percentile in this ordered data: 5 7 9 10 11 13 14 15 16 17 18 18 20 21 37

Solution

Decimal positions of percentiles do not make sense, so interpolate the actual values,
that is, at position 4, there is 10 and at position 3 there is 9, therefore the 20th percentile
is 9+0.2* (10-9)=9.2
29
Relationship between quartiles and percentiles
1st. quartile→ 25th percentile
2nd. quartile→ 50th percentile (MEDIAN)
3 . quartile→
rd
75th percentile
4 . Quartile→
th
100th percentile

Notice that the 2nd quartile, or 50th percentile is the same as the median. Thus quartiles
operate with the same formula for percentiles.

1.5.2.3 Deciles
Deciles are those values which divide the data series into ten equal parts. There are nine
deciles i.e. D1, D2, D3, …, D9 in a series and 5th decile(D5) is same as median and 2nd
quartile, because those values divide the series in two equal parts.

The calculation of deciles is done as follows:

1.5.3. Measures of data variability

1.5.3.1 The range


The sample range is obtained by computing the difference between the largest observed
value of the variable in a data set and the smallest one.
Definition(Range). The sample range of the variable is the difference between its maxi-
mum and minimum values in a data set: Range = Max −Min.
The sample range of the variable is quite easy to compute. However, in using the range,
a great deal of information is ignored, that is, only the largest and smallest values of the
variable are considered; the other observed values are disregarded.

Example
Seven participants in bike race had the following finishing times in minutes:
30
28,22,26,29,21,23,24. What is the range?

Solution
Range=Max-Min=29-21=8

Example
Eight participants in bike race had the following finishing times in minutes:
28,22,26,29,21,23,24,50. What is the range?

Solution
Range=Max-Min=50-21=28

1.5.3.2 Interquartile range


Definition(Interquartile range). The sample interquartile range of the variable, denoted
by IQR, is the difference between the first and third quartiles of the variable, that is,
IQR = Q3 − Q1.

Example
Seven participants in bike race had the following finishing times in minutes:
28,22,26,29,21,23,24. What is the interquartile range?

Solution
Ordered list: 21,22,23,24,26,28,29

Q1 is at (n+1) = (7+1) =2 i.e 22


4 4
Q3 is at 3 (n+1)= 3×8=6 i.e 28
4 4

Interquartile range=Q3-Q1=28-22=6

1.5.3.3 Variance and standard deviation


They are common measures of data variability. They are measures of how data varies
from the mean.

Note, the degrees of freedom is the number of pieces of information remaining


that enter in the estimation of the parameter given that other parameters have been
estimated using the same sample data. The assumption is that as parameters are being
estimated by the same sample data, some information is lost and the information lost
is assumed to be equal to the number of parameters estimated. Now the smaller the
31
variance, the closer the individual scores are to the mean and vice versa. The
variance is in squares which is in contradiction to original data which is not in squares.
The standard deviation tries to make measure of variation not to be in squares, that is,
to be similar to original data. It is thus the recommended measure of variation unlike
variance.

1.5.3.4 Standard Deviation (s, sd)

The positive square root of the variance,

Like variance the larger the sd, the lager the distance the measurements have from the
mean.

Example
Find the variance and standard deviation of the following data: 2,3,5

Solution

Example (comparing data with different std deviations)


Distribution Mean SD
A 35 35 35 35 35 35 35 0
B 28 40 35 37 29 41 35 5.47
C 1 4 5 67 68 65 35 34.72

32
Measurements in A are closer to the mean, than measurements in C which are far from
the mean. Standard deviation is used instead of variance because variance has squared
units where standard deviation has the same units as the original data.

1.5.4 The Five Number Summary


The five number summary of a data set consists of the minimum, maximum, and quar-
tiles:

Boxplots are basically a way of graphing the five-number summary. The simplest type
of boxplot is of the form:

The box is the interquartile range, that is, difference between 1st quartile and 3rd
quartile.
The line between Q1 and Q3 denote the median or 2nd quartile or 5th decile.

Activity 1.5

1. The variance measures deviation around the:


A. mode
B. median
C. mean
D. sum of squares
2. In a box-and-whiskers diagram, the standard deviations of the
independent variable scores are indicated by:
A. the lines in the centers of boxes.
B. the length of whiskers.
C. the size of the dots used for data points.
D. the lengths of boxes

Unit Activity 1.6


The data below shows weight gain in grams to children after a nutritionist
gave them diet plan A after 6 months. Draw a stem and leaf diagram for
data and coment on distribution of weight gain.

33
1.7 Reflection
An exam is given to students in an introductory statistics course. What is likely to be
true of the shape of the histogram of scores if the exam is quite easy?

Unit summary

In this unit you have learnt about ways of describing the population using the
sample. You have learnt about table, graphical and numerical measures in describing the
population. What you have learnt in this unit is important in understanding the first
impression of the data before having statistical significant tests which are presented in
unit 4.

End of unit test


1. Parameters and statistics…
A. Are both used to make inferences about x
B. Describe the population and the sample, respectively.
C. Describe different groups of individuals.
D. Describe the same group of individuals.

2. Why do we use inferential statistics?


A. to help explain the outcomes of random phenomena
B. to make informed predictions about parameters we don’t know
C. to describe samples that are normal and large enough (n>30)
D. to generate samples of random data for a more reliable analysis

3.
Which of these statements is false?
A. A parameter, in practice, is an unknown number describing the
population.
B. A statistic is used to estimate an unknown parameter.
C. A parameter is used to estimate an unknown statistic.
D. Statistics can change from sample to sample.

4.Which of the following is/are true about a skewed right distribution with
extreme outliers?
I) The mean is greater than the median
II) The median should be used as the measure of center because
it is more resistant to extreme observations than the mean
III) The standard deviation should be used as the measure of spread because
it is more resistant to extreme observations than the range or inter-
quartile range
A. I and II only
34
B. I and III only
C. II only
D. I, II, and III

5. The lengths (to the nearest tenth of a cm) of 30 fish in a pond were measured
and recorded as follows: 59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1,
60.7, 61.6, 56.3, 61.9, 65.7, 60.4, 58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2,
62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2. Prepare an ordered stem and leaf plot
for the data. Briefly describe the distribution of lengths.

Answers to unit activities


Answers to unit 1 activities
Answer to activity 1.1
a) Categorical
b) Continuous

Answer to activity 1.2

Range=18-1=17
Number of classes=
Class width=17/5=4
Class boundaries: 1 to 4, 4 to 7, 7 to 10, 10 to 13, 13 to 16.
Class Limits Frequency
______________________________________
1–5 9
5 – 9 11
9– 13 2
13 – 17 5
17 – 21 3
______________________________________
Answer to activity 1.3
A histogram display continuous data while bar graph displays categorical data. Fur-
thermore a histogram has no gaps between bars while a bar graph has gaps between
bars.

Answer to activity 1.4


A
Answer to activity 1.5
1.C
2.D

35
Answer to unit activity

Answers to Unit test


1.B
2.B
3.C
4.A
5.

36
Unit 2:

Correlation and linear regression


2.0 Introduction

In this unit you will learn about correlation using scatter plots, Pearson, and Spearman
rank correlation coefficient. Correlation is related to linear regression as both look at
relationship between variables. Linear regression forms the basis for most modeling
techniques and hence it will help you to easily understand them.

2.1 Unit Objectives

By the end of this unit you should be able to:


a) interpret correlation in a scatter plot
b) calculate Pearson and Spearman rank correlation coefficient
c) interpret Pearson or Spearman correlation coefficient
d) calculate linear regression equation given the two sets of data

Key terms

• Correlation
• Regression

2.2 Scatter diagrams and correlations

Scatter plots are plots of two quantitative variables, one on vertical and another on the
horizontal axis. Patterns of scatter diagrams are used to show relationship between two
variables. Scatter diagram correlations are shown and described below:

37
Fig 2.1: Scatter patterns and correlations

Strong Positive: If one variable increases at the same time the other variable increases,
they are said to be positively correlated.

Strong Negative: If one variable decreases at the same time the other variable
increases, or vice versa, they are said to be negatively correlated.

Complex: The data points are scattered in a curved pattern. The shape may look like
a rainbow or an arch. The two variables are correlated, though not linearly. As X
increases, Y first increases, then it decreases (or vice versa).

Weak Relationships: A weak correlation does not necessarily mean that the factor
being studied is not a cause. It may simply be a weak cause or a cause that requires the
presence of another contributing factor to bring about the effect. In this latter case, both
the factor under study and the contributing factor are perfectly good causes; you just
need them both to be active simultaneously to get the effect.

No Relationship: The data points are scattered in a shapeless pattern. You can
conclude that the two variables are not correlated over the ranges for which the data
was collected.
Activity 2.1
A scatter plot allows one to see:
A. whether there is any relationship between two variables
B. what type of relationship there is between two variables
C. both a and b
D. neither a nor b

38
2.3 Pearson and Spearman rank correlation coefficient

2.3.1 Pearson correlation coefficient


It assumes that two sets of measurements are symmetrically (normally) distrib-
uted, otherwise use other measures like Spearman rank correlation coefficient. The
following is the Pearson correlation coefficient:

Note that -1 > r >1, i.e r is between -1 and 1. If r>0, it means there is positive correla-
tion between two data sets and if r<0, it means there is negative correlation. If r=1 it
means there is perfect positive correlation and if r= -1, it means there is perfect negative
correlation. If r is close to 1, there is strong positive correlation and if r is close to -1 it
means there is strong negative correlation.

Example
Find Pearson correlation coefficient between Kips’ fat and calories using data below:

Solution

39
How can you describe the correlation between X and Y data?

Solution
There is strong positive linear relationship, i.e. as x increase y increase as well i.e the
scatter plot of y against x confirms

Fig 2.2: Scatter plot of calories and grams of fat

Note, Pearson correlation coefficient is a measure of linear relationship. The closer to


0 the Pearson correlation coefficient is, the less the linear correlation and vice versa.

2.3.2 Spearman’s rank correlation coefficient


Instead of using the values of the variables x and y, we rank them in order of size,
using the numbers 1, 2, 3, …, n. A Spearman correlation coefficient is then determined
on the basis of these ranks. For each pair of ranks we calculate difference between the
ranks

Spearman’s coefficient of rank correlation is then given by:

40
Note that this value is particularly useful when the exact data values are not known but
have already been ranked (eg positions in a class or competition). Also the spearman
rank correlation coefficient is used when the two sets of measurements are asymmetri-
cally distributed (non normal).

Example
These are the marks obtained by 8 pupils in a Mathematics and Physics. Calculate
Spearman’s coefficient of rank correlation.

Use spearman correlation coefficient to asses correlation between Mathematics and


Physics.

Note that one may rank the observations by assigning the largest data point the smallest
rank (1) instead of assigning the largest rank to the largest data point. See next example.

Dealing with tied ranks


Similar scores are assigned the average of the consecutive ranks, and the next data
point is assigned the rank after those consecutive ranks we have taken average.

Example
The marks of 12 pupils in geography and history essays are as follows:

41
Calculate Spearman’s rank correlation coefficient.

Solution
First we must rank the data.

Geography

19 = 1
18 = 1 (2 + 3 + 4) = 3
3
17 = ½ (5 + 6) = 5.5
16 = ½ (7 + 8) = 7.5
15 = ½ (9 + 10) = 9.5
14 = 11
10 = 12

History
13 = ½(1+2) = 1.5
12 = 1 (3+4+5) = 4
3
11= 1 (6+7+8) = 7
3
10 = 9
9 = 10
8 = 11
7 = 12

42
There is some positive correlation between the geography and history results.

Activity 2.2
At the Agriculture show 10 countries sheep were ranked by the qualified
judge and by a trainee judge. Their rankings are shown below:

Calculate the Spearman rank correlation coefficient between the ranks for
qualified judge and trainee judge, and coment on the correlation.

2.4 Linear regression

The central purpose of linear regression is to create a linear equation relating the in-
dependent variable X, to the dependent variable Y. The linear equation can be used to
predict Y given X.

The linear equation relating independent variable X, to the dependent variable Y is


defined as

43
44
Example

45
Solution
(a)

(b) Yes there is a linear relationship between calories(Y) and fat(X)

(c)

46
28 530 14840 784
̅ 20.6 ̅ 416 ∑ =46950 ∑ 2421
n

Y X i i  nY X
b1  i 1
n
2
X
2
i  nX
i 1

13.70989

̅ ̅
13.70×20.6
133.5762
Thus linear regression equation/model between Y and X is
Y  133.57  13.7 X

(c) To predict y given x=2 we substitute the x value into the regression e
Y  133.57  13.7 X
 133.57  13.7  2
 160.97
Note that the assumption to fit a linear regression line to the given data is
Note that the assumption to fit a linear regression line to the given data is that the
response(Y) must be response(Y) must be normally
normally distributed. distributed.
Now suppose the Now suppose
scatter the scatter plot b
plot between
variables
two variables is not linear, thatisis,notcomplex
linear, that
likeis, complex Then
quadratic. like quadratic.
the modelThen the model bet
between
and predictor(X)
response(Y) and predictor(X) has to be has to be modified
modified by including
by including the term inthethe
term in the
model to model to
the complexity
take into account the complexity e.g e.g a quadratic
a quadratic relationship
relationship between
between Y and Y and X would be re
X would
be represented by the model:
model: Y  b0  b1 X 2

Activity 2.3 Activity 2.3


In a large class, the instructor ran a regression with the independent
variable(X) as the grade on the first exam and the dependent variable(Y)
being the gradeIn on
a large class, the
the second instructor
exam. She ran a regression
found with to
the equation thebe:
independent vari
on the
Y = 29.25 + 0.57X first exam
. Knowing and
that thefriend
your dependent variable(Y)
got a 78 on the firstbeing
exam,the grade on the
found
what value would youthe equation
predict to be:
she got on Ythe=second
29.25 +exam?
0.57X . Knowing that your friend
exam, what value would you predict she got on the second exam?
2.5
2.5 Practice
Practice Activity
Activity
Eight test areas were given different concentrations of new fertilizer and the resulting crop
Eight
was test areas were given different concentrations of new fertilizer and the
weighed.
resulting crop was weighed.
Concentration 1 2 3 4 5 6 7 8
g/L(X)
Crop yield 7 11.1 14 16.2 20 23.9 27 29
kg(Y)
a) Draw a scatter diagram for the data, and interpret 35
the kind of
a) Draw acorrelation.
scatter diagram for the data, and interpret the kind of correlation.
b)b)Calculate equationequation
Calculate of the regression line Y on Xline Y on X
of the regression
c)c)What would be the crop yield if the concentration is 12.
What would be the crop yield if the concentration is 12.
47
2.6 Reflection
2.6 Reflection
2.6 Reflection
When is Spearman correlation coefficient used instead of Pearson correlation coefficient
When is
between twoSpearman correlation coefficient used instead of Pearson correlation
sets of data?
coefficient between two sets of data?
Unit Summary
Unit
In this unitSummary
you have learnt relationship between variables using correlation and simple linear
regression. The unit has provided the starting point in the process of investigating
Inrelationships
this unit you have quantitative
between learnt relationship between
variables. Linearvariables
regressionusing
is thecorrelation andin most of
starting point
simple linear regression. The unit has provided the starting point in the process of
the regression techniques.
investigating relationships between quantitative variables. Linear regression is the
starting point in most of the regression techniques.

End of unit test


Unit Test
Find the Pearson correlation coefficient between students time of study and grade and
Find the Pearson correlation coefficient between students time of study and grade and
interpret correlation between students study time and grade.
interpret correlation between students study time and grade.
Student Hours Score
A 1 1
B 1 3
C 3 2
D 4 5
E 6 4
F 7 5
G 8 7
H 8 8

36

48
Answers to unit activities
Answers
Answers to unittoactivities
unit activities
Answer
Answer
Answer totoactivity
2.1 2.1
activity
to activity 2.1
C C C
Answer
Answer
Answer totoactivity
2.2 2.2
activity
to activity 2.2
AnswersQualified
to unit activities
12 23 34 45 56 67 78 89 9 10 10
Qualified 1
Answer to activity 2.1
CJudge Judge
AnswerTrainee
to activity 2.2
12 25 56 67 78 810 104 43 39 9
Trainee 1
Qualified 1 2 3 4 5 6 7 8 9 10
Judge
Judge Judge
d d
Trainee0
0
1 0 2
0-2 5 -2-26 -2-27 -28-2 -2
-3
10
-3
4 4
4
3 6 9
61 1
2
d2 d
Judge 0 00 04 44 44 44 49 9 16 1636 361 1
d 0 0 -2 -2 -2 -2 -3 4 6 1
6r 78 6d  78  00.47 0
2
4 4 4 4 9 16 36 1
r  0.47
210(10 2  1)
10(10  1)
Coment:
Coment: there
6 is
there 78 is positive
positive correlation
correlation between between
trainedtrained and trainee
and trainee judge. judge.
r  0.47
10(10 2  1)
Answer
Answer toto
Coment: activity
activity
there 2.3
2.3 correlation between trained and trainee judge.
is positive
Answer to activity 2.3
Y=29.25+0.57×78=73.71
Answer to activity 2.3
AnswerAnswer
to Unittopractice
Unit practice
activity activity
Answer
a) to Unit
Based on practice
scatter plot,activity
a) Based
Answerontoscatter plot, there
Unit practice isthere
activity
is a strong
a strong positive
positive linear correlation
linear correlation between
between yield and
yield and
a) Based on
fertilizer scatter
concentration.plot, there is a strong positive linear correlation between
fertilizer
a) Based concentration.
on scatter plot, there is a strong positive linear correlation between yield and
yield and fertilizer concentration.
fertilizer concentration.
8
8

8
7
7

7
6
6

6
yield
Crop yield

Crop yield

5
5

5
Crop
4
4
4

3
3
3

2
2
2

1
1
1

10 15 20 25

10 10 15 15 20
Fertilizer concentration
20 25 25

Fertilizer
Fertilizer concentration
concentration

37
49

37
(b) y=4+3x
(c) y=4+3(12)=40
(b)
Answer
(c) to Unit test
Answer to Unit test
STUDENT HOURS SCORE
(X) (Y) X2 Y2 XY
A 1 1 1 1 1
B 1 3 1 9 3
C 3 2 9 4 6
D 4 5 16 25 20
E 6 4 36 16 24
F 7 5 49 25 35
G 8 7 64 49 56
H 8 8 64 64 64
2 2
X = 38 Y = 35 X =240 Y =193 XY= 209
N  XY   X  Y
r
[ N  X 2  ( X ) 2 ][ N  Y 2  ( Y 2 )]
8(209)  (38)(35)

[8(240)  (38) 2 ][8(193)  (35) 2 ]
342

151844
342

389.6717
 0.878
Interpretation: There is strong positive linear correlation between students study time and
Interpretation:
grade. There is strong positive linear correlation between students study time
and grade.

3850
Unit 3:

Elementary probability theory


3.0 Introduction

Statistical inference is based on probability theory. The aim of this unit is to intro-
duce to you basic elements of probability. In the unit you will learn about elementary
probability rules. You will actually learn about multiplication and addition rule. This
will be followed by law of total probability and the Bayes theorem. The Bayes rule
provides the starting point for Bayesian inference where in estimation of parameters
there is inclusion of prior knowledge.

3.1 Unit Objectives

By the end of this unit you should be able to:

a) define basic probability terms


b) apply basic probability rules eg compliment of an event rule to find probability
of an event.
c) apply multiplication rule to find probability of an event
d) use addition rule to find probability of an event
e) use the Bayes rule to find probability of an event

Key terms

• Probability
• Conditional probability
• Bayes rule

3.2 Basic probability terms

Random Experiment: Is the process of observing the outcome of a chance


event.
Elementary Outcomes: Are all possible results of the random experiment.
Sample Space: Is the set or collection of all the elementary outcomes.

Example
In throwing a coin, sample space is the set s={H,T} where H means head and T means
tail.
In throwing a die sample space is the set s={1,2,3,4,5,6}

51
Example
A family has three children. Write the sample space.

Solution
We may be helped by the tree diagram:

Fig 3.1: Sample space of three children


Fig 3.1: Sample space of three children
The sample space consists of eight possibilities. {BBB, BBG, BGB, BGG, GBB, GBG,
GGB, GGG}.
The sample The possibility
space consists BGB, for example,
of eight possibilities. {BBB,indicates that the
BBG, BGB, BGG,firstGBB,
bornGBG,
is a boy,
GGB,the second
GGG}. born
The a girl, and
possibility the third
BGB, a boy. indicates that the first born is a boy, the
for example,
second born a girl, and the third a boy.
Example
Example
Two dice are rolled. Write the sample space.
Two dice are rolled. Write the sample space.
Solution
Solution
We assume one ofone
We assume the of
dice
theisdice
red, and the and
is red, otherthe
green.
otherWe have We
green. the following 36
have the following 36
possibilities.
possibilities.
Green
Red 1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3)52 (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
Solution
We assume one of the dice is red, and the other green. We have the following 36
possibilities.
Green
Red 1 2 3 4 5 6
1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)
4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

The
The entry
entry (2,(2,
5),5),
forfor example,
example, indicates
indicates that
that thethe
redred
diedie shows
shows a two,
a two, andand
thethe green
green a 5.
a 5.

Event: event is
Event: An event is aasubset
subsetofofa asample
samplespace
space
or or
it isit an
is outcome
an outcome of interest.
of interest. For
For example if
example if you have thrown a coin twice, and you are looking at probability
you have thrown a coin twice, and you are looking at probability that you have two that you
have two heads(HH),
heads(HH), then yourthen your
event event
is the the40
twoisheads(HH).
two heads(HH).
n( E )
Probability:
Probability: P( E ) 
n( S )
Activity 3.1
Activity 3.1
A coin and a die are thrown, write the sample space.
A coin and a die are thrown, write the sample space.
3.3 Basic probability rules
3.3 Basic probability rules
Suppose P(A) is the probability of some event defined by A. P(A) is always in the interval
Suppose P(A) is the probability of some event defined by A. P(A) is always in the
[0, 1] i.e[0,
interval 0≤1] P(A)≤1.
i.e 0≤ Now P(A)Now
P(A)≤1. = 1 means
P(A) =the event occurs
1 means withoccurs
the event certainty
withandcertainty
P(A) = 0
means
and P(A) that=it0will certainly
means that itnot
willoccur. The not
certainly probability of all
occur. The events putoftogether
probability must
all events putadd up
to 1, so long
together mustasaddweupdon’t
to 1,double-count
so long as we bydon’t
including events that
double-count byoverlap.
including Forevents
example,
that
considerFor
overlap. a sample space
example, of an experiment
consider of throwing
a sample space a coin, S 
of an experiment H , T } . Now
of{throwing a coin, .
Now p(H)+p(T)=1.
p(H)+p(T)=1. The complement
The complement of event
of event A alsoAknown
also known
as notasevent
not event A is denoted
A is denoted by ~A (or in
by ~A (or in other books A’). This is the probability that A will not occur.
other books A’). This is the probability that A will not occur. P(~A) = 1 – P(A), and P(~A) = 1 –P(A) +
P(A), and
P(~A) = 1. P(A) + P(~A) = 1.
Activity 3.2
Activity 3.2
The metrological department says that probability that it will rain in cen-
tral region today is 0.8. What do you think is the probability that it will
Self- not rain today?

The metrological department says that probability that it will rain in central region today is
0.8. What do you think is the probability that it will not rain today?
53
3.4 Intersection and union of events
0.8. What do you think is the probability that it will not rain today?

3.4 Intersection and union of events


3.4 Intersection
The intersection of twoand union
events of events
is when two events both happen.
The intersection of two events is when two events both happen.

Fig 3.2: Union and intersection of events


We denote union of event A and B by A B, sometimes A or B, and we denote intersection
WeB,denote unionA of
andevent A and BofbytheA union
B, can
sometimes
be foundAusing
or B,this
andformula:
we denote

by A ∩ sometimes B. Probability
intersection by A ∩ B, sometimes A and B. Probability of the union can be found using
this formula: 41
P(AP(AB) =B)P(A)
∩ + P(B)
= P(A) – P(A
+ P(B) ∩ B)
– P(A Why
∩ B) do do
Why wewesubtract P(AP(A
subtract ∩ B) ? To
∩ B) keep
? To us us
keep from
from
double-counting. What if we have union of three events as follows (shaded area):
double-counting. What if we have union of three events as follows (shaded area):
P(A B) = P(A) + P(B) – P(A ∩ B) Why do we subtract P(A ∩ B) ? To keep us from
double-counting. What if we have union of three events as follows (shaded area):

Fig 3.3: Union of three events


Fig 3.3: Union of three events
P(A =p(A)+p(B)+p(C) .
P(A =p(A)+p(B)+p(C) .
In general if we have events
In general if we have events
,, then
then
∑∑ ∑
∑ ∑

If theIftwo events
the two A and
events B are
A and mutually
B are mutuallyexclusive,
exclusive, in the
the equation
equation
P(A U B) = P(A) + P(B) – P(A ∩ B) meaning54they cannot happen
P(A U B) = P(A) + P(B) – P(A ∩ B) meaning they cannot happenatatthe
thesame time,
same then
time, the the
then
formula
formula simplifies
simplifies to to
P(A U B) = P(A) + P(B).
If the two events A and B are mutually exclusive, in the equation
If the two events A and B are mutually exclusive, in the equation
P(AUUB)B)==P(A)
P(A P(A)++P(B)
P(B)––P(AP(A∩∩B) B) meaning
meaning they
they cannot
cannot happen
happen atat the
the same
same time,
time,then the
formula
then simplifies
the formula to
simplifies to
P(A U B) = P(A)
P(A U B) = P(A) + P(B).+ P(B).
Notethat
Note thatififevent
eventAAandandB Barearemutually
mutuallyexclusive
exclusivetheir
theirintersection
intersectionis is0,0,i.ei.e
AA∩∩B=0,B=0,seeseebelow
below

Fig 3.4: Mutually exclusive events.


In a more general case if for i=1, 2, 3,.., n are mutually exclusive i.e
their intersection is zero as shown below:

Fig 3.5: Mutually exclusive events

,since the intersections are impossible 42


∑ , in compact form.
This is a general form of addition rule of probability.
Example
Example
In aa class
In classthere
thereareare
170170
students, 60 take
students, 60 Mathematics and Statistics,
take Mathematics 40 take Mathematics
and Statistics, 40 take
only, and 50 take Statistics only, 60 take neither Mathematics nor Statistics.
Mathematics only, and 50 take Statistics only, 60 take neither Mathematics nor Find
(a) Probability
Statistics. Find of picking a Mathematics or statistics student at random
(b) Probability of picking a Mathematics student or student who neither take mathematics nor
(a) Probability of picking a Mathematics or statistics student at random
statistics
(b) Probability of picking a Mathematics student or student who neither take
Solution
mathematics nor statistics
Sketch of events:

Solution
Sketch of events:

55
statistics
Solution
Sketch of events:

(a)event
(a) Let the Let the event ofmathematics
of picking picking mathematics
student be student be M
M and that of and thatstatistics
picking of picking statistics
student be
student be S, then probability of picking Mathematics or statistics student
S, then probability of picking Mathematics or statistics student is denoted as p(M or S) is denoted as
sometimes p(M or U
p(M S)S)sometimes p(M U S)
which is defined aswhich is defined as
p(M U S)=p(M)+p(S)-P(M and S)
p(M U S)=p(M)+p(S)-P(M and S)
=100/170+90/170-60/170
=100/170+90/170 60/170
(b) Let event of neither M nor S be N, we need p(M or N) also denoted as p(M U N)
(b) Let event of neither
defined as p(MM U nor S be N, we need p(M
N)=p(M)+p(N)-p(M and N)or N) also denoted as p(M U N) defined
as p(M U N)=p(M)+p(N) p(M and
=p(M)+p(N), N) event M and N is impossible
because
=p(M)+p(N), because event M and N is impossible
=100/170+40/170
=100/170+40/170
ActivityActivity
3.3 3.3
Self- Let E1 be an event a cow gives birth to male cow and E2 be an event a cow

gives birth to a female cow, how can we find p(E1 E2)
Let E1 be an event a cow gives birth to male cow and E2 be an event a cow gives birth to a
female cow, how can we find p(E1 E2)
3.5 Conditional Probabilities
A3.5
conditional probability
Conditional tells you the probability of some event given that you already
Probabilities
A conditional
know probability
that another tells
event has you the
43 probability
occurred. The formula of for
some event given
conditional that you already
probability of A know
given B is:
that another event has occurred. The formula for conditional probability of A given B is:
P( A  B)
P( A | B) 
P( B)
What about conditional probability of B given A denoted as P(B|A)?
P( B  A)
P( B | A) 
We have P( A)

Example
Example
Consider situation above
Consider situation above

56
Example
Example
Consider situation above
Consider situation above

Find probability of picking a mathematics pupil given that he also take statistics
Find probability of picking a mathematics pupil given that he also take statistics
Solution
Solution
P( M  S )
We need P( M | S )  P( M  S )
We need P( M | S )  P( S )
P( S )
=60/170 90/170
=60/170 90/170
Example
Example
Example
The following table shows the distribution by gender of students at Bunda Campus who use
The following
The following table
tableshows
showsthethedistribution
distribution bygender
genderofofstudents
studentsatatBunda
BundaCampus
Campus who use
public transport and the ones who drive tobyschool. who
public
use transport
public andand
transport the ones
the oneswhowho
drive to school.
drive to school.

Male(M) Female(F) Total


Male(M) Female(F) Total
Public Transportation(P) 8 13 21
Public Transportation(P) 8 13 21
Drive(D) 39 40 79
Drive(D) 39 40 79
Total 47 53 100
Total 47 53 100

The events M, F, P, and D are self explanatory. Find the following probabilities.
The events M, F, P, and D are self explanatory. Find the following probabilities.
a. P(D | M) b. P(F | D) c. P(M | P)
a. P(D | M)
Solution b. P(F | D) c. P(M | P)
P(E  F)
We use the conditional probability formula P(E | F) = P(F) .

P(D  M) 39/100 39
a. P(D | M) = P(M) = 47/100 = 47 .
44
44
P(F  D) 40/100 40
b. P(F | D) = = =
P(D) 79/100 79 .

P(M  P) 8/100 8
c. P(M | P) = = =
P(P) 21/100 21 .
57
Multiplication Rule
Events are called independent if the occurrence of one does not affect the probability of the
 D)  P)40/100
P(F P(M 8/100 40 8
b. P(F | D) |=P) =P(D)
c. P(M = 79/100
= = .
P(P) 21/10079= 21 .

Multiplication
MultiplicationRule Rule P(M  P) 8/100 8
Events c. P(M | P) = = = . probability of
Eventsare arecalled
calledindependent
independentififthe theoccurrence
occurrenceP(P)ofofone
onedoes
doesnot
21/100 notaffect
21 the
affect the probability of the
the other. That is, P(A|B) = P(A).
other. That is, P(A|B) = P(A).
Multiplication Rule P( A  B)
SoSo are
Events P( Acalled
| B) independentifPthe ( A)occurrence of one does not affect the probability of the
P( B)
other. That is, P(A|B) = P(A).
This would mean, P(A ∩ B) = P(A) ∙ P(B). Thus the multiplication rules says that if A and B
This would P( A  P(AB) ∩ B) = P(A) ∙ P(B). Thus the multiplication rules says that if A
So Pare | B)  mean,events,
( Aindependent  Pthen
( A) you can multiply the probabilities together to get the joint
and B are P ( B )
independent events,
probability (the probability of the thenintersection).
you can multiply That the probabilities
is, P(A ∩ B) = P(A) together
∙ P(B)to get
Thisthe joint mean,
Inwould
generalprobability
P(A ∩(the
if events B) probability
for= of the
P(A)2,∙ 3,….,n
i=1, P(B). areintersection).
Thus That
then is,
the multiplication
independent, P(A
rules ∩ B)
says that=ifP(A) ∙ B
A and
P(B)
are independent events, then you can multiply the probabilities together to get the joint
probability
Example (the probability of the intersection). That is, P(A ∩ B) = P(A) ∙ P(B)
In general
In general if events for
if events A_ii=1,
for i=1, 2, 3,….,n
2,together
3,….,n are are independent,
independent, then
A die and a coin are thrown once, what is the then
probability of getting an even number
on a die and a head on a coin.
Example
Solution
A dieLetand a coin
event
Example are thrown
of even on die together
be denoted once, what
as E andishead
the probability of Then
on coin as H. gettingweanneed
evenevent
number
E and H
on aAalso
die
dieand
andaahead
denoted as on
coin ∩a H.
Eare coin.
thrown together
But these twoonce,
eventswhat is the probability
are independent, of can
that is, getting an even each
not influence
Solution
number on aare
other since dieoccurring
and a head onon a coin.
two different objects. Thus
Let event
P(E ∩ofH)=P(E)×P(H)
even on die be denoted as E and head on coin as H. Then we need event E and H
Solution
also denoted as E ∩ H./2But these two events are independent, that is, can not influence each
=3/6
Let event
otherExample of even on die
since are occurring be denoted
on two differentasobjects.
E and head
Thuson coin as H. Then we need event
E
P(E Theand H also
∩ H)=P(E)×P(H) denoted as E ∩ H. But these two events are independent, that is, can not
probability that Jaime will visit his aunt in Mzuzu this year is 0.30, and the probability
influence each other since are occurring on two different objects. Thus
that∩he=3/6
will go/2river rafting on Rukuru river is 0.50. If the two events are independent, what
P(E H)=P(E)×P(H)
Example
is the probability
=3/6×1/2that Jaime will do both?
TheExample
probability that Jaime will visit his aunt in Mzuzu this year is 0.30, and the probability
thatThe
he will go riverthat
probability rafting
Jaimeon will
Rukuruvisitriver is 0.50.
his aunt If the two
in Mzuzu events
this year is are independent,
0.30, what
and the prob-
is the probability
ability that he that
will Jaime
go riverwill do both?
rafting on Rukuru river is 0.50. If the two events are inde-
pendent, what is the probability that Jaime will do both?

Solution:
Let A be the event that Jaime will visit his aunt this year, and R be the event that he will
go river rafting. We are given P(A) = 0.30 and P(R) = 0.50, and we want to find P(A
R). Since we are told that the events A and R are independent,
P(A R) = P(A)× P(R) = (0.30)(0.50) = 0.15. 45

Example
Given P(B | A) = 0.4. If A and B are independent, find P(B).
45
Solution
If A and B are independent, then by definition P(B | A) = P(B). Therefore, P (B) =0 .4

58
Given P(B | A) = 0.4. If A and B are independent, find P(B).
Solution
If A and B are independent, then by definition P(B | A) = P(B). Therefore, P (B) =0 .4
Activ Activity 3.4
Activity 3.4
Self-
A cow gives birth twice in ten years. How can you get probability of a
female
A cow gives birth
birth at second
twice birth given
in ten years. a male
How can youbirth at first birth?
get probability of a female birth at second
birth given a male birth at first birth?
3.6 Law of total probability and Bayes rule
3.6 Law of total probability and Bayes rule
3.6.1 Law
3.6.1 Law of
of total
total probability
probability
Let the sample space S be partitioned by events A1 , A2 , . . . An . i. e
S  A1  A2 .....  An and Ai  Aj   for i  j. Let B, be an event in S such that
B  Ai , that is,

Fig 3.6: Partition of S by events


We have p(B)=p(B|A1)p(A1)+p(B|A2)p(A2)+……+p(B|An)p(An)

46

59
Proof:
Proof:
Note B=B
P(B)=p(B B , by addition rule of
Proof:
probability. Note the last part is 0 as for i=1,2,3,…..n are disjoint
Note Proof:
=B=B
p(B B ,
P(B)=p(B
Proof:Proof:
Note B=B B , by addition rule of
Proof: =Note
probability. p(B|A B=B 1)p(A
Note 1)+p(B|A
the last part 2)p(A
is 0 )+……+p(B|A
2as for n)p(An), Since
i=1,2,3,…..n are P(B|A)=
disjoint
Note P(B)=p(B
B=B B , by addition rule of
Note=P(B)=p(B
B=B
p(B
which meansNote P(Bthe last part isB|0 Bas B , ,disjoint
by addition rule
probability.
P(B)=p(B for i=1,2,3,…..n are , by addition rule of of
P(B)=p(B probability. Note the last part B
is 0 as for i=1,2,3,…..n ,areby addition
disjoint rule of
=∑ = p(B|A
probability. = p(B 1)p(A | 1)+p(B|A
Note the last,2in )p(A
part compact 0 as form.
2is)+……+p(B|A B forn)p(A n), Since P(B|A)=
i=1,2,3,…..n are,disjoint
probability. = p(B Note the last part is 0 as B for i=1,2,3,…..n are disjoint , P(B|A)=on probability of all
In law =ofp(B
which =total
p(B|A probability
)p(A )+p(B|A the 2total )p(A | B probability
)+……+p(B|A of nan )p(A event
n), Since B, depends
= p(B means1 P(B 1 B2 ,
In law
other of
=∑events =total
= p(B|A p(B|A
which |1probability
that )p(A 1)p(A
results
means 1)+p(B|A
1)+p(B|A , in
P(B the
into 2total
its
2)p(A
compact )p(A probability
)+……+p(B|A
occurrence.
2)+……+p(B|A
2form.
| nof
)p(A nan
)p(A nevent
), Since
), nSince BP(B|A)=
depends
P(B|A)=on probability of
all other = p(B|Aevents 1)p(A that 1)+p(B|A
results 2)p(A
into 2)+……+p(B|A
its occurrence. n)p(An), Since P(B|A)=
Example
In law of =∑
whichtotal
which probability
means means | P(B P(Bthe ,total in compact probability
| | form. of an event B depends on probability of all
The which
otherfollowing
events means
that P(B
contingency ,table |
gives the results of of an operations in a hospital according of allto the
In
=∑law =∑ of |results
total into
|probability , inits occurrence.
inthe
compactcompact
total form.
probability
form. event B depends on probability
Example
Example
complexity=∑ oftotal |
the operation. , in compact form.
In In other
law law of
ofevents
total probability
that
probability resultsthe intothetotal total
its probability
occurrence.
probability of of an an event event B depends
B depends on on probability
probability of ofall all
The
The following
In law of
following
other total
events contingency
probability
contingency
that results the table
table
intototal gives
its gives
probability
the
occurrence. the
resultsresults
of an
of of operations
event
operations B depends
in a in
on
hospital aprobability
hospital
according according
of to all
the
other events that results into its Simple
Example occurrence. Complex Total
tocomplexity
theExample
other complexity
events of that
the of theinto
results
operation. operation.
its occurrence.
ExampleThe
Successfulfollowing contingency table 1990 gives the results of operations
950 in a hospital according 2940to the
Example The following of contingency Simple
table Complex Total
The complexity
following
Unsuccessful the operation.
contingency table 10gives
gives thethe results
results of of operations
operations 50 in ainhospital a hospital according
according to the
to the
60
The complexity
following
Successful contingency
of the operation. table1990 gives Simple the results of operations
950 Complex in a hospital according
2940 Totalto the
complexity Total of the operation. 2000 1000 3000
complexity
Unsuccessful of the operation.
Successful 10 Simple
1990 50 Complex950 60 Total
2940
Simple Complex Total
SolutionTotal Simple
2000 Complex
1000 Total2940
3000
Successful
Unsuccessful
Successful 1990 199010 950950 50 294060
Let A
Solution be the
Successful
Unsuccessful
event that an operation 1990 10 is simple and B the
950 event
50 that an operation2940 is successful.
60
Solution Unsuccessful Total C 102000 C 501000 603000
PLet(A)=P(A|B)×P(B)+P(A|
AUnsuccessful
be the event that an operation
Total B )×p(B 10is2000 ) , because
simple and B the forevent A50to1000 occur
that anits either due
operation 60to3000 occurrence
is successful.
Let A Solution
be Total
the event that an 2000 1000 3000
C operation is simple and B the event that an operation is
of B Solution
or
Let not
Total
P (A)=P(A|B)×P(B)+P(A|
A be B.the event thatB an)×p(B operation
C
2000) , because is simple forand A toB1000 occur
the eventits either
that andue 3000is successful.
to occurrence
operation
Solution
successful. P (A)=P(A|B)×P(B)+P(A| BC )×p(BC) , because for A to occur its either
Solution
of B=Letor not
A be B. the event that an C
operation C
is) simple
due Let
to PA (A)=P(A|B)×P(B)+P(A|
be
occurrence the event of that
B orannot B
operation
B. )×p(B is simple , becauseandand Bfor B the
the toevent
Aevent occurthatthat an an
its operation
either
operation due is is successful.
tosuccessful.
occurrence
Let =A Pof be the event
(A)=P(A|B)×P(B)+P(A|
B or not that an operationB C
)×p(B is simple
C
) , and
because B the
for event
A to that
occur anits operation
either due istosuccessful.
occurrence
ruleB.
C C
3.6.2P (A)=P(A|B)×P(B)+P(A|
Bayes B )×p(B ) , because for A to occur its either due to occurrence
P (A)=P(A|B)×P(B)+P(A|
of B or not B. BC )×p(BC) , because for A to occur its either due to occurrence
of BBayes
3.6.2
The or =notrule B.
of BBayes or not rule B. of probability was coined by Thomas Bayes (1761).
The Bayes =
= Bayes
3.6.2 rule ofrule probability was coined by Thomas Bayes (1761).
The Bayes = theorem is stated as follows: suppose A1 , A2 , . . . An again partition S i. e
3.6.2
The 3.6.2 Bayes
Bayes
3.6.2
The BayesBayes
Bayes rule
theorem rule rule
rule isofstated
probabilityas follows: was coined suppose by AThomas, ,
1 A2 . .Bayes . An again (1761). partition S i. e
The3.6.2
STheBayesABayes
The
The Bayes rule
Bayes
.....
rule
A2..... of
rule A and
ofAprobability
theorem of probabilityAi 
isAstated was Awas
as

coined
coined
follows:
for i
by
by 
suppose
j.
Thomas
Thomas Let ,A B ,
Bayes
Bayes be
, .event an event
(1761).
(1761). inpartition
S such that S i. B e 
S  A 1Bayes
 A rule  probability
nand
 A was  coined
j for i by
 jThomas
. Let B A,Bayes
be an (1761).
. . A n again
in S such that B 
TheTheBayes Bayes rule
theorem of probability
isis i was j coined by Thomas Bayes (1761). 1 2
bestated as follows: suppose
1 2 n
A The
i The SBayes .
Bayes
Now A  Btheorem
theorem
A may ..... 
isoccur
stated
A
stated
and asafter
A
as follows:
follows:
Aoccurrence
suppose
forofi Aof
suppose A
j,,.1A,A
A A11,,2,,A
Let AAB , ..A. 3A
2. 2.,,be .,…..,
an n again
Aevent
n again or partition
inASnthus
partition andbySthus
such
S
i. eB
that
i.by
e law of

AThe
i Bayes
. Now B
1theorem may 2 be occur
is stated n after i 
as follows: j suppose
occurrence ,
A1 A2 . . . An
1 2 3 ,….., or
againA n and
partition S i.law
e of
total probability
SiA1  A1  A2 ..... P(B)=
AP(B)=
2 
.....  p(B|A
Aand and )p(A
A j)+p(B|A )+p(B|A

Ajoccurrence for
for2i)p(A i )p(A
 j .
 j.2)+……+p(B|A
Let )+……+p(B|A
Let B , be an event )p(A
in S ).
such that B 
totalS probability An p(B|A A1)p(A of BAB ,, be an eventn)p(Ain n). S such that Bby  law of
1 1 2 2 n n
A . Now B may i i A1after
andbeAioccur Aan 2, A 3,….., inorS A n and thatthus
n
Now S  A 
suppose A .....
B has A occurred,  Aafter  Afor i  j. Let
occurred, we be
,1want event
probability such
of any B
of the event Ak
Now Ai A suppose
1
total . Now
i . probability
Now
2 BBhas B occurred,
may may
n
be be
P(B)= occur after
occur
p(B|A after
j Akoccurrence
1after
)p(A occurred,
k
occurrence
1)+p(B|Aof
we
2)p(AAwant
of probability
A)+……+p(B|A
1,2A 2,3A 3,….., orofor any
An nof
)p(A and
). thethus
event
byby A
lawkof of
1,2A ,A ,….., nA n and thus law
for
Ai k=1,2,3,…,n
for . Now
k=1,2,3,…,n
total
Now probability
suppose B may that
that B madebemade
P(B)=
has occur B
Bp(B|A
occurred, to
toafteroccur occur
1)p(A
occurrence
after given Agiven
1)+p(B|A
of
that)p(AthatAhas
B)p(A 1,Bwe A has
occurred,Aoccurred,
2, want
2)+……+p(B|A 3,….., or Andenoted
denoted n)p(Aand anyas
).thus
asnp(A byp(A
k|B). law
This |B).
of This
kevent
total probability P(B)= p(B|A 1)p(A 1)+p(B|A k occurred,
2 2 2)+……+p(B|A probability
n)p(An).of of the Ak
probability
totalNow
probability
Now
probability
forsuppose isis
suppose
k=1,2,3,…,n found
found P(B)=
B has by
B has by the
that the
p(B|A
made B
occurred, Bayes
Bayes
occurred, 1 )p(A Theorem:
after to 1 Theorem:
)+p(B|A
after AkAoccurred,
occur k given
2 )p(A
occurred, )+……+p(B|A
wewe
that
2 Bwant haswant occurred,
probability n )p(A
probability ).
of of
denoted
n
anyany as of
of p(A
the the event
k|B).
event AkAk
This
Now forsuppose
probability
| | B has occurred, after Ak occurred, we want probability of any of the event Ak
k=1,2,3,…,n is that
found that made
by the B to
Bayes occur given that B has occurred, denoted as p(A |B). This
p(A
p(A |B)=
forkk|B)=
k=1,2,3,…,n made B to occurTheorem: given that B has occurred, denoted as p(Ak|B). k This
for k=1,2,3,…,n
probability is |that
found made by Bthe toBayesoccur Theorem: given that B has occurred, denoted as p(Ak|B). This
probability
p(A | is found by the Bayes Theorem:
probability ==∑k|B)= is||found| | by the Bayes Theorem:
p(A
p(Ak|B)= ∑
k |B)= | | |
p(Ak|B)=
Note p(B) has =∑ to be |total probability of B. The Bayes theorem seeks to find the probability of
Note p(B) =has | to| be total probability of B. The Bayes theorem seeks to find the probability of
causeNote (Ak=)∑p(B)given ∑| has the effect
|to be total (B).probability of B. The Bayes theorem seeks to find the probability of
cause (A =∑k) given | the effect (B).
Note Note
causep(B) p(B)
(Ahas k) has
|
given
to be to the be
totaltotal
effect probability
(B). of of
probability B. B. The The Bayes Bayes theorem
theorem seeks seeks to find
to find thethe probability
probability of of
Notecause
Example p(B) (A has to
k) given
be total probability of B. The Bayes theorem seeks to find the probability of
cause (Ak) given
Example thethe effect
effect (B). (B). 60
cause
In a binary (A
Example k ) given
communication the effect (B).
system a zero and a one are transmitted with probability 0.6 and
In arespectively.
binary communication
0.4Example Due to error system in the a zero and a one
communication are transmitted with probability
with a 0.60.6 and
Example
In a binary communication system a zero and asystem one areatransmitted zero becomes with a one
probability and
Note p(Bt) has to be total probability of B. The Bayes theorem seeks to find the prob-
ability of cause (Ak) given the effect (B).

Example
In a binary communication system a zero and a one are transmitted with probability 0.6
and 0.4 respectively. Due to error in the communication system a zero becomes a one
with
(i) ofa receiving
probability 0.1 and
a one and a one becomes a zero with a probability 0.08. Determine the
probability
(ii)of
that a one was transmitted when the received message is one.
(i) receiving a one and
(i) of receiving a one and
(ii) that a one was transmitted when the received message is one.
(ii) that a one was transmitted when the received message is one.
Solution
Solution
Let S be the sample space corresponding to binary communication. Suppose T0 be event of
Let S be the sample space corresponding to binary communication. Suppose T0 be event of
transmitting 0 and T1 be the event of transmitting 1 and R0 and R1 be corresponding events
transmitting 0 and T1 be the event of transmitting 1 and R0 and R1 be corresponding events
of receiving 0 and 1 respectively.
of receiving
Given P(T0 0) and 1 respectively.
0.6, P(T1 )  0.4, P( R1 / T0 )  0.1 and P( R0 / T1 )  0.08.
Given P(T0 )  0.6, P(T1 )  0.4, P( R1 / T0 )  0.1 and P( R0 / T1 )  0.08.
(i) P ( R1 )  Probabilty of receiving 'one'
(i) P ( R1 )  Probabilty of receiving 'one'
 P (T1 ) P ( R1 / T1 )  P (T0 ) P ( R1 / T0 )
 P (T1 ) P ( R1 / T1 )  P (T0 ) P ( R1 / T0 )
 0.4  0.92  0.6  0.1
 0.4  0.92  0.6  0.1
 0.448
 0.448
(ii) Using the Baye's rule
(ii) Using the Baye's rule
P (T ) P ( R / T )
P (T / R ) P (T1 )1P ( R1 /1 T1 )1
P (T11/ R11) 
PP(R ( R)1 )
1

PP (T(T )1P)P( R( R/1T/1 T


) 1)
 P (T ) P ( R /1 T ) 1P (T
P (T1 )1P ( R1 /1 T1 )1 P (T0 ) 0P)(P R(1 R1 /)
/T 0
T0 )
0.4
0.4  0.92
0.92

0.40.92
0.4 0.92 0.60.6  0.10.1
 0.8214
 0.8214
Example
Example
In an
In anelectronics
electronicslaboratory,
laboratory,therethere areare identically
identically looking
looking capacitors
capacitors of three
of three makesmakes
AA11,,AA22 and
andAA3 3ininthe
theratio
ratio2:3:4.
2:3:4. It It
is is
known
known thatthat
1%1% A1 , 1.5%
of of of Aof
A1 , 1.5% 2 and 2% of
A2 and 2%A3ofare
A3 are
defective.
defective.What Whatpercentage
percentageofofcapacitors
capacitors in in
thethe
laboratory
laboratory are are
defective?
defective? If a capacitor
If a capacitor
picked at defective is found to be defective, what is the probability
picked at defective is found to be defective, what is the probability it is of make it is of make A3 ? A ?
3
Solution
Solution
Let
Let DD be
bethe
theevent
eventthat
thatthe
theitem
itemis is
defective.
defective.Here wewe
Here have
haveto find
to find
22 1 1 4 4
P( D) and P( A3 / D). Here P( A1 )  P( A2 ) P( A3 )
P( D) and P( A3 / D). Here 
 P( A1 ) 9 , , P( A2 )3 and

and
 P( A3 )9 . .
9 3 9
The conditional probabilities are:
The conditional probabilities are:
P( D / A1 ) 0.01,
  P( D / A2 ) 0.015 and P( D / A3 ) 0.02.
P( D / A1 ) 0.01,
  P( D / A2 ) 0.015
and P( D / A3 ) 0.02.

 P ( D ) P ( A1 ) P ( D / A1 )  P( A2 ) P( D / A2 )  P( A3 ) P( D / A3 )

 P ( D ) P ( A1 ) P ( D / A1 )  P( A2 ) P( D / A2 )  P( A3 ) P( D / A3 )
2 1 4
 2 0.01  1  0.015  4 0.02
 9  0.01 3  0.015  9  0.02
61
9
 0.0167 3 9
and  0.0167
Let D be the event that the item is defective. Here we have to find
2 1 4
P( D) and P( A3 / D). Here P( A1 )  , P( A2 ) and
 P( A3 ) .
9 3 9
The conditional probabilities are:
P( D / A1 ) 0.01,
  P( D / A2 ) 0.015 and P( D / A3 ) 0.02.

 P ( D ) P ( A1 ) P ( D / A1 )  P( A2 ) P( D / A2 )  P( A3 ) P( D / A3 )
2 1 4
  0.01   0.015   0.02
9 3 9
 0.0167
and
P ( A3 ) P ( D / A3 )
P ( A3 / D ) 
P( D)
4
 0.02
 9
0.0167
 0.533

Activity 3.5
48
A chicken has birds’ flue in Malawi if it is from Madagascar or Indonesia.
It is known that 1% chickens are from Madagascar and 15% chickens
are from Indonesia. The probability that a chicken has flue that it is from
Madagascar is 0.2 and that it has flue given it is from Indonesia is 0.06.
Find probability that chicken has flue.

3.7 Practice activity


In a recent survey in a Statistics class, it was determined that only 60%
of the students attend class on Thursday. From past data it was noted that
98% of those who went to class on Thursday pass the course, while only
20% of those who did not go to class on Thursday passed the course.
(a) What percentage of students is expected to pass the course?
(b) Given that a student passes the course, what is the probability that
he/she attended classes on Thursday.

3.8 Reflection
Which of the two rules, Bayes rule or Law of total probability is used to find probability
of cause given effect?

Unit Summary

In this unit you have learnt basic elements of probability theory. You have learnt that
probability of an event is always positive and that it is less than or equal to one. You
have also learnt about basic rules of probability like multiplication and addition rule,
law of total probability and the Bayes theorem. What you have learnt in this unit forms
the foundation to further probability theory and statistical inference like Bayesian in-
ference which stems from Bayes rule.
62
End of unit test

1.Consider two events: E and F. We know that P(E)=P(F)=0.7.


a) Are the two events E and F disjointed?
b) If E and F were known to be independent, what is the probability that
they occur simultaneously?
c) If E and F were known to be independent, what is the probability that at
least one of them occurs?
d) Suppose we know that P(F|E)=0.9. What is the probability that at least
one of them occurs?
e) Suppose we know that P(F|E)=0.9. What is P(E|F)?

2. A prisoner has just escaped from jail. There are three roads leading away from
the jail. If the prisoner selects road A to make good her escape, the probability
that she succeeds is 0.25. If she selects road B, the probability that she succeeds
is 0.2. If she selects road C, the probability that she succeeds is (1/6).
Furthermore, the probability that she selects each of these roads is the same. It
is (1/3). If the prisoner succeeds in her escape, what is the probability that she
made good escape by using road B?

63
jail. IfIfthe
e) Suppose
jail. Ifroad thewe know that
prisoner P(F|E)=0.9.
selects road A thetoWhat
make isgood
P(E|F)? hersheescape, the probab
jail. theprisoner
prisoner selectsselects A roadto make
A togood make hergood
escape, probability
her escape, thethat
probability that she
2. A prisoner
succeeds is 0.25. If succeeds has
she selects isjust roadescaped
0.25.
B, the from
If probabilityjail.
she selectsthat There
road
she B, are three roads
the probability
succeeds leading
is 0.2. If she away
that she from
succeed
succeeds is 0.25. If she selects road B, the probability that she succeeds is 0.2. If she
selects road jail. If the
C, the prisoner
selects
probability roadthatselects
C,
shethe road A
probability
succeeds to make
is (1/6). that good hertheescape,
she succeeds
Furthermore, the probability
is (1/6).
probability Furthermore, tha
selects road C, the probability that she succeeds is (1/6). Furthermore, the probability
that she selects
Answers succeeds eachthatofisthese
0.25.
she roads If she
selects selects
is the
each same.of road B,
It is (1/3).
these the
roads probability
If the
is prisoner
the same. that
It isshe
succeeds in succeeds
(1/3). is 0.2.
If the prison
that sheto
her escape,
unit
selects
what is
activities
each
the of these that
probability roads she ismade
the good
same. It is by
escape (1/3).
using Ifroad
the B?
prisoner succeeds in
selects road C, the probability
her escape, what is the that she succeeds
probability that she is (1/6).
made Furthermore,
good escape by theusing
prob
her escape, what is the probability that she made good escape by using road B?
Answers to Unitthat she selects each of these roads is the same. It is (1/3). If the prisoner succe
3 Activities
Answers to Unit 3 Activities
Answer to Activity
Answer totoActivity her
Answers3.1 towhat
escape, Unitis3 the probability that she made good escape by using road B
Activities
Answers Unit 33.1 Activities
Answer to Activity 3.1 die
Answer to Activity 3.1
Answers to 1Unit 3 Activities 2 3 4 5 6 die
coin Answer H H,1 H,2 H,3 H,4 die H,5 H,6
to Activity 3.1 1 2 3 4 5
T H,1 1 H,2 2 H,3 3 H,4 4 H,5 die H,6 5 6
coin H H,1 H,2 H,3 H,4 H,5
coin
The sample H space is the setH,1 H,2 H,3 H,4 H,5 H,6
The sample T space is theH,1 set T 1 H,1 2 H,2 3 H,3 4 H,4 5 H,5 6
S={(H,1),(H,2),(H,3),(H,4),(H,5),(H,6),(T,1),(T,2),(T,3),(T,4),(T,5),(T,6)}
H,2 H,3 H,4 H,5 H,6
Answer tocoin Hsample space H,1is the set H,2
S={(H,1),(H,2),(H,3),(H,4),(H,5),(H,6),(T,1),(T,2),(T,3),(T,4),(T,5),(T,6)}
Activity The 3.2 H,3 H,4 H,5 H
The sample space is the set
T today is H,1 H,2
Probability of not S={(H,1),(H,2),(H,3),(H,4),(H,5),(H,6),(T,1),(T,2),(T,3),(T,4),(T,5),(T,6)}
learning H,3 H,4 H,5 H
S={(H,1),(H,2),(H,3),(H,4),(H,5),(H,6),(T,1),(T,2),(T,3),(T,4),(T,5),(T,6)}
Answer
Answer to to Activity
Activity 3.33.2
The sampleAnswer spacetoisActivitythe set 3.2
Answer
Probability to Activity
of notProbability 3.2
learning today since the intersection part is 0 i.e we cannot have birth a male
of notis learning
1-0.8=0.2
S={(H,1),(H,2),(H,3),(H,4),(H,5),(H,6),(T,1),(T,2),(T,3),(T,4),(T,5),(T,6)}
today is
Probability of not learning today is
at same time a female.
Answer to
Answer
to Activity
to3.4Activity
Answer 3.2
to Activity 3.3
Answer
Answer toActivity
Activity 3.3
3.3
Probability of
Let F be event of female birth at second not learning today
birth isM ansinceevent the intersection part is We
0 i.e we cannot ha
since
since theand
the intersection
intersection ofpart
part male
isis0birth
0i.ei.eatwe
first
we birth.
cannot
cannot havehavebirth a male
need Answer
| which atto is Activity
same |time 3.3
a female. since these two events are independent.
birth
atAnswer a male
same time at same
a female. time a female.
Answer
to Activity
to Activity
3.5
Answer 3.4 to Activity 3.4since the intersection part is 0 i.e we cannot have birth
Answer
Let B beat to
eventActivity
that chicken3.4 has flue, and let A1 be event of chicken from Madagascar, and A2
Let F same Let
be event
event time F abefemale.
event of at female birth at second
and Mbirth an and M of an event
male of male birth at
Let
be an F be
event ofoffemale
that chicken
female birth
is from
birthat second
Indonesia.
second
Webirth
birth event
need and M an event of male birth at first birth. We
birth
atNow Answer
first birth. We need to Activity
p | 3.4
which is | since
since these two events events areareindependen
need | which| is | | since these two events are independent.
independent. Let F beAnswer event oftofemale birth
Activity 3.5 at second birth and M an event of male birth at first bir
Answer to Activity 3.5
need Let | B be which event is that |chicken has flue, sinceand these let two
A1 be events
eventare of independent.
chicken from Mad
Let
Answer B be event that chicken has flue, and let A be event of chicken from Madagascar, and A2
Answer to to Activity 3.5 1
Answer
Unit Activityto Activity 3.5
be an event that chicken is from Indonesia. We need
beA1an
Let B event
be
: the event
students thatthatchicken
attend chicken isonfrom
classthat has Indonesia.
flue,
Thursday and let We A1 be needevent of chicken from Madagas-
Let B be Nowevent chicken | has flue, and let|A1 be event of chicken from Madagascar
Now
car,
A2:and Abe be andoevent
the students |
not attend thatclasschicken |
is from Indonesia. We need p(B)
on Thursday
2 an event that chicken is from Indonesia. We need
NowB1: the p(B)=p(B│A
students pass1the )p(A course )+p(B│A_2 )p(A_2)
Now 1 | |
B2: the students =0.2×0.01+0.06×0.15
do not pass the course
.6, P AAnswer
(a) P A1   0=0.011   1  P Ato Unit 0.4, PActivity
B1 | A1   0.98, PB1 | A2   0.2
Answer to Unit 2Activity1
PB   P A1 A P1:Bthe 1 | A 1   P  Aattend
students 2 P  B1class | A2 on Thursday
A1: the1 students attend class on Thursday
Answer
Answer to Activity 0.6  0.98 to
A2: 3.5Unit
0.4 Activity
0.2
the students do not attend class on Thursday
A : the students
 0.668 do not attend class on Thursday
A12: the students
A1: the attendstudents
B1: theclass attend
students class
pass on
on Thursday theThursday
course
B 1: the students pass the course
A2: the studentsA2: the do not
students attend class
B2: the students do not pass on
do not attendon Thursday
class theThursday
course
B : the students
B1: the students do not pass the course
2
B1: the pass students the course
pass the course
B(a) P A1students
: the B 0: .the
6, Pdo(a)
A2notP A1pass
students
1   0.6, P A2   1  P A1   0.4, PB1 | A1   0.98, PB1 | A2   0.2
 P A 1 
the 0.4, PB1 | A1   0.98, PB1 | A2   0.2
course
2 2 PB1  do  P not Apass the course
1 P B1 | A1   P  A2 P  B1 | A2 
PB1   P A1 PB1 | A1   P A2 PB1 | A2 
(a) P A1   0.6, P  A20.6  10.98P A1  0.4 0.4, P0.2B1 | A1   0.98, PB1 | A2   0.2
 0.6  0.98  0.4  0.2 50P A PB | A 
PB1   P A 1 P
0.668
B1 | A1   2 1 2
 0.668
 0.6  0.98  0.4  0.2
 0.668
b) By Bayes’ theorem,

64 50
50
50
(b) By Bayes’ theorem,
P ( A1 ) P ( B1 | A1 )
P ( A1 | B1 ) 
P( A1 ) P ( B1 | A1 )  P ( A2 ) P ( B1 | A2 )
0.6  0.98

0.6  0.98  0.4  0.2
 0.854

Answers to Unit test


1. (a) No; P(E)+P(F)=1.4;
(b) P(E and F)=P(E|F)×P(F)=P(E)×P(F)=0.7×0.7=0.49
(c) P(E or F)=P(E)+P(F)-P(E and F)=0.7+0.7-0.49=0.91;
(d) P(E or F)=P(E) + P(F) - P(E and F) = P(E) + P(F) - P(E|F)×P(F) = 1.4 - 0.9×0.7 =
0.77
(e) P(E|F)=P(E and F)/P(F)=P(F|E)×P(E)/P(F)=(0.9×0.7)/0.7 =0.9
2. We will use Bayes’ Rule:
P(road B|succeeds)=P(succeeds|road B)×P(road B)/{P(succeeds|road A)×P(road A)+
P(succeeds|road B)×P(road B)+P(succeeds|road C)×P(road C)}=
=(1/5)(1/3)/{(1/4)×(1/3)+(1/5)×(1/3)+(1/6)×(1/3)}=12/37

65
Unit 4:

Random variables and probability distribution

4.0 Introduction

In this unit you will learn about random variables. Specifically, you will learn what
a random variable is, and types of random variables. In the unit you will also learn
about some discrete and continuous random variables. The discrete random variables
that you will learn about are Binomial, Poisson, mulitinomial and hyper geometric
random variables while continuous random variables are normal, t, chi-square, and f
random variables. This will be followed by the sampling distribution of sample mean,
and related quantities. The unit is important as it prepares you to statistical inference
in terms of hypothesis testing and interval estimation where there is use of z-test, t-test,
chi-square test and f-test.

4.1 Unit Objectives

a) understand the meaning of random variable and its distribution


b) identify the name of discrete random variable e.g Poisson
c) find probability of the discrete random variable
d) apply properties of continuous distribution to solve probability
problem
e) understand the normal random variable
f) find probability of standard normal using standard normal tables
g) use standard normal to find probability of normal random variable
h) state sampling distribution of the sample mean under different sampling
scenarios.

Key terms

• Random variable
• Probability distribution
• Sampling distribution

4.2 Random variable and probability distribution

Definition (random variable): A random variable is a numerical description


of the outcome of an experiment.

66
Example

1. Consider an experiment of tossing a coin, then sample space is the set


S= (H, T). Now let X=1 if H appears and let X=0 if T appears, then X is a
random variable.
2. Consider outcome of number of HIV infections in a year, let X be such number,
then X=0,1,2,3,… is the random variable.
3. Consider the experiment of observing students height in class, let X be height
of the student, then X=1cm, 2.5cm,…, then X is a random variable.

There are two common types of random variables. They are:

Discrete random variable: a quantity that assumes either a finite number of values or
an infinite sequence of values, such as 0, 1, 2, , such as number of HIV infections in a
year, number of children a family has in life time.
Continuous random variable: a quantity that assumes any numerical value in an
interval, such as time, height, weight, distance, and temperature.
We will denote name of variable by capital letter e.g X and value of random variable
by small letter e.g x
Definition (probability distribution (or density) function (p.d.f)): a function that
describes how probabilities are distributed over the values of the random variable.

Example
Example
Consider an experiment of tossing a coin, let X be outcome in the experiment, find
probability
Consider distributionofof tossing
an experiment X. a coin, let X be outcome in the experiment, find
probability distribution of X.
Solution
Solution
X H T
P(X=x) 0.5 0.5

Note we have used a table to show how probabilities are distributed over the values of the
Note we have used a table to show how probabilities are distributed over the values of
random variable
the random X.
variable X.
We can also just define the probabilities for each value of the random variable to show
We can also just define the probabilities for each value of the random variable to show the
the distribution of probabilities for each value of random variable
distribution of probabilities for each value of random variable
OR
OR
{
S={H,T} }
P(X=H)=0.5
P(X=H)=0.5
P(X=T)=0.5
P(X=T)=0.5
ORWe
OR Wemaymayuseuse
thethe formula
formula to define
to define probability
probability distribution
distribution for X for X
P(X=x)=0.5, where X=H,T
67
Example
Find probability distribution when tossing a die
OR
OR{ }
P(X=H)=0.5{ }
P(X=T)=0.5
P(X=x)=0.5,
P(X=H)=0.5where X=H,T
ORP(X=T)=0.5
We may use the formula to define probability distribution for X
Example
P(X=x)=0.5,
OR We may where useX=H,T
the formula to define probability distribution for X
Find
P(X=x)=0.5, where X=H,T when tossing a die
Example probability distribution
FindExample
probability distribution when tossing a die
Solution
Solution
Find probability distribution when tossing a die
X
Solution 1 2 3 4 5 6
P(X=x)
X 1/6 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6
Or P(X=x) 1/6 1/6 1/6 1/6 1/6 1/6
P(X=x)=1/6,
OrOr where X=1,2,3,4,5,6.
Now, sometimeswhere
P(X=x)=1/6,
P(X=x)=1/6, may present
where the probability distribution by using the density plot,by plotting
X=1,2,3,4,5,6.
X=1,2,3,4,5,6.
Now,
the sometimes
probabilities may
against present
their
Now, sometimes may present the valuesthe probability
ofprobability distribution
random variable
distribution by using
by using the density
the density plot,byplot,by
plotting
plotting
Example the probabilities against their values of
the probabilities against their values of random variable random variable
TheExample
following probability distribution table shows the probability of selecting 0, 1, 2, 3, or 4
Example
red The
chipsfollowing
when 4 chips are selected. Represent the probability distribution using a0,histogram
probability distribution table shows the probability of selecting 1, 2, 3, or 4
The following probability distribution table shows the probability of selecting 0, 1, 2,
plot.red chips when 4 chips are selected. Represent the probability distribution using a histogram
3, or 4 red chips when 4 chips are selected. Represent the probability distribution using
plot.
a histogram plot.
x P (x)
0 x0.24 P (x)

1 00.412
0.24

2 10.265
0.412

3 20.076
0.265

4 30.008
0.076
4 0.008

ution Solution

53
53

Fig 4.1: Probability density plot


s there are three ways to display a probability distribution
68for a discrete random
able:
Fig 4.1: Probability density plot
Thus there are three ways to display a probability distribution for a discrete random
Thus there are three ways to display a probability distribution for a discrete random
variable:
variable:
(1) (1) through
through a table
a table
(2) through
(2) a formula/equation
through a formula/equation
(3) through a density plot e.g a histogram
(3) through a density plot e.g a histogram
So Sofarfarwe
we have
havehadhadexamples
examplesof discrete probability
of discrete distributions
probability because the because
distributions random the
variables
random are discrete.
variables Note thatNote
are discrete. we can
thatalso
wehave
can probability
also have distribution
probabilityfor continuousfor
distribution
random variables,
continuous random but at the moment
variables, we will
but at the focus we
moment on discrete probability
will focus distributions.
on discrete probability
distributions.
Activ Activity 4.1

Self-Activity 4.1
A coin is tossed two times. Let X denotes the number of tails. Write the
A coin is tossed two times. Let X denotes the number of tails. Write the probability
probability distribution for X.
distribution for X.

4.3 Discrete probability distributions


4.3 Discrete probability distributions
4.3.1 Properties of discrete distributions
4.3.1 Properties of discrete distributions
TheThe
following
followingare
areproperties
properties of
of discrete randomvariables:
discrete random variables:
1) ∑ P(X) = 1
2) P(X) ≥ 0
3) P(X =


Property 3 is about cumulative probability where probabilities are added up to the
Property 3 value
specified is about cumulative
of random probability where probabilities are added up to the
variable.
specified
Examplevalue of random variable.
Find the value of c so that the following function is a probability distribution
Example
where
Find the value of c so and
thathence find
the following . is a probability distribution P(x)=c(x+2)
function
where x=0,1,2,3 and hence find p(X≤1).

54

69
Thus there are three wa
variable:
Fig 4.1: Probability density plot
Thus there are three ways to display a probability distribution for a discrete random (1) throug
Solution
Solution
(2) throug
variable:
Note ∑ P(X) = 1, for all X if p(X) is a probability distribution.
Solution (3) throug
(1) through a table
P(X=0)+P(X=1)+P(X=2)+P(X=3)=1 (Note that X can only take 0, 1, 2, 3)
Note ∑ P(X) = 1, for all X if p(X) is a probability distribution. So far we have had exam
(2) through a formula/equation
, since p(X=x)=c(x+2) variables are discrete. N
P(X=0)+P(X=1)+P(X=2)+P(X=3)=1 (Note that X can only take 0, 1, 2, 3)
(3) through a density plot e.g a histogram
2c+3c+4c+5c=1, random variables, but a
So far we have had examples of discrete probability , since p(X=x)=c(x+2)
distributions because the randomActivity 4.1
So c=1/14. Activ
2c+3c+4c+5c=1,
variables are discrete. Note that we can also have probability distribution for continuous
Now p(X≤1)=p(X=0)+p(X=1)
Sorandom
c=1/14. variables, but at the moment we will focus on discrete probability distributions.
Self-
Activ Activity 4.1
=1/14(0+2)+1/14(1+2)
Now p(X≤1)=p(X=0)+p(X=1) A coin is tossed two tim
=1/7+3/14 distribution for X.
Self- =1/14(0+2)+1/14(1+2)
is=5/14
A coin=1/7+3/14
tossed two times. Let X denotes the number of tails. Write the probability 4.3 Discrete probabilit
4.3.2 Mean and variance of a discrete probability distribution
distribution for X.
4.3.2 Mean and
=5/14
Expected variance
value or mean ofis aasingle
discrete probability
average distribution
value that summarizes a probability4.3.1 Properties of disc
distribution.
4.3.2 Mean
Expected valueand
We denote or variance
expected is of
meanvalue a ofa discrete
single probability
average
X by E(X) value
and is
distribution
that
defined as:summarizes
The following
a probability
E(X) =  XP(X). Now variance dis-of are prope
Expected
4.3 value
Discrete or mean
probabilityis a single average value that summarizes a probability
by E(X) and is defined as: E(X) 1) distribution.
= ∑ P(X) =1
tribution. We denote
a discrete random variable,distributions
expected Xvalue of Xas:
is defined XP(X).
Now We4.3.1
denote
variance expected
2 Properties
of a value
of of 2X distributions
discrete
discrete random by E(X) and is
variable, 2Xdefined
is as: E(X)
defined as: =  XP(X). Now 2) variance
P(X) ≥ 0 of
σ = Var(X) = E[(X– E(X)] = ∑[X – E(X)] P(X), this is because
σ2 a=2discrete
E(X) ==
random
The following
Var(X) =E[(X–
XP(X).
variable,
are E(X)]2 2
X=isof
properties defined
∑[X as:random variables:
discrete
– E(X)]2P(X),
2
this is because 3) P(X =
σ =
1) Var(X)

E(X) =Example P(X)
XP(X). =E[(X–
1 E(X)] = ∑[X – E(X)] P(X), this is because
E(X) =  XP(X).
2)Toss
P(X) ≥ 0 twice, let X= # of heads. Find E(X) and Var(X).
Example
a coin ∑
Example3)Solution
P(X =
Toss a coin twice,letletX=
X=##of of heads.
heads. Find Property 3 is about cum
Toss a coin twice, FindE(X)
E(X)and andVar(X).
Var(X).
The probability distribution is
Solution specified value of rando
Solution ∑
TheX probability
The probability 0 distribution
distribution is 1 is 2 Example
Property 3 is about cumulative probability where probabilities are added up to the
XP(X=x) 00.25 10.5 0.25
2 Find the value of c so th
specified value of random variable.
P(X=x) 0.25 0.5 0.25 where and
Example
E(X)=∑
E(X)=∑ =0×0.25+1×0.5+2×0.25
Find the value of c so that the following function is a probability distribution
=0×0.25+1×0.5+2×0.25
=1(the number of heads we can expect to get if we toss a coin twice).
where and hence find .
σ2=1(the
= Var number of heads
(X) = E[(X– E(X)]we 2 can expect to2get if we toss a coin twice).
= ∑[X – E(X)] P(X)
2 2 2
σ2 = Var
=(0-1)
(X) ×0.25+(1-1)
= E[(X– E(X)] 2
×0.5+(2-1) – E(X)]2P(X)
= ∑[X×0.25
2
×0.25+(1-1)2×0.5+(2-1)2×0.25
=0.25+0+0.25
=(0-1)
=0.5
=0.25+0+0.25
=0.5
Example
Example
In the particular game, a coin is tossed. If the coin 54comes up a head, the player wins $100. If
Example
In the
theparticular
coin comes game, a coin
up a tail, theisplayer
tossed. If the
loses $50.coin
Whatcomes
is theup a head,value
expected the player
of the wins
game?$100. If
Inthethe particular
coin comes upgame,
a tail, athe
coin is tossed.
player loses $50.If the
Whatcoin comes
is the up avalue
expected head,ofthe
the player
game? wins
$100. If the coin comes up a tail, the player loses $50. What is the expected value of
the game?

70
55
55
Solution
X(Dollar value) $100 $50
P(X=x) 1/2 1/2
Now E(X)= XP(X)
olution =$100×1/2+ $50×1/2
X(Dollar value) =$25$100 $50
(X=x) 1/2
4.3.3 Examples of discrete random variables and their distributions1/2
lution 4.3.3 Examples of discrete random variables and their distributions
ow E(X)=4.3.3.1
XP(X) Discrete uniform random variable
4.3.3.1 Discrete uniform random variable
(Dollar value) It=$100×1/2+ $100
is the simplest$50×1/2
discrete random variable. The discrete $50 uniform random variable X is when
It is the simplest discrete random variable. The discrete uniform random variable X is
(X=x) the probability
=$25 of 1/2
observing a particular value of 1/2
X is equal for all possible values of X.
when the probability of observing a particular value of X is equal for all possible values
3.3 Examples
w E(X)= Since
XP(X) ofthe probabilities
discrete random arevariables
the same,and this their
random variable is called the discrete uniform and
distributions
of X.
3.3.1 Discrete its uniform
probability
=$100×1/2+ $50×1/2 random
distribution variable
is called discrete uniform distribution. If the random variable X
is the simplestassumes
=$25 values
discrete of x1variable.
random , x2, …, xThek with equal probabilities,
discrete uniform random thenvariable
the discrete uniform distribution
X is when
e.3probability
Since
of
the probabilities
observing a
arevalue
particular
the same,
of X is
this random
equal for all
variable
possible
is called
values of
the
X.
discrete uniform
Examples of discrete
is given by P(X=x,randomk) = variables and their distributions . Note that x1, x2, …, xk are just a convenient
and its probability distribution is variable
called discrete
is called uniform distribution.
uniform and If the random
3.3.1
nce theDiscrete uniform
probabilities
way to
are the random
iterate
same, this variable
out all values
possible
random
values that…, X can take on.
the discrete
Theprobabilities,
following are examples of
variable X assumes of x1, x2, xk with equal then
X the discrete
sprobability
the simplest distribution
discrete is called
random
discrete distribution
uniform random
discreteThe
variable. uniform
variable discrete
and
distribution.
uniform
distribution:
If the random
random variable variable
X is when
sumes
probability
uniform
values ofofobserving
is given
x1, x2, …,a xparticular
k with equalvalue
by P(X=x,
probabilities,
of X isthen
equal
k)for
then =the

alldiscrete
possible uniform
values
Note that x1,
distribution
of X. uniform random
x2, …, xk 1.
are When
just a you throw
convenient a die,
way to outcome
iterate out on the
all die
possible is values
discrete that
nce
giventheby
probabilities
P(X=x, k)are = the same, this random variable . Note that is called
x1, x2,the…,discrete
xk are justuniform and can take on.
a convenient
X
The following variable because
are examples the probability
of discrete for every
uniform random value is the
variable same,
and X .
distribution:
probability
ay to iterate distribution is called
out all possible values discrete
that X uniform
can takedistribution.
on. The followingIf the are
random variable
examples of
umes values
screte uniformofrandom
x1, x2, …, xProbability
k with
variable andequal distribution:
probabilities, then the discrete uniform distribution
distribution:
1. When you throw a die, then outcome on the die is discrete uniform random variable
given by 1. P(X=x,
When k) you
= throw X a die, then1outcome . Note
onthat
2the x 1, x
die is2,discrete
…,3 xk are just 4
uniform a convenient
random 5 6
because the probability for every value is the same, 1
y to iterate out all possible P(X=x)
values that X 1/6can take on. 1/6
The is the1/6
following are . 1/6 of 1/6 1/6
variable because the probability
for every value 6 examples
same,
crete uniformProbability
random variable
2. When andyoudistribution:
toss a coin, the outcome on a coin is discrete uniform where
Probability distribution:
distribution:
1. When you throw a die, then outcome on the die is discrete uniform random
X probability of2each outcome
1 the probability is .
3 value
variable because for every is the4 same, . 5 6
P(X=x) Probability
1/6 distribution:
1/6 1/6 1/6 1/6 1/6
Probability distribution:
X H T
2. When you toss a coin, the outcome on a coin is discrete uniform where
X 1P(X=x) 2 3 0.5 4 5 0.5 6
2. probability
When
P(X=x)
you toss
of each
1/6
a coin,
outcome the
1/6
isoutcome
.
1/6
on a coin is
1/6
discrete uniform
1/6
where1/6
probability
of
4.3.3.2 each
Probabilityoutcome
Binomial is 1
distribution:
random variable
2. When you toss a coin, 2the outcome on a coin is discrete uniform T where
TheXbinomial random variable H from
arise the binomial experiment. The following explains the
probability
P(X=x) of
binomial experiment: each outcome is .
0.5 0.5
Probability
1. Probability distribution:
The experimentdistribution:
is repeated for a fixed number of trials, where each trial is independent of
3.2 Binomial X random
other trials. variable H T
binomial 2. P(X=x)
Therevariable
random are onlyarisetwofrompossible 0.5
outcomes
the binomial of interest for
experiment. Theeach 0.5 The
trial.
following outcomes
explains the can be
omial experiment: classified as a success (S) or as a failure (F).
3.2
The Binomial 4.3.3.2
experiment is Binomial
random
Trial repeated 2forrandom
1 Trialvariable a fixed
Trial variable
3, number
…, Trialofntrials, where each trial is independent of
other
binomial trials.
randomS/F variableS/Farise S/F, from the
..., binomial
S/F experiment. The following explains the
There The two
are only
mial experiment: binomial
possiblerandomoutcomes variable
of interest arise
for from the binomial
each trial. The outcomes experiment.
can be The following
3. The probability of a success S denoted by P is the same for each trial, similarly probability
classified
The explains
as
experiment a success the(S)binomial
is repeated experiment:
orforasaafixed
failure (F). of trials, where each trial is independent of
number
of failure denoted by 1 p.
ther trials.
Trial 1 Trial 2 Trial 3, …, Trial n
4. The binomial random variable, call it X is the number of successes in the n independent
There
S/F are only 1. The
S/F twoS/F,experiment
possible S/Fis repeated
..., outcomes of interestfor for aeach
fixed
trial.number of trials,
The outcomes can bewhere each trial is
trials.
lassified as a success (S) independent
or as a failure of other
(F). trials.
The probability of a success S denoted by P is the same for each trial, similarly probability
Trial 1 Trial
of failure 2 Trial
denoted by 3,
1 …,p. Trial n 71
56
S/F S/F random
The binomial S/F, variable,
..., S/Fcall it X is the number of successes in the n independent
trials.
The probability of a success S denoted by P is the same for each trial, similarly probability
2. There are only two possible outcomes of interest for each trial. The outcomes
can be classified as a success (S) or as a failure (F).

Trial 1 Trial 2 Trial 3, …, Trial n


S/F S/F S/F, ..., S/F

3. The probability of a success S denoted by P is the same for each trial,


similarly probability of failure denoted by 1-p.
4. The binomial random variable, call it X is the number of successes in the n
independent trials.

Now
Now in a binomial
binomialexperiment,
experiment, thethe probability
probability of getting
of getting exactly
exactly x successes
x successes in nwhich
in n trials trials
which
we callwe call probability
probability distribution
distribution of X is of X is
( )
Now in a binomial experiment, the probability of getting exactly x successes in n trials which
we call probability distribution of X is
where X=0,1,2,…,n ( )
This is what the binomial probability distribution in formula form. Mean and variance of
This is what the binomial probability distribution in formula form. Mean and variance
binomial random variable are given as and respectively.
of binomial random variable are given as E(X) = np and V(X) = np(1-p) respectively.
where X=0,1,2,…,n
Example
Assume a farmer
This is what
Example is examining
the binomial fruits whether
probability ripe inorformula
distribution not. Theform.
probability
Mean andof avariance
ripe fruitofis
0.1. If therandom
binomial farmervariable
examinesare10 of the
given as fruits, (a) what andis the probability that 2respectively.
fruits are ripe?
Assume a farmer is examining fruits whether ripe or not. The probability of a ripe fruit
(b) What is the probability that at most 2 fruits are ripe?
isExample
0.1. If the farmer examines 10 of the fruits, (a) what is the probability that 2 fruits
Solution
are ripe?a(b)
Assume What
farmer is the probability
is examining that atripe
fruits whether mostor2not.fruits
Theare ripe? of a ripe fruit is
probability
Examining of 10 fruits
0.1. If the farmer is binomial
examines experiment
10 of the fruits, (a)because
what is for
the each examination
probability there are
that 2 fruits are two
ripe?
possible outcomes (ripe/not). Let X be number of ripe
(b) What is the probability that at most 2 fruits are ripe? fruits, then for
Solution
(a) We need P(X =2).
Solution
Examining of 10 fruits is binomial experiment because for each examination there are
Using the
Examining formula,
of ( )experiment
10 fruits is(ripe/not).
two possible outcomes binomial Let X be because where
number ofneach
for = 10,
ripe x = 2,then
p = 0.10,
examination
fruits, there we
for are have:
two
possible outcomes
(a) We need (P(X)=2).(ripe/not). Let X be number of ripe fruits, then for
(a) We need P(X =2).
Using the formula, ( ) where n = 10, x = 2, p = 0.10, we have:
(b) What is the probability that at most 2 fruits are ripe?
It means we need( )p(X≤2).
Using the formula for cumulative probability p(X≤x), we have
P( X  2)  P( X  0)  P( X  1)  ( P( X  2)
(b) What is the probability that at most 2 fruits are ripe?
It means we need10  p(X≤2). 10  10 
  (0.10) 0 (0.90)10   (0.10)1 (0.90) 9   0.10 0.90
2 8

Using the formula0


  for cumulative 1
probability
  p(X≤x), we 2
 have
P( X  2)= 0.3487
P( X  0+) 0.3874
 P( X  )  ( P( X  2)
+ 10.1937
10
= 0.9298 10 10
  (0.10) 00 (0.90)1010  
 (0.10)11 (0.90) 99   0.1022 0.9088
4.3.3.3 Bernoulli
0  random variable 1  and probability  2  distribution
Bernoulli=experiment:
0.3487 + 0.3874 + 0.1937
1. There is one trial
= 0.9298
2. In each
4.3.3.3 Bernoulli trial there are two
random possibleand
variable outcomes, success(S)
probability or failure(F)
distribution
3. The
Bernoulli
72
probability of success is denoted as p and failure is denoted as 1-p
experiment:
Now1.theThere
Bernoulli
is onerandom
trial variable, call it X is the number of success in the one trial, thus
x 1-x
4.3.3.3 Bernoulli random variable and probability distribution

Bernoulli experiment:
1. There is one trial
2. In each trial there are two possible outcomes, success(S) or failure(F)
3. The probability of success is denoted as p and failure is denoted as 1-p

Now the Bernoulli random variable, call it X is the number of success in the one
trial, thus X=0,1. The probability distribution function is defined as P(X=x)=px(1-p)1-x,
where x=0,1.

Note that the Bernoulli probability distribution is a special case of Binomial distribu-
tion where number (n) of trials is 1. Thus E(X)=p and V(X)=p(1-p).
Examples of Bernoulli random variables
1. When throwing a coin once, number of times we have a head(X=0,1)
2. When undergoing HIV testing, the number of times one has positive
result(X=0,1)
4.3.3.4 Poisson random variable and distribution
4.3.3.4 Poisson random variable
The Poisson and distribution
distribution is a discrete probability distribution of a random var
The Poisson distribution is a discrete probability distribution of a random variable X
satisfies the following conditions. The experiment consists of counting the nu
that satisfies the following conditions. The experiment consists of counting the number
an event occurs in a given continuous interval/space. The interval can be an
of times an event occurs in a given continuous interval/space. The interval can be an
area,
interval of time, area, oror volume.
volume. The
The probability
probability of of
thethe event
event occurring
occurring is the
is the same
same forfor each i
The number
each interval. The number of occurrences
of occurrences in onein interval
one interval is independent
is independent of theofnumber
the number of o
of occurrences in other intervals.
intervals. The
The probability exactlyx xoccurrences
probabilityofofexactly occurrencesin in
an an
interval
in- is give
terval is given by where X=0,1,2,…. where e  2.71818 and and μμisisthe mean

the mean number occurrences.


of occurrences.The The
meanmean
and variance for thefor
and variance Poisson randomrandom
the Poisson variable are the
E(X)that is,and V(X)
variable are the same, . The following are examples of Poisson random varia
1. Number of HIV infections in a year
E(X)=μ and V(X)=μ. The 2. Number
followingofare
cars passing in
examples of an hour on
Poisson the road.
random variables:
3. Number of typographical errors on each page of word document.
1. Number4.of Number
HIV infections
of fungiininfections
a year per square cm2 on maize plant
2. Number of cars passing in an hour on the road.
Example
3. Number of typographical
The mean errors on visits
number of extension each page of word
to farmer document.rural is 4 per yea
in Lilongwe
4. Number of fungi infections per square cm2 on maize plant
probability that in a given year, there are exactly 3 visits. (b) Find the probab
given year there are at most 2 visits.
Example
The mean number of Solution
extension visits to farmer in Lilongwe rural is 4 per year. (a) Find
Number
the probability that in a givenof year,
extension
there visits to farmers
are exactly is Poisson
3 visits. (b) Findrandom variable where m
the probability
that in a given yearsince
thereyou
are are looking
at most at number of observations in a continous space, time.
2 visits.
(a) We need p(X=3). In this case we use the formula .

Now
73
,
The mean number of extension visits to farmer in Lilongwe rural is 4 per year. (a) Find the
probability that in a given year, there are exactly 3 visits. (b) Find the probability that in a
given year there are at most 2 visits.
Solution
Solution
Number
Number of of extension
extension visits
visits to
to farmers
farmers is
is Poisson
Poisson random
random variable
variable where
where mean
mean visits
visits is =4,
is μ=4, since you are looking at number of observations in a continous
since you are looking at number of observations in a continous space, time.space, time.

(a) We need p(X=3). In this case we use the formula .

Now

0.19
Note that ,
so .
(b) Now here we need , cumulative
probability up to X=2.

0.018316 0.073263 0.146525


0.238104
4.3.3.5 Multinomial random variable
4.3.3.5 Multinomial
The multinomial random
random variable
variable arises from multinomial experiment: The following
The
describes the multinomial experiment: from multinomial experiment: The following
multinomial random variable arises
describes the multinomial
1. There experiment:
are n independent trials
1. There are n independent trials
2. In each trial there are k possible kinds of outcomes, instead of two as is the case with
2. In each trial there are k possible kinds of outcomes, instead of two as is the case
binomial
with binomial
3. 3. The
The probability
probability of
of each
each kind
kindofofoutcome
outcomeinineach trial
each p1,pp1,p2,p3,….,pk
triali.e i.e 2,p3,….,pk is constant
is
constant across the n trials.
58
Now let X=(X1, X2,…., Xk) be the number of each kind of outcome in n trials. Then X
represent multinomial random variable.

Example 1
Suppose you are testing 10 cows for kind of disease, in each trial there are more than
one kinds of diseases(success), and assuming probability of each kind of disease is
same in all trials, then the number of each kind of disease in 10 trials is multinomial

Example 2
Suppose you are testing 1000 children for severity of child anaemia. The outcome may
be mild anaemia, moderate or severe for each testing of the child. Assuming that the
probability of each kind of anaemia is same across 1000 tests, the outcome the number
of each kind of anaemia in 1000 tests is multinomial.
74
may be mild anaemia, moderate or severe for each testing of the child. Assuming
may be mild anaemia, moderate or severe for each testing of the child. Assuming
that the probability of each kind of anaemia is same across 1000 tests, the outcome
that the probability of each kind of anaemia is same across 1000 tests, the outcome
the number of each kind of anaemia in 1000 tests is multinomial.
the number of each kind of anaemia in 1000 tests is multinomial.
Probability
Probabilitydistribution of multinomial random variable is given by by
Probabilitydistribution
distributionofofmultinomial
multinomialrandom randomvariable
variableisisgiven
given by
n! xk
P( X 1  x1 , X 2  x2 ,..., X k  xk )  n!p p ... p
x1 x2
P( X 1  x1 , X 2  x2 ,..., X k  xxk 1)! x2 !...xk ! 1 2p1x1 p 2kx2 ... p kxk
x1! x2 !...xk !
4.3.3.6 Hypergeometric random variable
4.3.3.6Hypergeometric
4.3.3.6 Hypergeometric random
random variable
variable
Consider
Considera population of size N consisting of m items withwitha characteristic of interest ,
Considera apopulation
populationofofsize
sizeNNconsisting
consistingofofmmitems
items witha acharacteristic
characteristicofofinterest
interest ,
thus, there
thus are N-m
are items without a characteristic of interest e.g a population with mwithHIV
thus there
there are N-m items
N-m items without
without aa characteristic
characteristic of interest
of interest e.gaapopulation
e.g population with mmHIV
and HIV
N-mand with no HIV.
and N-mN-m withwith no HIV.
no HIV.

Fig 4.2: Scenario of hyper geometric random variable


Fig 4.2: Scenario of hyper geometric random variable
Suppose one get
Suppose a get
sample of n items nfrom the from
population without replacement, then
Suppose one get a sample ofofn items
one a sample items
from the population
the population withoutwithout replacement,
replacement, then
number
then ofnumber
items, X,of with
items,characteristic of interest from
X, with characteristic of this sample
interest is the
from this sample is the
number of items, X, with characteristic of interest from this sample is the
hypergeometric
hypergeometric randomrandom variable.
variable. The following
The following are examples
are examples of hyperofgeometric
hyper geometric
hypergeometric random variable. The following are examples of hyper geometric
random
random variable:
variable:
random variable:
1. Number of HIV people in every sample of 10 individuals from statistics class.
1. Number of HIV people in in
every sample of of
1010individuals from statistics class.
2. 1.
Consider Number
sampleofofHIV people
every every
50 students insample
statistics individuals
class, number offrom statistics
females class.
in such
2.2. Consider sample of every 50 students in statistics class, number
Consider sample of every 50 students in statistics class, number of females of females in such
in
every sample is hyper geometric.
every
suchsample is hyperisgeometric.
every sample hyper geometric.
The probability distribution of a hyper geometric random variable is defined as:
The probability distribution of a hyper geometric random variable is defined as:
The probability distribution of a hyper geometric random variable is defined as:

 m  N  m  59
   59
x nx 
P( X  x)    where 0≤x≤m.
N
 
n

Activ Activity 4.2


75
Self-
1. Decide whether the experiment is a binomial or not.
 
n

Activ Activity 4.2


Activity 4.2
Self-1. Decide whether the experiment is a binomial or not.
a) You toss a coin ten times and you are looking at the number of times
1. Decide whether the experiment is a binomial or not.
you will have a head.
a) You toss a coin ten times and you are looking at the number of times you will
b) You roll a die 10 times and note the number the die lands on.
2. have
Statea the
head.
name of the random variable below
a) b) You
Number ofdie
roll a 10 times
positive andwhen
result note the number
testing 100the diefor
cattle lands on.and
Foot
2. StateMouth
the name of the random variable below
disease
b) a) Number
Numberofofpositive resultinfected
maize stalks when testing 100 borer
with stalk cattle per
for Foot and Mouth disease
hectare.
b) Number of maize stalks infected with stalk borer per hectare.

4.44.4Property
Property ofof continuous
continuous random
random variable
variable and
and distribution
distribution
Recall in discrete probability that if X is discrete random
Recall in discrete probability that if X is discrete random variable variablethen
thenthe
theprobability
probability
distributionthat
distribution thatXXtakes
takesspecific
specificvalue
value isis defined
defined as P(X=x) where where xxisisthe
thespecific
specificvalue
of X.ofNow
value if X is
X. Now if continuous, that is,
X is continuous, assuming
that a number
is, assuming of values
a number in a given
of values in a interval
given
interval e.g. a<X<b,
e.g. a<X<b, then the then the probability
probability distribution
distribution that X that
takesXspecific
takes specific
value isvalue is not
not p(X=x)
p(X=x) but
but rather rather p(x<X<x+∆x)≈f(x)dx where dx is the small change
where dx is the small change in Xin X
Consequently we talk of X in an interval so
Consequently we talk of X in an interval so that that wewetrytry
to have p(a<X<b)
to have p(a<X<b) which
whichis area
is area
under the curve, f(x), the probability density function.
under the curve, f(x), the probability density function.

Fig 4.3: Continuous density function


Now area under f(x) from a to b is obtained by a definite integral (see Mat 121)

Definition: If the distribution of a continuous random variable has a probability


density function, f(x), then for any interval (a, b), we have
60
76
Definition: If the distribution of a continuous random variable has a probability density
function, f(x), then for any interval (a, b), we have
b
Pa  X  b   Pa  X  b    f x dx .
a

The following are properties of density functions:


(1) f(x) ≥ 0 for all possible x values. (Density functions always are on or above the
horizontal axis)

(2)  f ( x)dx = 1. (Total area beneath the curve f(x)) is exactly 1.)

(3) The cumulative distribution function (cdf) is denoted by F(x) is
x

P(X<x) = F(x) =  f ( x)dx which is area under f(x) from x to the left


Fig 4.4: Cumulative probability distribution


(4) , that is, to get the density function, f(x) differentiate the cumulative
density function, F(x).
Example
Consider the probability density function
where 0<x<2. Find the value of c.
Solution
∫ =1 where
2
 2
  cxdx  cx 2 / 2 0  2c  1 ,
0

so c = 1/2.
Example
77 from 1 to 5 days, which gives
The delivery time X is uniformly distributed
Example
The delivery time X is uniformly distributed from 1 to 5 days, which gives

{
{{
Find the probability that the delivery time is two or more days
Find
Find the
Solution the probability
probability that that the
the delivery
delivery time time is is two
two or or more
more days days
Solution
Solution {
∫ ( ) ( )
∫ ( ) (({ ))
Definition:
Definition:
nd the probability thatThe the ) ortime
∫ (delivery
mean, expected
expectedis twovalue,value,
or more or
or expectation,
expectation,of
days ofaacontinuous
continuousr.v. r.v.XXwithwithp.d.f.
Definition: The mean, or expected value, { or expectation, of
lutionthe p.d.f.
Find Definition:that
probability Thethe mean, or expected
delivery

time is two value, or ormore expectation,
days of a continuous r.v. X with p.d.f.
a continuous r.v. X with p.d.f.
Solution ∫Find f(x) is given
( ) the probabilityby     
()  E that
E X   xf  x dx
.
the delivery time is two or more days
   xf xx dx

() E X  X  xf
f(x)
f(x)
finition: Definition:
The ∫
is
Solutiongiven
ismean,
given
( ) orLet
by
byexpected 
value,
X be a continuous or expectation,
dx ..
of a continuous r.v. X with p.d.f.
  r.v. with p.d.f. f(x), and mean . The variance of X, or the
 
Definition: Let X ( a) continuous
∫expected
be ( ) r.v. with
Definition: Let X be
be aa continuous is r.v. with byp.d.f.
p.d.f. f(x),
p.d.f.off(x),
and
and mean r.v.
and mean mean .
. The The variancep.d.f.ofof
variance X, or the
of X,

Definition: The mean,
variance of the ordistribution value,
of X, or expectation,
given a continuous . X with
Definition:
or
) is givenvariance the  
variance
byDefinition: E
of   
X
Let
the of 
X
the
The mean, xf
distribution
continuous
 x dx
distribution
 of
r.v.
.X, is given
orofexpected X,
with
is
value, given
by by
f(x),
or expectation,
The variance
of a continuous r.v. X with p.d.f.
X, or the

   

variance of the distribution of2X, is given2 by
   E E X  X Exf( Xx)dx   x  E ( X ) 2 f x dx . The standard
2

f(x) is given2by X
2
 V  E ( X )  [ E ( X )]2  
finition:  XV

Let2f(x) is X
Vbe Xgiven 
E
    
a continuous
Eby XX E
E (( X
E
r.v. Xwith
2 .
X)) 2  E
Exf
p.d.f.
 (( X ))x22dx
Xf(x),  [[ E
and X
. ((mean
E X )]
2
)]2 .

 Thexx variance
E X ))  of
E (( X
2 f X,
f xx dx
dxor .the
. The
The standard
standard
riance
Definition: deviation
of the distribution
Let X be of aXcontinuous
is
ofjustX, is thegiven
square
r.v. by with root
 of the
p.d.f. f(x), variance.
and mean .  
 The variance of X, or the
deviation
Example of
of X X is justX the
beisasquare root
root of the variance.
deviation

Definition:

isLetjust the continuous
square ofr.v. thewith p.d.f. f(x),2 and mean . The variance of X, or th
variance.

2variance of the distribution 2of X, given by
 V  X Example
The

For E 
standard
X
a lathe in
variance
Example
 E ( X
ofathe ) 
deviation
machine  Eof
( X X ) 2is just
shop, of
distribution  [ E
letX, (the
X
X is )]square
denote
2
giventhe  root
x E
bypercentage
the) variance.
of( X fofxtime
dx .out The ofstandard
a 40-hour workweek
  V  Xthat
2 For
For athe
a lathe
lathe
Elathe X in
inE a
a(machine
is 
X )   in
machine
actually
2 shop,
E
shop, X )let
(use. let
2
X X denote
[ Edenote
Suppose ( X )] 2
X has the

the probability
x  E ( X )  density
of x dxoutfunction
 percentage of time out of a 40-hour workweek
 apercentage
2
ftime . of
The given workweek
standard
a 40-hour by
viation ofExample
ample
X is
that
thatathe
For
just
the
2
lathe
V in
lathe
lathe

the 
square
is 
actually
Xisaactually 
root
E X inE
machine
of
in the
use.  
variance.
2
Suppose
( X )Suppose
use.
shop, let XEdenote
X
( XX)has has
2
{ the
a ( X )]   density
probability
 [aEprobability
 
percentage
2

density
x time
of E( X 2
function
) of
function
out
 
given
f xagiven
dx by
. The
40-hour by standard
deviation of X is just the square root of the variance. 
workweek that the lathe is actually in use. {{Suppose X has a probability density function
r a lathe inFind adeviation
machine shop,
of X isletjust X denote
the square the percentage
root of the variance. of time out of a 40-hour workweek
Example giventhe by mean and variance of X.
For latheFind
t thea lathe is athe
actually
inExample
Solution
Find machine
the mean
mean and
in use.
shop,
and variance
Suppose
let X denote
variance of
X has
of X.
X. the a probability
percentagedensity of timefunction
out of a given 40-hour byworkweek
Solution
From For
is thea lathe in
definitioninause. machine
ofSuppose
expectedshop, let Xawe denotehave,the percentage of timegiven out of bya 40-hour workwee
that the lathe Solution actually { Xvaluehas probability density function
From
From the
thatthe definition
thedefinition
lathe is actually of expected
of expected in use. valueSuppose we have,
X has a probability density function given by
nd the mean and variance of X.
∫ {value ∫
we have,
∫ * +
Find the mean and variance of X. {
lutionthe mean and variance of ∫ ∫ ∫ ** ++
Find Thus, on the average,X.
∫ ∫ ∫
the lathe is in use 75% of the time. To compute V(X), we first find
om the definition
Solution Find
Solution
Thus, 2 on of
the expected
the mean
average, and value
the we have,
variance
lathe of
is X.
in
E(X
Thus,
From
) onas follows:
the the average,
definition of the lathe
expected in use
isvalue usewe
75%
75% have,
of
of the
the time.
time. To To compute
compute V(X), V(X), we we first
first find
find
From the E(X Solution
definition
2
)
2 ∫ as of
follows: expected value we have,
E(X ) as the follows: ∫ ∫ * +
From ∫ definition of expected ∫ value we ∫ have, * +
us, on the average, the ∫ ∫  ∫∫ ∫ ∫ * + ** ++we first find
∫ lathe is in use ∫75% of the ∫ time.∫ To compute V(X),
2 Then, ∫ ∫ . * +
XThus,) as on
follows:
the average, 
the lathe is in use 75% of the time. To compute V(X), we first findwe first
Then,
Thus,
Then, on the average, the lathe is in use 75% of the time. To compute .
. V(X),
2
E(X ) as follows: Thus,2on ) asthe average, the lathe is in use 75% of the time. To compute V(X), we first find
∫find E(X 2
follows:
∫ ∫ * +
 E(X ) as follows:
Activ
∫ Activity ∫ 4.3 ∫ * +
en, Activ  Activity 4.3 .
Activ ∫
Activity 4.3 ∫ ∫ * +
Then, Self-  .
Self-Then,
Self-1. Suppose Y, the grams of lead per liter of gasoline, has the density function f(y) =
.
tiv Activity 1. 4.3 Y,
1. Suppose
12.5y
Suppose – 1.25Y, the
theforgrams
0.1 <y<
grams of
of lead
0.5. per
lead What
per liter
literisof thegasoline,
of probability
gasoline, has
has thatthe
the density
the nextfunction
density f(y)
f(y) =
liter of gasoline
function =
Activ Activityhas 12.5y
4.3 less– 1.25
than for
0.3 0.1
grams <y< of 0.5.
lead? What 78 is the
12.5y – 1.25 for 0.1 <y< 0.5. What is the probability that the next liter of gasoline probability that the next liter of gasoline
lf-
2. has
Activ Show
has less than
than 0.3
Activity
lessthat 4.3grams
0.3 grams of of lead?
lead?
1. Suppose Y, the grams of lead per liter of gasoline, has the density function f(y) =
Activity 4.3
1. Suppose Y, the grams of lead per liter of gasoline, has the density
function f(y) = 12.5y – 1.25 for 0.1 <y< 0.5. What is the
probability that the next liter of gasoline has less than 0.3 grams
of lead?
2. Show that V(X)=E[X-E(X)]2=E(X^2 )-[E(X)]2

4.5 Some continuous random variables and distributions


4.5 Some continuous random variables and distributions
4.5.1 Normal random variable and distribution
4.5.1 Normal random variable and distribution
The probability density function for the normal distribution is given by:
The probability density function for the normal distribution is given by:

Fig 4.5: Normal density function


f(x), the height of the curve, represents the relative frequency(probability) at which the
f(x), the height of the curve, represents the relative frequency(probability) at which the
corresponding values occur.
corresponding values Smaller values (values
occur. Smaller to the fartoleft)
values (values the and largerand
far left) values
larger(values
valuesto
the right) havetolower
(values probability
the right) have of occurrence)
lower probabilitythanofvalues in the centre.
occurrence) The following
than values are
in the centre.
examplesTheoffollowing
normal random variables:
are examples of normal random variables:
1. 1.PeoplePeople
height,height,
that is,that
it isis,
rare torare
it is see taller
to seeand shorter
taller peoplepeople
and shorter and it and
is common to see
it is common
medium to height
see medium
people.height people.
2. 2.StudentsStudents
grades,grades,
that is,that
mostis,ofmost of the ittimes
the times it isfor
is rare rare for students
students to score
to score low and lowhigh
and
high grades and it is common for students to
grades and it is common for students to score medium size grades. score medium size grades.
In the normal probability distribution, there are two parameters: μ for location and σ for
In the normal probability distribution, there are two parameters: μ for location and σ
shape. The following are normal curves with varying standard deviation, sigma(σ) but with
for shape. The following are normal curves with varying standard deviation, sigma(σ)
same centre(μ):
but with same centre(μ):

79
shape. The following are normal curves with varying standard deviation, sigma(σ) but with
same centre(μ):

Fig 4.6: Normal distribution with different standard error


The following
The following
following are properties
are properties
properties of the
the normalof thedensity
normal normalcurve:density curve:
The are of density curve:
1. It is symmetric itsμ,about its
μ, mean, μ, area(probability) toofthe left of the
is me
1. It is 1. It is symmetric
symmetric about itsabout
mean, mean, area(probability)
area(probability) to the
to the left of theleftmean theismean
same assame same
as the
area to area toasthe
right area
of thetomean.
right thethe
of right
63 of the mean.
mean.
2. The2. mean 2. The
The=mean
median =mean
mode.
= median ==median
mode. = mode.
3. Total3.area 3.
under
Total Total
areathe area under
curve(total
under the curve(total
probability)
the curve(total = 1 probability)
probability) =1 =1
4. Area4.under
Areathe4. curve
underArea
theunder
to the the
curve right
to curve
the of μtoequals
right theμright
of theofarea
equals μ equals
the under the
thearea
area under theunder
curve the
to the
curve curve
to the leftto
leftof μ,ofwhich equals
μ, whichof μ, ½ ½ equals ½
which
equals
5. As x increases 5. without bound (gets
As x increases without larger
bound and(gets
larger), theand
graph approaches,
5. As x increases without bound (gets larger andlarger
larger), thelarger),
graphthe graph approa
approaches,
but never reachesbuttheneverhorizontal
reaches axis (like approaching
the horizontal anapproaching
asymptote).anAs x
but never reaches the horizontal axis (likeaxis (like
approaching an asymptote). asymptote).
As x
decreases withoutdecreases
bound (gets withoutlarger and
bound larger
(gets in the
larger andnegative direction) the
decreases without
graph approaches, but neverbound (getsthe
reaches, larger and larger
horizontal axis.in the negative direction)directio
larger in the negative the
graph approaches,
graph approaches, but neverbut never the
reaches, reaches, the horizontal
horizontal axis. axis.

4.5.2 Standard
4.5.2 Standard Normal Distribution
Normal Distribution
normalAdistribution
A normal
A normal distribution
distribution with  = 0with
with and===011and  = 1 is
isis known
known asknown
as as anormal
aastandard
standard standard
normal normal distribution
distribution.
distribution. The
The
letterletterletter Z is
Z is used
Z is used used to indicate
to indicate
to indicate the standard
the standard
the standard normal normal
normal variable.
variable.
variable. The density
The density
The density function
function
function of standa
of standardof
standard
normal isnormal
definedisas:
defined
defined as:
as: where where . The expected
. The expected value andvalue a


variance variance
of of standardZnormal
standard are of Z are and respectively.
The expected value normal
and variance standard and
normal Z are respectively.
E(Z)= 0 and V(Z)= 1
respectively.

80
normal
variance is definednormal
of standard as: Z are where
and . The expected value and
respectively.

variance of standard normal Z are and respectively.

Fig 4.7: Standard normal density function


Cumulative distribution functionFigis4.7: Standard
given by normal density function
Cumulative distribution function is given by
Cumulative distribution function is given by



To evaluate such integral it is not handy, so we use table
√ of standard normal integrals.
To evaluate such integral it is not handy, so we use table of standard normal integrals.
Example
FindExample
P(Z  1.53)for a standard normal variable Z.
Find P(Z  1.53)for a standard normal variable Z.
64
Solution
Solution 64

The P(Z  1.53) =∫ is equal to the shaded area in figure below.


81
Solution
The P(Z  1.53) =∫ is equal to the shaded area in figure below.

Solution
The P(Z  1.53) =∫ is equal to the shaded area in figure below.

Fig 4.8: P(Z  1.53)


To find this integral we use standard normal integrals table. From the table, this is found in
Tothe
find this integral
cross-section we row
of the use corresponding
standard normal integrals
to 1.5 and the table.
columnFrom the table,tothis
corresponding is found
0.03.
Fig 4.8: P(Z  1.53)
inHence,
the cross-section of the
P(Z  1.53) = 0.9370 row corresponding to 1.5 and the column corresponding to
To find this
0.03. integral we use standard
Hence, P(Z < 1.53) = 0.9370 normal integrals table. From the table, this is found in
Example
the cross-section of the row corresponding to 1.5 and the column corresponding to 0.03.
If Z denotes a standard normal variable, find
Hence, P(Z  1.53) = 0.9370
Example
a) P(Z  1)
Example
If Z denotes a standard normal variable, find
If Z denotes b)a standard
P(Z < -1.5)
a) P(Z < 1) normal variable, find
c) P(Z > 1)
a)b) P(Z  P(Z
1) < -1.5)
d)
b)c) P(Z <P(Z P(-1.5
-1.5)> 1) Z  0.5)
c)d) P(Ze)>P(-1.5
Find the
1) <Z value of Z, say z0, such that P(Z  z0) = 0.99.
< 0.5)
d)e)Solution
P(-1.5Find
 Z the0.5)value of Z, say z0, such that P(Z < z0) = 0.99.
e) Find a)theP(Z  1)of= Z,
value 0.8413
say z as, such
shown
thatinP(Z
figure
 zbelow
) = 0.99.
0 0
Solution
Solution
a) P(Z < 1) = 0.8413 as shown in figure below
a) P(Z  1) = 0.8413 as shown in figure below

Fig 4.10: P(Z  1)

65
Fig 4.10: P(Z  1)
82
65
b) P(Z < -1.5) =P(Z>1.5)=1-P(Z<1.5)=1-0.9332= 0.0668 as shown in figure below

Fig 4.11: P(Z < -1.5)


c) P(Z > 1) = 1 – P(Z  1) = 1 – 0.8413 = 0.1587, shown in figure below

Fig 4.12: P(Z > 1)


d) P(-1.5  Z  0.5) = P(Z  0.5) – P(Z < -1.5) = 0.6915 – 0.0668 = 0.6247 as shown in
figure below.

Fig 4.13: P(-1.5  Z  0.5)

83
e) To find the value of Z, say z0, such that P(Z  z0) = 0.99. We must look for the given
probability 0.99 in the Z-tables. The closest we can come is 0.9901, which
corresponds to the z-value of 2.33. Hence, z0 = 2.33.

Fig 4.14: P(Z  z0) = 0.99


4.5.2.1 Using standard normal to find probability of normal random variable
A standard normal random variable can be obtained from normal distribution by transforming
the normal random variable into the standard normal random variable using the formula:
X 
Z=

This is called standardizing the data. It will result in (transformed) data with μ = 0 and σ = 1.
Because any normally distributed random variable can be transformed to the standard normal,
probabilities can be evaluated for any normal distribution by using a table of standard normal
integrals.
Example
Credit card balances are normally distributed with mean MWK2870 and standard deviation
MWK900. What is the probability that a randomly selected credit card holder has balance
less than MWK2500?
Solution
Let X be the credit card balance. Now we need p(X<2500). Now since X is normal we can
standardize 2500 to standard normal as
x   2500  2870
z   0.41,
 900
Now
Note that to find p(z<-0.41) we use Z-tables or in Excel type =NORMSDIST(z)
Example
Suppose that the average salary of college graduates is normally distributed with μ=$40,000
and σ=$10,000. What proportion of college graduates will earn less than $24,800?
Solution
Z = ($24,800 – $40,000) / $10,000 = 1.52
Proportion of college graduates will earn less than $24,800 is equivalent to proportion of
college graduates will earn less than -1.52 which is 0.064, using the z-tables.
67
84
Example
AExample
firm that manufactures and bottles apple juice has a machine that automatically fills
16-ounce
A firm thatbottles. There is,and
manufactures however,
bottlessome
applevariation
juice hasina the amount
machine thatofautomatically
liquid dispensed
fills 16-
(in ounces) into each bottle by the machine. Over a long period of time, the average
ounce bottles. There is, however, some variation in the amount of liquid dispensed (in
amount dispensed into the bottles was 16 ounces, but there is a standard deviation of
ounces) into each bottle by the machine. Over a long period of time, the average amount
1 ounce in these measurements. If the amount filled per bottle can be assumed to be
dispenseddistributed,
normally into the bottles wasprobability
find the 16 ounces,that
but the
there is a standard
machine deviation
will dispense of 1than
more ounce17in these
measurements.
ounces of liquid If theany
into amount filled per bottle can be assumed to be normally distributed, find
one bottle.
the probability that the machine will dispense more than 17 ounces of liquid into any one
bottle.
Solution
Let X be amount of juice filled in bottle which is assumed to be normally distributed
Solution
with
Let Xmean 16 and of
be amount standard deviation
juice filled 1. Therefore
in bottle which is assumed to be normally distributed with
mean 16 and standard deviation 1. Therefore

Now probability that the machine will dispense more than 17 ounces (p(X>17)) is just same
Now probability that the machine will dispense more than 17 ounces (p(X>17)) is just
as
same as P(Z>1)=1-P(Z<1)=0.1587

4.5.3 The
4.5.3 Thestudent
student t-distribution
t-distribution
The following are the propertiesof
The following are the properties ofthe
thet-distribution:
t-distribution:
1. 1. ItItisisbell-shaped
bell-shapedlike
likenormal
normalcurve.
curve.
2. 2. ItItisissimilar
similartotostandard
standardnormal
normalininthat
thatititalso
alsohas
hasmean
meanof
of0,
0,but
butitithas
has standard
standard
deviation
deviationmore
morethan
than1.1.
3. 3. Hence t-curve isiswider
Hence the t-curve widerthan
thanthethe standard
standard normal
normal curve,
curve, but larger
but for for larger
sample
sample sizes, n ≥ 30 the t-curve is closer to the standard
sizes, n ≥ 30 the t-curve is closer to the standard normal curve. normal curve.

Fig 4.15: t distribution and standard normal distribution


4. 4. There
There is
is actually
actually aafamily
familyof of
t-distributions, eacheach
t-distributions, one one
characterized by a parameter
characterized by a
called the called
parameter degreestheofdegrees
freedom.
of freedom.
85
4.5.4
4.5.4The
TheChi-square
Chi-square(ᵪ^2)
( random
random variable
variable and
and distribution
distribution
4.5.4 The Chi-square
Properties
Properties ( random
of chi-square
of chi-square variable and
distribution
distribution distribution
curvescurves
are: are:
Properties of chi-square distribution curves are:

Fig 4.16: Chi-square distribution


Fig 4.16: Chi-square distribution
1. TheThetotal
totalarea
areaunder
undera aᵪ^2 –curve
–curve isis equal
equal to 1.
1. The total area under a –curve is equal to 1. to 1.
1. 2. AAᵪ 2 –curve –curve starts
starts at 0 onon the
thehorizontal
horizontalaxisaxisand
andextends
extendsindefinitely
indefinitelytotothethe right,
2. A –curve starts at 0 on the horizontal axis and extends indefinitely to the right,
approaching,
right, approaching, but never touching,
but never the horizontal
touching, axis as
the horizontal axis it does so, that
as it does so,isthat
chi-square
is
approaching,
chi-square butvalues
never are
touching,
always the horizontal axis as it does so, that is chi-square
positive.
values are always positive.
2. values are always ispositive.
3. Aᵪ 2 –curve
A –curve isright
rightskewed,
skewed,smaller
smallervalues
valuesarearemore
morelikely
likelytotooccuroccurthanthanlarger
larger
3. A values.–curve is right skewed, smaller values are more likely to occur than larger
values.
3. values.As the number of degrees of freedom becomes larger, ᵪ 2 –curves look
4. As the number of degrees of freedom becomes larger, –curves look increasingly
4. As the increasingly
number of like normal
degrees curves. becomes larger, –curves look increasingly
of freedom
like normal curves.
like normal curves.
4.5.5The
4.5.5 TheFFrandomrandomvariable
variableand anddistribution
distribution
4.5.5 The
Xhas F random
hasthetheF F variable
probability and distribution
distribution if the probability density curve is also skewed to the right.
X hasXthe probability
F probability distribution
distribution if theifprobability
the probability density
density curve
curve is alsois also skewed
skewed to the
to the right.
Just Just
right. like thelikeChi-square
the Chi-squaredistribution
distributionsmaller values
smaller have have
values higher probability
higher than larger
probability than values.
Just like the Chi-square distribution smaller values have higher probability than larger values.
The reason
larger values.why The the F-distribution
reason is similar to the
why the F-distribution chi-square
is similar to theis chi-square
that the F random
is that thevariable is
The reason why the F-distribution is similar to the chi-square is that the F random variable is
F based
random on the chi-square random variables in that it is the ratio of the two chi-squareofrandom
variable is based on the chi-square random variables in that it is the ratio
basedtheontwo
the chi-square
chi-square random
random variables
variables.in that it is the ratio of the two chi-square random
variables.
variables.

.
.
Fig 4.17: F distribution
Fig 4.17: F distribution

86
Activ Activity 4.4
Activ Activity
Activity 4.4 4.4
1. Let Z be standard normal random variable, find z such that p(Z > z)
=0.012
1. Let Z be standard normal random variable, find z such that p(Z > z)
find z1.such 2.p(Z
Let that
Z be How is the
> z)
standard chi-square
normal random
random variable
variable, related
z suchto
find random F-random
that z)variable?
p(Z >related
2. How is the chi-square variable to F-random variable?
ated to 2.
F-random
How isvariable?
the chi-square random variable related to F-random variable?
4.8 Sampling Distributions
4.8 Sampling Distributions
4.8.1 Distribution
4.8 Sampling of
Distributions sample
4.8.1 mean (Xof̅) sample mean ( ̅
Distribution
Suppose we take
4.8.1 Distribution many
of sample meanwe( ̅take
repeated
Suppose samples
many(of the same
repeated size) from
samples thesame
(of the samesize)
population,
from the same p
ameSuppose
size) and
from
we the
each same
taketime,
manypopulation,
repeated
calculate theand
each time, calculate the sample mean, so that we have a set ofand
samples
sample (of the
mean, same
so size)
that we from
have the
a same
set of population,
sample means. For means
sample
aveeach
a settime,
of sample
example, means.
takethe
calculate For example,
500sample
samples ofsamples
mean,
take 500 size
so 20
that
ofaswefollows:
sizehave a set
20 as of sample means. For example,
follows:
take 500 samples of size 20 as 1 follows:1 1
x1 ,
x2 , , x20  x1
x11 , x12 , , x120  x
2 2 2
x12 , x22 , 1, x20 2
 x2
x1 , x2 , , x20  x2
   
   500 500 500
500 500 500
x1 , x 2 ,, x20  x500
x1 , x2 ,, x20 We may x
be interested
500 in the histogram of these sample means to see the dist
We may be interested in the histogram of these sample means to see the distribution
mpleWemeans
may to interested
of be
thesee the distribution
sample in the
means. sample ofmeans.
What the
histogram
wouldofWhat
these sample
would
that set means
that
of sample to sample
see the
setmeans,X
of distribution
means,
̅ values of the
̅ values
look like look
in a like in a
̅
ans,sample
values look
means. like
What in
histogram? Consider a histogram?
would that set of
the sampling sample means,
distribution ̅ values look
of the sample like in
mean X a histogram?
̅ when we take
̅ when
Consider the sampling distribution of the sample mean we take sam
mean ̅ when we
samples take
of samples
size n
Consider the sampling distributionof
from size
a n
normal population
of thepopulation
sample meanwith̅ mean µ
when weand and
take standard deviation
of size n σ.The sampl
samplesdeviation
from a normal with mean standard
ard from aThe
deviation sampling
normal The distribution
sampling
population with
of
of X ̅ has also
distribution
̅ mean
has also and
normal
standard
normal
distributionThe
deviation
distribution
with centre(mean),
sampling
with centre(mean),
µ and
µdistribution
and standard deviati
n),of
µ and standard
̅ hasstandard deviation
deviation
also normal .
distribution with centre(mean), µ and standard deviation .
√ √

Fig 4.18: Distribution of sample mean if sample from norm


mean if sample fromFig normal
4.18: The
Distribution of sample mean if sample
standard error of the sample mean,from
̅ is normal
. Note that as the sample size

NoteThe
thatstandard
as the sample size gets larger, the ̅
error of the sample mean, is . Note that as the sample size gets larger, the
spread of the sampling √ distribution gets smaller. When the sample size is lar
When the sample size is large, the sample 87
spread of the sampling distribution
mean varies getsless
smaller.
acrossWhen the Now
samples. sample
if size
̅ isisnormal
large, with
the sample
mean µ and stand
malmean
with mean µ and standard deviation is ̅
varies less across samples. Now if is normal with mean µ and standard deviation is
, then we can also standardize the sample mean to standard normal as
Fig 4.18: Distribution of sample mean if sample from normal
The standard error of the sample mean, ̅ is . Note that as the sample size gets larger, the

spread of the sampling distribution gets smaller. When the sample size is large, the sample
mean varies less across samples. Now if ̅ is normal with mean µ and standard deviation is
̅
, then we can also standardize the sample mean to standard normal as .
√ √
Now if we take a random sample (of size n) from any population not necessarily normal e.g
Now if we take a random sample (of size n) from any population not necessarily
skewede.g
normal with mean with
skewed andmean
standard
μ anddeviation
standard , the sampling
70 deviation σ, thedistribution of ̅ is of
sampling distribution
approximately
X normal,
̅ is approximately if the sample
normal, size issize
if the sample large. This isThis
is large. called the central
is called limit limit
the central
theorem(CLT). How
theorem(CLT). How large does nn have
large does have to
to be? The rule of thumb is that when n ≥ 30, we can
be? The
apply
we canthe CLT.the
apply ForCLT.
example, the sets of
For example, thethesets
sample means
of the sample below
meanswillbelow
have will have
histograms/distributions
histograms/distributions thatthat has normal shape even though the data x_iisisnot notnormal
normalbecause
because the sample size is more than
the sample size is more than or equal to 30 or equal to 30
Set 1 of sample means, n=30
x11 , x12 ,  , x30
1
 x1
x12 , x22 ,  , x30
2
 x2
   
x1500, x2500,  , x30
500
 x500
Set 2 of sample means, n >30
x11 , x12 , , x140  x1
x12 , x22 , , x40
2
 x2
   
x1500, x2500,, x40
500
 x500
As n gets larger, the closer the sampling distribution looks like a normal distribution.
As n gets larger, the closer the sampling distribution looks like a normal distribution.
Why is
Why is the
the CLT
CLTimportant?
important? Because whenX̅̅ isis(approximately)
Becausewhen (approximately)normally
normallydistributed,
distributed, we
cancan
we answer probability
answer questions
probability aboutabout
questions the sample mean,mean,
the sample by using
by the standard
using normal
the standard
distribution.
normal distribution.
Example
Example
Suppose we are studying the failure time (at high stress) of a certain engine part. The failure
Suppose
times have weaare studying
mean of 1.4 the failure
hours and atime (at high
standard stress) of 0.9
deviation a certain
hours.engine
If our part.
sample The
size is 40
failure times have a mean of 1.4 hours and a standard deviation of 0.9
engine parts, then what is the sampling distribution of the sample mean? What is thehours. If our
sample sizethat
probability is 40
theengine
sampleparts,
meanthen
willwhat is thethan
be greater sampling
1.5? distribution of the sample
mean? What is the probability that the sample mean will be greater than 1.5?
Solution
The distribution of sample mean is normal with mean 1.4 and standard deviation of .

Note, though we are not told about the population distribution from where sample is obtained,
but since the sample size is large that is , it suffices to say that ̅ ~N( by the CLT.
88
Now we need ̅ ( )

times have a mean of 1.4 hours and a standard deviation of 0.9 hours. If our sample size is 40
engine parts, then what is the sampling distribution of the sample mean? What is the
probability that the sample mean will be greater than 1.5?
Solution
Solution
The distribution of sample mean is normal with mean 1.4 and standard deviation of .

Note, though we are not told about the population distribution from where sample is obtained,
but since the sample size is large that is , it suffices to say that ̅ ~N( by the CLT.

Now we need ̅ ( )

( )

71 ̅
̅
4.8.2 The sampling distribution of

4.8.2 The sampling distribution of ̅
√ In
practice,̅the population standard deviation in is typ

In practice, the population standard deviation in is typically unknown, so estimate ̅
with ̅sample √ standard deviation
̅
s, to have . The quantit

with sample standard deviation s, to have . The quantity has a standard normal
distribution
√ if that√is if sample size is large enough. If
distribution if that is if sample size is large enough. If the sample ̅ size is small that is
̅
if , then the quantity has a t-distribution (“Stud

if , then the quantity has a t-distribution (“Student’s t”) with n – 1 degrees of
√ freedom (the parameter of the t-distribution). The t-distrib
freedom (the parameter of the t-distribution). The t-distribution
normal (symmetric, centeredresembles the itstandard
at zero) but is more spread
normal (symmetric, centered at zero) but freedom, the more spread out the t-distribution is of
it is more spread out. The fewer the degrees and as the
freedom, the more spread out the t-distribution is and
gets closer asstandard
to the the d.f. normal.
increase, the t-distribution
gets closer to the standard normal.
Activ Activity 4.5
Activ Activity 4.5
Activity 4.5
1. Suppose extension workers’ salaries have a mean of $90,000 and a
standard deviation of $30,0001. Suppose
(highlyextension
skewed, workers’ salaries
not normal). have aa mean of $9
Given
1. Suppose sample
extension workers’ salaries have
of extension workers,ofcana mean
$30,000 of
we find $90,000
(highly and a standard
skewed, notthe
the probability deviation
normal).
sampleGiven a sam
of $30,000 (highly skewed, not normal).
mean is less than $100,000 ifwe Given
n =find a sample
n = 30? the sample mean iscan
of
5? theIfprobabilityextension workers, less than $10
we find the probability the sample mean is less than $100,000 if n = 5? If n = 30?
2. A sample of 45 maize grains2. A sample
was takenof 45tomaize grains was taken
test hypothesis about to test hypo
2. A sample of 45 maize grains was taken
population maize mean weightmean to test hypothesis
weight
of 8.7g. Theofsample about
8.7g. The population
sample
standard maizedeviation
standard
deviation
mean weight of 8.7g.
was 2g. WhatThe sample
would standard
be the deviation
distribution
distribution of was ?2g. What would be the
̅
̅ √
distribution of ?

Unit Activity
Unit Activity A farmer cooperative knows that mean of earnings per year
A farmer cooperative knows that mean knowof earnings per year that
89the probability is $200,000
in a givenand needsthe
quarter to earnings of th
know the probability that in a given quarter
than the earnings
$100,000, of <the100,000)
P(X cooperative
with will be less standard d
a cooperative
than $100,000, P(X < 100,000) with a cooperative standard
is the probability? deviation of $80,000. What
Unit Activity
A farmer cooperative knows that mean of earnings per year is $200,000
and needs to know the probability that in a given quarter the earnings of the
cooperative will be less than $100,000, P(X < 100,000) with a cooperative
standard deviation of $80,000. What is the probability?

4.9 Reflection
What are other examples of normal random variables in real life situation?

Unit Summary

In this unit you have learnt about random variables and their distributions. For
example you have learnt about Binomial, Poisson random variables for discrete
random variables, and normal, t, chi-square and f random variable for continuous
random variables. The unit has also covered the sampling distribution for sample
means, and the related quantities. What has been covered in this unit will enable
you to do inferential statistics like hypothesis testing and interval estimation in the
subsequent unit. TheThe
following areare
following properties
properties of the normal
of the normal density
densitycurve:
curve:
1. 1.It isItsymmetric about its mean, μ, area(probability)
is symmetric about its mean, μ, area(probability) to
same sameas area to the
as area right
to the of the
right mean.
of the mean.
End of unit test 2. 2.TheThe mean = median = mode.
mean = median = mode.
3. 3.Total area
Total area under thethe
under curve(total
curve(total probability)
probability)= 1= 1
4. 4.Area Areaunder
underthethe
curve to the
curve right
to the of μofequals
right μ equalsthethe
areaar
1. Suppose there are three balls in a box.
of μ, One
whichof the balls
equals
of μ, which equals ½ ½ is number 1, another is
the number 2, and the third is the
5. number
As 3. You select
x increases twobound
without balls at random
(gets largerand
andandlarger),
5. As x increases without bound (gets larger large
without replacement from the box and note the two numbers observed. Let X be
butbut never reaches the horizontal axis (like
never reaches the horizontal axis (like approacapproachin
the sum of the two balls selected.
decreases without bound (gets larger andand
larger in the
a. Draw the table of distribution forwithout
decreases X. bound (gets larger larger in
b. graph
What is the probability graphapproaches,
that the sum is but
approaches, never
but
at least 4 reaches,
never reaches,thethe
horizontal
horizonta a
c. What is the mean of X?
2. Suppose you take a4.5.2
sample Standard
4.5.2 Standard
of Normal
Normal
25 high-school Distribution
Distribution
students, and measure their IQ.
Assuming that IQ isAnormally
normal
A normal distribution
distribution
distributed  ==0100
with
with =and  =1=is15,
0 and 1known
is what
knownas
is athe
asstandard n
a standard
probability that yourletter Z isZused
sample’s
letter is to indicate
IQused
will thegreater?
beindicate
to 105 or standard
the normal
standard normalvariable. TheThe
variable. density
dens
normal
normalis defined as: as:
is defined where
where . The
.T
√ √
variance of standard
variance normal
of standard Z are
normal Z are andand respec
resp

90
2.2. Suppose
Suppose youyou take
take a sample
a sample of high-school
of 25 25 high-school students,
students, and measure
and measure their IQ.
their IQ.
Assuming
Assuming that IQIQ
that is normally distributed
is normally withwith
distributed  = 100
 = and = 15,
100and  =what
15, is the is the
what
probability that your sample’s IQ will be 105 or greater?
probability that your sample’s IQ will be 105 or greater?
Answers to activities in Unit 4

Answers
Answersto activities in Unit 4
Answer to to activities
Activity 4.1 in Unit 4
Answer
AnswertotoActivity
Activity4.14.1
XX 00 1 1 2 2
P(X=x) 0.25 0.5 0.25
P(X=x) 0.25 0.5 0.25

Answers to Activity 4.2


Answers to Activity 4.2
1.
1.
a) It is binomial since in each trial there are two possible outcomes, head/tail, and
a) It is binomial since in each trial there are two possible outcomes, head/tail, and
probability for each outcome is constant, which is ½.
b) Thisprobability for each
is not binomial outcomesince
experiment is constant,
in each which is ½.
trial there are more than one
b)outcomes,
This is not binomial experiment since in each trial there are more than one
1,2,3,4,5,6.
2. outcomes, 1,2,3,4,5,6.
2. a) Binomial
b)a) Poisson
Binomial
Answer tob) Activity
Poisson4.3
Answer
1. WetoneedActivity 4.3 Now ∫
1. We need Now ∫

2. , expanding
2. , entering expectation
, expandingoperator
, entering expectation operator

Answer to Activity 4.4


1. The z such that p(Z > z)=0.012 is the same z such that p(Z < z)=1-0.012=0.988,
z=2.2572 73
2. The F is the ratio of two chi-squares divided by their degrees of freedoms
73
Answer to Activity 4.5
1. n=5, we may not find probability because we don’t know distribution of sample
mean n=30, we may find probability using standard normal because distribution
of sample mean is normal since n≥30
2. standard normal since sample size is large enough ie greater than 30

91
Answer to Activity 4.5
1. n=5, we may not find probability because we don’t know distribution of sample mean
n=30, we may find probability using standard normal because distribution of sample
Answers to mean
Unitis test
normal since
1. 2.(a)standard
The sample space
normal consists
since sampleof size
six equally
is largelikely
enoughoutcome {(1,2),
ie greater than(1,3),
30 (2,1),
Answers(2,3),
to Unit test
(3,1), (3,2)}.
1. (a) The sample space consists of six equally likely outcome {(1,2), (1,3), (2,1), (2,3),
X={3, 4, 3, 5, (3,1),
4, 5}. (3,2)}.
X={3, 4, 3, 5, 4, 5}.
Table of distribution of X
Table of distribution of X
X 3 4 5
P(X) 1/3 1/3 1/3
(b) P( the sum is at least 4)= P(4)+P(5)= 2/3
(c) Mean of X = 3×(1/3) + 4×(1/3) + 5×(1/3) = 4
̅
2.


̅

74

92
Unit 5:
Inferential Statistics
5.0 Introduction

The aim of this unit is to introduce you to hypothesis testing and interval estimation.
The unit will cover the definition of a hypothesis, testing hypothesis about the mean
and proportion using the z-test, t-test, and F-test. In the unit you will also learn about
testing for association in cross tables using the chi-square. The unit offers a background
in testing of research hypothesis.

5.1 Unit Objectives

By the end of this unit you should be able to


a) define a hypothesis
b) test the hypothesis about the population mean
c) test the hypothesis about the population proportion
d) use F-test to compare population means
e) test for association in contingency tables using the Pearson chi-square test
f) construct confidence interval of population mean or proportion

Key terms

• Hypothesis
• Significance level
• Type I error
• Type II error

5.2 Hypothesis Testing

Under hypothesis testing we will use the sample to make inferences about the claims
in the population. A statistical hypothesis is a conjecture/claim about a population
parameter which may or may not be true. Statistical hypothesis testing is a
decision-making process for evaluating claims about a population using the sample.
The following are examples of hypothesis:

(a) The mean temperature in a coastal town is less than 78oF.


(b) The mean grade point average of graduating seniors at a university
is at least 2.3.
(c) The mean income for Bunda graduates is MK45000 per month.
(d) The proportion of voters in Malawi who vote for PP is 0.53.
93
(e) The proportion of full-time college students younger than 26 years
of age is at most 0.90
(f) The proportion of of high school graduates in 1990 seeking full-
time employment was more than 0.30.

There are two types of statistical hypotheses: the null hypothesis and the alternative
hypothesis. The null hypothesis, symbolized by H0, is a statistical hypothesis that
states that there is no difference between a parameter and a specific value, or that there
is no difference between two parameters. The null hypothesis contains an equal sign =.
The alternative hypothesis, symbolized by H1, is a statistical hypothesis that states the
existence of a difference between a parameter and a specific value, or states that there
is a difference between two parameters. The alternative hypothesis usually contains the
symbol >, <, or ≠.

The following is a classical approach to hypothesis testing:


Step 1: Identify H0 and H1
H0 will contain =
H1 will contain > , < , or ≠
Step 2: Select the test statistic and determine its sampling distribution.
data.
Step 3: Use a given level of significance to determine the critical/rejection
region(s).
Step 4: Calculate test statistic using sample data.
Step 5: Make your decision. If the test statistic falls in the critical region,
reject H0. If the test statistic does not fall in the critical region,
do not reject H0. Interpret your decision in terms of the claim.

Activity 5.1

Which of the examples of hypothesis given above are null and which are
alternative?

5.3 Hypothesis test about population mean

5.3.1 Hypothesis about one population mean


The following is a possible hypothesis formulation about the mean:
Ho: μ=μ_0, to mean the population mean μ is equal to some specific mean μ_0
versus
H_1:μ≠μ_0, to mean the population mean μ is not some specific mean μ_0 i.e is either
greater or less than the specified mean μ_0
The test of such hypotheses will be two tailed because based on the alternative
hypothesis either μ>μ_0 or μ<μ_0
The following is a real life example of two tailed hypothesis formulation:
94
Ho: Average barley tobacco yield in 2014 was 50 000 000 kg
H1: Average barley tobacco yield in 2014 was not 50 000 000 kg
The other possible hypothesis formulation about the mean is:
Ho: μ≤μ_0 versus
H1:μ>μ_0
The test of such hypotheses will be right tailed because based on the alternative
hypothesis μ>μ_0
The real life example of one right tailed hypothesis formulation is:
Ho: Average barley tobacco yield in 2014 was 50 000 000 kg or less than
H1: Average barley tobacco yield in 2014 was more than 50 000 000 kg
The following is the other possible hypothesis formulation about the mean:
Ho: μ≥μ_0 versus
H1:μ<μ_0
This hypothesis formulation is left tailed because based on alternative hypothesis, Ho
will be rejected when sample statistic will be to the left.

The following is a real life example of one left tailed hypothesis formulation:
Ho: Average barley tobacco yield in 2014 was 50 000 000 kg or more than
H1: Average barley tobacco yield in 2014 was less than 50 000 000 kg
Now appropriate sample statistic to use to test the above hypotheses is the sample mean ̅ .
If sample used to test hypotheses is from normal ( , then sample mean ̅ is
̅
normal( ) so that we can use the z-test defined as . Note the Z-test is after

x
standardizing the sample mean. If  is unknown but n  30, then z-test z  is still
s/ n
used since by central limit theorem, ̅ is normally distributed so that ̅ can still be
x x
standardized to have z  . If  is unknown and n < 30, then t  is used. The
s/ n s/ n
following is the rejection criteria. For the right tailed Z test, H0 is rejected when .
Figure below shows the rejection region for the right tailed Z test.

Fig 5.1: Right tailed z test rejection region


For the left tailed z test , H0 is rejected when . Figure below shows the rejection
region when you have left tailed z test:

95
Fig 5.5: Left tailed t test rejection region
For the two tailed t-test H0 is rejected when |t| i.e when or

78

Fig 5.6: Two tailed t test rejection regions


Example
Full time PhD students receive an average salary of $12,837 according to the U.S.
Department of Education. The dean of graduate studies at a large state university feels that
PhD students earn more than this. He surveys 44 randomly selected students and finds their
average salary is $14,445 with a standard deviation of$1,500. With  = 0.05, is the dean
correct?
Solution
Stating null and alternative hypothesis:

Thus we have one right tailed test. Now appropriate statistic to test claim about population
mean ( ) is sample mean ( ̅ . Now sampling distribution of the sample mean ( ̅ must be
̅
normal( since n , by CLT. Thus our test statistic is the Z , after standardizing

̅ . Rejection criteria: we reject null when Z ≥ i.e when , using the
̅
Z-tables. Now Z . Thus since Z is less than the critical value we fail to

reject the null hypothesis, i.e the mean salary is likely to be 12837.

Example
A nutritionist believes that a 12 ounce box of breakfast cereal should contain an average of
1.2 ounces of bran. The nutritionist measures a random sample of sixty boxes of popular
cereal for bran content. Suppose sample mean is, x  1.170 and standard deviation, s =0
0.111. Do the data indicate that the mean bran content of all boxes of this brand of cereal
differs from 1.2 ounces? Use α=.05.
Solution
Stating null and alternative hypothesis:
96

Thus we have a two tailed test. Now since , we will use z test statistic defined
cereal for bran content. Suppose sample mean is, x  1.170 and standard deviation, s =0
0.111. Do the data indicate that the mean bran content of all boxes of this brand of cereal
differs from 1.2 ounces? Use α=.05.
Solution
Solution
Stating nulland
Stating null andalternative
alternative hypothesis:
hypothesis:

Thus we have a two tailed test. Now since , we will use z test statistic defined
̅
as . Rejection criteria: we reject if | | i.e when

or when . Calculating the test statistic we have:


79̅

Now , the critical value, we therefore fail to reject the null


hypothesis, that is, the nutritionist belief is likely to be true.

Example
The average amount of rainfall during the summer months for the northeast part of the
United States is 11.52 inches. A researcher selects a random sample of 10 cities in the
northeast and finds that the average amount of rainfall for 1995 was 7.42 inches. The
standard deviation of the sample is 1.3 inches. At  = 0.05, can it be concluded that for 1995
the mean rainfall was below 11.52 inches?

Solution
Stating null and alternative hypothesis we have:

This means we will have one left tailed test. Appropriate statistic to test about the population
̅
mean is sample mean ̅ . Now since n=10, i.e distributed as t with degrees of

freedom n-1. For the rejection criteria: we reject the null hypothesis when
, using t-tables. Now calculating the t test statistic using sample data we
̅
have: . Thus since , we reject ,


that is, rainfall is likely to be below inches.

5.3.2 Hypothesis about difference between two population means


The following is one possible hypothesis formulation when testing about difference between
two population means:
or , null hypothesis
or , alternative hypothesis
Of course this is two tailed test, that is, based on alternative , as either or .
The other possible hypothesis formulation about the difference between two population
97
means is as follows:
, or , null hypothesis versus
5.3.2 Hypothesis about difference between two population means
The following is one possible hypothesis formulation when testing about difference between
two population means:
or , null hypothesis
or , alternative hypothesis
Of course this is two tailed test, that is, based on alternative , as either or .
The other possible hypothesis formulation about the difference between two population
means is as follows:
, or , null hypothesis versus
, or , alternative hypothesis.
Test of these hypotheses would be one right tailed i.e rejection region is the right tail of the
probability distribution of the test statistic. The following is the other possible hypothesis
formulation about difference between two means:
, , null hypothesis versus

, as alternative hypothesis.
80
The test of these hypotheses would be left tailed because rejection region would be the left
tail of the probability distribution of the test statistic. Real life examples of hypothesis
formulations about difference between two population means are as follows:
Example 1

Example 2

Example 3

In all of the above hypothesis formulations test statistic is the difference between two sample
means ̅ ̅ , that is, one may collect two samples of size respectively and have
̅ ̅ and then see if ̅ ̅ or ̅ ̅ . If one has ̅ ̅ , then he/she
may have an idea that is true otherwise he/she may say that is true. Now note that if we
sample from normal populations, that is, normal ( , and normal( ,then ̅ ̅

will also be normal( . So that we can use the z test after standardasing
̅ ̅ defined as
̅ ̅

Note that if , then ̅ ̅ is still normal by CLT, even though we do not


̅ ̅
sample from normal so that is still standard normal even though

are estimated by the sample variances. If and and that are estimated by
sample variances and respectively, then we use the t-distribution, that is,
̅ ̅
with 1 or degrees of freedom. Normally to use this t-test we

98

use the smaller degrees of freedom of the two 1 or . If you assume that population

are estimated by the sample variances. If and and that are estimated by
sample variances and respectively, then we use the t-distribution, that is,
̅ ̅
with 1 or degrees of freedom. Normally to use this t-test we

use the smaller degrees of freedom of the two 1 or . If you assume that population
variances are equal, that is, , then you have test statistic
̅ ̅ ̅ ̅ ̅ ̅ ̅ ̅
Z
√ √ ( ) √

since under the null hypothesis , that is, . Now if the common variance
is not known and that and , then we may use sample variance to estimate
as follows:

81
and √ . The is called the pooled sample variance because it is the
and . The variances.
weighted√average of two sample is called Now
the pooled sample
the z-test variance
statistic thenbecause
becomesit the
is the
t-test
statistic defined as
weighted average of two sample variances. Now the z-test statistic then becomes the t-test
statistic̅ defined
̅
as
with degrees of freedom.
√̅ ̅
with degrees of freedom.
Example

Example
Two types of fertilizers, UREA and CAN were applied to two maize plots 1 and 2
Example
Two types of fertilizers, UREA and CAN were applied to two maize plots 1 and 2 re-
respectively.
Two Farmers think
types of fertilizers, UREA thatand
there
CANis no difference
were applied toin two
maize yieldplots
maize between
1 andtwo
2 fertilizers.
spectively. Farmers think that there is no difference in maize yield between two fertil-
A researcher takes
respectively. Farmerssample
thinkofthat
40 there
maizeisgrains in plot 1inand
no difference 32 maize
maize grains intwo
yield between plotfertilizers.
2. The
izers. A researcher takes sample of 40 maize grains in plot 1 and 32 maize grains in plot
average
A weight
researcher of sample
takes maize grains is 10kg
of 40 maize in plot
grains in 1 and1 7kg
plot and 32in plot 2. grains
maize Standard deviation
in plot 2. The for
2. The average weight of maize grains is 10kg in plot 1 and 7kg in plot 2. Standard
average weight of maize grains is 10kg in plot 1 and 7kg in plot 2. Standard deviation
weights for plot1 is 2kg and for plot 2 is 4kg. Test whether there is a difference in maize for yield
deviation for weights for plot1 is 2kg and for plot 2 is 4kg. Test whether there is a dif-
weights
for UREA forand
plot1 is 2kg and for plot 2 is 4kg. Test whether there is a difference in maize yield
CAN.
ference in maize yield for UREA and CAN.
for UREA and CAN.
Solution
Solution
The two hypotheses are:
Solution
The two hypotheses are:
The two hypotheses are:

Since then we use


Since then we use ̅ ̅
̅ ̅


Rejection: we reject when |z| i.e when
Rejection: we reject when |z| ̅ ̅ ̅i.e ̅when
or when . Now ̅ ̅ ̅ ̅ .
or when . Now √ √
√ .

√ √
Since we reject the null hypothesis, that is, there is difference between yield under
Since we reject the null hypothesis, that is, there is difference between yield under
Urea and CAN.
Urea and CAN.
Example
Example 99
A farmer thinks that local chickens lay eggs with larger weight than hybrid. She collects 10
A farmer thinks that local chickens lay eggs with larger weight than hybrid. She collects 10
eggs for local and 12 eggs for hybrid. The mean weight for local is found to be 5kg and that
eggs for local and 12 eggs for hybrid. The mean weight for local is found to be 5kg and that
Urea and CAN.

Example
Example
A farmer thinks that local chickens lay eggs with larger weight than hybrid. She collects 10
A farmer thinks that local chickens lay eggs with larger weight than hybrid. She
eggs for local and 12 eggs for hybrid. The mean weight for local is found to be 5kg and that
collects 10 eggs for local and 12 eggs for hybrid. The mean weight for local is found to
for hybrid is found to be 12kg. The standard deviation for local weight is 2kg and that for
be 5kg and that for hybrid is found to be 12kg. The standard deviation for local weight
hybrid is 3kg. Test the claim for the farmer.
is 2kg and that for hybrid is 3kg. Test the claim for the farmer.
Solution
Data: ̅ ̅ , , where ̅ denote sample
mean weight for local chicken eggs, ̅ denote sample mean weight for hybrid chickens,
and denote sample standard deviation for weight of local chicken eggs and hybrid chicken
eggs respectively.
Hypothesis:

82
̅ ̅
This is one tailed test. Test statistic to use is ̅ ̅ with smaller degrees of freedom
This is one tailed test. Test statistic to use is √
with smaller degrees of freedom

of 1 and , since sample sizes are less than 30. We reject if
of 1 and , since sample sizes are less than 30. We reject if
where df is smaller of 1 and , that is, when . Now
where df is smaller of 1 and ̅ ,̅ that is, when . Now
̅ ̅

√ √

Thus since we fail to reject null hypothesis, that is, based on the available data
Thus since we fail to reject null hypothesis, that is, based on the available data
, that is, eggs of local chickens are smaller or equal to those of hybrid chickens.
, that is, eggs of local chickens are smaller or equal to those of hybrid chickens.
Example
Example
Example
IsIs there
there difference in
in return
return book
booktimes
timesbetween
betweentwo
twouniversity
university students?
students?
Is there difference in return book times between two university students?
Book-Return times for two University bookstores
Book-Return
Book-Return times times
for (inforUniversity
two two University bookstores
bookstores
days)
(in days)
(in days)
LUANAR Mzuzu
LUANAR Mzuzu
2 3
2 3
4.3 6.5
4.3 6.5
8.5 5
8.5 5
3 7.5
3 7.5
2 8
2 8
4
4
3
3

Solution
Solution
Ho:
Ho:
where and denote mean 100
retun time for LUANAR and Mzuzu university
where and denote mean retun time for LUANAR and Mzuzu university
students respectively.
students respectively. ̅ ̅
4
3

Solution
Solution
Ho:
where and denote mean retun time for LUANAR and Mzuzu university
students respectively.
̅ ̅
This is two tailed test. Test statistic is with degrees of freedom

where √ . Note here we assume that two universities have equal return
̅ ̅
time variances, but you can also use with smallest degrees of freedom of

1 and , when you assume that university book return times variances are
different. Why do we use t-test instead of z-test? Because <30. Now,
x1 = Mean for LUANAR University
x 2 = Mean for Mzuzu University
s 21 = Variance for LUANAR University
s 2 2 = Variance for Mzuzu University 83
̅ ̅

√ √
̅ ̅

√ √

Critical value: that is, reject null when or when


. Now since , we fail to reject the null hypothesis, that is, there is no
difference between return book times between two universities.

Activ Activity
Activity5.3
5.3

1. A farmer wants to test the claim that famers in Malawi spend on average
1. A farmer5hrs at a to
wants farm. Heclaim
test the collects
thattimes
famersofin10 famers
Malawi spent
spend onataverage
a farm5hrs
andatfinds
a
farm. He collects times of 10 famers spent at a farm and finds that the times spent of
that the times spent have an average of 6hrs, and standard deviation
have an3hrs. Assuming
average of 6hrs,that
and times aredeviation
standard approximately
of 3hrs.normal,
Assuming what
thatwould be the
times are
appropriate
approximately testwhat
normal, statistic
wouldto be
be the
used in this test.
appropriate test statistic to be used in this
2.test. Suppose a production line operates with a mean filling weight of 16
2. Suppose ounces per container.
a production Since
line operates over-
with or under-filling
a mean filling weight of can16be dangerous,
ounces per a
quality control inspector samples 30 items to determine
container. Since over- or under-filling can be dangerous, a quality control inspectorwhether or not
samplesthe30filling
items to weight has whether
determine to be adjusted.
or not the The sample
filling weightrevealed
has to be aadjusted.
mean of
16.32revealed
The sample ounces.a meanFromofpast data,
16.32 the standard
ounces. From past deviation
data, theisstandard
knowndeviation
to be .8
ounces. Using a 0.10 level of significance, can it be concluded
is known to be .8 ounces. Using a 0.10 level of significance, can it be concluded that the
that
process
the process is of
is out outcontrol
of control (not equal
(not equal to 16 ounces)?
to 16 ounces)?

101
5.4 Testing hypothesis about population proportion
5.4.1 Hypothesis about single population proportion
is known to be .8 ounces. Using a 0.10 level of significance, can it be concluded that
the process is out of control (not equal to 16 ounces)?

5.4 Testing hypothesis about population proportion


5.4 Testing hypothesis about population proportion
5.4.1
5.4.1Hypothesis aboutsingle
Hypothesis about singlepopulation
population proportion
proportion
Appropriate sample statistic is the sample proportion ̂ . Now,consider a yes/no or
success/failure survey such that sample size is and suppose you have successes. Note
~Binomial( , n (1- )). Now if ̂ is the proportion of successes then its mean, E( ̂
and variance ̂ are: E( ̂ ( ) and ̂ ( )

. And standard error is defined as: se( ̂ √ . Now if


̂
the sampling distribution of ̂ is normal( by CLT, so that , after

standardizing ̂ That is, when testing hypothesis about population proportion , we will
assume large samples so as to use z-test statistic.

Example
A marketing company claims that it receives 4% responses from its mailing. To test this
claim, a random sample of 500 was surveyed with 25 responses. Test at α =0.05 significance
84
level.
Solution
H0 : p = 0.04
H1: p  0.04
This is a two-sided rejection region test since sample proportions that are either much smaller
than 0.04 or much larger than 0.04 would cause you to reject the null and support the
alternative. Rejection region: we reject null hypothesis when Z ≥ or Z ≤ - . That is
when Z ≥ or Z ≤ - , that is, when ever Z ≥ 1.96 or Z ≤ -1.96. Now the value of
̂
test statistic is

.
In conclusion since z , we fail to reject the null hypothesis, that is, the claim of
marketing company is likely to be true. Note we used the Z-test since by CLT.

5.4.2 Hypothesis about the difference between two population proportions


The following are possible hypotheses to be tested:
or versus
or
This is a two tailed test.
or versus
or
This is one right tailed test.
or versus
or 102
This is left tailed test.
The appropriate test statistic is the difference of sample proportions: ̂ ̂ . Note if we
or versus
or
This is one right tailed test.
or versus
or
This is left tailed test.
The appropriate test statistic is the difference of sample proportions: ̂ ̂ . Note if we
sample from normal populations or that then ̂ ̂
̂ ̂ ̂ ̂
. This means that , is standard
√ √

normal after standardised ̂ ̂ . Under assumption of equal population proportions, that is,
, we have
̂ ̂

√ ( )
̂ ̂
where , that is, estimated by the pooled sample proportion.
Example
Example
A farmer club in Mzuzu claims that the proportion of rotten ground nuts in their 50kg bag is
Asame
farmer clubforinMulli
as that
Example Mzuzu claims Athat
Brothers. the proportion
researcher of rotten
collects 100 ground
ground nuts fromnutsa in
bagtheir 50kg
of farmer
bag is same
clubAand
farmer as
finds that
that
club for Mulli
20% are
in Mzuzu Brothers.
rotten
claims thatand A
thecollectsresearcher
8580offrom
proportion collects
Mulli
rotten bag
ground 100
and
nuts ground
in finds nuts
12%is are a
that bag
their 50kg from
bag of
same
rotten. farmertheclub
as that
Test for and
Mulli
claim finds that
ofBrothers.
famers 20% are collects
A researcher
club. rotten 100
andground
collects
nuts80
fromfrom
a bagMulli bag and
of farmer
findsclub
thatand
12%findsare
thatrotten. Test
20% are theand
rotten claim of famers
collects club. bag and finds that 12% are
80 from Mulli
rotten. Test the claim of famers club.
Solution
Data ̂ ̂ .
Solution ̂ ̂ ̂ ̂
TestData
statistic
̂ is ̂ where . , since sample sizes are more than
√ ( )
̂ ̂ ̂ ̂
Test statistic is where , since sample sizes are more than
30. We reject null hypothesis
√ ( when
) |Z|> ,that is, when Z ≥ or Z ≤ - , that is, when
30. We reject null hypothesis when |Z|> ,that is, when Z ≥ or Z ≤ - ̂ , that̂ is, when
Z ≥ or Z ≤ - , that is, when Z ≥ 1.96 or Z≤ -1.96. Now and
̂ ̂
Z ≥ ̂ or
̂ Z≤- , that is, when Z ≥ 1.96 or Z≤ -1.96. Now and
. Since Z<1.96 we fail to reject the null
√ (̂ ̂ ) √ ( ) . Since Z<1.96 we fail to reject the null
√ ( ) √ ( )
hypothesis, that is, based on the available data there is no difference in proportions.
hypothesis, that is, based on the available data there is no difference in proportions.
ActivActivity 5.4
Activity 5.4
Activ Activity 5.4
Two pesticides are applied to two maize plots respectively with aim of
comparing
TwoTwo pesticideseffectiveness
pesticides are
are applied
applied toof two
to pesticides.
two The
plotsfollowing
maize plots
maize shows
respectively
respectively withwithproportion
aim aim of pest
of comparing
of comparing
died after
effectiveness
effectiveness the
of of treatment
pesticides. of
pesticides.The maize
The following plots
showsproportion
following shows proportion of pest
of pest dieddied
after after the treatment
the treatment of of
maize plots.
maize plots. Sample
Sample 11 Sample
Sample 2 2
nn1 1==368
368 n2 n=2 405= 405
x1x1==175
175 x2 x=2 182
= 182
x x1 175175 x 182
pˆ 2 pˆ 2 x2  182
pˆ 1pˆ1  1n  368 == 0.476
0.476
2n 405
= 0.449. Test whether
= 0.449. Test whether
n1 1 368 2 n2 405
Testiswhether
there significantthere is significant
difference difference
in effectiveness in effectiveness
of pesticides. of pesticides.
there is significant difference in effectiveness of pesticides.
5.5 Testing for equality of multiple populations means using F-test
5.5 Consider
Testingthe
fordata
equality
in tableofbelow:
103 means using F-test
multiple populations
Consider the data in table below:
Treatments/groups Observations
Treatments/groups Observations
1 2
n1 368 n2 405
there is significant difference in effectiveness of pesticides.

5.5
5.5 Testing
Testing for equalityofofmultiple
for equality multiple populations
populations means
means usingusing
F-testF-test
Consider the data in table below:
Consider the data in table below:
Treatments/groups Observations
1 …. …
2 …. …
. . . . .
. . . . .
. . . . .
k …. …

One
Onemay
may wish totest
wish to testasasto to whether
whether the treatments/groups
the treatments/groups areorequal
are equal not e.gortesting
not e.g
as testing
to
aswhether
to whether cropunder
crop yield yielddifferent
under different
fertilizersfertilizers
are equal orare equal
not. or equality
To test not. To oftest equality of
such
such treatments it is same as to test whether the treatment means are equal or not. The
following is the hypothesis formulation when testing for equality of treatments/groups:
treatments it is same as to test whether the treatment means are equal or not. The following is
86
the hypothesis formulation when testing for equality of treatments/groups:
versus
/atleast two treatment means are different.
The appropriate test statistic is the F-test statistic, defined as which is ,
that is, it has an distribution with degrees of freedom for numerator as and degrees
of freedom for denominator as
k

 n x  x
2
j j
j 1
MST  where is the sample size for group/treatment j, ̅ is sample mean
k 1
for group j for j=1,2,3,…,k, ̅ is the grand/overall mean of all the data regardless of group,
and k-1 is degrees of freedom for between groups/treatments which is number of groups (k)
minus 1.
k k nj

 n  1s 2j  x  xj 
2
j j ,i
j 1 j 1 i 1
MSE   where is group j sample variance, is an
N k N k
observation in group and column , and ̅ is group sample mean. The null hypothesis is
rejected when the F-test statistic is greater or equal to the critical F-value, i.e when

Note the F-test is always right tailed test, that is, the rejection region for the F-test is always
to the right, because the F-values are always positive.

104
Note the F-test is always right tailed test, that is, the rejection region for the F-test is always
to the right, because the F-values are always positive.

Fig 5.7: Rejection region for F test


Example
The following data are the number of products of 3 different production lines.
Number of Products
Line 1 210 215 205 180 175 190
Line 2 180 160 195 190 170 155
Line 3 145 170 165 160 155 175
Let and be the mean number of products of the 3 production lines. Test the
hypothesis against at   0.05 significance level.

Solution
, 87
̅ ̅ ̅ ̅
Then,
∑ ̅ ̅

∑ ∑ ∑ ̅


210  195.82    190  195.82  180  1752    155  1752  145  161.62    175  161.62
18  3
 216.91

The F critical value is Fk1, N  k  F20,15


.05
 3.68 , after looking up in F-tables. Now the F
MSB 1779
statistic is F    8.2  F2.15,0.05  3.68 . We therefore reject , that is, there is a
MSE 216
difference in the means of products of the three production lines. Note the F-test to test
equality of treatment means in table below

Treatments/groups Observations
1 105 …. …
2 …. …
difference in the means of products of the three production lines. Note the F-test to test
equality of treatment means in table below

Treatments/groups Observations
1 …. …
2 …. …
. . . . .
. . . . .
. . . . .
k …. …

is based on ANOVA, analysis of variance. In this case the variability in observed data is
is split into different sources i.e due to difference between groups, and due to error
(unobserved influences). This means total variation in the observed data is split/analysed as
follows:
Total variation = variation due to group + variation due to error.
Total variation is measured by total sum of squares (TSS), error variation is measured by
error sum of squares (SSE), and between group is measured by between group sum of squares
(BSS). Thus summary of analysis of variance in the data becomes as follows:
TSS=SSB+ SSE
Now the F-test compares the variability between groups (BSS) and variability due to

error(SSE) through the ratio, . Now if there is difference between

treatment/group means then F is too large, that is, variability in data due to groups/treatments
(numerator) is larger
rejected whenthan variability due
, the to error
critical (denominator).
value or when Now where
the nullishypothesis
the is
rejected when
significance level. The, following
the critical value
is the
88or when
analysis where
of variance summary table whenis the the F-
using
test to compare treatment/group means.
significance level. The following is the analysis of variance summary table when using the F-
Source of Sum of squares Degrees of Mean square F-value
test to compare treatment/group means.
variation freedom
Source ofBetween Sum ofSST squares Degrees k-1 of Mean square
SST/k-1=MST F-value
F=MST/MSE
variation group/treatment freedom
Between Error SST SSE k-1 N-k SST/k-1=MST
SSE/N-k=MSE F=MST/MSE
Total
group/treatment TSS N-1 TSS/N-1
Error SSE N-k SSE/N-k=MSE
Total TSS N-1 TSS/N-1
Activ Activity 5.5

Activity 5.5
Activ Activity 5.5 ANOVA table for four normal populations with the same variance  2 and
The following
means 1 ,  2 ,  3 ,  4 .
Source Sum of Degree of Mean Square F
Square Freedom
The following ANOVA table for four normal populations with the same variance  2 and
Between (1) (3) 237.4 (6)
means 1 ,  2 ,  3 ,  4 .
Within (2) (4) (5)
Source Total Sum of1909.22 Degree 22 of Mean Square F
(a) Complete the Square
above ANOVA Freedom
table.
(b)
Between Test H 0 :   
1 (1) 2   3   4 at (3)
 0.05 . 237.4 (6)
Within (2) (4) (5)
5.6
Total Testing for association
1909.22 in 22 106
contingency tables using the Pearson chi-square
Let X1 and X2 be categorical variables with and categories/levels respectively in the cross
(a) Complete the above ANOVA
table/contingency table. table.
(a) Complete the above ANOVA table.
(b) Test H 0 : 1   2   3   4 at   0.05 .

5.6 Testing for association in contingency tables using the Pearson chi-square
Let X1 and X2 be categorical variables with and categories/levels respectively in the cross
table/contingency table.
X2
Level 1 Level 2 Level 3 … … Level J
Level 1
Level 2
X1 .
.
.
Level I

We wish to test whether there is an association between X1 and X2. The following is the
hypothesis formulation.
versus

Or
are independent versus
are not independent.
One of the test statistics to use is the Pearson89
chi-square test statistic defined as
 (Oij  Eij ) 2 
 2    ~  (2I 1)( J 1) , that is, has chi-square distribution with
 E 
 ij 
degrees of freedom where is number of rows and is the number of
columns. The null hypothesis is rejected when the chi-square is so large, that is, when
 2   (2I 1)( J 1) , 

Fig 5.8: Rejection region for chi-square test


Note that a chi-square test is also a right tailed test since chi-square values are positive.
Example
Test whether there is an association between heart attack status and personality type
Personality type

Heart attack status 107Type A Type B

Heart Attack O=25 O=10


Fig 5.8: Rejection region for chi-square test
Note that a chi-square test is also a right tailed test since chi-square values are positive.
Example
Test whether there is an association between heart attack status and personality type
Personality type

Type A Type B
Heart attack status
Heart Attack O=25 O=10

No Heart Attack O=5 O=40

Solution
H0: Personality type & heart attack status are independent in the population versus
H1: Personality type & heart attack status are dependent in the population.
Before we can compute  2 we first need to find the expected frequencies in each of our
category cells. To calculate the E for a particular “cell” in the table we use the formula:
E = (cell’s column total)(cell’s row total) / n

Personality type
Type A Type B Row Total
Heart status
Heart attack O=25 O=10 35
No heart attach O=5 O=40 45
Column Total 30 50 80
E: type A and heart attack: (30)(35)/80 = 13.125
E: type A and no heart attack: (30)(45)/80
90 =16.875
E: type B and heart attack: (50)(35)/80 = 21.875
E: type B and no heart attack: (50)(45)/80 = 28.125
Let’s put this information in our table:
Personality type
Type A Type B Row Total
Heart status
Heart attack O=25 O=10 35

13.12 21.875
No heart attach O=5 O=40 45

16.875 28.125
Column Total 30 50 80
 (O  E )  2
Chi-square is obtained via:  2     . The degrees of freedom for this test are:
 E 
df = (number of rows –1)(number of columns –1). We have 2 rows and 2 columns, thus our
degrees of freedom are: (2-1)(2-1) =1. Now,

(25  13.125) 2 (10  21.875) 2 (5  16.875) 2 (40  28.125) 2


2      30.56
13.125 21.875 16.875 28.125

We reject Ho if  2  12,0.05  3.84 . Our obtained χ2 of 30.56 exceeds this value. We


therefore reject H0. Pearson chi-square test 108
assumes that all expected frequencies ≥ 2. The
observed frequencies can be < 2. Further more no 20% of expected frequencies must be < 5
2
 E 
df = (number of rows –1)(number of columns –1). We have 2 rows and 2 columns, thus our
degrees of freedom are: (2-1)(2-1) =1. Now,

(25  13.125) 2 (10  21.875) 2 (5  16.875) 2 (40  28.125) 2


2      30.56
13.125 21.875 16.875 28.125

We reject Ho if  2  12,0.05  3.84 . Our obtained χ2 of 30.56 exceeds this value. We


therefore reject H0. Pearson chi-square test assumes that all expected frequencies ≥ 2. The
observed frequencies can be < 2. Further more no 20% of expected frequencies must be < 5
If the expected frequencies in the cells are “too small,” the χ2 test may not be valid, in this
case you can use the Fisher exact test. You can read about Fisher exact test.

vActi Activity 5.6 Activity


Activity
vActi 5.6 5.6

The following data are the number of people who are in favor of, are not
The
The following datafollowing
in no
favor areof,the
data are the
number
and of number
people
haveproposal:
no comment
of
whopeople who
aresome
on
are in of,
in favor favor
proposal:areof,not
arein
notfavor
in favor
of,of,
andandhave
have
comment on some
no comment on some proposal:
Favor Not Favor No Comment
Male Favor 252 Not Favor145 No Comment 203
Male Female 252 148 145 105 203 147
Female 148 differ in their opinions
Test if female and male 91 105about the proposal. 147
Test if female andifmale
Test female differ
andinmale
their differ
opinionsin about the proposal.
their opinions about the proposal.
5.7 Point and interval estimation of population mean and proportion
Ainterval
point estimate is the valueofofpopulation
a single statisticmean
(e.g. the mean) while a confidence
5.75.7Point
Point
andand interval estimation
intervalestimation
is the value ofofanpopulation mean
interval or range
and
and proportion
of numbers
proportion
constructed around the point
AApoint
point estimate
estimate isisthe
thevalue
value of of a single
a single statistic
statistic (e.g.mean)
(e.g.attaches
the the mean)
while awhile a confidence
confidence
estimate. A confidence/ interval estimation an error to the estimate unlike point
interval
intervalisis the valueofofanCalculating
the estimation.
value aninterval
interval or or
therange range of
of numbers
confidence numbers
interval constructed
constructed
for the population around
mean:around
the point
when the point
the
estimate. A confidence/
estimate. A confidence/ interval
interval
population standard estimation
estimation
deviation attaches
attaches
is known, whichan an error
error the
is rarely to the to the
case,estimate
the estimate unlike
unlike)%point
point estimation.
estimation. Calculating
confidence
Calculating interval
the thepopulation
for
confidence confidence interval
intervalis:for for the mean:
the population population mean: when
when the
the population
population standard
standard x deviation
deviation √
≤ µis≤ known,
- z α/2is known, + zα/2 iswhich
x which

or ( x - is
rarely zthe

rarely
α/2 the
zα/2 case,
, x the
case, √
) the )%
100(1-α)%
confidence
confidenceinterval for
where for
interval population
x ispopulation μis:is:
the sample mean, µ is the population mean, z is the value of z depending

x upon
- z α/2the level
≤ µof≤ confidence
x + zα/2 desired,
or ( x1-α- zI nnms
α/2
the confidence level, and
, x zα/2 ) √
is the
√ √ √ √
standard error of the mean.
where x is the sample mean, µ is the population mean, z is the value of z depending
upon the levelExample
of confidence desired, 1-α I nnms the confidence level, and is the

Suppose that we wanted to calculate the 95% confidence level of the mean for the
standard error of the mean.
approval rating of President Joyce Banda in the population where x = 56% and σ = 12.1
and the sample size was 500. The value for z is obtained from the Z Table where 1-α =0
Example .95 and the value of z for 0.95/2 or 0.475 = 1.96. Then our 95% CI for the mean would be
Suppose that we wanted to calculate the5695% confidence
– (1.96× ) ≤ µ ≤level
56 + of the mean
(1.96× ) orfor the
√ √
approval rating of President Joyce Bandaµ = 56 in the or
± 1.06 population
it is within where = 56%
interval x(54.94, and σ = 12.1
57.06)
and the sampleWhen
size the
waspopulation
500. Thestandard
value for z is obtained
deviation is unknown, fromthe the Z Table
formula where
is below. 1-α
Note =0
that
.95 and the value of z for 0.95/2 or 0.475 = 1.96. Then our 95% CI for the mean would be
instead of the Z Table the t Table is used in the calculations.
x - tα/2, n-1 56 –≤(1.96×
µ ≤ x + tα/2,) n-1 or (+x (1.96×
≤ µ √≤ 56 - tα/2, n-1 , x + tα/2, n-1 ).
√ √ ) or √
√ √
Where x is sample mean, µ is the population mean, t is the value of t depending upon
µ = 56 ± 1.06 or it is within interval (54.94, 57.06)
the confidence level desired, α is the significance level, and n is the sample size.
When the population standard deviation is unknown, the formula is below. Note that
instead of the Z Table the t Table is used in 109
Example the calculations.
Suppose
x - tα/2, n-1 that we wanted to calculate
≤ µ ≤ x + tα/2, n-1 the
or (95%
x -confidence
tα/2, n-1 level
, xof+the mean for the
tα/2, ).
n-1
√ rating of President Joyce√Banda in the population
approval √ where x = 56%, s√= 65.3, and
.95 and the value of z for 0.95/2 or 0.475 = 1.96. Then our 95% CI for the mean would be
56 – (1.96× ) ≤ µ ≤ 56 + (1.96× ) or
√ √
µ = 56 ± 1.06 or it is within interval (54.94, 57.06)
When the population standard deviation is unknown, the formula is below. Note that
instead of the Z Table the t Table is used in the calculations.
x - tα/2, n-1 ≤ µ ≤ x + tα/2, n-1 or ( x - tα/2, n-1 , x + tα/2, n-1 ).
√ √ √ √
Where x is sample mean, µ is the population mean, t is the value of t depending upon
the confidence level desired, α is the significance level, and n is the sample size.

Example
Suppose that we wanted to calculate the 95% confidence level of the mean for the
approval rating of President Joyce Banda in the population where x = 56%, s = 65.3, and
the sample size was 1,025. Then our confidence interval for the mean would be
56 – (1.96× ) ≤ µ ≤ 56 + (1.96× ) or
√ √
µ = 56 ± 4.0 or (52, 60)
When calculating the confidence interval 92
for proportions, where there is a dichotomous
categorical outcome, the equations are somewhat different. It is assumed that the
population follows the binomial distribution and with multiple trials the normal
distribution would be approximated. The 100(1-α)% confidence interval for population
proportion is defined as:

⁄ √ ⁄ √ or ( ⁄

√ ⁄ √ )

Where p is the proportion of one group and (1-p) is the proportion of the other group.
Example
If in a random sample of 300 voters, 120 preferred candidate X, what is the 95%
confidence interval for candidate X? Our 95% CI for candidate X would be

0.4 – 1.96 ×√ ≤ p ≤ 0.4 + 1.96 ×√ or


p = 0.4 ± 0.055 or (0.345, 0.455)

5.8 Reflection
5.8 Reflection
Suppose you are testing for association between farmer farm size and access to
Suppose you are testing for association between farmer farm size and access to extension and
extension and you find that two out of six expected cell frequencies are less than five.
you find that two out of six expected cell frequencies are less than five. Would you proceed
Would you proceed with the Pearson chi-square test?
with the Pearson chi-square test?

Unit Summary
Unit Summary
InInthis
thisunit
unit you
you have
have learnt
learnt about
about hypothesis
hypothesis testing.testing. Youlearnt
You have haveonelearnt oneand
sample sample
two and
sample test of hypothesis using z and t test. You have also learnt testing of equality of equal-
two sample test of hypothesis using z and t test. You have also learnt testing of
ity of population
population means
means using the using
F-test. the F-test. Estimation
Estimation of populationofmean
population mean and
and proportion usingpropor-
tion usinginterval
confidence confidence interval
has also has also
been done. Whatbeen done.
has been What has
introduced in been introduced
this unit gives you in
an this
unit gives you an idea in carrying out test of research hypothesis where the
idea in carrying out test of research hypothesis where the research hypothesis is actually the research
hypothesishypothesis.
alternative is actually the alternative hypothesis.
110
End of unit test

1. A beer distributor claims that a new display, featuring a life-size picture of a


well-known athlete, will increase product sales in supermarkets by an average
of 50 cases in a week. For a random sample of 20 supermarkets, the average
sales increase was 41.3 cases and the sample standard deviation was 12.2 cases.
Test at the 5% level the null hypothesis that the population mean sales increase
is at least 50 cases, stating any assumption you make.

2. Of a sample of 361 owners of retail service and business firms that had gone
into bankruptcy, 105 reported having no professional assistance prior to
opening the business. Test the null hypothesis that at most 25% of all members
of this population had no professional assistance before opening the
business.
3. A drawing training procedure’s effect is to compared with that of a sham
(nonsensical) method and a placebo control (no training). A sample of 53
subjects were obtained, each drawing a picture prior to “training”. 19 subjects
received the training method of interest (Edwards’ method), 18 received the
sham treatment, and 16 received the placebo treatment (no training). Drawings
were obtained after the training, and difference scores obtained for each subject
(post training-pre training). Complete the following ANOVA table and test
whether the mean change scores differ among the three conditions (α=0.05).

4. With a sample size of 800, and a standard deviation of 4.3, what is the 90%
confidence interval if the sample mean is 4.5?

111
among the three conditions (α=0.05).
4. With a sample size of 800, and a standard deviation of 4.3, what is the 90%
confidence interval if the sample mean is 4.5?

Answers to activities in Unit 5


Answers to unit 5 activities
Answer to Activity 5.1
(b), (e), (c) and (d) are null hypothesis because they contain an = sign and the rest are
alternative hypothesis because they contain < or >.
Answer to Activity 5.2
T test because the sample size is not large enough that n < 30
Answer to Activity 5.3
Ho: μ = 16
H1: μ ≠ 16
x   16.32  16
Z   2.19
 0.8
n 30
Reject Ho if Z < - 1.65 or Z > 1.65
Reject Ho, that is, there is sufficient evidence to conclude the process is out of control at α =
0.10.
Answer to Activity 5.4
Sample 1 Sample 2
n1 = 368 n2 = 405
x1 = 175 x2 = 182
x1 175 x 182
pˆ 1   = 0.476 pˆ 2  2  = 0.449
n1 368 n2 405
x1  x2 175  182 357
p   = 0 .462
n1  n2 368  405 773
Ho: p1 - p2 = 0
H1: p1 - p2  0
For two-tail, /2 = 0.025 and z.025 = ±1.96
( pˆ  pˆ 2 )  ( p1  p 2 ) (0.476  0.449)  (0)
z 1  = 0.75
1 1   1 1 
p  q   (0.462)(0.538)  
 n1 n   368 405 
 
Since the observed z = 0.75 < zc = 1.96,94the decision is to fail to reject the null
hypothesis i.e there is significance difference in effectiveness of pesticides.
Answer to Activity 5.5
(a)
k  4 , (3)  k  1  4  1  3 , (4)  22  (3)  22  3  19 .
(1) (1)
237.4    (1)  237.4  3  712.22
(3) 3
(2)  1909.22  (1)  1909.22  712.22  1197
(2) 1197
(5)    63
(4) 19
237.4 237.4
(6)    3.767 112
(5) 63
Therefore, ANOVA table is
(2)  1909.22  (1)  1909.22  712.22  1197
(2) 1197
(5)    63
(4) 19
237.4 237.4
(6)    3.767
(5) 63
Therefore, ANOVA table is
Source Sum of Degree of Mean Square F
Square Freedom
Between 712.22 3 237.4 3.767
Within 1197 19 63
Total 1909.22 22
(b)
Now since F  3.767  F 3,19,0.05 3.13 , Ho is rejected that is, there is a difference in treatment
means.
Answer to Activity 5.6
Favor Not Favor No Comment Row Total
Male 600  400 600  250 600  350 600
 240  150  210
1000 1000 1000
Female 400  400 400  250 400  350 400
 160  100  140
1000 1000 1000
Column Total 400 250 350 1000

Thus,
2   (O ij  Eij ) 2 / Eij


252  240 2 
145  150 2 
203  210 2
240 150 210


148  160   105  100   147  1402  2.5
2 2

160 100 140


Since  2  2.5  5.99   22,0.05 , we fail to reject H0.

Answers to Unit test


1. Formulate Hypotheses:
95
H0:   50
H1:   50
Decision Rule:
Note that t(19, 0.05) = 1.729

Reject H0 if t < -1.729,

Calculating Test Statistic:


41.3  50
t 
12.2 / 20
 3.189
2. Formulate Hypotheses: 113

H0: p = 0.25
Calculating Test Statistic:
41.3  50
t 
12.2 / 20
 3.189
2. Formulate Hypotheses:
H0: p = 0.25
H1: p > 0.25
Calculating Test Statistic:
0.2909  0.25
z 
0.250.75
361
 1.79

3.

Source df SS MS F
Groups 2 291.8027 145.9 14.78
0
Error 50 493.3881 9.87
Total 52 785.1908

Rejection Region: Fobs  F.05,2,50 = 3.183.


Since calculated F > the critical value, we reject the null i.e there is a difference in
treatment means.

SX
4.4 X  Z 
n
4.3
4.5 ± 1.645 or 0.25 ie 4.5 – 0.25 = 4.25 to 4.5 + 0.25 = 4.75 . Thus
800
confidence interval (at 90%) is from 4.25 to 4.7

96

114
Module Test
1. In an agricultural experiment, a large uniform field was planted with a single variety
of wheat. The field was divided into many plots (each plot being 700 m2) and the
yield Module test
(in kg) of grain was measured for each plot. These plot yields followed
approximately a normal distribution with mean 88 kg and standard deviation 7 kg.
1. In an agricultural experiment, a large uniform field was planted with a single
Whatvariety
percentage of the plot
of wheat. The yields were
field was divided into many plots (each plot being 700
a)m2)
80 and
kg the yield (in kg) of grain was measured for each plot. These
or less? plot
(3 marks)
yields followed approximately a normal distribution with mean 88 kg and
b)standard
Between 75kg and
deviation 90kgWhat percentage of the plot yields were (6 marks)
7 kg.

a) 80 kg or less? (3 marks)
2. A study of effect of three feeding regimes (maize bran, broiler starter and fishmeal) on
b) Between 75kg and 90kg (6 marks)
growth of fish was conducted. Maize bran was fed to fish in pond 1, broiler starter was
2. A study of effect of three feeding regimes (maize bran, broiler starter and
fed to fish in pond 2, and fishmeal was fed to fish in pond 3. The following weights in
fishmeal) on growth of fish was conducted. Maize bran was fed to fish in pond
grams1,ofbroiler
fish were measured
starter after
was fed toafish
period of 6 months
in pond 2, andtofishmeal
comparewas
growth
fed of
to fish
fishunder
in
pond,
different 3. The
feeding following weights in grams of fish were measured after a period
regimes.
of 6 months to compare growth of fish under different feeding regimes.
Maize bran: 63 58 61 60 62 59
BroilerMaize bran:
starter: 71 64 63 6858 6561 67
60 6762 59
Broiler starter: 71 64 68 65 67 67
Fish meal :
Fish meal 49
: 52494752 5147 4851 48
SST =SST = 56.00,
56.00, SSE =SSE = 140.00
140.00
Fill in the ANOVA table. (10 marks)
Fill in the ANOVA table. (10 marks)

SOURCE DF SS MS F
OF VARIATION
Treatment/feeding _ _ _ _
regime
Error _ _ _
Total _ _

Does
Doesthethedata
dataprovide
providesufficient
sufficient evidence indicateaadifference
evidence to indicate differenceamong
amongthethe feeding
feeding
regimes testing at (2 marks)
regimes testing at   0.10. (2 marks)
3. A popular traditional variety of Malawi cotton yields an average of μ = 425 kgs/
acre. An international seed company has developed a new variety which they
believe will provide a higher yield. To test the new variety, a student at Bunda
College grows 6 plots of the new variety. The yields for the plots are shown
below (in kgs/acre):
97
115
provide a higher
provide yield.
a higher To test
yield. the new
To test variety,
the new a student
variety, at Bunda
a student College
at Bunda grows
College 6 6
grows
plotsplots
of the
of new variety.
the new TheThe
variety. yields for the
yields for plots are shown
the plots below
are shown (in kgs/acre):
below (in kgs/acre):
X=New Variety
X=New Variety
431431
460460
430430
425425
435435
450450

Is the
Is New Variety
the New preferred
Variety overover
preferred the Traditional Variety
the Traditional at at0.
Variety 050.?05 ? (6 marks)
(6 marks)
Is the New Variety preferred over the Traditional Variety at ? (6 marks)

4. 4.4.
A farmer wants
A farmer
A farmer to know
wants
wants to ifknow
to know there ifisthere
if there anisassociation between
is association
an an association fishfish
between
between death
fish and
death feed
death
and type.
and feed
feed type.
SheShe type.fish
gives Shemeal
gives fish mealand to 53 fish, andgiven
39 fish are given maize bran and after
gives fish meal to 53 fish, and 39 fish are given maize bran and after the end6of 6
to 53 fish, 39 fish are maize bran and after the end of
the end of 6 months she notes number of fish dead and alive under each fish feed
months she she
months
regime.notes
Thenumber
notes number
following of fish deaddead
oftable
fish and and
alivealive
summarizes under each
under
the fishfish
each
observed feed
data regime.
feed Theafter
regime.
obtained The
period
following of summarizes
table six months. the observed data obtained after period of six months.
following table summarizes the observed data obtained after period of six months.
FishFish
mealmeal Maize bran
Maize bran Total
Total
DeadDead 50 (50 ( ) ) 28( 28( ) ) 78 78
Alive
Alive 3( 3( ) ) 11( 11( )
) 14 14
Total
Total 53 53 39 39 92 92
a) a)
Copy the table into youryouranswer book and and
fill in
fillexpected cell cell
frequencies in in
a) CopyCopy the table
the table into into
your answer answer
bookbook
and fill in in expected
expected frequencies
cell frequencies in
the brackets
the brackets given
givengiven
the brackets (4 marks)
(4 marks)
(4 marks)
b) b) Using
Using chi-square
chi-square test, test
test,test, as to whether
test test whether there
thereisisanan associationbetween
betweenfishfish
b) Using chi-square as to whether there isassociation
an association between fish
feeding regime and death. (4 marks)
feeding regime
feeding and and
regime death.death. (4 marks)
(4 marks)

98 98

116
References

1. Mendenhall W.(1979). Introduction to Probability and Statistics. Boston: PWS


Publishers. USA
2. Walpole R.E.(1982). Introduction to Statistics. 3rd Edition. New York: Mc
Millian. USA.
3. Wannacoh T.H. and Wonnacoh R.J.(1969). Introductory Statistics, 5th Edition.
John Wiley & Sons. Canada.
4. Ross S.M.(2010). Introductory Statistics,3rd Edition. Burlington: Academic
Press. USA.
5. Mc Clave J.T. (2003). A First Course in Statistics. New Jesey: Prentice Hall.
USA.
6. Meyer P.L.(1970). Introductory Probability and Statistical Applications, 2nd
Ed, Addison-Wesley Publishing Company.
7. Daniel W.W. (1994). Biostatistics: a foundation for Analysis in the Health
Sciences, New York: Wiley. USA.
8. Atkinson A.C.(1985). Plots, transformations and regression. Oxford:
University Press. UK.

117

You might also like