0% found this document useful (0 votes)
11 views62 pages

Introduction To Quantitative Methods

Uploaded by

okenlinus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views62 pages

Introduction To Quantitative Methods

Uploaded by

okenlinus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoARcPSD|17473482

Introduction to Quantitative Methods

Statistics 2 (Mount Kenya University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Linus Giovann ([email protected])
lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

BMCU002: QUANTIATIVE METHODS


Brief Course Outline
 Introduction to Statistics
 Probability
 Correlation and Regression Analysis
 Statistical Inference
 Times Series Analysis and Index Numbers

TOPIC ONE: INTRODUCTION TO STATISTICS


1.1 Introduction
Statistics
Statistics is the science of collecting, organizing, presenting, analyzing and interpreting of data. This
definition clearly points out four stages in a statistical investigation, namely:
i. Collection of data
ii. Organization and Presentation of data
iii. Analysis of data
iv. Interpretation of data

Uses of Statistics
a) To present the data in a concise and definite form: Statistics helps in classifying and tabulating
raw data for processing and further tabulation for end users.
b) To make it easy to understand complex and large data: This is done by presenting the data in
the form of tables, graphs, diagrams etc., or by condensing the data with the help of means, dispersion
etc.
c) For comparison: Tables, measures of means and dispersion can help in comparing different
sets of data.
d) In forming policies: It helps in forming policies related to the education environment.
e) Enlarging individual experiences: Complex problems can be well understood by statistics, as
the conclusions drawn by an individual are more definite and precise than mere statements on facts.
f) In measuring the magnitude of a phenomenon (occurrence):- Statistics has made it possible to
count the population of a country, the industrial growth, the agricultural growth, the educational level
(of course in numbers)

Definitions:
• Statistics is the art and science of collecting, analyzing, presenting and interpreting data. A
branch of mathematics taking and transforming numbers into useful information for decision
makers. It refers to methods used for helping reduce the uncertainty inherent in decision
making
• Data are the facts and figures that are collected, summarized, analyzed, and interpreted.
• Raw data refers to unprocessed or unorganized data.
• Data can be broadly classified as being qualitative or quantitative.
• Quantitative data indicate either how many or how much. Are countable or numerical.
– Quantitative data that measure how many are discrete i.e. take specific values e.g.
whole numbers

[email protected] Page 1

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

– Quantitative data that measure how much are continuous because there is no
separation between the possible values for the data i.e. can take any value including
fractions.
• Qualitative data are labels or names used to identify an attribute of each element. Are non
numerical and therefore not countable.
Qualitative data use either the nominal or ordinal scale of measurement.
The statistical analyses for qualitative data are rather limited
The statistical analysis that is appropriate depends on whether the data for the variable are
qualitative or quantitative.
• Attribute: A characteristic of an elementary unit that can only be observed as to its presence or
absence.
• Variable: An observable quantitative characteristic of an elementary unit that vary from unit to
unit.
• Discrete Variable: A variable whose values are restricted to integer values only i.e. takes whole
numbers e.g. no. of students
• Continuous Variable: A variable that can assume any value within some interval i.e. can take
even fractions e.g. height or size of a building, measurements, weights, age
• Population: the entire possible observations that may be made in the universe
• Sample: Any portion drawn from a population. Generally a sample consist of a fewer elementary
units or observations than contained in a population. Thus a sample is a sub set of a population.
• Elementary units: Physical entity on which an observation is made.
• Survey: A planned and Systematic process of collecting statistical data
• Census: A survey in which observations are made on every elementary unit of the whole
population
• Sample Survey: A survey in which observations are made on a sample of elementary unit drawn
from the population.

Types of statistical data

Data

Qualitative Quantitative
(Categorical) (Numerical)

Continuous
Discrete (takes
(takes whole
only whole Nos.)
Nos. & fractions )
Classification of Statistics
Broadly classified into two categories:
i. Descriptive statistics: Refers to the collection, analysis and synthesis of data in order to come up
with a better description of the situation. It is a branch of statistical which is concerned with
collecting, describing and summarizing a set of data so as to derive meaningful information. It

[email protected] Page 2

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

involves classification of data, presentation of data in tabular forms, graphs, charts and
calculation of averages.
ii. Inferential statistics: Divided into two:
a) Inductive statistics: Is concerned with the development of scientific criteria so that
values of a group may be meaningfully estimated by examining only a small portion of
that group. The whole group is known as population or universe whle the portion is
known as sample. Values in the samples are known as statistics and values in the
population are known as parameters. Thus, inductive statistics is concerned with
estimating universe parameters from the sample statistics.
A sample is chosen instead of considering the whole population because of:
 Time limit: using a census survey based on the entire universe requires a lot of time
which might not be available.
 Costs: A sample survey is much cheaper compared to a census survey
 Volatility: since the census survey is time consuming, the relevance of the research
may not apply by the time of finishing the research.
b) Deductive statistics: it is concerned with the establishing of laws and procedures for
choosing one course from alternatives courses of actions under situations of
uncertainty. Since deductive statistics uses probability theory, it provides a rational base
for dealing with situations influenced by chances related factors.

Types of
Statistics

Descriptive Inferential
statistics Statistics

Inductive Deductive
Inferential Inferential
Statistics Statistics
Scales of measurement
Nominal Scale
Nominal measurement consists of assigning items to groups or categories. No quantitative information
is conveyed and no ordering of the items is implied. Nominal scales are therefore qualitative rather than
quantitative e.g. Religious preference, race, and gender. Variables measured on a nominal scale are
often referred to as categorical or qualitative variables.
Ordinal Scale
Measurements with ordinal scales are ordered in the sense that higher numbers represent higher
values. However, the intervals between the numbers are not necessarily equal. For example, on a five-
point rating scale measuring attitudes towards whether the quality of education offered in M.K.U. is of
standard. The rating on scale could be I strongly agree, I agree, am neutral, I disagree or I strongly
disagree. The difference between a rating of 2 and a rating of 3 may not represent the same difference

[email protected] Page 3

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

as the difference between a rating of 4 and a rating of 5. There is no "true" zero point for ordinal scales
since the zero point is chosen arbitrarily.

Interval Scale
On interval measurement scales, one unit on the scale represents the same magnitude on the trait or
characteristic being measured across the whole range of the scale. Interval scales do not have a "true"
zero point, however, and therefore it is not possible to make statements about how many times higher
one score is than another

Ratio Scale
Ratio scales are like interval scales except they have true zero points.
1.2 Organizing and Presenting Data
• It’s hard to interpret raw data in its original form. Hence it is always important to organize the
data in a systematic way.
• Organizing data refers to arranging data:
 according to similarity or resemblance
 according to the order of importance
 in the descending or ascending order
• The purpose for organizing data: To make the data easily understandable
 In order to make comparison and draw meaningful conclusion easily
 To eliminate unnecessary data
Statistical Series: Refers to different ways of arranging data.

a. Time Series: This is arranging data according to when they occur. This can be in terms of
hrs, days, months or years
b. Spatial Series: This is arranging data according to their geographical characteristics.
c. Conditional Series: This is arranging data according to their specific characteristics. E.g
male or female.
NOTE: Refer to class exercises for different methods of presenting data.

1.3 Measures of Central Tendency


These are statistical values which tend to occur at the centre of any well ordered set of data. These
measures are as follows:
i. The arithmetic mean
ii. The weighted average
iii. The mode
iv. The median
v. The geometric mean
vi. The harmonic mean

1. The arithmetic mean


This is commonly known as average or mean. It is obtained by summing up the values given and by
dividing the total value by the total no. of items.

[email protected] Page 4

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

X
Mean ( x ) =
n
Where x = values of items
∑ = summation
n = no. of observations or items
Example
The mean of 60, 80, 90, 120
60 + 80 + 90 + 120
4

350
=
4

= 87.5
The arithmetic mean is very useful because it represents the values of most observations in the
population.
The mean therefore describes the population quite well in terms of the values attained by most of the
members of the population.
Note: Refer to class exercises for further understanding.
Calculating Arithmetic Mean for grouped data

x
Where f=frequency
The following statistical terms are commonly used in statistical calculations. They must therefore be
clearly understood.

i) Class limits
These are numerical values which give a lower and upper limit for any given class i.e. all the
observations in a given class are expected to fall within the interval which is bounded by the class limits.
ii) Class boundaries
These are statistical boundaries, which separate one class from the other. They are usually determined
by adding the upper class limit to the next lower class limit and dividing by 2 e.g. in the above table the
19 + 20
class boundary between 19 and 20 is 19.5 which is = .
2
iii) Class mid points
These are very important values which mark the center of a given class. They are obtained by adding
together the two limits of a given class and dividing the result by 2.

[email protected] Page 5

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

iv) Class interval/width


This is the difference between an upper class boundary and lower class boundary. The value usually
measures the length of a given class.
Note: Refer to class exercises for further understanding.
2. The mode
- The mode is defined as the value of item which is repeated more than any other in a series.
Sometimes a single value may not exist as such in which case we may refer to the class with the
highest frequency. Such a class is known as a modal class.
- The mode can be important statistical value in education activities e.g. most students finish O-level
at the age of 16yrs. The mode can easily be determined for ungrouped data by determining the one
with the highest frequency.
- When determining the values of the mode from the grouped data we may use the following
methods;-
i. The graphical method which involves use of the histogram
ii. The computation method which involves use of formula

Example
In a social survey in which the main purpose was to establish the intelligence quotient (IQ) of resident in
a given area, the following results were obtained as tabulated below:

IQ No. of residents
1 – 20 6
20 – 40 18
40 – 60 32
60 – 80 48
80 – 100 27
100 – 120 13
120 – 140 2

Required
Calculate the modal value of the IQ’s tabulated above using the formula method and by graphical
method.
Formula. First identify the modal class i.e. the class with the highest frequency. The use the following
formula.

[email protected] Page 6

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Where l=lower limit of the modal group


f1=frequency of the class preceding the modal class
f2=frequency of the class following the modal class
i=class interval

Graphical method

50

40

30

20

10

20 40 60 80 100 120 140


Value of the mode
Note: Refer to class exercises for further understanding.
3. The median
- Refers to the value of the middle item in a series when the data is arranged in ascending or
descending order. eg 14, 17, 9, 8, 20, 32, 18, 14.5, 13. When the data is ordered it will be 8, 9, 13,
14, 14.5, 17, 18, 20, 32
The middle number/median is 14.5
- The importance of the median lies in the fact that it divides the data into 2 equal halves. The no. of
observations below and above the median are equal.
- When data is grouped the median may be determined by using the following methods

[email protected] Page 7

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

i. Graphical method using the cumulative frequency curve (o give)


ii. The formula
Example
Referring to the table below, determine the median using the methods above
The graphical method
IQ No of Cumulative Frequency
residents
0 – 20 6 6
20 – 40 18 24
40 – 60 32 56
60 – 80 48 104
80 – 100 27 131
100 – 120 13 144
120 – 140 2 146
UCL- Upper Class limit
x
160
140
120
100
80
60
40
20

20 40 60 80 100 120 140 160

Value of the median


n+1 146+1
The position of the median = =
2 2
Median position = 73.5
Median value = 67

[email protected] Page 8

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Where l=lower limit of the middle group


M=middle item
C= cummulatve frequency of the class before the middle group
F=frequency of the middle group
i=class interval of the middle group

Note: Refer to class exercises for further understanding.


4. Weighted mean
- It is an average used to show the degree of importance for the varying proportions or weights of
items.
Examples
The following table shows that marks scored by a student doing in an exam.

Subject Scores (x) Weight (w) wx


Communication 65 50 3250
Maths 63 40 2520
Statistics 62 45 2340
Psychology 80 35 2800
Educational 69 55 3795
Management
School 55 60 3300
Management
w = 285 wx = 18005

Weighted mean

Merits and demerits of the different measures of central tendency


The arithmetic mean (a.m)
Merits
[email protected] Page 9

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

i. It utilizes all the observations given


ii. It is a very useful statistic in terms of applications. It has several applications in business
management e.g. hypothesis testing, quality control e.t.c.
iii. It is the best representative of a given set of data if such data was obtained from a normal
population
iv. The a.m. can be determined accurately using mathematical formulas

Demerits of the a.m.


i. If the data is not drawn from a ‘normal’ population, then the a.m. may give a wrong
impression about the population
ii. In some situations, the a.m. may give unrealistic values especially when dealing with
discrete variables e.g. when working out the average no. of children in a no. of families, it
may be found that the average is 4.4 which is unrealistic in human beings

The mode
Merits
i. It can be determined from incomplete data provided the observations with the highest
frequency are already known
ii. The mode has several applications in business eg stocking the most sold good.
iii. The mode can be easily defined
iv. It can be determined easily from a graph

Demerits
i. If the data is quite large and ungrouped, determination of the mode can be quite
cumbersome
ii. Use of the formula to calculate the mode is unfamiliar to most business people
iii. The mode may sometimes be non-existent or there may be two modes for a given set of
data. In such a case therefore a single mode may not exist
The median
Merits
i. It shows the centre of a given set of data
ii. Knowledge of the determination of the median may be extended to determine the quartiles
iii. The median can easily be defined
iv. It can be obtained easily from the cumulative frequency curve
v. It can be used in determining the degrees of skew-ness

[email protected] Page 10

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Demerits
i. In some situations where the no. of observations is even, the value of the median obtained
is usually imaginary
ii. The computation of the median using the formulas is not well understood by most people.
iii. In the education environment the median has got very few applications

Other Measures of Location


Quartiles: Refers to values of items that divide a series into four equal parts when the series is arranged
into ascending order.
- The quartiles normally used are three namely;
a) The lower quartile (first quartile Q1)
b) The median (second quartile Q2)
c) The upper quartile (third quartile Q3)
i. Deciles: Refers to values of items that divide a series into ten equal parts when the
series is arranged in ascending order.
ii. Percentiles: Refers to values that divide a series into 100 equal parts when the series is
arranged in ascending order.
Note: Refer to class exercises for further understanding.

Measures of Dispersion
- Also known as Measures of variations or variability.
- They measure how much data spread out around a central measure.
- The measures of dispersion are very useful in statistical work because they indicate whether the
rest of the data are scattered around the mean or away from the mean.
- If the data is approximately dispersed around the mean then the measure of dispersion
obtained will be small therefore indicating that the mean is a good representative of the sample
data. But on the other hand, if the figures are not closely located to the mean then the
measures of dispersion obtained will be relatively big indicating that the mean does not
represent the data sufficiently.
- The measures of dispersion are expressed in two ways:
i. Absolute measure: This is when the measures are expressed using the same units of
measure as the original data.
ii. Relative measure: This is when the measures are expressed as a fraction or percentage.
Are also known as coefficients of dispersion.

Methods of Measuring Dispersion


The commonly used measures of dispersion are
a) The range

[email protected] Page 11

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

b) The quartile deviation


c) The mean deviation
d) The standard deviation
a) The range
- The range is defined as the difference between the highest and the smallest values in a
frequency distribution.
- where H= Highest value
- S= Smallest value

- The main disadvantage of range is that it only uses 2 values (Highest and smallest) in a given
series. However the smaller the value of the range, the less dispersed the observations are from
the arithmetic mean and vice versa.

b) The quartile range (inter-quartile deviation)


- This is a measure of dispersion which involves the use of the upper quartile (Q3) and the
lower quartile (Q1). A quartile is a mark or a value which lies at the boundary of a division
when any given set of data is divided into four equal divisions.
- The semi inter-quartile range is a good measure of dispersion because it shows how the rest
of the data are generally spread around the mean.

- The main disadvantage of quartile deviation is that it uses only two values (Q 1 and Q3) ignoring
other values.
Note: Refer to class exercises for further understanding.
c) The mean deviation
- This is a deviation taken from the mean, median or mode. The deviations are taken as positive
values. This measure of dispersion makes use of all the values given.
Finding Mean deviation for ungrouped data
x

Finding Mean deviation for grouped data


x

[email protected] Page 12

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

E.g.1
In a given exam the scores for 10 students were as follows
Student Mark (x) xx
A 60 1.8
B 45 16.8
C 75 13.2
D 70 8.2
E 65 3.2
F 40 21.8
G 69 7.2
H 64 2.2
I 50 11.8
J 80 18.2
Total 618 104.4
Required: Determine the absolute mean deviation
618
Mean, x = = 61.8
10
 X-X 104.4
Therefore AMD = = = 10.44
N 10

E.g. 2
The following data was obtained from a given financial institution. The data refers to the loans given out
in 2013 to several students

No. of Amount of loan fx xx x x .f


students per student (x)
(f)
3 20000 60000 4157.9 12473.70
4 60000 240000 35842.1 143368.40
1 15000 15000 9157.9 9157.9
5 12000 60000 12157.9 60789.50
6 14000 84000 10157.9 60947.40
Σf = 19 Σfx = 459000 286736.90

[email protected] Page 13

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Required
Calculate the mean deviation for the amount of loan given

NB If the absolute mean deviation is relatively small it implies that the data is more compact and
therefore the arithmetic mean is a fair sample representative.
The main disadvantage of mean deviation is that it takes all the deviations as positive values even when
some are negative.
d) The standard deviation
- This is one of the most accurate measures of dispersion. It has the following advantages;
i. It utilizes all the values given
ii. It makes use of both negative and positive values if they occur
iii. The standard deviation reflects an accurate impression of how much the sample data varies
from the mean.
Std deviation for ungrouped data

Std deviation for grouped data

Coefficients

x
Variance
Square of the standard deviation is called variance.

[email protected] Page 14

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Finding Variance for ungrouped data


x

Finding variance for grouped data

Example
A sample comprises of the following observations; 14, 18, 17, 16, 25, 31
Determine the standard deviation of this sample

x
 x  x  x  x
2

14 -6.1 37.21
18 -2.1 4.41
17 -3.1 9.61
16 -4.1 16.81
25 4.9 24.01
31 10.9 118.81
121 210.56

121
X  20.1
6

 
2
 xx 210.56
 standard deviation,   
n 6
= 5.93

Example 2 (with frequency)


The following table shows the part-time rate per hour of a given no. of laborers in the month of June
1997.

Rate per hr (x) Shs No. of labourers fx


 
2
f xx
(f)
230 7 1610 700

[email protected] Page 15

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

400 6 2400 153600

350 2 700 24200

450 1 450 44100

200 8 1600 12800

150 11 1650 89100

Total 35 8410 324500

Calculate the standard deviation from the above table showing how the hourly payment were varying
from the respective mean

∴ Standard deviation,

=
= 96.29

Example 3.1
The quality controller in a given firm had an accurate record of all the iron bars produced in may 1997.
The following data shows those records
Bar lengths (cm) No. of bars(f) Class mid point fx
 
2
f xx
(x)
201 – 250 25 225.5 5637.5 596756.3

251 – 300 36 275.5 9918 393129

301 – 350 49 325.5 15949.5 145542.3

351 – 400 80 375.5 30040 1620

401 – 450 51 425.5 21700.5 105582.8

451 – 500 42 475.5 19971 383050.5

501 - 550 30 525.5 15765 635107.5

[email protected] Page 16

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

313 118981.50 2260788

Calculate the standard deviation of the lengths of the bars

∴ Standard deviation,

=
TOPIC TWO: INTRODUCTION TO PROBABILITIY
 It is a chance that something will happen.

 It is the possibility of an occurrence of an event

 It is the ratio of the number of favorable cases to the total number of equally likely cases i.e.

Probability = No. of favorable cases

Total No. of possible outcomes

Approaches to probability

i. The classical /priori approach: The probability of an event E is given by

P (E) = Favorable Outcomes

Total No. of outcomes

ii. The relative frequency/empirical approach: This is a statistical approach to probability through
use of observation

iii. The personalistic approach: Also known as subjective approach. The probability here depends
on the individual’s beliefs, opinion, feelings and is based on one’s own experience.

iv. The axiomatic approach: Axiom means rule or law or norm. In this approach, there are three
basic rules:

a) Rule one: Probability of any event is a non negative real number. Hence the smallest
probability is zero.

b) Rule two: The entire probability sample is one i.e. P(S)= 1

[email protected] Page 17

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

c) Rule three: For mutually exclusive events, P (E1 or E2) = P (E1) +P (E2). This rule gives the
total probability for mutually exclusive events as being one.

Probability Line

 Probability can be shown on number line.

 Probability is always between 0 and 1.

0 ½ 1

When an event is impossible 50-50 chance when an event is

100% sure or certain

E.g. of an impossible event: The probability of getting a 7 when you through a dice. P (of a 7 in a dice
through= 0

E.g. of a sure event: The rising of the sun tomorrow. Probability here = 1

Sample space:

 Refers to all the possible outcomes of an experiment.

 It is made up of sample points

 Examples:

 Tossing a coin: You can get either a head or a tail Sample space=2 {Head, Tail}

 Throwing a dice: Sample space= 6 {1,2,3,4,5,6}

 Drawing a card from a deck of cards= 52 {all the cards}

Sample Point

 Refers to any one of the possible outcomes of an experiment

 Examples:

 Tossing a coin: Head is one of the two sample points

[email protected] Page 18

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

 Throwing a dice: A four is one of the six sample points

 Drawing a card from a deck of cards: A Queen of hearts is one of the 52 sample points.

Event

 It is a single result of an experiment

 It can include one or more sample points e.g. getting an even number after throwing a dice. The
event here is even number. There are three even numbers (2, 4, 6), hence three sample points.

P (Even no.)= 3/6 = ½

Types of Events

i. Elementary event/simple event: Is a single possible outcome of an experiment. It is an event


that cannot be subdivided into a combination of other events.

ii. Compound Events: When two or more events occur in connections with each other. It is an
aggregate of simple event.

iii. Mutually Exclusive Events: When you cannot get both events at the same time; either one or
the other but not both. Mutually exclusive events are those events which cannot happen at the
same time. E.g. getting a head and a tail at the same time when tossing a coin.

iv. Collectively Exhaustive events: Events which include all possible outcomes of an experiment.

v. Complementary Events: All out comes that are not the event being considered. E.g. head is
complement to tail (in tossing a coin).

 Complementary law: States that the sum of the probability of an event and the
probability of its complement equals to one. i.e. P(A) + P (Ac)= 1

vi. Equally likely Events: When one event does not occur more often than others i.e. each event of
an experiment has an equal chance of happening, just like any other.

vii. Independent Event: An event is independent if the occurrence of the event is not affected by
any other event. E.g. getting a head does not depend on getting a tail.

viii. Dependent Event: Two events are dependent if the outcome or occurrence of the first affects
the outcome or occurrence of the second. E.g. drawing a card from a deck without replacement
Conditional probability is based on dependent events.

[email protected] Page 19

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Probability Laws

i. Addition rule (OR):

It is used to calculate the probability of two or more mutually exclusive events. In such case the
probability of the separate events must be added

E.g. For events A and B that are mutually exclusive: P (A or B) =P (A) + P (B)

If in case there is any intersect for the two events, the probability of the intersect should be subtracted
e.g. For events A and B that are not mutually exclusive: P (A or B) =P (A) +P (B)-P (A∩ B) (∩- symbol
for intersect; ᴜ -symbol for union)

ii. Multiplication rule (AND):

It is used when there are independent events

E.g. For events A and B that are independent: P (A & B) =P (A) x P (B)

Two events A and B are independent if the fact that the occurrence or non-occurrence of event A does
not affect the occurrence or non-occurrence of event B.

Conditional Probability

Conditional probability of an event B in relationship to an event A is the probability that event B occurs
given that event A has already occurred. It is written as P (B/A) (read as probability of B given A)

P (B/A) =P (A ∩ B)

P (A)

Examples

1. A math teacher gives her class a text 25% of the class passed both test and 42% of the class passed
the first test also passed the second test

2. In a certain company there are 550 employees 380 employee have gone to at least to college
Education 412 have attended vocational training programme.357 have both gone to at least upto
college and attended vocational training programme what is the probability of randomly choosing an
employee who has at least college education or vocational training or both.

3. A consulting form is bidding for two jobs .the pro of getting firm A is 0.45 while pro for getting B given
that it gets a bid with firm A is 0.9 ,what chance does the firm has of getting both jobs

[email protected] Page 20

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

P (A ∩ B) = P (B/A) x P (A)

NOTE: Refer to class exercises

TOPIC THREE: CORRELATION AND REGRESSION ANALYSIS


It refers to interrelationship or association between 2 or more variables.
The purpose of studying correlation is for one to be able to establish whether a relationship exists
between 2 or more variables.
Importance of Correlation
i. To measure (using one figure) the degree of relationship between the variables.
ii. To help estimate or predict the value of one variable given the value of the other variable
iii. Used in economic forecasting and planning

Types of Correlations
Variables may be related:
i. Perfectly related: A change in independent variable causes the same amount of change in the
dependant variable. Here we can have perfect +ve correlation (where an increase in
independent variable causes the same increase in dependent) or perfect –ve correlation where
an increase in independent variable causes an equal decrease in dependent variable.
ii. Partly correlated
iii. Uncorrelated (where no relationship exists)

Methods of Studying Correlation


i. Scatter diagram
ii. Karl Pearson’s coefficient of correlation
iii. Spearman’s Rank correlation coefficient
iv. Methods of Least squares (Regression Line)

3.1 SCATTER DIAGRAM


- A scatter diagram is a graph which comprises of points which have been plotted but are not
joined by line segments
- The pattern of the points reveal the types of relationship existing between variables
- The following sketch graphs will greatly assist in the interpretation of scatter graphs.

[email protected] Page 21

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Perfect positive correlation


18
16
14
12
10
8
6
4
2
0
0 2 4 6 8 10 12 14
For the above pattern, it is referred to as perfect +ve because the points may easily be represented by a
single straight line graph and an increase in one variable causes an equal increase in another. e.g. when
measuring relationship between volumes of sales and profits in a company, the more the company sales
the higher the profits.

Perfect negative correlation


70
60
50
40
30
20
10
0
0 2 4 6 8 10 12 14
This represents perfect –ve correlation where an increase in one variable causes an equal decrease in
other e.g goods demanded in relation to the price, the lower the price, the higher the goods the
demanded.

[email protected] Page 22

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

High degree of positive correlation


25

20

15

10

0
0 2 4 6 8 10 12 14

High degree of negative correlation


30

25

20

15

10

0
0 2 4 6 8 10 12 14

[email protected] Page 23

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

No correlation
25

20

15

10

0
0 2 4 6 8 10 12 14
Spurious Correlations
- in some rare situations when plotting the data for x and y we may have a group showing either
positive correlation or –ve correlation but when you analyze the data for x and y in normal life
there may be no convincing evidence that there is such a relationship. This implies therefore
that the relationship only exists in theory and hence it is referred to as spurious or non sense
e.g. when high pass rates of student show high relation with increased accidents.
Note: Refer to class exercise for further understanding.

Correlation coefficient
- These are numerical measures of the correlations existing between the dependent and the
independent variables
- These are better measures of correlation than scatter diagram.
- The range for correlation coefficients lies between +ve 1 and –ve 1. A correlation coefficient of
+1 implies that there is perfect positive correlation. A value of –ve shows that there is perfect
negative correlation. A value of 0 implies no correlation at all
3.2 KARL PEARSON’S COEFFICIENT OF CORRELATION (r)

It is also known as Product Moment Coefficient of correlation.


n xy   x y
n x 2    x   n y 2    y 
2 2

Example
(Product moment correlation)
The following data was obtained during a social survey conducted in a given urban area regarding the
annual income of given families and the corresponding expenditures.

Family (x)Annual (y)Annual xy x2 Y2


income £ 000 expenditure £
000
A 420 360 151200 176400 129600

[email protected] Page 24

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

B 380 390 148200 144400 152100


C 520 510 265200 270400 260100
D 610 500 305000 372100 250000
E 400 360 144000 160000 129600
F 320 290 92800 102400 84100
G 280 250 70000 78400 62500
H 410 380 155800 168100 144400
J 380 240 91200 144400 57600
K 300 270 81000 90000 72900
Total 4020 3550 1504400 1706600 1342900

Required
Calculate the product moment correlation coefficient briefly comment on the value obtained
The produce moment correlation
n xy   x y
r=
n x 2    x   n y 2    y 
2 2

10 1,504, 400    4020  3550 


r=
10 1, 706, 600   40202  10 1,342,900    3550 
2

= 0.89

Comment: The value obtained 0.89 suggests that the correlation between annual income and annual
expenditure is high and positive. This implies that the more one earns the more one spends.

Note: Refer to class exercise for further understanding.

Interpretation of r-Problems in interpreting r values

NOTE:
 A high value of r (+0.9 or – 0.9) only shows a strong association between the two variables but
doesn’t imply that there is a causal relationship i.e. change in one variable causes change in the
other it is possible to find two variables which produce a high calculated r yet they don’t have a
causal relationship. This is known as spurious or nonsense correlation e.g. high pass rates in M.K.U.
and increased inflation in Asian countries.
 Also note that a low correlation coefficient doesn’t imply lack of relation between variable but lack
of linear relationship between the variables i.e. there could exist a curvilinear relation.
 A further problem in interpretation arises from the fact that the r value here measures the
relationship between a single independent variable and dependent variable, whereas a particular
variable may be dependent on several independent variables (e.g. goods demanded may depend on
price, customer’s preferences/tastes, price of related goods, substitutes etc.) in which case multiple
correlation should be used instead.

[email protected] Page 25

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

3.4 THE RANK CORRELATION COEFFICIENT (R)


It is also known as the Spearman’s rank correlation coefficient. Its purpose is to establish whether there
is any form of association between two variables where the variables are arranged in a ranked form.
6 d 2
R=1-
n  n 2  1
Where d = difference between the pairs of ranked values.
n = numbers of pairs of rankings

Example
A group of 8 students are tested in Psychology and Statistics. Their rankings in the two tests were.
Student Psychology Statistics ranking d d2
ranking
A 2 3 -1 1
B 7 6 1 1
C 6 4 2 4
D 1 2 -1 1
E 4 5 -1 1
F 3 1 2 4
G 5 8 -3 9
H 8 7 1 1
d 2
 22
d = Psychology ranking – Statistics ranking
6 d 2 6  22
R=1-  1
n  n  1
2
8  82  1
= 0.74
Thus we conclude that there is a reasonable agreement between student’s performances in the two
types of tests.

Equal or Tied Rankings


A slight adjustment to the formula is made if some students tie and have the same ranking the
adjustment is
t3  t
where t = number of tied rankings. The adjusted formula becomes
12

R=1-

6  d 2  t 12t
3

n  n  1
2

Example
Assume that in our previous example student E & F achieved equal marks in Psychology and were given
joint 3rd place.

[email protected] Page 26

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Solution
Student Psychology Statistics ranking d d2
ranking
A 2 3 -1 1
B 7 6 1 1
C 6 4 2 4
D 1 2 -1 1
E 3½ 5 -1 ½ 2¼
F 3½ 1 2½ 6¼
G 5 8 -3 9
H 8 7 1 1
d 2
 26 1 2

R = 1-
6  d  
2 t 3 t
12
= 1- 
6 26 1 2  23  2

n   n  1
12

8  8  1
2 2

= 0.68
NOTE: It is conventional to show the shared rankings as above, i.e. E, & F take up the 3rd and 4th rank
which are shared between the two as 3½ each.

3.4 REGRESSION
- This is a concept, which refers to the changes which occur in the dependent variable as a result
of changes occurring on the independent variable.
- Knowledge of regression is particularly very useful in business statistics where it is necessary to
consider the corresponding changes on dependant variables whenever independent variables
change
- It should be noted that most business activities involve a dependent variable and either one or
more independent variables. Therefore knowledge of regression will enable a business
statistician to predict or estimate the expenditure value of a dependant variable when given an
independent variable e.g. consider an example for annual incomes and annual expenditures.
Using the regression techniques one can be able to determine the estimated expenditure of a
given family if the annual income is known and vice versa
- The general equation used in simple regression analysis is as follows
ŷ= a + bx
Where y = Dependant variable
a= Interception of y axis (constant)
b = Slope on the y axis
x = Independent variable
The determination of the regression equation such as given above is normally done by using a technique
known as “the method of least squares’.
Regression equation of y on x i.e. y = a + bx

[email protected] Page 27

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

y x x Line of best fit


x x
x x
x x
x x
x x

x
The following sets of equations normally known as normal equation are used to determine the equation
of the above regression line when given a set of data.
Σy = an + bΣx
Σxy = aΣx + bΣx2
Where Σy = Sum of y values
Σxy = sum of the product of x and y
Σx = sum of x values
Σx2= sum of the squares of the x values
a = The intercept on the y axis
b = Slope gradient line of y on x
Example
An investment company advertised the sale of pieces of land at different prices. The following table
shows the pieces of land their acreage and costs

Piece of (x)Acreage (y) Cost £ 000 xy x2


land Hectares
A 2.3 230 529 5.29
B 1.7 150 255 2.89
C 4.2 450 1890 17.64
D 3.3 310 1023 10.89
E 5.2 550 2860 27.04
F 6.0 590 3540 36
G 7.3 740 5402 53.29
H 8.4 850 7140 70.56
J 5.6 530 2969 31.36
Σx =44.0 Σy = 4400 Σxy= 25607 Σx2 = 254.96

Required
Determine the regression equations of
i. y on x and hence estimate the cost of a piece of land with 4.5 hectares
ii. Estimate the expected average if the piece of land costs £ 900,000
Σy = an + bΣx
Σxy = a∑x + bΣx2

[email protected] Page 28

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

By substituting of the appropriate values in the above equations we have


4400 = 9a + 44b …….. (i)
25607 = 44a + 254.96b ……..(ii)
By multiplying equation …. (i) by 44 and equation …… (ii) by 9 we have
193600 = 396a + 1936b …….. (iii)
230463 = 396a + 2294.64b ……..(iv)
By subtraction of equation …. (iii) from equation …… (iv) we have
36863 = 358.64b
102.78 = b
by substituting for b in …….. (i)
4400 = 9a + 44 (102.78)
4400 – 4522.32 = 9a
–122.32 = 9a
-13.59 = a
Therefore the equation of the regression line of y on x is
Y = 13.59 + 102.78x
When the acreage (hectares) is 4.5 then the cost
(y) = -13.59 + (102.78 x 4.5)
= 448.92
= £ 448, 920
Note that
Where the regression equation is given by
ŷ= a + bx
Where a is the intercept on the y axis and
b is the slope of the line or regression coefficient
n is the sample size
then,

the y-intercept a =
 y  b x
n
n xy   x y
n x 2    x 
Slope b = 2

Prediction within the Range of Sample Data


We can use the linear regression model to predict the mean of dependant variable for any given value of
independent variable
For example if the sample model is given by
Time (min) = 5.91 + 2.66 (distance in miles)
and the distance is 4.0 miles then our estimated mean time is
ŷ= 5.91 + 2.66 x 4.0 = 16.6 minutes
Coefficient of Determination (r2)

[email protected] Page 29

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

This refers to the ratio of the explained variation to the total variation and is used to measure the
strength of the linear relationship. The stronger the linear relationship the closer the ratio will be to one.

Coefficient of determination = Explained variation


Total variation

Where ŷ= estimate of y given the regression equation for each value of x


ӯ= mean of actual values of y
y= individual actual values of y
NOTE: If the coefficient of determination is low, look for non-linear relationships between the
variables or some other causes.
You can also find the coefficient of determination by first calculating the coefficient of
correlation (Karl Pearson’s or rank coefficient) and then squaring the answer.
Note: Refer to class exercise for further understanding.

Non Linear Relationships


If the scatter diagram and the correlation coefficient do not indicate linear relationship, then the
relationship may be non – linear
Two such relationships are of interest
y = abx
Both of these can be reduced to linear model. Simple or multiple linear regression methods are then
used to determine the values of the coefficients
i. Exponential model
y = abx
Take log of both sides
Log y = log a + log bx
Log y = log a + xlog b
Let log y = Y and log a = A and log b = B
Then Y = A + Bx. This is a linear regression model
ii. Logarithm model
y = axb
using the same technique as above
Log y = log a + blog x
Y = A + bX
Where Y = log
A = log a
X = log x
Using linear regression technique (the method of least squares), it is possible to calculate the value of a
and b.

[email protected] Page 30

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

TOPIC FOUR: STATISTICAL INFERENCE


It is the process of drawing conclusions about attributes of a population based upon information
contained in a sample (taken from the population).
It is divided into estimation of parameters and testing of hypothesis. Symbols for statistic of population
parameters are as follows.

Sample Statistic Population Parameter


Arithmetic mean x µ
Standard deviation s σ
Number of items n N
Proportion P π

4.1 Statistical Estimation


It is the procedure of using statistic to estimate a population parameter
It is divided into point estimation (where an estimate of a population parameter is given by a single
number) and interval estimation (where an estimate of a population is given by a range which the
parameter may be considered to lie)
Characteristic of a Good Estimator
(i) Unbiased: where the expected value of the statistic is equal to the population
parameter e.g. if the expected mean of a sample is equal to the population mean
(ii) Consistency: where an estimator yields values more closely approaching the population
parameter as the sample increases
(iii) Efficiency: where the estimator has smaller variance on repeated sampling.
(iv) Sufficiency: where an estimator uses all the information available in the data concerning
a parameter
Confidence Interval
The interval estimate or a ‘confidence interval’ consists of a range (an upper confidence limit and a
lower confidence limit) within which we are confident that a population parameter lies and we assign a
probability that this interval contains the true population value
The confidence limits are the outer limits to a confidence interval. Confidence interval is the interval
between the confidence limits. The higher the confidence level the greater the confidence interval. For
example

4.2 NORMAL DISTRIBUTION


The normal distribution is a probability distribution which is used to determine probabilities of
continuous variables
Examples of continuous variables are Distances, Times, Weights, Heights, Capacity e.t.c.
Usually continuous variables are those, which can be measured by using the appropriate units of
measurement.

Following are the properties of the normal distribution


1. The total area under the curve is = 1 which is equivalent to the maximum value of
probability

[email protected] Page 31

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

2. The line of symmetry divides the curve into two equal halves
3. The two ends of the normal distribution curve continuously approach the horizontal axis
but they never cross it
4. The values of the mean, mode and median are all equal
NB: The above distribution curve is referred to as normal probability distribution curve because if a
frequency distribution curve is plotted from measurements of a given sample drawn from a normal
population then a graph similar to the normal curve must be obtained.
- It should be noted that 68% of any population lies within one standard deviation, ±1σ
- 95% lies within two standard deviations ±2σ
- 99% lies within three standard deviations ±3σ

Where σ = standard deviation

0 Z
Standardization of Variables
- Before we use the normal distribution curve to determine probabilities of the continuous
variables, we need to standardize the original units of measurement, by using the following
formula.
 μ
Z=
σ
Where χ = Value to be standardized
Z = Standardization of x
µ = population mean
σ = Standard deviation
Example
A sample of students had a mean age of 35 years with a standard deviation of 5 years. A student was
randomly picked from a group of 200 students. Find the probability that the age of the student turned
out to be as follows

[email protected] Page 32

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

i. Lying between 35 and 40


ii. Lying between 30 and 40
iii. Lying between 25 and 30
iv. Lying beyond 45 yrs
v. Lying beyond 30 yrs
vi. Lying below 25 years

Solution
(i). The standardized value for 35 years
  35 - 35
Z= = = 0
σ 5

The standardized value for 40 years


  40 - 35
Z= = = 1
σ 5

∴ the area between Z = 0 and Z = 1 is 0.3413 (These values are checked from the normal tables). The
value from standard normal curve tables.
When z = 0, p=0
And when z = 1, p = 0.3413
Now the area under this curve is the area between z = 1 and z = 0
= 0.3413 – 0 = 0.3413
∴ the probability age lying between 35 and 40 yrs is 0.3413
(ii). 30 and 40 years
  30  35 5
Z= = = = -1
σ 5 5
  40  35
Z= = = 1
σ 5

∴ the area between Z = -1 and Z = 1 is


= 0.3413 (lying on the positive side of zero) + 0.3413 (lying on the negative side of zero)
P = 0.6826
∴ the probability age lying between 30 and 40 yrs is 0.6826
(iii). 25 and 30 years
   25  35 10
Z= = = = -2
σ 5 5

   30  35
Z= = = -1
σ 5

[email protected] Page 33

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

∴ the area between Z = -2 and Z = -1


Probability area corresponding to Z = -2
= 0.4772 (the z value to check from the tables is 2)
Probability area corresponding to Z = -1
= 0.3413 (the z value for this case is 1)
∴ the probability that the age lies between 25 and 30 yrs
= 0.4772 – 0.3413 (The area under this curve)
P = 0.1359

iv). P(beyond 45 years) is determined as follow = P(x > 45)


  45  35  10
Z= = = =+2
σ 5 5

Probability corresponding to Z = 2 = 0.4772 = probability of between 35 and 45


∴ P(Age > 45yrs) = 0.5000 – 0.4772
= 0.0228

4.3 Hypothesis Testing


- A hypothesis is a claim or an opinion about an item or issue. Therefore it has to be tested statistically
in order to establish whether it is correct or not correct
- Whenever testing a hypothesis, one must fully understand the 2 basic hypotheses to be tested namely
i. The null hypothesis (H0)
ii. The alternative hypothesis(H1 OR HA)
The Null Hypothesis (H0)
This is the hypothesis being tested, the belief of a certain characteristic. Takes the signs ‘=’, ‘≤’ or
‘≥’.
The Alternative Hypothesis (H1 or HA)
Alternative hypothesis is a contradiction to the null hypothesis. Thus when we reject the null hypothesis
we accept the alternative hypothesis. It does not contain an equal sign but takes on the sings ‘≠’ or
‘˂’ or ‘˃’.
Acceptance and rejection regions
All possible values which a test statistic may either agree with the null hypothesis (acceptance region) or
lead to the rejection of the null hypothesis (rejection region or critical region)
The values which separate the rejection region from the acceptance region are called critical values
Type I and type II errors
While testing hypothesis (H0) and deciding to either accept or reject a null hypothesis, there are four
possible outcomes.
Possible Outcomes in hypothesis testing
a) Accepting a true hypothesis –the correct decision
b) Rejecting a false hypothesis -correct decision
[email protected] Page 34

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

c) Rejecting a true hypothesis – incorrect decision. This is called type I error, with probability = α.
d) Accepting a false hypothesis – (incorrect decision) – this is called type II error, with probability = β.

Decision Accept H0 Reject H0


H0 is true Correct (No Error) Wrong (Type I error)
H0 is false Wrong (Type II Error) Correct (No error)

Levels of significance
A level of significance is basically the probability of one making an incorrect decision after the statistical
testing has been done. Usually such probability used are very small e.g. 1% or 5%

Two Tailed Test

Region of acceptance for


H0

Rejection region Rejection region

Critical Value critical Value


NB: Alternative hypothesis is usually rejected if the standardized value of the sample means lies beyond
the Critical Values.

One Tailed Test


It is only concerned with one of the tails of the distribution.
e.g. Lower tail(left tail) test

Acceptance Region

Rejection Region

Critical Value

[email protected] Page 35

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

e.g. Upper tail( Right tail) Test


Acceptance region

Rejection Region

Critical Value

HYPOTHESIS TESTING PROCEDURE


Whenever a business complain comes up there is a recommended procedure for conducting a statistical
test. The purpose of such a test is to establish whether the null hypothesis or alternative hypothesis is to
be accepted.
The following are steps normally adopted
1. State the null and alternative hypothesis
2. State the level of significance to be used.
3. State the test statistic i.e. what is to be tested e.g. the sample mean, sample proportion,
difference between sample means or sample proportions
4. Type of test whether two tailed or one tailed.
5. State the critical values using the appropriate level of significance
6. Standardizing the test statistic
7. Conclusion showing whether to accept or reject the null hypothesis

STANDARD HYPOTHESIS TESTS


In principal, we can test the significance of any statistic related to any probability distribution.
Hypothesis testing is divided into 2
i. Parametric tests:
ii. Non-parametric tests:
The sample statistics mean, proportion and variance, are related to the normal, t, F, and chi squared
distributions
1. Normal test
Test a sample mean ( X ) against a population mean (µ) (where samples size n > 30 and population
variance σ2 is known) and sample proportion, P (where np >5 and nq >5 since in this case the normal
distribution can be used to approximate the binomial distribution
2. t test
Tests a sample mean ( X ) against a population mean and especially where the population
variance is unknown and n < 30.

3. Variance ratio test or f test


It is used to compare population variances and it is used with samples of any size drawn from
normal populations.

[email protected] Page 36

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

4. Chi squared test


It can be used to test the association between attributes or the goodness of fit of an observed
frequency distribution to a standard distribution
Example 1
A certain NGO carried out a survey in a certain community in order to establish the average at which the
girls are married. The results of the survey indicated that the marriage age for the girls is 19 years
H0: μ (mean marital age) = 19 years
Alternative hypothesis H1: μ (mean marital age) < 19 years
1. The level of significance is 5%
2. The test statistics is the sample mean age, X = 16 years
3. The critical value of the one tailed test (one tailed because the alternative hypothesis is an
inequality) at 5% level of significance is –1.65
4. The standardizes value of the sample mean is
X -μ S
Z = where S x =
Sx n

Where, X = Sample mean


µ = Population mean
S = sample standard deviation
n = sample size
z = standard value (as per computation)
The standard value Z must fall within the acceptance region for us to accept the null hypothesis.
Thus it must be > - 1.65 otherwise we accept the alternative hypothesis.
16  19
Z = 2.1
= - 10.1
50

5. Since –10.1 < -1.65, we reject the null hypothesis but accept the alternative hypothesis at 5%
level of significance i.e. the marriage age in this community is significantly lower than 19 years

Acceptance region
Rejection region

- 1.65 0

Example 2

[email protected] Page 37

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

A foreign company which manufactures electric bulbs has assured its customers that the lifespan of the
bulbs is 28 month with a standard deviation of 4months. Recently the company embarked on a quality
improvement research for their product. After the research using new technology, a sample of 70 bulbs
was tested and they gave a mean lifespan of 30.2 months
Does this justify the research undertaken? Use 1% level of significance to conduct a statistical test in
order to establish the truth about the above question.
Testing procedure
1. Null hypothesis H0: µ = 28
Alternative hypothesis HA: µ > 28
2. The level of significance is 1% (one tailed test)
3. The test statistics is the sample mean age, x’ = 30.2
4. The critical value of the one tailed test at 5% level of significance is + 2.33
5. The standardized value of the sample mean is
X  30.2  28
Z = = 4
= 4.6
Sx 70

6. Since 4.6 > 2.33, we reject the null hypothesis but accept the alternative hypothesis at 1% level
of significance i.e. the new sample mean life span is statistically significant higher than the
population mean
Therefore the research undertaken was worthwhile or justified

0.4900

1% = 0.01

2.33
Example 3
A construction firm has placed an order that they require a consignment of wires which have a mean
length of 10.5 meters with a standard deviation of 1.7 m
The company which produces the wires delivered 90 wires, which had a mean length of 9.2 m., The
construction company rejected the consignment on the grounds that they were different from the order
placed.

Required
Conduct a statistical test to indicate whether you support or not support the action taken by the
construction company at 5% level of significance.

Solution
Null hypothesis µ = 10.5 m
Alternative hypothesis µ ≠ 10.5 m
Level of significance be 5%

[email protected] Page 38

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

The test statistics is the sample mean X = 9.2m


The critical value of the two tailed test at 5% level of significance is ± 1.96 (two tailed test).

Acceptance Region

- 1.96 +1.96
The standardized value of the test Z =
X -μ 9.2  10.5
Z = = 1.7
= - 7.25
SX 90

Since -7.25 < -1.96, reject the null hypothesis but accept the alternative hypothesis at 5% level of
significance i.e. the sample mean is statistically different from the consignment ordered by the
construction company. Therefore support the action taken by the construction company

TESTING THE DIFFERENCE BETWEEN TWO SAMPLE MEAN (LARGE SAMPLES)


A large sample is defined as one which contains 30 or more items (n≥30) where n is the sample size.
Let X1 and X2 be any two samples whose sizes are n1 and n2 and mean X 1 and X 2, Standard deviation
S1 and S2 respectively. In order to test the difference between the two sample means, we apply the
following formula.
X1  X 2
Z = 
where S X 1  X 2 =  S12 S22


S X1  X 2  n1 n2
Example 1
An agronomist was interested in a particular fertilizer yield output. He planted maize on 50 equal pieces
of land and the mean harvest obtained later was 60 bags per plot with a standard deviation of 1.5 bags.
The crops grew under natural circumstances and conditions without the soil being treated with any
fertilizer. The same agronomist carried out an alternative experiment where he picked 60 plots in the
same area and planted the same plant of maize but a fertilizer was applied on these plots. After the
harvest it was established that the mean harvest was 63 bags per plot with a standard deviation of 1.3
bags

Required
Conduct a statistical test in order to establish whether there was a significant difference between the
mean harvests under the two types of field conditions. Use 5% level of significance.

Solution

[email protected] Page 39

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

H0: µ1 = µ2

HA: µ1 ≠ µ2

Critical values of the two tailed test at 5% level of significance are ±1.96
The standardized value of the difference between sample means is given by Z where

X1  X B
Z = 
where S X 1  X 2 =  1.52 1.32


S X1  X 2  50 60

Z =
 60  63
0.045  0.028
= 11.11

Acceptance region

- 1.96 0 +1.96
Since 11.11 >1.96, we reject the null hypothesis but accept the alternative hypothesis at 5% level of
significance i.e. the difference between the sample mean harvest is statistically significant. This implies
that the fertilizer had a positive effect on the harvest of maize
Note: You don’t have to illustrate your solution with a diagram.

Example 2
An observation was made about reading abilities of males and females. The observation led to a
conclusion that females are faster readers than males. The observation was based on the times taken by
both females and males when reading out a list of names during graduation ceremonies.
In order to investigate into the observation and the consequent conclusion a sample of 200 men were
given lists to read. On average each man took 63 seconds with a standard deviation of 4 seconds
A sample of 250 women were also taken and asked to read the same list of names. It was found that
they on average took 62 second with a standard deviation of 1 second.
Required
By conducting a statistical hypothesis testing at 1% level of significance establish whether the sample
data obtained support earlier observation or not

Solution
H0: µ1 = µ2
HA: µ1 ≠ µ2
Critical values of the two tailed test at 1% level of significance is 2.58.

[email protected] Page 40

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

X1  X 2
Z =

S X1  X 2 
63  62
Z = = 3.45
42
 250
2
1
200

Acceptance region

Rejection region
Rejection region

- 2.58 0 +2.58 +3.45


Since 3.45 > 2.33 reject the null hypothesis but accept the alternative hypothesis at 1% level of
significance i.e. there is a significant difference between the reading speed of Males and females,
thus females are actually faster readers.

TEST OF HYPOTHESIS ON PROPORTIONS


This follows a similar method to the one for means except that the standard error used in this case:
Pq
Sp =
n
P
Z score is calculated as, Z = Where P = Proportion found in the sample.
Sp
Π =the hypothetical proportion.

Example
A member of parliament (MP) claims that in his constituency only 50% of the total youth population
lacks university education. A local media company wanted to ascertain that claim conducted a survey
taking a sample of 400 youths, of these 54% lacked university education.

Required:
At 5% level of significance, confirm if the MP’s claim is wrong.

Solution.
Note: This is a two tailed tests since we wish to test the hypothesis that the hypothesis is different (≠)
and not against a specific alternative hypothesis e.g. < less than or > more than.

H0: π = 50% of all youth in the constituency.


HA: π ≠ 50% of all youth in the constituency.

[email protected] Page 41

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

pq 0.5 x0.5
Sp = = = 0.025
n 400
0.54  0.50
Z= = 1.6
0.025

At 5% level of significance for a two-tailored test the critical value is 1.96. Since calculated Z value <
tabulated value (1.96).
i.e. 1.6 < 1.96 we accept the null hypothesis.
Thus the MP’s claim is accurate.

HYPOTHESIS TESTING ABOUT THE DIFFERENCE BETWEEN TWO PROPORTIONS


It is used to test the difference between the proportions of a given attribute found in two random
samples.
The null hypothesis is that there is no difference between the population proportions. It means two
samples are from the same population.
Hence
H0: π1 = π2
The best estimate of the standard error of the difference of P 1 and P2 is given by pooling the samples
and finding the pooled sample proportions (P) thus
p1n1  p2 n2
P=
n1  n2

Standard error of difference between proportions


pq pq
S  p1  p2   
n1 n2

P1  P2
And Z =
S  p1  p2 

Example
In a random sample of 100 persons taken from village A, 60 are found to be consuming tea. In another
sample of 200 persons taken from a village B, 100 persons are found to be consuming tea. Do the data
reveal significant difference between the two villages so far as the habit of taking tea is concerned?

Solution
Let us take the hypothesis that there is no significant difference between the two villages as far as the
habit of taking tea is concerned i.e. π1 = π2
We are given
P1 = 0.6; n1 = 100
P2 = 0.5; n2 = 200

[email protected] Page 42

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Appropriate statistic to be used here is given by


p1n1  p2 n2
P =
n1  n2

=
 0.6100   0.5 200
100  200
= 0.53
q = 1 – 0.53
= 0.47
pq pq
S  P1  P2  = 
n1 n2

=
 0.53 0.47    0.53 0.47
100 200
= 0.0608
0.6  0.5
Z=
0.0608
= 1.64

Since the computed value of Z is less than the critical value of Z = 1.96 at 5% level of significance
therefore we accept the hypothesis and conclude that there is no significant difference in the habit of
taking tea in the two villages A and B
Example
Ken industrial manufacturers have produced a perfume known as “fianchetto.” In order to test its
popularity in the market, the manufacturer carried a random survey in Back rank city where 10,000
consumers were interviewed after which 7,200 showed preference. The manufacturer also moved to
Rook town where he interviewed 12,000 consumers out of which 1,0000 showed preference for the
product.

Required
Design a statistical test and hence use it to advice the manufacturer regarding the differences in the
proportion, at 5% level of significance.

Solution
H0: π1 = π2
HA: π1 ≠ π2

The critical value for this two tailed test at 5% level of significance = 1.96.

Z=
 P1  P2 
S  P1  P2 

[email protected] Page 43

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Where;

Sample 1 Sample 2
Sample size n1 = 10,000 n2 = 12,000
Sample proportion of success 1.2 P2 = 0.83
P1 =
10
Population proportion of success. Π1 Π2

pq pq
Now S  p1  p2  = 
n1 n2
p1n1  p2 n2
Where P =
n1  n2
And q = 1 – p
 in our case
10, 000(0.72)  12, 000(0.83)
P=
10, 000  12, 000
84, 000
=
22, 000
= 0.78
 q = 0.22
0.78  0.22  0.78  0.22 
S  P1  P2   
10, 000 12, 000
= 0.00894
0.72  0.83
Z= = 12.3
0.00894

Since 12.3 > 1.96, we reject the null hypothesis but accept the alternative. The differences between
the proportions are statistically significant. This implies that the perfume is much more popular in
Rook town than in Back rank city.

t distribution (student’s t distribution) tests of hypothesis (test for small samples n < 30)
For small samples n < 30, the method used in hypothesis testing is exactly similar to the one for large
samples except that t values are used from t distribution at a given degree of freedom v, instead of z
score, the standard error Se statistic used is also different.
Note that v = n – 1 for a single sample and n1 + n2 – 2 where two sample are involved.

a) Test of hypothesis about the population mean


[email protected] Page 44

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

When the population standard deviation (S) is known then the t statistic is defined as
X  S
t = where S X 
SX n
Follows the students t distribution with (n-1) d.f. where
X = Sample mean
μ = Hypothesis population mean
n = sample size
and S is the standard deviation of the sample calculated by the formula

 X  X 
2

S= for n < 30
n 1
If the calculated value of t exceeds the table value of t at a specified level of significance, the null
hypothesis is rejected.

Example
Ten oil tins are taken at random from an automatic filling machine. The mean weight of the tins is 15.8
kg and the standard deviation is 0.5kg. Does the sample mean differ significantly from the intended
weight of 16kgs. Use 5% level of significance.
Solution
Given that n = 10; x = 15.8; S = 0.50; μ = 16; v = 9
H0: μ = 16
HA: μ ≠ 16
0.5
= SX 
10
15.8  16
t = 0.5
10

0.2
=
0.16
= -1.25
The table value for t for 9 d.f. at 5% level of significance is 2.26. The computed value of t is smaller than
the table value of t. therefore, difference is not significant and the null hypothesis is accepted.

b) Test of hypothesis about the difference between two means


The t test can be used under two assumptions when testing hypothesis concerning the difference
between the two means; that the two are normally distributed (or near normally distributed)
populations and that the standard deviation of the two is the same or at any rate not significantly
different.

Appropriate test statistic to be used is

[email protected] Page 45

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

X1  X 2
t = at n1 + n2 – 2 d.f.
S X X 2
 1 

The standard deviation is obtained by pooling the two sample standard deviation as shown below.

Sp =
 n1  1 S12   n2  1 S22
n1  n2  2
Where S1 and S2 are standard deviation for sample 1 & 2 respectively.
Sp Sp
Now S X 1 = and S X 2 =
n1 n2
S X1X 2 = S X2  S X2 2
  1

n1  n2
Alternatively S = Sp
 X1X 2  n1n2

Example
Two different types of drugs A and B were tried on certain patients for increasing weights, 5 persons
were given drug A and 7 persons were given drug B. the increase in weight (in pounds) is given below
Drug A 8 12 16 9 3
Drug B 10 8 12 15 6 8 11
Do the two drugs differ significantly with regard to their effect in increasing weight? (Given that v= 10;
t0.05 = 2.23)

Solution
H0: μ1 = μ2
HA: μ1 ≠ μ2

X1  X 2
t=
S X1X 2
 

Calculate for X 1 , X 2 and S


X1 X1 – X 1 (X1 – X 1 )2 X2 (X2 – X 2 ) (X2 – X )2
8 -1 1 10 0 0
12 +3 9 8 -2 4
13 +4 16 12 +2 4
9 0 0 15 +5 25
3 -6 36 6 -4 16

[email protected] Page 46

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

8 -2 4
11 +1 1
ΣX1 = 45 Σ(X1– X 1 ) = 0 Σ (X1 – X 1 )2= 62 ΣX2= 70 Σ (X2 – X 2 ) = 0 Σ (X2– X 2 )2= 54

X1 =
X 1
=
45
=9 X2 =
X 2

70
 10
n1 5 n2 7

62 54
S1 = = 3.94 S2 = 3
4 6

Sp =
 4 15.5   6 9
10

= 3.406

11.6 11.6 75


S X1X 2   or 3.406
  5 7 57

= 1.99

X1  X 2 9  10
t = =
S X1X 2 1.99
 
= 0.50

Now t0.05 (at v = 10) = 2.23 > 0.5

Thus we accept the null hypothesis.


Hence there is no significant difference in the efficacy of the two drugs in the matter of increasing
weight

Example
Two salesmen A and B are working in a certain district. From a survey conducted by the head office, the
following results were obtained. State whether there is any significant difference in the average sales
between the two salesmen at 5% level of significance.

A B
No. of sales 20 18
Average sales in $ 170 205
Standard deviation in $ 20 25

[email protected] Page 47

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Solution
H0: μ1 = μ2
HA: μ1 ≠ μ2
Where

Sp =
 n1  1 S12   n2  1 S22
n1  n2  2
n1  n2
S X 1  X 2 = Sp
  n1n2

Where: X 1 =170, X 2 = 205, n1 = 20, n2 = 18, S1 = 20, S2 = 25, V = 36

19   202   17   252 


Sp =
20  18  2

= 22.5

38
S X 1  X 2  22.5
  360
= 7.31

170  205
t=
7.31
= 4.79
t0.05(36) = 1.9 (Since d.f > 30 we use the normal tables)

The table value of t at 5% level of significance for 36 d.f. when d.f. >30, that t distribution is the same as
normal distribution is 1.9. since the value computed value of t is more than the table value, we reject
the null hypothesis. Thus, we conclude that there is significant difference in the average sales between
the two salesmen

Chi square hypothesis tests (Non-parametric test)(X2)


They include amongst others
i. Test for goodness of fit
ii. Test for independence of attributes
iii. Test of homogeneity
iv. Test for population variance

The Chi square test (χ2) is used when comparing an actual (observed) distribution with a hypothesized or
explained distribution.

[email protected] Page 48

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

O  E 
2

It is given by; χ =
2
 E
Where O = Observed frequency

E = Expected frequency
The computed value of χ is compared with that of tabulated χ2 for a given significance level and degrees
2

of freedom.

i. Test for goodness of fit


This test is used when we want to determine whether an actual sample distribution matches a known
theoretical distribution
The null hypothesis usually states that the sample is drawn from the theoretical population distribution
and the alternate hypothesis usually states that it is not.
ii) Test of independence of attributes
This test discloses whether there is any association or relationship between two or more attributes or
not. The following steps are required to perform the test of hypothesis.
1. The null and alternative hypothesis are set as follows
H0: No association exists between the attributes
H1: an association exists between the attributes
2. Under H0 an expected frequency E corresponding to each cell in the contingency table is
found by using the formula
RC
E=
n
Where R = a row total, C = a column total and n = sample size
3. Based upon the observed values and corresponding expected frequencies the χ2 statistic
is obtained using the formula.
O  E 
2

χ =
2
 E
4. The characteristic of this distribution are defined by the number of degrees of freedom
(d.f.) which is given by
d.f. = (r-1) (c-1),
Where r is the number of rows and c is number of columns corresponding to a chosen
level of significance, the critical value found from the chi squared table
5. The calculated value of χ2 is compared with the tabulated value χ2 for (r-1) (c-1) degrees
of freedom at a certain level of significance. If the computed value of χ2 is greater than
the tabulated value, the null hypothesis of independence is rejected. Otherwise we
accept it.
Example
A sample of 200 people where a particular devise was selected of these 100 were given a drug and the
others were not given any drug. The results are as follows
Drug No drug Total
Cured 65 55 120
Not cured 35 45 80
Total 100 100 200

[email protected] Page 49

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Test whether the drug will be effective or not, at 5% level of significance.

Solution
Let us take the null hypothesis that the drug is not effective in curing the disease.
Applying the χ2 test
The expected cell frequencies are computed as follows
R1C1 120 100
E11 = = = 60
n 200

R1C2 120 100


E12 = = = 60
n 200

R2C1 80 100
E21 = = = 40
n 200

R2C2 80 100
E22 = = = 40
n 200

The table of expected frequencies is as follows


60 60 120
40 40 80
100 100 200

Arranging the observed frequencies with their corresponding frequencies in the following table we get
O E (O – E) 2 (O – E) 2 /E
65 60 25 0.417
35 40 25 0.625
35 40 25 0.417
45 40 25 0.625
Σ(O – E) 2 /E = 2.084

O  E 
2

χ =
2
 E

= 2.084
V= (r –1) (c-1) = (2 – 1) (2 –1) = 1; tabulated( 0.05) = 3.841
2

The calculated value of χ2 is less than the table value. The hypothesis is accepted. Hence the drug is not
effective in curing the disease.

iii. Test of homogeneity

[email protected] Page 50

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

It is concerned with the proposition that several populations are homogenous with respect to some
characteristic of interest e.g. one may be interested in knowing if raw material available from several
retailers are homogenous. A random sample is drawn from each of the population and the number in
each of sample falling into each category is determined. The sample data is displayed in a contingency
table
The analytical procedure is the same as that discussed for the test of independence

Example
A random sample of 400 persons was selected from each of three age groups and each person was
asked to specify which types of TV programs they preferred. The results are shown in the following table

Type of program
Age group A B C Total
Under 30 120 30 50 200
30 – 44 10 75 15 100
45 and above 10 30 60 100
Total 140 135 125 400
Test the hypothesis that the populations are homogenous with respect to the types of television
program they prefer, at 5% level of significance.

Solution
Let us take hypothesis that the population are homogenous with respect to different types of television
program they prefer
Applying χ2 test

O E (O – E) 2 (O – E) 2 /E
120 70.00 2500.00 35.7143
10 35.00 625.00 17.8571
10 35.00 625.00 17.8571
30 67.50 1406.25 20.8333
75 33.75 1701.56 50.4166
30 33.75 14.06 0.4166
50 62.50 156.25 2.500
15 31.25 264.06 8.4499
60 31.25 826.56 26.449
Σ(O – E) 2 /E = 180.4948

O  E 
2

χ =
2
 E

The table value of χ2 for 4d.f. at 5% level of significance is 9.488

[email protected] Page 51

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

The calculated value of χ2 is greater than the table value. We reject the hypothesis and concluded that
the population are not homogenous with respect to the type of TV programs preferred, thus the
different age groups vary in choice of TV programs.
Summary of Formulae in Hypothesis Testing

(a) Hypothesis testing of mean


For n>30
X  S
Z= Where S X  at  level of significance.
SX n
For n < 30
X  S
t= where S X 
SX n
at n – 1 d.f
 level of significance

(b) Difference between means


For n > 30

X1  X B
Z=
S X1X 2
 
S12 S22
Where S  
 X1X 2  n1 n2
At  = level of significance
For n < 30
X1  X 2
t= at n1 + n2 – 2 d.f
S X1X 2
 
n1  n2
where S  Sp
 X1X 2  n1n2

and S p 
 n1  1 S12   n2  1 S22
n1  n2  2

(c) Hypothesis testing of proportions


P 
Z=
Sp

[email protected] Page 52

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Pq
Where: Sp =
n
P = Proportion found in sample
q=1–p
 = hypothetical proportion
(d) Difference between proportions
P1  P2
Z=
S P1  P2 
Where:
pq pq
S P1  P2  

n1 n2
p n  p2 n2
p= 1 1
n1  n2
q=1–P
(e) Chi-square test
O  E 
2
2
X =  E
Where O = observed frequency
Column total × Row total
E= = expected frequency
Sample Size

TOPIC FIVE: TIME SERIES ANALYSIS AND INDEX NUMBERS


5.1 TIME SERIES ANALYSIS
This is the mathematical or statistical analysis on past data arranged in a periodic sequence.

Decision making and planning in an organization involves forecasting which is one of the time series
analysis.

Impediments in time series analysis

Accuracy of data in reflecting

a) Drastic changes e.g. in the advent of a major competitor, period of war or sudden change of
taste.

b) For long term forecasting internal and external pressures makes historical data less effective.

1. Moving Average

[email protected] Page 53

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Periodical data e.g. monthly sales may have random fluctuation every month despite a general trend
being evident. Moving average helps in smoothing away these random changes.

A moving average is the forecast for a period that takes the average of the previous periods.

Example:

The table below represents company sales, calculate 3 and 6 monthly moving averages, for the data

Months Sales

January 1200

February 1280

March 1310

April 1270

May 1190

June 1290

July 1410

August 1360

September 1430

October 1280

November 1410

December 1390

Solution.

These are calculated as follows

Jan + Feb + Mar 1200 +1280 +1310


April’s forecast = 3 = 3

[email protected] Page 54

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Feb + Mar + Apr 1280 +1310 +1270


May’s forecast = 3 = 3

And so on…

Similarly for 6 monthly moving average

Jan + Feb + Mar + Apr + May + Jun 1200 +1280 +1310 +1270 +1190 +1290
July forecast = 6 = 6

And so on…

3 months moving average 6 months moving average

April 1263

May 1287

June 1257

July 1250 1257

August 1297 1292

September 1353 1305

October 1400 1325

November 1357 1327

December 1373 1363

Characteristics of moving average

1) The more the number of periods in the moving average, the greater the smoothing effect.

2) Different moving averages produce different forecasts.

3) The more the randomness of data with underlying trend being constant then the more the periods
should be involved in the moving averages.

[email protected] Page 55

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

2. Exponential smoothing

This is a weighted moving average technique, it is given by:

New forecast = Old forecast +  (Latest Observation – Old forecast)

Where  = Smoothing constant

This method involves automatic weighing of past data with weights that decrease exponentially with
time.

Example

Using the previous example and smoothing constant 0.3 generate monthly forecasts

Months Sales Forecasts:  = 0.3

January 1200

February 1280 1200

March 1310 1224

April 1270 1250

May 1190 1256

June 1290 1233

July 1410 1250

August 1360 1283

September 1430 1327

October 1280 1358

November 1410 1335

December 1390 1357

Solution

[email protected] Page 56

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Since there were no forecasts before January we take Jan to be the forecast for February.

 Feb – 1200

For March;

March forecast = Feb forecast + 0.3 ( Feb sales – Feb forecast)

= 1200 + 0.3 (1280 – 1200)

=1224

Note:

 The value  lies between 0 and 1.

 The higher the  value, the more the forecast is sensitive to the current status.

Characteristics of exponential smoothing

 More weight is given to the most recent data.

 All past data are incorporated unlike in moving averages.

 Less data is needed to be stored unlike in periodic moving averages.

5.2 INDEX NUMBERS


An index number is an attempt to summarize a whole mass of data into one figure. The single figure
shows how one year differs from another year.

It is a statistical devise used to measure the change in the level of prices, wages output and other
variables at given times, relative to their level at an earlier time which is taken as the base for
comparison purposes

 Pn

A simple price index =  Po × 100 (an unweighted price index)

 Qn

A simple quantity index =  Qo × 100 (an unweighted quantity index)

[email protected] Page 57

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

Where pn is the price of a commodity in the current year (the year for which the price index to be
calculated)

Where po is the price of the same commodity in the base year (the year for comparison purposes)

Similarly Qn and Qo are defined in the same way

AGGREGATE PRICE INDEX NUMBERS AND QUANTITY INDEX NUMBERS

PRICE INDEX QUANTITY INDEX

LASPEYRE’S INDEX p q n o q pn o

P q o o
× 100 q po o
× 100

PAASCHE’S INDEX p q n n q qn n

Pq o n
× 100 q po n
× 100

p qn n

Value index = P qo o
× 100

MODIFIED FORM OF THE LASPEYRE’S PRICE INDEX NUMBER


  wpn
po o
100
Laspeyre’s Price index w o

Where w0 are the proportions of the total expected in the basic period. This formula is frequently used
to calculate retail price index.
[email protected] Page 58

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

CHANGING THE BASE OF THE INDEX

For comparison purposes if two series have different base years, it is difficult to compare them directly.
In such cases, it is necessary to change the base year of one of the series (or both) so that both have the
same base.

It is also necessary to keep the index relevant to current conditions hence the need to change the base
from time to time.

Example;

Year 1985 1986 1987 1988 1989 1990 1991 1992

Price index 100 104 108 109 112 120 125 140

Suppose we wish to change the base year to 1989

We recalculate each index by expressing it as a percentage of 1989

Previous index Recalculated index

1985 100 100


112 × 100 = 89.3

1986 104 104


112 × 100 = 92.9

1987 108 108


112 × 100 = 96.4

1988 109 109


112 × 100 = 97.3

1989 (new base year) 112 112


112 × 100 = 100

[email protected] Page 59

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

1990 120 120


112 × 100 = 107.1

1991 125 125


112 × 100 = 111.6

1992 140 140


112 × 100 = 125.0

When changing the base year, it is advisable to update the weights used in the base year.

CHAIN BASED INDEX NUMBERS

A chain based index is one where the index is calculated every year using the previous year as the base
year. This type of index measures rate of change from year to year.

This method is suitable where weights are changing rapidly and items are constantly being brought into
the index and unwanted items taken out. It can be a price or quantity index

Previous index Recalculated chainbased fixedbased index


index

1985 100 100 100(1985 base year

1986 104 104 104


100 × 100 = 104 100 × 100 = 104

1987 108 108 108


104 × 100 = 103.8 100 × 100 = 108

1988 109 109 109


108 × 100 = 100.9 100 × 100 = 109

[email protected] Page 60

Downloaded by Linus Giovann ([email protected])


lOMoARcPSD|17473482

MKU: Quantitative Methods 2018

1989 112 112 112


109 × 100 = 102.8 100 × 100 = 112

1990 120 120 120


112 × 100 = 107.1 100 × 100 = 120

1991 125 125 125


120 × 100 = 104.2 100 × 100 = 125

1992 140 140 140


120 × 100 = 112 100 × 100 = 140

[email protected] Page 61

Downloaded by Linus Giovann ([email protected])

You might also like