0% found this document useful (0 votes)
269 views108 pages

Statistical Methods of Financial Accounting

Classroom lecture notes presentation of financial management

Uploaded by

eskias tetemke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
269 views108 pages

Statistical Methods of Financial Accounting

Classroom lecture notes presentation of financial management

Uploaded by

eskias tetemke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Statistics For Finance

ETHIO LENS COLLEGE


CENTER OF DISTANCE EDUCATION

DEPARTMENT OF ACCOUNTING AND FINANCE

Course Title:-Statistics for Finance


Course Code: - AcFn 3022
Credit Hour:- 4

Sepet-2013

Ethio lens college Page 1


Statistics For Finance

CHAPTER ONE
AN OVERVIEW OF STATISTICS

1.1 INTRODUCTION

The term statistics is never a new phenomenon to societies of different economic,


social, and technological reflections. States have been collecting data of the size of their
population, the amount of tax collections and others many which were, in fact, believed
to allow authorities in establishing different policy instruments of economic,
environmental, political, and demographic implications.
Chapter Objectives
Thus, while completing this chapter, students should be able to:
 Comprehend the role of statistics in business decision-making.
 Differentiate between descriptive and inferential statistics
 Explain the functions of statistics
 Know how to collect data
 Understand sampling and the techniques used in sampling

1.2 Definition of Statistics

? What do you mean by Statistics?


_________________________________________________________________________________
Though the term statistics has multi definitions, the resultant of these definitions can
be seen from two simple perspectives:
 Statistics (Plural) - refers to the numerical statements of facts: … … … …“
aggregate of facts affected by multiplicity of causes, numerically expressed,
enumerated or estimated according to a reasonable standards of accuracy,
collected in a systematic manner for a predetermined purpose and placed in
relation to each other” Prof. Horace Secrist
From this definition, statistics involves the following features:
Statistics are aggregate of facts

Ethio lens college Page 2


Statistics For Finance

Example If we say that Mr. Alex’s age is 35, what sense would it, then, make?
Therefore, statements of facts must allow comparisons and making of
relationships to others.
Statistics are affected by multiplicity of causes: statistical facts and figures are never
independent; rather they are affected by a number of operating forces, which may not
be measurable.
Example The rate of transmission of the HIV pandemic in the year 199x has
decreased by 2.5%. You may ask why; the possibilities are attitude change,
political commitments, education etc. Then can you specifically and separately
measure their contribution to the aggregate?
Statistics are numerically expressed: statements of facts like “the Ethiopian economy is
growing” or ‘Rural-to-Urban influx of population will increase’ are not statistics. As a
matter of fact, all statistics are numerical statements of facts, i.e., expressed in
numbers.
Reasonable standards of accuracy: facts of a given phenomenon can be derived either
through converting and measurement or estimation. However, in many of the
statistical enquires, it may be difficult to acquire a 100% accuracy of the facts. Thus it
is important to use some reasonable standards of accuracy.
Statistical collections are systematic: we don’t collect data in any way possible; rather
it demands that one has to put a comprehensive and appropriate plan for data
collection. Statistics are purposeful
Comparability: statistics must be place in relation to each other so that some
comparable figures can be drawn accordingly.
Activity
Which statement is statistics?
- Chewing habit of chat among the youth in Mekelle Town is growing
- The number of ‘chat’ chewing youth has increased by 20% in the year 199x while
in the year 199x the rate was 5% only
The above definition and basic characteristics describe only the facts. However, it is
important to know how the facts are being built. This leads us to define statistics as a
series of logical procedures ranging from data collection through interpretation. Then,
 ‘Statistics can be defined as the collection, presentation, analysis and
interpretation of numerical data: [singular sense].
Ethio lens college Page 3
Statistics For Finance

Data Collection
It is the first step in any statistical investigations and care must be enveloped here for
it prints the foundation of statistical analysis. Here it is important to locate the sources
of data and the techniques that need to be put in to effect such that one can realize
data utility.
Organization
This involves three jobs:
Editing – to rule for omissions, inconsistencies, wrong computations etc
Classifying- arranging data according to some attributes of homogeneity.
Tabulation – arranging data into column and rows to ensure clarity.
Presentation
This involves representing the statistical relevance on to diagrams and graphs such as
pie charts, bar graphs, histograms etc.
Analysis
Once the first three procedural requirements are put into effect, then it is on the ‘go
ahead’ position to allow the search for information useful for decision- makers. Notice
that analysis and interpretation is note one and the same. The premier deals with the
data it self while the later jumps beyond.
Interpretation
It is about drawing meaningful conclusions from the data collected and analyzed and
based on which implementation packages are set forth in resolving the managerial
problem on hand.
1.2.1 Classification of Statistics

? State the classification on statistics?


_________________________________________________________________________________
Statistical investigations can take either of the following form:
Descriptive Statistics – is a branch of statistics that mainly aim at description and
analysis of a given sample size but do not attempt to draw a conclusion from the same.
For example the average score of management students in section 001 this semester is
69.5%.

Ethio lens college Page 4


Statistics For Finance

Inferential Statistics- is a branch of statistics that end at drawing important


conclusions and making of generalizations on the basis the descriptive data.
For example:
1) The average age of girls in a class is 25 years. Therefore, female students in the
New Millennium College are of age 25 or so.
2) According to the 1997 E.C government economic performance report, the economy
has grown by 7.5%. Ethiopia is, then, one of the few countries which will achieve the
Millennium Development Goals (MDGs).
1.2.2. Functions of Statistics
Generally speaking, statistics tend to serve almost all managerial concerns with
varying degrees of intensities in all forms and sizes of enterprises; in fact, it functions
as an instrument in making an informed judgment. It, moreover, accelerates the
momentum and quality of the decision making process. The question here is how?
Below are some possible functions/ answers.
It allows definite presentation of facts - not all information is equal in terms of what
they are meant and how.
Consider the following examples,
1. Compared to the year 1996, the Ethiopian economy has grown
2. The number of voters in the 3rd round national election has shown increase by
15% than in the 2nd round, which were only 20 million voters.
Which of the statements is definite, then? The answer is the 2 nd one. In short,
statistics presents data (or statements of facts) in a more clear and precise and definite
manner.
Helps in condensing voluminous data-statistics aim at drawing and displaying
meaningful overall information from a huge pool of data.
It enhances comparison -for a data to be of meaning it has to be displayed in relation
to others of the same characteristic/ attribute. If not, it would be meaningless. If you
read back the above two examples, the second statement is more meaningful than the
first for it allows the use of comparable attributes.
It helps in formulating and testing hypothesis - with statistics it is possible to frame
an assumption (formulation) and the same can be tested on whether the assumptions
are right or wrong. If the test is, then, in favour, then become theories that provide
direction for managers in making decisions.
Ethio lens college Page 5
Statistics For Finance

It helps in predicting the future - mangers make provision of the future, which
almost is unknown to their view. Therefore, knowledge of future trends is the only
weapon on hand to visualize the future. This is made happen with the use of statistical
methods.
It helps in policy formulation-many of governmental data collections aim at
formulating appropriate policy, of what ever type, that help in the proper
administration of human and non-human resources.
1.2.2 Limitations of Statistics
- Statistics does not deal with individual measurements. Characteristically,
statistics deals only with aggregates of facts; in fact these aggregates allow
comparison and drawing meaningful information. Therefore individual
measurements, for instance ‘the number of students in New Millennium College
is 2,500’ are meaningless and barely considered as statistics.
- Statistics deals only with numerical characteristics. There are situations where
it may be difficult to express some characteristics quantitatively. In this case
the interpretation or conclusion would be subjective and not statistical.
- Statistical facts (results) are true only on an average. Statistical undertakings
end at drawing a conclusion; however, the conclusion draw cannot be universally
true.
- Statistics is only one of the methods of studying a problem . As statistics fail to
provide best solution under all circumstances, the measurements must be again
evaluated using other evidences.
- Statistics can be misused
Check point!
Make sure that you have very well understood the real sense of statistics as it applies
to many of the business decision schemes
1.3 Data collection
1.3.1 Introduction
Once the purpose of the investigation is well spelled, the scope well defined, the unit of
data collection is decided, the sources, techniques, frame and degree of accuracy are
principally and comprehensively uncovered, and other pre operating activities
(preliminaries) are put in place, n is the time to step up to the first step called data
Ethio lens college Page 6
Statistics For Finance

collection. Data set the foundations for statistical analysis. There fore, the volume and
quality of data collected and how it is collected are of important boundaries between
which effective information is and which is not.
1.3.2 Sources of data
Generally there are two types of sources of data, namely Primary sources and
secondary sources.
Primary sources these are sources which the investigator meets in person so as
to generate first hand data. Or, a primary source is one that it self collects the data.
The sources thought may vary depending on the nature of the enquiry.
Primary sources are used for either of the following reasons:
Secondary sources may be mistaken due to errors that may have been committed while
in recording the primary data.
Primary sources enable to show a greater detail of data
Primary sources also allow tracing the procedures used in collecting the data as well as
in selecting the sample.
Thus, depending on the type of source, data can be classified in to two:
Primary data- the type of data that are originally and directly generated from the
mouth and action/behaving of the respondents. Example, if a manufacturing company
is collecting data from the users of its product on issues concerning the product’s
capability in satisfying their needs compared to that of competitors’; then the primary
sources are the customers and the primary data is the word of mouth and behavioral
reflections of the customers
Secondary data- unlike to the primary data, secondary data are collected from already
processed documents like journals, articles, government releases, and other documents
of relevance. One can deduct that with the use of secondary sources it is easy to save
time and (that would other wise have been lost in planning and executing the collection
project). More over, it is useful in times when it is impossible to collect the primary
data. In fact, the limitations that are typically related with secondary data are that of
‘fit’ and ‘accuracy’
Factors that determine the choice between primary and secondary data
The question as to whether to use primary data or secondary data is determined by the
following factors:
Nature and scope of the enquiry
Ethio lens college Page 7
Statistics For Finance

Availability of financial resources


Availability of time
Degree of accuracy required
The collecting agency: individual, organization, government body etc.
1.3.3 Methods of data collection
The techniques used in gathering data should be able to spell the informational utility
the interpretation phase deserves otherwise. In fact, the choice of data collection
method requires some practice and skill. We will consider the techniques used to collect
mainly primary data:
Direct Personal Interview this is used when there need be an in depth search for
opinions and back up attitudes of the informants. Hence, a face-to-face contact with the
respondents would be a must technique.
Strengths: provides quality data because it allows reading emotional appeals of
respondents; allows increased response and interaction etc.
Limitations: it costs more time and money; It needs high degree of skill and experience;
it is usually used for intensive but not extensive field surveys.
Observation some times data for business applications are collected by observation.
Strengths: it lessens the degree of biasness
Limitations: it may not be always possible to observe all the information needed
Telephone- telephone interviewing is another popular technique for gathering
information from informants.
Strengths: It is relatively less expensive and relatively easy
It is faster
It provides a fairly high response rate
Limitations: Only simple questions can be asked
Used only for short surveys
Not all persons own a telephone
Mail questionnaire here written questionnaires are mailed to different respondents
and is usually used to gather data when a mailing list exists or when
respondents are scattered over a wide area.
Strengths: Allows contacting a large number of respondents at a time
It is relatively cheap and expeditious
Allows respondents to have more time and space to think and answer
Ethio lens college Page 8
Statistics For Finance

Limitations: It applies only to literates


Questionnaires may not get back
Responses may not be accurate
1.4 Sampling and Sampling Techniques
Sampling: meaning
In most of business applications, it has been consistently proven difficult, if not
impossible, to include every individual or item of interest in to the statistical
investigation. This has, so far, been justified by the otherwise inescapable limitations
associated with a complete enumeration (total count). Therefore, it grows in
importance that sampling, counting on the segment part of the totality for the reason
mentioned below:
1. It is not technically or economically feasible to take the entire population into
consideration.
2. Due to dynamic changes in business, industrial and social environment, it is
necessary to make quick decisions based upon the analysis of information
managers, however, seldom have the time to collect and process data for the
entire population.
3. Sample, if representative may yield more accurate results than total census.
This is because samples can be more accurately supervised and data can be
carefully selected. It saves time.
4. Quality of some products must be tested by destroying the products.
1.4.1 Methods of Sampling
Since the entire population can not be studied in practice, sample-driven investigation
prevail a must. A sample is a representative of the bigger population, thus should act
as reliable estimator: being unbiased; involving a minimum of variance; being
sufficient, being consistent added with being asymptotically normal. The estimator
fulfilling these characteristics is qualified as Estimator. The sample has to encompass
the following essentials so that the data is a good one:
 Representativeness
 Adequacy
 Independence
 Homogeneity
Generally, there are two methods of sampling:
Ethio lens college Page 9
Statistics For Finance

I. Random [probabilistic] sampling


II. Non-random [non-probabilistic] sampling
I. Random [probabilistic] sampling
* If a sample is selected in such a way that every single observation of the sampled
population has a non-zero chance of being included in the sample it is called random
sampling.
Advantages:
 Does not depend on the existence of detailed information about the universe.
 Provides estimates unbiased in nature and allow measurable precision
 Enables evaluation of the relative efficiency of various sample designs
Disadvantages:
 Requires a high level of skill and experience
 Requires a lot of time to plan and execute the method
 It involves higher cost
Types of random sampling
i. Simple Random Sampling [SRS]
This involves sampling approach where every observation has the same (equal) chance
of being selected. Some of the techniques used are:
A. The lottery method: each unit of the population is coded or given and
identification number and a draw is made where other attributes remaining the
same for all the units.
B. Table of random numbers: this presents a printed table listing numbers in
random order from which a sample can be drawn.
Advantages of simple random sampling
 Less affected by personal bias
 As the size of the sample increases, it becomes representative of the population
 Accuracy of estimates can easily be assessed
Limitations of simple random sampling
 The size of the sample required to ensure statistical reliability is usually large
 Necessitates a completely cataloged population from which to draw the sample
 Selections can be widely dispersed [geographically] involving larger time and cost
consumption
 It may end at selecting a non-random looking outcome
Ethio lens college Page 10
Statistics For Finance

ii. Stratified Random Sampling


When the population is heterogeneous [lack similarity], stratified random sampling is
inevitable allowing more chance of representativeness. Once the units are put in to a
stratum, representative selection can be made. IN stratified random sampling the
sampling is designed so that a designated number of items is chosen from each
stratum.
Procedures in selecting a stratified random sample:
i. Select your base of stratification
ii. Decide the number of strata
iii. Decide the sample size of each stratum
Sample size ni (i=1,2,……, k) is determined from the total sample size N=
n1+n2+….+nk assuming the following features:
a. Equal allocation: if n=5, then ni will be 1/5th of n
b. Proportional allocation: if the population size is equal, then the proportion
will be n/N. However, if not equal determining ratio may be difficult.
c. Disproportional allocation: some times strata differ in variability (variance
and is reasonable to take large samples from more variable strata and
smaller from less variable strata.
Merits of stratified sampling:
1. Increases representation
2. Grater accuracy
Limitations of stratified sampling:
1. Involves much of cost and time
2. Needs greater skill and experience
3. Liable to bias
III. Systematic Random Sampling:
SRS involves selection of a unit randomly and then selects the additional units at
equally spaced intervals until the required number of units is obtained.
Example, if the number of units to be studied is 500, then 500/25 is 20. Assume that
we take the 5th item from the first choice; next [Si] will be 25th, 45th . . . etc. This is
relatively simple technique than SRS.
Merits of systematic Sampling Design:
1. It is simple and convenient to adopt
Ethio lens college Page 11
Statistics For Finance

2. The time and work in sampling are relatively less


3. It is suitable if the population is very large

Limitations of Systematic Random Sampling:


1. Orders are made with respect to the characteristics the investigator is interested
in.
IV. Cluster Ransom Sampling (CRS)
This involves dividing the population in to clusters based on some criteria; example
Geography. The main objective of this method is to minimize cost, but to increase
accuracy. Then we can perform census on a randomly selected clusters or take a sample
form each cluster using a SRS.
V. Multistage Sampling
This is an improvement over the CRS involving more stages as selecting a sample
using some methods (such as SRS), then select an iterated sample using another
method. Further steps can be performed as required.
II. Non-Random (Non-Probabilistic) Sampling
Non-probability sampling denotes that the chance of including any elementary unit of
the population in the sample can not be determined and hence they do not lend
themselves to an accurate statistical treatment and analysis.
Some of the non-random sampling techniques often used are:
a) Judgment sampling – in this method of sampling, the choice of sample items
depends exclusively on the judgment of the investigator. It is simple and useful in
situations where only a small number of sampling units are in the universe and the
simple random sampling may miss the most important elements. It is also used
when the investigator is highly skilled and the decision has to be made under time
constraints and taking probability samples would be, highly, time consuming.
For example, if a teacher wants to find out the studying habit of his 40 students,
he/she may pick up a sample of 8 students who he thinks are representative of the
class.
The most usual limitation of this method is that the investigator is highly subjected
to bias
b. Convenience Sampling a convenient sample is obtained by selecting population
units that are convenient to select for the investigator. For example, you can stand
Ethio lens college Page 12
Statistics For Finance

at the gates of the library and sample the first 8 students entering in to the library.
This method can, rather, be used for pilot studies, before a final sampling design is
decided up on.
c. Quota Sampling this is a type of judgment sampling-there quotas are set up
according to a given criterion, but the selection of the sample units with in the
prescribed is made according to the personal judgment of the investigator.
For example, if you have to investigate on the saving habits of 250 college students,
you may perform it in that out of the first 100 persons 35 should be 4th year
students, 30 should be 3rd year students, 20 of 2nd year students and 15 of 1st year
students. This is what is called quota sampling. Despite the fact that the method is
easy, it is highly subject to personal biases and consequently the sample may not be
representative of the population.
Determining the size of a sample
The following inputs must be well-thought-of in order to determine the sample size for
a given study:
1. The desired precision level; that is, the magnitude of the  term that the
researcher is willing to tolerate; for instance you may allow a  5% margin of
error.
2. The desired confidence interval (Zq – value), that is , the degree of confidence that
the decision maker have in the interval estimate
3.An estimate of the degree of variability in the population, expressed in the form of
a standard deviation
Given the desired precision level, H; the desired confidence level (q), and an estimate of
the standard deviation (s) we can write the following equation;
ZqS
H Squaring both sides of the equation, we can re-write the equation
n
as,
Zq 2 S 2
n
H2
****n stands for the total number of items to be included in the sample
Example A marketing manager of Almeda Textiles wants to estimate the average
annual amount that families in a certain locality spend on local textiles per year. He
wants the estimate to be with in a  birr 10 margin of error. When such an interval

Ethio lens college Page 13


Statistics For Finance

estimate is constructed, he wants to be able to have a 99% confidence in it. He


estimates that the standard deviation of annual family expenditures on local textiles is
about birr 100. How many families must be chosen for this study?
Solution:

Zq 2 S 2 (2.575) 2 (100) 2
n =  663 families
H2 10 2
Example The executive committee of Mesfin Industrial Engineering wants a 95%
confidence when dealing with the Assembly Contract Provision that it is planning to
take over. How ever, the industrial experience shows that such major takeovers may
involve a market risk of about 4.5 million. More over, the computed standard of
deviation of the same industry amounts to 11.45 million.
The executive committee wants to know the expected return on its decision. What
should the number be of cases (in the industry) the committee considers for the
analysis purposes?
Solution
In the above example the desired confidence level thought is 95%. Its value from the
table is given as 1.96.
Zq 2 S 2 (1.96) 2 (11,450,000) 2
Then, n   11,920747.6
H2 4,500,000

Ethio lens college Page 14


Statistics For Finance

 Review Questions

Part one: say true or false


1. Statistics (plural) is an aggregate of facts
2. The prevalence of HIV/AIDS is 2.5% denotes statistics
3. Statistical estimates are end results based on which decisions are made
4. The statement ‘. . ., therefore general household livelihood is improving’ is a
descriptive statistics
5. Statistics can be either numerical or non numerical
6. Statistical facts can be true on an individual basis
7. A sampling method where each unit in the population will have a non-zero chance
of being included in the sample is random sampling
8. When the population is heterogeneous, convenience sampling is appropriate
Part two: work out
1. The New Millennium College has an estimate that the monthly salary of its
graduates is 2,200 birr. The college would like to investigate the assumption taking
a 95% confidence interval and that the maximum sampling error tolerable being
550. What should be the size of its sample?
What if the sampling errors allowable were:
 ± 420
 ± 275
2. The Finance and Admin department of the New Millennium Institute share
company would like to enumerate the probable occurrence of errors while
implementing the internal control procedures. The Institute’s policy is standard
error of 7.5% and a 99% confidence interval and a precision level of 2.5%. What
should be the size of sample should the department take to achieve the same
objective?

Ethio lens college Page 15


Statistics For Finance

CHAPTER TWO
CLASSIFICATION AND PRESENTATION OF DATA
Introduction
Dear distance learner, once data are collected they will not be in their useable form
unless further value adding activities are undertaken. Putting the bulk of data in to
easily manageable and compact form and ensuring that they are collected as per the
research structure is and in a way adhering to the objective is more important logical
statistical phase. This process is called presentation. Another important
complementary task is that of organization dealing with editing, classification and
tabulation.
Student learning objectives
At the end of this section you should be able to
- understand the meaning of data classification and presentation
- Construct the various types of tables and graphs and charts
- Apply the constructions in real world business decision

A Point for Thought!


Do you think that it is technically possible to process decisions based on the
voluminous raw data collected over a period of time?
2.1 Frequency Distribution

? What do you Mean by Frequency distribution?


_________________________________________________________________________________
Once the edited data has been put in to an ordered array, it can be organized in to a
frequency distribution
Definition: Frequency distribution is the list of data classes or categories along with the
number of values that fall in to each. This data display method shows the frequency, or
the number of occurrences, in each of the several classes. The distribution can be either
for a grouped or ungrouped data
Exercise
The survey by the St. George Brewery Company on the average daily sales revenue of
20 selected bars in Mekelle town shows the following data: 525, 525, 375, 700, 1200,

Ethio lens college Page 16


Statistics For Finance

375, 525, 1200, 150, 700, , 700, 525, 375, 375, 700, 150, 150, 525, 150, 1200. This can be
listed in a tabular form and the frequency of occurrences be assigned as shown below;
Sales Frequency
volume(birr)
150 4
375 4
525 5
700 5
1200 2
20
Exercise
Students in the New Millennium College have been asked to rate the college’s service
program. The results show Good, Excellent, Very good, Very good, Very good, Good,
Moderate, Excellent, Very good, Good, Very good, Moderate, Excellent. The student’s
reaction can be summarized as
Students’ reaction Frequency
Excellent 4
Very good 5
Good 2
Moderate 2
Poor 0
13
Steps to construct a frequency distribution (grouped data)
Generally, the following steps should be followed in constructing a given frequency
distribution table
a. Determine the number of classes, usually between 5 and 15
b. Determine the size of each class. Class size or width is determined by finding the
difference between the largest value in the data set and the smallest value and
dividing it to the number of classes desired.
c. Determine the starting point for the first class
d. Prepare a table of the distribution using the actual counts/ percentages (relative
percentages)
Exercise
As part of the financial policy and pay system reform project, ROSE Consulting Group
has been investigating the monthly income of the employees of the client company,
CLEAR BLUE, the following results were obtained; 545, 545, 545, 675, 545, 690,690,

Ethio lens college Page 17


Statistics For Finance

675, 1450, 1200, 545, 870,870,375,454,400,600,900, 955,640, 1125, 1000, 1040, 755,
790,850 775, 1075, 690, 650.
Then construct the frequency distribution table.
Solution
Determine the number of classes that you want. Let’s assume a class of 5
Determine the size of the of each class;
First find the range of the data by subtracting the lowest value from the highest value;
the higher value is 1450 birr and the lowest value is 375 birr. Then the range ( R) is
1450 – 375 = 1075. Second you divide the range to the number of classes thought, i.e.,
range 1075
Class size (width)    215
total number of classes assumed 5

Then you can start constructing the intervals by determining the lower limit of the first
class. Assume a lower class limit of 375; then the class intervals will be as follows:
Monthly income Tally Frequency
375 – 590 IIIIIIII 8
590– 805 IIIIIIIIIII 11
805- 1020 IIIIII 6
1020– 1235 IIII 4
1235 - 1450 I 1
2.2 Cumulative Frequency Distribution
Cumulative frequency distribution, unlike the simple frequency distribution, spells the
total number of items or observations that fall above or below a certain point or
juncture. Thus, if you would like to know the total number of observations that fall
below or above a given point, you can use the cumulative frequency distribution. For
example, construct a cumulative frequency distribution for the above example of ROSE
Consulting Group
Monthly income Frequency Cumulative frequency
375 – 590 8 8
590– 805 11 19
805- 1020 6 25
1020– 1235 4 29
1235 - 1450 1 30

Exercise
Given the number of visitors of the Mekelle Museum of Martyrs, as reported by the
authorities, 24-45; 45-66; 66-87; 87-108; 108-129; 129-150 and their respective

Ethio lens college Page 18


Statistics For Finance

frequencies as 6, 6, 5, 5, 5, 3, construct the frequency distribution and cumulative


frequency distribution tables.
Table of frequency distribution
Number of visitors Number of days
24-45 6
45-66 6
66-87 5
87-108 5
108-129 5
129-150 3
Table of cumulative frequency distribution
Number of visitors Number of days Cumulative frequency
24-45 6 6
45-66 6 12
66-87 5 17
87-108 5 22
108-129 5 27
129-150 3 30
Relative frequencies- relative frequencies are percentages calculated by the actual
frequency for each class by the total number of observations being classified the column
of percentages should equal 1.000
Example
The following table presents the age distribution of a certain group
Class interval Frequency Cumulative Relative
frequency frequency
0-9 1 1 1/88 = 0.011
10-19 6 7 6/88 = 0.068
20-29 27 34 27/88 = 0.307
30-39 22 56 22/88 = 0.250
40-49 12 68 12/88 = 0.136
50-59 16 84 16/88 = 0.182
60-69 4 88 4/88 = 0.046
Activity
Referring the data (of prices) in the following table, organize the data in to a frequency
distribution with about five classes. Determine the convenient class interval, given that
all class intervals are to be uniform in size.

Ethio lens college Page 19


Statistics For Finance

Area Price per gallon Area Price per gallon


¢ ¢
Addiss Ababa 53.4 Dilla 53.5
Awassa 55.1 Debrebirhan 50.1
Agaro 53.9 Debremarkos 50.3
Alemaya 53.4 Desse 55.2
Ambo 54.8 DembiBolo 52.9
Bedele 53.3 Fiche 53.4
Butajira 53.9 Finote-Selam 52.3
Bonga 49.1 Mekelle 55.3
Bahirdar 53.7 Jimma 56.8
Bale 47.9 Woldia 52.7
Bati 49.6 Dredawa 55.2

Construct
A. Frequency distribution table
B. Cumulative frequency distribution table
C. Relative frequency distribution

2.3 Charts and Graphs


Frequency distributions are good ways to present the essential aspects of data
collection in concise and understandable terms but pictures can be even more effective
in displaying large volume of data.
Charts and graphs are useful to present nominal and ordinal data
2.3.1 Pie Charts
Pie charts are an effective way of displaying the percentage breakdown of data by
category. This type of chart is particularly useful if the relative sizes of the data
components are to be emphasized. A complete circle, 3600, represents the total number
of observations, and the sizes of the slices are proportional to the relative frequency of
each category. For example if you want to represent the relative importance
(percentages) of the output distribution of 5 major industries in a given economy, say
industry A’s share is 0.2; 0.17, 0.03, 0.375, 0.225 are the market shares of industries B,
C, D, and E respectively. The pie chart will be then given as

Ethio lens college Page 20


Statistics For Finance

Pie Chart showing the market shares of the


Industries

1 2 3 4 5

Activity
Using the data on the monthly income of 30 employees, construct the relative
frequency distribution and the pie chart showing their relative importance of the
distributions,
Income levels Frequency Cumulative frequency Relative frequency
375-590 8 8 8/30
590-805 11 19 11/30
805-1020 6 25 6/30
1020-1235 4 29 4/30
1235-1450 1 30 1/30
Then construct the pie chart representing the percentage/relative importance of the
above income distributions
2.3.2 Bar charts
Bar chart is another common method for graphically representing nominal- and
ordinal-scaled data. The height of each bar is proportional to the number of items in
each category. The bars are separated, positioned vertically with this base on the x-
axis.Take for example the above market share values of the five industries in the
economy; the bar chart representation will be as

Ethio lens college Page 21


Statistics For Finance

2.3.3 Histogram
The histogram is frequently used to graphically present interval and ratio data. In this
graphing method, the categories or classes are plotted along the horizontal axis of the
graph, and the numerical values of each class are presented by vertical bards. The bars
are not separated; the adjacent bears indicated that a numerical range is being
summarized by indicating the frequencies in arbitrary chosen classes.
Again take the above market share values representing each industry in the economy;
the histogram representation will be:
The histogram representation will be, then:

Histogram showing frequency distribution of voters


by age group

35

30
Num
25
ber
of
20
vote
rs 15

10

0
1
Age group

18-25 25-32 32-39 39-46 46-53 53-60 >60

Ethio lens college Page 22


Statistics For Finance

2.3.4 Frequency Polygon


This is used to present interval and ratio data, TD construct a frequency polygon, mark
the frequencies on the vertical axis and the values of the variable being measured on
the horizontal axis.
Next, plot each class frequency by placing a dot above the class mid point and connect
successive dots with straight lines to form a polygon. Here two new classes are added
(with frequencies of zero) at the ends of the horizontal scale.

2.3.5 Ogive
A graph of a commutative frequency distribution is called an ogive. It is used when one
wants to determine how many observations lie above or below a certain value is a
distribution. A less than ogive tells how many items in the distribution have a value
less than upper class limit of each class.
First a cumulative frequency distribution (CFD) is constructed. Next, the commutative
frequencies are plotted at the upper class limit of each category. Finally the points are
connected with straight lines to form the ogive curve.

Ethio lens college Page 23


Statistics For Finance

Exercise
The following table shows the average weights of 20 heavy weight boxers
Weight Frequency Cumulative Frequency Distribution
less-than greater-than
110-125 3 3 20
125-140 6 9 17
140-155 5 14 8
155-170 4 18 4
170-185 2 20 2

Then, you can draw the less than curve taking the upper class limits of each class as

A Less-than commulative distribution curve

25

20
Commulative frequency

15

Series1

10

0
110-125 125-140 140-155 155-170 170-185
Weight

A more than ogive shows how many items in the distribution have a value grater than
or equal to the lower limit of a particular limit;

A Greater-Than commulative distribution curve

25

20
ulativefrequency

15

Series1

10
Comm

0
110-125 125-140 140-155 155-170 170-185
We ight

Ethio lens college Page 24


Statistics For Finance

Activity
Given the following information on the monthly apartment rental rates for 200
apartments construct;
 Histogram of the distribution
 Frequency polygon
 Frequency curve (Ogive):
 Greater than
 Less than

Self test questions


1. The Claims department of Africa Insurance Company has made an assessment
of the amounts of claims the company has been settling over a period of time and
come out with a monthly average that can be presented in a frequency
distribution. The following table is the summary:
Claims (in birr) Frequency
10,000-19,999 4
20,000-29,999 8
30,000-39,999 10
40,000-49,999 12
50,000-59,999 16
60,000-69,999 9
70,000-79,999 5
80,000-89,999 3

a. Construct a histogram
b. Draw a frequency polygon
2. A librarian in the Mekelle City Public Library has been tallying the number of
college students visiting her library by their number of years in college and came
up with the following data:

Ethio lens college Page 25


Statistics For Finance

College years Number of visitors


1st 40
2nd 27
3rd 34
4th 38
5th 28
6th 25
7th 22
8th 25
9th 23
10th 20

Construct the bar graph


3. The following table shows the rate of employee turnovers in the XXX
multinational company over the past few years of operation:

Turnovers Frequency Less than or Greater


equal than or
equal
0 up to 5 6 6 98
5 up to 10 16 22 92
10 up to 15 22 44 76
15 up to 20 25 69 54
20 up to 25 18 87 29
25 up to 30 11 98 11

Construct the Ogive:


a. Less than Ogive
b. Greater than Ogive

Ethio lens college Page 26


Statistics For Finance

CHAPTER THREE
MEASURES OF CENTRAL TENDENCY
Introduction
In calculating summary values for data collection, the first consideration is to find a
central, or typical, value for the data. Three important measures of central tendency
are presented in this section: mean, median, mode. With the use of these measures we
can summarize the huge volume of data with a single value characterizing the nature
of data we have. More over, measures of variation or dispersion are used to diagnose
how good the distribution of data is with reference to the central measures.
chapter objectives
By the end of this chapter, students should be able to:
Understand the use of the measure of central tendency
Calculate the different measures of central tendency
Understand and calculate the measures of variation
3.1 Measures of central tendency
Arithmetic mean
The arithmetic mean of a collection of numerical values is the sum of these values
divided by the number of values. The symbol for the population mean is the correct
letter  (mu), and the symbol for a sample mean is x (x-bar):
n

x i
x i 1
, . . . . . . . . . . . . . . . . . . . . . For ungrouped data
n
n

fx i i
x i 1
, . . . . . . . . . . . . . . . . . . . . For a grouped data
N
Weighted Mean
The simple measure of an arithmetic mean that has been measured above gives equal
importance or weight to all the data items. How ever, it is possible that data items may
not have equal weights demanding different treatment. The type of mean which is
obtained by taking the weights of each observation in to consideration is known as the
weighted arithmetic mean (or weighted mean).
Suppose the weights be P1, P2, P3……Pn and the data values as X1, X2, X3…… Xn, then
the weighted mean is given by

Ethio lens college Page 27


Statistics For Finance
n

PX i i
P1 X 1  P2 X 2  P3 X 3  ...  Pn X n
i 1
, i.e., X w 
N N
Where Pi is the weight given for each observation
Xi is the data items
Ni is the sum of the weights
Example
Tadesse has registered for 5 courses of credits 4, 3, 3, 3 and 3 respectively in the second
semester of a given year. At the end he scored A, B, A, D and C respectively. Consider
A=4, B=3, C=2, D=1, and F=0. Then compute the weighted mean of the student.
Solution
P1 X 1  P2 X 2  P3 X 3  ...  Pn X n
Xw 
N
4  4  3  3  4  3  1  3  2  3 16  9  12  3  6 46
    3.286
4  3  4 1 2 14 14
Example
A marketing manager of ‘silas’ and family Plc has come out with 5 major economic
courses of actions which would result 20, 21, 24, 22, and 27 (in millions) returns on
investment. However, due to uncertainties the manager has assigned different
probability degrees of occurrences as 5, 4.5, 3, 3.5 and 2.75. Then compute the weighted
mean
Solution
Since sum of all probabilities is equal to 1, we first convert the probabilities in to the
same, i.e., 5+4.5+3+3.5+2.75= 18.75
Now, 5/18.75 = 0.267
4.5/18.75 = 0.24
3/18.75 = 0.16
3.5/18.75 = 0.187
2.75/18.75 = 0.146
0.267  20  0.24  21  0.16  24  0.187  22  0.146  27 22.28
Xw    22.28
1 1
Activity:
A company has earned sales of $40.000 at the end of the year 1997 from the sale of
three model bikes: MA1, MA2 and MA3. Each unit of model A1 sells for $1,750, each

Ethio lens college Page 28


Statistics For Finance

unit of model A2 sells for $1,400 and each unit of model A3 sells for $1,150. The
company sold 50, 90 and 110 units of each model type respectively. Compute the
weighted mean
Geometric Mean
It is unjust to assume that figures (Quantities) will remain the same. They rather
change for which we may be interested in finding the average rate of change over a
period of time. The measure is called Geometric Mean (GM)
Geometric Mean (GM) = n product of all x values

Where n is the number of observations


Example
The following inflation rates were reported to have been experienced in Ethiopia over
the past five years:
Year Inflation rates
1990 0.06
1991 0.04
1992 0.035
1993 0.042
1994 0.055
Then the average rate of change over the 5 year time can be computed as
(GM) = n x1  x2  ...  xn . In the above example n is equal to 5
5
0.06  0.04  0.035  0.042  0.055
(GM) =
5
0.00044 
0.213
Activity
The sales journal of Haymanot PLC shows that the unit price (cost) of its MB bicycles
has shown a decline over the past four months of operation from what was the retail
price before the last month, birr 300.
Month Unit price
1 275
2 250
3 240
4 225

Ethio lens college Page 29


Statistics For Finance

Compute the average rate of fall of the selling price over the four months.
(ANS 6.94%)
The Median (Some times called counting average)
Median refers to a single value from the data set that measures the central item in the
data. The single item is the middle most or most central item in the set of numbers.
Half of the items lie above this point, and the other half lie below it.
To find the median we first array the data in a descending or ascending order. Once
ordered, the middle value will be the median (if the number of observations is odd) or
the average of the two middle items (if the number of items is even)
Calculation of median
th
 n  1
Median 1) =   item in a data array (ungrouped data)
 2 

  n  1 
   F  1 
2 
2) X     w  L (Grouped data)
 fm  m

 
 
Where:
X is sample median
n is total number of items in the distribution
F is sum of all class frequencies up to, but not including, the median class
fm is the frequency of the median class
w is class interval
Lm is lower limit of the median class interval width
The Dashen Bank Mekelle Branch has disclosed that distribution of its customers
monthly balance as in the following table.
Class Frequency
interval(Birr)
0-49.99 78
50-99.99 123
100-149.99 187
150-199.99 82
200-249.99 51
250-299.99 47
300-349.99 13
350-399.99 9
400-449.99 6

Ethio lens college Page 30


Statistics For Finance

450-499.99 4
Total 600

Solution
th
n  1  600  1 
Using the first method, i.e,    300.5 item is the center most. (You can
th

2  2 
take it as the 300th and the 301st item)
Add the frequencies to locate the class that contain the above center most element ( i.e.,
78+123+187=388) this shows that the item is in the 3rd class 100-149.99
 Lm (the lower limit of the class) =100
n (number of observations=600
F (sum of all frequencies up to the med. class) =201
W (class interval width)  149.99  100  49.99  50
fm (frequency of the median class)=187

 (n  1) 
  ( F  1) 
~
x  Lm   2  W, substituting the items in the formula we have,
 fm 
 
 
 (600  1) 
  (201  1) 
~
x  100   2 50
 187 
 
 
 (601) 
  (202) 
x  100   2
~ 50
 187 
 
 

~  (98.5) 
x  100   50
 187 

=100+ (0.527)50
=100+26.35; ~
x =126.35 is the sample median
Example
The following data represent the weights of fishes caught in lake ‘Hashenge’ by a local
fisherman

Ethio lens college Page 31


Statistics For Finance

Class Frequency
0-24.9 5
25-49.9 13
50-74.9 16
75-99.9 8
100-124.9 6

Compute the median


Solution- following the steps used in the above example
th
 n  1  48  1 
The sum of all frequencies is (5+13+16+8+6) 48 then   item is    24.5
 2   2 
showing that the center most numbers are 24th and 25th items.
Add the class frequencies up to the expected median class (i.e., 5+13+16=34 showing
that the median in the class 50-74.9)
Now F=18(5+13)
W=25(74.9-50)
Lm=50
fm=16
n=48

 ( n  1) 
  ( F  1) 
~
x  Lm   2 
 fm 
 
 

 (48  1) 
  (18  1) 
~
x  50   2 25
 16 
 
 

 (49) 
  (19) 
~
x  50   2 25
 16 
 
 

 (49  38) 
 
~
x  50   2  25
 16 
 
 

Ethio lens college Page 32


Statistics For Finance

 (11) 
 
x  50   2 25
~
 16 
 
 

~  5.5 
x  50   25
 16 
50+(0.34375)25
50+8.59
~
x  58.59 is the median item.
Activity
AWASH Insurance S.Co has present the following table of claims by its customers for
vehicle accidents in the last fiscal year.
Amount of claims (in birr) Frequency
0-299.99 52
250-499.99 337
500-749.99 1,066
750-999.99 1,776
1000-1249.99 1,492
Compute the median
The Mode (observed average)  x̂ 

Sometimes you may come to situation where you want to know the value with the
greatest number of happening (occurrence) the value, therefore, with the largest
number of occurrence is what is called mode or modal value. Or it can be defined rather
as the value about which the items are most closely concentrated
Graphically the most typical or fashionable value of a distribution can be given as
follows:
y

0 mode x̂  x

Calculation of mode

Ethio lens college Page 33


Statistics For Finance

If the distribution is ungrouped, then item with the greatest frequency is selected as
the modal value.
However if the distribution is grouped the following formula is used,
 1 
xˆ  Lmo    w
 1   2 
Where x̂ is the mode
Lmo is the lower limit of the modal class
1 is the difference between the frequency of the modal class and the frequency
of the pre modal class
2 is the difference between the frequency of the modal class and the frequency
of the post modal class
w is the class interval of the modal class
Example Consider the following table of income distribution of 300 workers of Messebo
Center Factory
Income interval Frequency
100-149.5 12
150-199.5 14
200-249.5 27
250-299.5 58
300-349.5 72
350-399.5 63
400–449.5 36
450–499.5 18

Solution
Locate the class with the greatest frequency; in this case 300-349.5 is the modal class
(72)
Then Lm = 300
1 = 72-58 = 14
2 = 72-63 = 9
w = 349.5-300 = 49.5  50

 1 
xˆ  Lmo    w
 1   2 

Ethio lens college Page 34


Statistics For Finance

 14 
 300   50
 14  9 
 14 
 300   50
 23 
= 300 + (0.6087)50
= 300 + 30.44
= 330.44
Example
The following were the grade score points of 60 students in their managerial statistics
Score Frequency
35 – 41.99 10
45 – 54.99 12
55 – 64.99 18
65 – 74.99 13
75 – 84.99 7
Compute the modal score point
Solution
From the table the class with the largest frequency is 55 – 64.99 (18)
Then Lmo = 55
1 = 18 – 12 = 6
2 = 18 – 13 = 5
w = 64.99 – 55 = 9.99  10
 1 
xˆ  Lmo    w
 1   2 
 6 
= 55 +  10
65
6
= 55 +  10
 11 
= 55 + (0.5455) 10
= 55 + 5.45
x̂  60.45 is the modal value
Which method to use
Generally when the distribution is symmetrical that contains only one mode the values
of the mean, median and the mode are the same. In this case we can use any one of the
measures. However if the distribution is skewed, the median is the best measure.
Ethio lens college Page 35
Statistics For Finance

The following contingencies can also be used to determine which method to use:
 If the numerical data have no extreme values, the mean can be used
 If the numerical data have extreme values(s) or if the data is non numerical
which can be arranged in some order, and then the median is the best measure.
 If the data is non-numerical and can not be represented (order) is any way the
mode is the best measure.
Attention! It is also possible to use Karl Pearson’s relationship to compute the values of
the mean, the median and the mode.
Here it is,
Mode = Mean – 3 (mean –Median)
Mode = 3 median – 2 mean
Median = mode + 2/3 (mean – mode)
Example It has been reported by the Bureau for trade and investment of Mekelle town
that out of the total investment certifications of 200 projects, 51.5 percent accounts for
20.5 million birr and the average capital investment of the 200 investment projects was
22.4 million. Find the median investment
~
x  xˆ  2 ( x  xˆ )
3
~
x  20.5mil.  2 (22.4mil.  20.5mil.)
3
~
x  20.5mil.  2 (1,900,000)
3
~
x = 20,500,000 + 1,266,666.7
~
x = 21,766,666.7 birr
Activity
The following table is age distribution of residents of kebele 20 in Mekelle town
Class Frequency
47 – 51.9 4
52 -56.9 9
57 – 61.9 13
62 – 66.9 42
67 – 71.9 39
72 – 76.9 20
77 - 81.9 9

Ethio lens college Page 36


Statistics For Finance

Required
Compute a. The mean age of the residents
b. The median age of the residents
c. The modal age of the residents

3.2 Measures of Dispersion [Averages of Second Orders]


The measures of central location tell only part of what we need to know regarding
certain characteristics of a distribution, i.e., the unit central measure that represents
the entire data. These measures may always be sufficient when the set of observation
may have same values. However, in practice some of the observations may show
disparities from where the center is. Thus measures of dispersion helps us study
another dimension of importance in the statistical investigation. Measures of
dispersion study how the data set are spread or in other words it studies the degree of
variability of the distribution.
Importance of measuring variation (dispersion)
i. to decide on the reliability of the central value (average)
ii. to serve as the basis of control of the variability
iii. compare two or more distribution
Method of Measurement of Dispersion
I. The Range – this is a measure of difference between the largest value and the
smallest value in the observation.
Consider the following examples;
ABC Company has recorded the following sales figures for the past ten months of its
operation:
Month1 2 3 4 5 6 7 8 9 10
Sales 20 19 27.2 32 17.5 26.3 41.75 40 31.15 36
(000)
Find the range of the distribution
Solution
Locate the largest and the smallest value of the sales records and find their differences
In the above example 41.75 and 17.5 are the largest and the smallest measures in the
distribution.

Ethio lens college Page 37


Statistics For Finance

Then, R = 41.75 – 17.5 = 24.25; and the coefficient of range is given


range 24.25
by   0.4093 = 40.93%
max .value  min .value 59.25
Example
The data below is the daily minimum Calorie requirements of a person by age

Age Calorie
requirement
15 150-174.5
18 175-199.5
21 200-224.5
24 225-249.5
27 250-274.5
30 275-299.5

Compute the range of calorie requirements between the ages


Method I
Take the last class’s upper class limit and the first class’s lower limit, 299.5 and 150
respectively.
R = 299.5-150 = 149.5  150
Method II
Compute the mid-points of each class and then subtract the lowest value from the
150  174.5 325
largest mid-point value. The mid-point of the first class is   162.5 and
2 2
275  299.5 575
the mid-point of the last class is   287.25 . Then the range R = 287.25 –
2 2
162.5 = 124.75
II. Mean Deviation (M.D.) mean deviation is a measure of the scatter of observations
around a central value, mean or median.

It is given by: M .D. x 


x i x
(ungrouped data ), or M .d . x 
f i xi  x
, (for grouped
n N
data)

x i ~
x f i xi  ~
x
M .D.~x  (ungrouped data ), or M .d .~x  , (for grouped data)
n N

Example

Ethio lens college Page 38


Statistics For Finance

The following are data values of student’s score in 5 scores 70, 50, 81, 67, 59. Then
compute the mean deviation from the mean and from the median.
Solution
First compute the mean of the sample, i.e.

 Xi  70  50  81  67  59  65.4;
n 5
Then ( X  x) is, (X  ~
x)

70  65.4  4.6 70  67  3
50  65.4  15.4 50  67  3
81  65.4  15.6 81  67  14
67  65.4  1.6 67  67  0
59  65.4  6.4 59  67  10

 X x  43.6  X  ~x  30

M .D x 
 X x 
43.6
 8.72 M .D~x 
 X ~
x

30
6
5 5 5 5
Example
The sales records of ABC trading shows, 9, 12, 14, 11, 8, 5.5, 15, 8.5, 9.5 and 10.75. The
frequency of the observations is 4, 2, 5, 3, 6, 6, 7, 4, 6, 4,
Find the mean deviation from the mean
Solution
Sales Frequency fixi Sales Frequency fixi
9 4 36 5.5 6 33
12 2 24 15 7 105
14 5 70 8.5 4 34
11 3 33 9.5 6 57
8 6 48 10.75 4 43
The mean of the distribution is given by,

x
 fiXi  483  10.28 , then
 fi 47

Ethio lens college Page 39


Statistics For Finance

Then X  x is, f f Xi  x

9  10.28  1.28...........................4............................5.12
12  10.28  1.72.........................2............................3.44
14  10.28  3.72.........................5............................18.6
11  10.28  0.72.........................3............................2.16
8  10.28  4.28..........................6............................25.68
5.5  10.28  4.78.......................6............................28.68
15  10.28  4.72........................7............................33.04
8.5  10.28  1.78.......................4............................7.12
9.5  10.28  0.78.......................6............................4.68
10.75  10.28  0.47...................4............................1.88

M .d . ~x 
 fi Xi  x =
130.4
 2.77
N 47
Activity
Given below is the age distribution of 8 runners:
Age Frequency
15 15
45 10
29 30
40 10
48 9
32 12
42 11
65 4

Compute the mean deviation from the mean and median


Coefficient of Mean Deviation
Coefficient of mean deviation is a measure of ratio of mean deviation
relative to the mean value of its arithmetic mean and is given by;
M .D.x M .D~x
Cmd = and Cmd ~x  ~
x x

Example Given the mean deviation (of the mean) 2.77and mean
value of 10.28, the coefficient of mean deviation will be
M .D. x 2.77
Cmd =   0.2695  26.95%
x 10.28

Ethio lens college Page 40


Statistics For Finance
Example
Weight of students Frequency fiXi X x fi X  x
100 11 1100 18.21 200.31
150 5 750 31.79 158.95
140 6 840 21.79 130.74
120 9 1080 1.79 16.11
105 8 840 13.21 105.68
 fi  39  fiXi   fi Xi  x 
4610
611.79
Compute the mean, mean deviation, and the coefficient of mean
deviation
Solution
We first compute the mean of the distribution, i.e.

x
 fiXi  4610  118.21
 fi 39

M .d .~x 
 fi Xi  x 
611 .79
 15.69
N 39

M .D.x 15.69
Cmd =   0.1327  13.27%
x 118.21
Activity
The number of visitors to the Mekelle museum is given below for the
month ‘Hamle’ per day
Visitors frequency
15 6
20 4
22 12
24 5
10 3
Compute the mean, mean deviation, and the coefficient of mean
deviation.
Variance and Standard Deviation
These are other measures of dispersion which are often used in many
areas of interest and particularly as they apply to business. Variance
and standard deviation are powerful measures of dispersion which
take in to account how all the observations in the data are
Ethio lens college Page 50
Statistics For Finance
distributed and take in to consideration each value of the data. If the
data are reasonably closer to the center (to the mean), then we say
that there is little variability or dispersion in the data. On the other
hand, if the data are quite dispersed and at a considerable distance
from the center, then we would say that the data is highly variable.
Their measure is given by:
(Ungrouped Data)

 (x  )2 N  x i  ( x i ) 2
2

Variance  
i
2
 , or , population
N N2

 (x n  x i  ( xi ) 2
2
i  x) 2
Variance  s 2  , or , sample
n 1 n 2 (n  1)

S tan dard Deviation  s 


 (x i  x)
, sample
n 1

  (x i   )2
, population
N
Where
 2 = population variance
s 2 = sample variance
N = total number of observations (population size)
n = total sample observation
 = population mean
xi = data values or class midpoints

x = sample mean
 = population standard deviation
s = sample standard deviation
Example
Take sample ages of 10 college students below. Find their standard
deviation and the variance.
17, 17, 18, 19, 20, 20, 22, 22, 22, and 23
Solution
First compute the mean of the distribution, i.e.,

Ethio lens college Page 51


Statistics For Finance

x
 Xi  200  20
n 10
Then the variance can be computed as follows:
Age(x) x  x  x  x  2

17 -3 9
17 -3 9
18 -2 4
19 -1 1
20 0 0
20 0 0
22 2 4
22 2 4
22 2 4
23 3 9
 x 
2
i x  44

 Variance  s 2 
 (x i  x) 2

44

44
 4.88 ; the standard
n 1 10  1 9
deviation of the distribution is the root value of its variance,

i.e., s  s 2  4.88  2.2


Example New Millennium College wants to hire graduates in areas
of management, accounting, and economics. The ages of the first 10
applicants to be interviewed are 22, 21, 20, 25, 26, 24, 26, 24, 22, and
24. The college demands candidates whose ages are fairly grouped
around 23 years. More over, the college wants that the standard
deviation of 2 years as acceptable. Does this group of candidates
qualify?
Solution
Age of x  x  x  x  2

candidates(x)
22 -1.4 1.96
21 -2.4 5.76
20 -3.4 11.56
25 1.6 2.56
26 2.6 6.76
24 0.6 0.36
26 2.6 6.76
24 0.6 0.36
22 -1.4 1.96
24 0.6 0.36
 x 
2
i x  38.4

Ethio lens college Page 52


Statistics For Finance

x
x i

234
 23.4
n 10

S
 (x i  x) 2

38.4

38.4
 4.2667  2.0656
n 1 10  1 9
38.4
S2   4.2667
9
Since the computed standard deviation is greater than the desired
one, the candidates may not all qualify
Example:-
The age of college students is given below
Age F
16 -17 4
17 -18 14
18 -19 18
19 - 20 28
20 -21 20
21 – 22 12
22 – 23 4

Compute the variance & the standard deviation of the distribution


Solution: Before you try to compute the values, you have to find the
class mid points, i.e., upper class limits plus lower class limit divided
by two.
Then,
Age Mid fi fixi x  x  x  x 
2

fi x  x  2

point
16 -17 16.5 4 66 -2.98 8.8804 32.52
17 -18 17.5 14 245 -1.98 3.9204 54.89
18 -19 18.5 18 333 -0.98 0.9604 17.29
19 - 20 19.5 28 546 0.02 0.0004 0.01
20 -21 20.5 20 410 1.02 1.0404 20.81
21 – 22 21.5 12 258 2.02 4.0804 48.96
22 – 23 22.5 4 90 3.02 9.1204 36.48
fx 1,948  f x 
2
i i i x  210.96

Ethio lens college Page 53


Statistics For Finance
(Grouped data)

Variance   2

 f (x   )
i i
2

, and s 2

 f (x
i i  x) 2
f i n 1

S tan dard deiation  


 fi( x   )
i
2

, and s 
 fi( x i  x) 2
 fi n 1

x
 fx i i

1948
 19.48
f i 100

The variance of the distribution is

s2 
 f (x
i i  x) 2

210.96 210.96
  2.13
n 1 100  1 99

Standard deviation s 
 fi( x  x)
i
2

2.13  1.46
n 1

Coefficient of variation (CV)

S tan dard Deviation


C.V .   100%
Mean

From the above example, compute the coefficient of variation

Coefficient of variation (CV)

S tan dard Deviation 1.46


C.V .   100   100  7.49%
Mean 19.48
Example
Ayele and Bogale are two ground tennis players. Both have come to
win their opponents. The following are their scores when they face
each other in about 10 tournaments;

Ethio lens college Page 54


Statistics For Finance
Ayele 42 17 83 59 72 76 64 45 40 32
Bogale 28 70 31 0 59 108 82 14 3 95

Find who is a better player and who is more consistent


Solution
530 490
Ayele; x   53 Bogale; x  49
10 10
x i x 
a
x i x  2
x i x 
b
x i x 
2

-11 121 -21 441


-36 1296 21 441
30 900 -18 324
6 36 -49 2401
19 361 10 100
23 529 59 3481
11 121 33 1089
-8 64 -35 1225
-13 169 -46 2116
-21 441 46 2116

A 
 (x i  x) 2 A

4038
 448.67  21.18
n 1 10  1

A 21.18
C.V . A   100   100  39.96%
xA 53

 (x
2
 x) B 13,734
B 
i
  1526  39.06
n 1 10  1
B 39.06
C.VB   100   100  79.71%
xB 49
There fore, from the above computations, Ayele is better and
consistent than Bogale.
Activity
Ato Pawlos has tried to test samples of polythene bags from
manufacturers for bursting pressure and got the following results
Bursting pressure Number of
bags
A B
5.0-9.9 2 9
10.0-14.9 9 11
15.0-19.9 29 18
20.0-24.9 54 32
25.0-29.9 11 27
30.0-34.9 5 13

Ethio lens college Page 55


Statistics For Finance

Required:
1. Which set of bags has the highest average bursting pressure?
2. Which bag has more uniform pressure?
3. If prices are the same, which manufacturer’s bags would be
preferred and why?
Self test questions
1. The Addis Ababa City Municipality Police Traffic Control
Department has observed the number of car accidents (per
month) to be categorized as shown in the table below:

Number of accidents Frequency


10-14 5
15-19 6
20-24 3
25-29 4
30-34 2
The department would like to know:
a. The average number of accidents
b. The median and the mode
c. The variance, standard deviation, and coefficient of
variation

2. A university professor has made an attempt to compare


the performances of her female students by taking a
record of seven year’s average scores in Managerial
statistics and Marketing management. Here are the scores
by year:
1994 1995 1996 1997 1998 1998 2000

Managerial statistics 20.1 20.5 20.4 20.5 20.4 22.5 22.3

Marketing 19.2 19.3 19.2 19.2 19.1 21.9 22.0

a. Compute the variance and standard deviation


b. Compute the Coefficient of variation
c. What course are the female students good at (which
performance is least variable)?

3. The New Millennium Institute Share Company is


contemplating two new mutually exclusive investment
projects – Agro processing and opening high school in
Humera town. Early research publication by the

Ethio lens college Page 56


Statistics For Finance
investment bureau of the woreda shows the average rates
of investment for the two projects is 11.9% ( =2.33) and
16.27% ( =3.45) respectively.
a. Compute the respective C.V
b. Which project is worthier or less risky?

Ethio lens college Page 57


Statistics For Finance
CHAPTER FOUR
PROBABILITY
Introduction
In chapter one, we discussed the difference between descriptive and
inferential statistics. Much statistical analysis is inferential, and
probability is the corner stone of inferential statistics. Recall that
inferential statistics involves taking a sample from a population, a
sample value (a statistics) on the sample, and inferring from the
statistic the value of the corresponding population value (parameter).
The reason for doing so is that it is difficult, and sometimes
impossible, to get the population parameter directly.
Chapter objectives
By the end of this chapter students must be able to understand the
basic principles of probability, thereby enabling you to:
 Know the meaning of probability and the basic terms in
probability.
 Know important laws of probability: the law of addition,
the law of multiplication, and the law of conditional
probability.
 Select the appropriate law of probability to solve a given
problem.
Before directly dealing with probability, it would be of much help to
see some counting principles.
4.1 Counting principles

A) Principle of multiplication
This principle shows that if an action can be completed in k steps of
which the 1st step can be done in n1 ways, the 2nd in n2 ways, the 3rd
in n3 ways,…, and the kth step in nk ways, then the whole action can
be completed in ( n1)(n2)( n3)….( nk) ways.
Example
There are 4 ways that take from a given dormitory to a cafeteria.
There are also 3 ways that take from the cafeteria to a class. In how

Ethio lens college Page 58


Statistics For Finance
many ways can a student go from the given dormitory to the given
class through the given cafeteria?
Solution: -The action to be completed is “Going to a class from a
dormitory through a given cafeteria.”
-The steps are 1st.Going to cafeteria
2nd. Going to class
Again, there are 4 ways of doing the 1st step and 3 ways of doing the
2nd step.
Therefore, the whole action (going to class) can be done in 4×3=12
ways.
Example
A library has 5 entrances and 6 exits. There are also 4 ways that
take from the library to a class. In how many ways can a student
take a book from the library and go to the class?
Solution: -The action to be completed is “Taking a book from the
library and then going to class.”
There are 3 steps of doing this action:
Step1. Entering the library
Step2. Leaving the library
Step3. Going to class
Number of ways of doing each step:
Step1 can be done in 5 ways
Step2 can be done in 6 ways
Step3 can be done in 4 ways
Therefore, the whole action can be done in 5 × 6× 4=120 ways.

B) Permutation.

Permutation refers to an arrangement of objects with attention given


to the order of arrangement. That is, the objects are arranged in a
definite order.
Example: In how many ways can 3 books named A, B, and C be
arranged (permuted) in a shelf?
Solution: As there are 3 books, there are 3 places to be filled:
3 2 1

Ethio lens college Page 59


Statistics For Finance
The 1st place can be filled in 3 ways. That is, any one of the 3 books
can be put in the 1st place.
The 2nd place can be filled in only 2 ways, because 1 book has already
been put in the 1st place.
The 3rd place can be filled in only 1 way, because 2 of the 3 books
have already been put in the 1st and 2nd places Therefore, by the
principle of multiplication, the books can be arranged in 3× 2× 1=6
ways.
That is the books can be put in any one of the following ways:
ABC BAC CAB, or
ACB BCA CBA

Example2: In how many ways 4 students named A, B, C, and D be


seated on a bench?
Solution: There are 4 places to be filled:
4 3 2 1

The 1st place can be filled in 4 ways, the 2nd in 3 ways, the 3rd in 2
ways, and the 4th in only 1 way. Therefore, the students can be
seated in 4× 3× 2× 1=24 ways. At this point, we can introduce a
notation, the Factorial notation.
The symbol n! is read as” n factorial” and is given by

n! = n (n-1) (n-2) (n-3)…2 ×1, where n is a whole number.


Example: 1! =1
2! =2× 1=2
3! =3× 2× 1=6
4! =4 ×3× 2× 1=24
5! =5× 4× 3× 2 ×1=120
N.B: 0! =1 by definition.
Therefore, by the factorial notation, we can arrange n objects in a
row in n! ways.
Example: 4 books can be arranged in a shelf in 4! =4× 3× 2× 1=24
ways

Ethio lens college Page 60


Statistics For Finance
5 students can be seated on a bench in 5! =5× 4× 3× 2× 1
=120 ways.
N.B: n objects can be arranged in a circular place in (n-1)! ways.
Example: 6 persons can be seated around a circular table in
(6-1)! =5! =5× 4× 3× 2 × 1=120 ways.
So far, we have seen the arrangement or permutation of n objects
taken all at a time. Now we will see how n objects could be arranged
taken at a time.
The number of arrangements of n objects taken r at a time is given
by nPr. nPr is read as “n permutation r” and is given by:
nPr=n! ÷ (n-r)! , where r≤ n.If r=n, nPr=nPn=n! ÷ (n-n)! =n! ÷0!
=n! ÷1=n!
Example
In how many ways can 3 books named A, B, and C be arranged in a
shelf taken 2 at a time?
Solution: n = 3
r=2
Therefore, the books can be arranged in nPr = 3P2=3! ÷ (3-2)! =
3! ÷1! =3! ÷1; 3! =3× 2× 1=6 ways. These 6 possible arrangements
are AB, AC, BA, BC, CA, and CB
Example
How many different flags of 3 colors can be made from 6 different
colors?
Solution: n = 6
r=3
Therefore, nPr = 6P3 = 6! ÷ (6-3)! =6! ÷ 3! =6× 5× 4× 3! ÷ 3! = 6× 5× 4
=120 different flags of 3 colors can be made.

Example
From a committee of 30 members, in how many ways can a
president, a vice president and a secretary be selected?
Solution: Note that order is important here.
n = 30
r=3

Ethio lens college Page 61


Statistics For Finance
Therefore, there are nPr = 30P3=30! ÷ (30-3)! =30! ÷ 27!
=30× 29× 28× 27! ÷ 27! = 30× 29× 28 = 24360 ways of
selection.
N.B: If n = n1+n2+n3+…+n k, where ni is a group of similar objects,
then there are
n! ÷ n1! ×n2! × n3! ×…× n k! ways of arranging these objects.
Example
In how many ways can the letters in the word STATISTICS be
arranged?
Solution: n = 10
n1 = 3 (there are 3 S’s)
n2 = 3 (there are 3 T’s)
n3 =1(there is 1 A)
n4 = 2 (there are 2 I’s)
n5 = 1 (there is 1 C)
Therefore, there are 10! ÷3! × 3! ×1! ×2! × 1! = 50400 arrangements.
C) Combination: This is a third type of counting principle. It refers to
the selection or grouping of objects with out regard to order. Order
does not matter here.
The number of combinations or groups of n objects taken r at a time
is given by
nCr  n!( n  r )  r! ; Where r ≤ n. nCr is read as “n combination

r”.
If r =n, then nCr = nCn =n! ÷ (n-n)! × n! =n! ÷ 0! × n! =n! ÷ 1× n! =n!
÷ n! =1. Note that n Pr  nCr  r!
Example
In how many ways can 3 books named A, B, and C be combined or
grouped taken 2 at a time?
Solution: n = 3
r=2
Therefore, there are nCr =3C2 = 3! ÷ (3-2)! × 2! = 3! ÷ 1! × 2! =3× 2! ÷
1× 2! =3 ways of grouping the 3 books taken 2 at a time. These 3
groups are AB, AC, and BC.

Ethio lens college Page 62


Statistics For Finance
N.B: AB and BA are the same combinations (groups) but two
different permutations (arrangements).
Example
In how many ways can a committee of 5 persons be selected from 12
persons?
Solution: n = 12, r = 5
Therefore, there are nCr = 12C5 = 12! ÷ (12-5)! × 5! = 12! ÷ 7! × 5!
= 12× 11× 10× 9× 8× 7! ÷ 7! × 5! = 12× 11× 10× 9 ×8÷ 5!
= 12× 11× 10× 9× 8÷ 5× 4× 3× 2× 1 = 792 ways.
Example
Out of 4 women and 2 men, in how many ways can a committee of 3
persons be formed?
a) Consisting of 1 man and 2 women?
b) With out restriction?
Solution:
a) 1 man from the 2 men can be selected in 2C1 =2 ways.
2 women from the given 4 women can be selected in 4C2 = 6 ways.
Therefore, by the principle of multiplication, the committee can be
formed in 2× 6 =12 ways.
b) With out any restriction means, selecting 3 out 6 persons
regardless of the number of men or women. Therefore, there are 6C3
= 20 ways of forming the committee.
4.2 Probability and basic terms in probability
Definition: Probability is the chance or likelihood of the occurrence of
an event or a situation. As we are living in a world of uncertainty,
the knowledge of probability is of much help in the world of business.
Basic terms in probability: The following are the most commonly
used terms in probability.
1. Experiment: - is any process of observation whose outcome or
result cannot be known in advance. An example of experiment
could be “Tossing a coin”. If a person tosses/throws a coin up
wards, he can’t in advance know whether it lands head up or
tail up.

Ethio lens college Page 63


Statistics For Finance
2. Experimental outcome: - is a possible outcome/ result of an
experiment.
E.g. in the experiment of tossing a coin, the possible outcomes
are head (H) or tail (T).
3. Sample space(S):- is the set of all possible outcomes of an
experiment.
E.g. the sample space of tossing a coin is, S= {H, T}
4. Event: - is a subset of a given sample space.
E.g. the possible events of the above sample space are,
E1 = {H}, E2= {T}, E3= {H, T}
Probability is associated with an event. The probability of an event is
found by dividing the number of outcomes the event contains by the
number of outcomes in the sample space (total number of
experimental outcomes).
That is, P(E) = n(E) ÷ n(S), where P(E) =probability of an event n(E)
= no. of elements in event E; n(S)= no. of elements in the sample
space
E.g. Given S= {1, 2, 3, 4, 5, 6}
E1= {1, 2, 3}, and E2 = {4, 5}
So, P (E1) = n (E1) ÷ n(S) =3 ÷6 =0.5
P (E2) = n (E2) ÷ n(S) =2 ÷6 = 0.33
N.B: The probability of an event E ranges from 0 to 1. That is, 0≤ P
(E) ≤ 1.The probability of an event can never be less than 0 or greater
than 1.
4.3 Types of events
Based on the relationship they have, events can be classified as
follows.
a) Mutually exclusive events
Two events A and B are said to be mutually exclusive if they have no
element in common. That is n (A∩B) = 0, and hence P (A∩B) =0,
because P (A∩B) = n (A∩B) ÷ n(S) =0 ÷ n(S) =0
Example: Consider an experiment of rolling/ throwing a die (a die is
a rectangular solid with 6 equal faces which are numbered 1 to 6).
Here the sample space is:

Ethio lens college Page 64


Statistics For Finance
S= {1, 2, 3, 4, 5, 6}, the set of all possible outcomes.
Let event A= {1, 2, 3, 5} and event B= {2, 4, 6}
Events A and B are mutually exclusive as they have common
element, i.e., A∩B=Ø. Therefore, P (A∩B) = n (A∩B) ÷ n(S) =0 ÷ 6=0.
b) Complementary events
Two events A and B are said to be complementary to each other if
event A occurs if and only if event B doesn’t occur. The complement
of event A is denoted by A’ which is read as A complement or not A.
E.g. Let S= {1, 2, 3, 4, 5, 6}
A= {1, 4, 6} Therefore, the complement of A will be the event
that contains the elements of S which are not in A.
That is A’ = {2, 3, 5}
N.B: P (A) +P (A’) =1always, because A and A’ are collectively
exhaustive, that is, each element of the sample space S is
found in either A or B, no element is left out.
c) Independent events
Two A and B are said to be independent of each other if the
occurrence or non occurrence of one doesn’t affect the occurrence or
non occurrence of the other.
E.g. consider the experiment of tossing a coin twice and let
Event A = getting a head in the 1st toss and
Event B = getting a tail in the 2nd toss
So, event A and event B are independent, because whether a head or
a tail is found in the 1st toss doesn’t affect the outcome of the 2nd toss.
4.4 Laws of probability
A. Law of addition
Let A and B be two events. The probability that at least one, i.e. ,
either A or B or both A and B will occur is given by the law of
addition as follows:
P ( A  B )  P( A)  P ( B)  P ( A  B)

Note that if A and B are mutually exclusive, P (A∩B) = 0 and


therefore,
P ( A  B )  P ( A)  P ( B )  0  P ( A)  P ( B )

Ethio lens college Page 65


Statistics For Finance
E.g. consider the experiment of rolling a die. What is the probability
of getting a 2 or a 4?
Solution: Sample space, S = {1, 2, 3, 4, 5, 6}
E1= {2}
E2 = {4}
Therefore, P (E1) = n (E1) ÷ n(S) = 1÷ 6 and P (E2) = n (E2) ÷ n(S) = 1÷
6. So, the probability of getting a 2 or a 4 is given by:
P ( E1  E2 )  P( E1 )  P( E2 )  P ( E1  E2 )

=1÷ 6 + 1÷ 6 – 0, because P (E1∩E2) = 0


=2÷ 6 =1 ÷3 = 0.33
B. Law of conditional probability
Given two dependent events A and B, the probability that A will
occur given that B has already occurred is given by:
P ( A  B ) n ( A  B ) n( B ) n( A  B )
P ( A / B)     , where n (B) ≠ 0
P( B) n( S ) n( S ) n( B )
And
P ( A  B ) n( A  B )
P ( B / A)   , where n (A) ≠ 0
P( A) n( A)

N.B: P (A/B) is read as “Probability of A given B”.


E.g. consider the experiment of rolling a die. What is the probability
of getting an even number if it s known that a number less than 5 is
found?
Solution: Let event A be an event that contains a number less than 5,
and B be an event that contains an even number:
S = {1, 2, 3, 4, 5, 6}
A = {1, 2, 3, 4}
B = {2, 4, 6}
P( A  B) n( A  B) n( A)
So, P( B / A)   
P ( A) n( S ) n( S )
2 4 2 6 2
      0. 5
6 6 6 4 4
C. Law of multiplication
Let A and B be two events. The probability that both A and B will
occur simultaneously is given by the multiplication law as follows:

Ethio lens college Page 66


Statistics For Finance

P ( A  B )  P ( A)  P ( B / A)  P ( B )  P ( A / B )

If A and B are independent, P (A/B) = P (A), and P (B/A) = P (B).


Therefore,
P (A∩ B) = P (A) × P (B) for two independent events. Similarly, for n
independent events, the probability that all n events will occur at the
same time is given by:
P ( E1  E 2  E3  ....  E n )  P ( E1 )  P( E 2 )  P( E3 )  ....  P( E n )

E.g. consider the experiment of tossing a coin twice. What is the


probability of getting a head in the 1st toss and a tail in the 2nd toss?
Solution: Let event A be the event of getting head in the 1st toss and
event B the event of getting a tail in the 2nd toss.
So, A= {H}
B= {T}
The probability of getting a head in the 1st toss is ½ and the
probability of getting a tail in the 2nd toss is also ½. It is also clear
that events A and B are independent, because whether a head or a
tail is obtained in the 1st toss doesn’t affect the outcome of the 2nd
toss.
So, P (A/B) = P (A) = ½ and P (B/A) =P (B) = ½
Therefore, P (A∩ B) = P (A) × P (B) = ½× ½ = ¼
4.5 Binomial distribution
The Binomial distribution is a widely known discrete probability
distribution which is constructed by determining the probabilities of
X successes in n trials. A success actually means an outcome of
interest; it doesn’t mean something favorable. For instance, consider
the production of a particular item. If a researcher is interested in
determining how many defective items are produced in every
production of 10 items, the number of defective items produced is
taken as a success in his case.
Note that a discrete distribution is a distribution in which discrete
variables (variables that can’t assume decimal values, e.g. the
number of students in a class) are involved.

Ethio lens college Page 67


Statistics For Finance
Assumptions underlying the binomial distribution
1. The experiment under consideration involves n identical
trials.
2. Each trial has only two mutually exclusive outcomes, success
or failure.
3. Each trial is independent of the previous trial, i.e. the outcome
one trial doesn’t affect the outcome of another trial.
4. p is the probability of getting a success on any one trial and it
is known before the experiment is conducted.
5. q is the probability of getting a failure on any one trial and q =
1-p, success and failure are two complementary events.
6. p and q remain constant through out the experiment.
Generally, the probability of getting X successes out of n trials is
given by the formula,
P ( X )  nCx. p x .q ( n  x )
Example
If a coin is tossed 4 times, what is the probability of getting 2 heads?
Solution: Here, since we are interested in getting head, our success
will be the number of heads to be obtained.
So, success = head
n=4
X =2 and p = ½ which is the probability of getting a head
on any one toss.
Therefore, P(X=2) = P ( X )  nCx. p x .q ( n  x )

= 4C2 .(1/2)2.(1-1/2)(4-2)
= 4! . × 1 × 1=3
(4-2)! 2! 4 4 8
Example
10% of the students in a class are left-handed. If 8 students are
sampled, what is the probability of getting 3 left-handed students?
Solution: success = left-handedness
n=8
X=3

Ethio lens college Page 68


Statistics For Finance
p = 10% = 0.1 => q = 1-0.1=0.9
Therefore, P(X=3) = P ( X )  nCx. p x .q ( n  x )

= 8C3. (0.1)3. q(8-3)


= 8! × (0.1)3 × (0.9)5
(8-3)! 3!
= 8× 7× 6× 5! × 0.001×0.59
5! ×3!
= 8× 7× 0.001× 0.59 = 0.033
4.6: Normal distribution
The concept of the normal distribution was briefly introduced in
chapter 3. It is the most widely known and used continuous
probability distribution (a continuous distribution is a distribution in
which variables can assume decimal values.E.g. students’ GPA). It is
characterized by µ, population mean and σ, population standard
deviation.
Characteristics of the normal distribution
1. It is a continuous distribution.
2. It is a symmetrical distribution (it has equal (balanced) right
and left halves).
3. It is a unimodal distribution (it has only one mode).
4. It is asymptotic to the x- axis (it approaches but never touches
the x-axis).
5. It is a family of curves (each unique value of µ and σ determine
a unique curve).
6. Total area under the normal curve is equal to 1(0.5 on each
half).
N.B: the normal curve is the graphical representation of the normal
distribution.

µ
The normal curve

Ethio lens college Page 69


Statistics For Finance
Standardized normal distribution (z-distribution)
In the case of the normal distribution, we need to draw a unique
normal curve for every pair of µ and σ. Doing so is a very tedious and
time consuming task. To alleviate this problem, a mechanism has
been developed by which all normal distributions can be converted in
to a single distribution: the standardized distribution (z-
distribution).
The z- distribution is a normal distribution with µ =0 and σ =1. It is
obtained by converting x values to z values by the formula
x
Z= , if x< µ, z will be negative (z<0)

If x> µ, z will be positive (z>0)

µ x 0 z

The normal curve The standardized (z) curve
The z- distribution is used to determine the percentage of
observations which are greater than or less than a given value. It is
also used to determine the probability that an observation is found in
a given interval.
The probability (area) under the standard normal curve is found
from a standard table (z- table) which is given in different statistics
books as an appendix. This standard table gives the probability that
a given value is between the zero (0) and z.
E.g. Consider a normal distribution with µ = 20 and σ = 2.
a) Find the probability that a measurement will be in the interval
from 20 to 23.
Solution: µ = 20
σ=2

A Fig: - The normal curve


20 23

Ethio lens college Page 70


Statistics For Finance
The question is just finding the shaded area A shown in the above
figure. To find A, first convert the normal distribution to the z
distribution as follows.
x 23  20
, Z = = =3/2 =1.5
 2
The standardized (z) curve

 A
0 1.5
Now, one can read the value of A from the
standard table.
The standard table is given in the appendix (see appendix)
In our example, the value of z is 1.5.Therefore, we look for the value
1.5 in the column under z and then look for the value 0.00 in the row
containing z (at the top), because the 2nd decimal place of the z value
is 0 (1.5 = 1.50).Then, take the value in the body of the table that is
found at the intersection point of 1.5 and 0.00 (see the sample table).
This value is 0.4332.
Therefore, our answer is A= 0.4332, which is the probability that a
measurement will be in the interval from 20 to 23.
N.B: If our z value were 1.23 the area would be 0.3907 (see the
sample table).
b) Find the probability that a measurement will be in the interval
from16 to 18.
Solution:

X1=16 A
X2=18 16 18 20
µ =20
We are asked to find A. So, let’s change the normal curve to z-curve
as follows.

Ethio lens college Page 71


Statistics For Finance
x1   16  20  4 x2   18  20  2
Z1= = =  2 and z2     1
 2 2  2 2

-2 A -1 0
 A= (Area b/n -2 and 0)-(Area b/n -1 and 0)
=0.4772-0.3413
=0.1359
So, the probability that a measurement will be b/n 16 and 18 is
0.1359.
N.B :( Area b/n –a and 0) is the same as (Area b/n 0 and a)
E.g. (Area b/n -2 and 0) = (Area b/n 0 and 2) =0.4772
c) Find the probability that a measurement will be less than 16.
Solution:

A
16 20
We are asked to find the area A. So,

x 16  20  4
z    2
 2 2

A
-2 0
So, A= (Area to the left of 0)-(Area b/n -2 and 0)
= 0.5-0.4772
= 0.0228
Therefore, the probability that a measurement will be less than 16 is
0.0228.

Ethio lens college Page 72


Statistics For Finance
Self test questions
1. There are 5 Cooperative, 3 Management, and 2 Accounting
students. In how many ways can these students be arranged
if:
a) Students from the same department must sit together?
b) No restriction?
2. There are 6 statisticians and 5 economists. It is needed to form
a committee of 2 statisticians and 3 economists. If 3 of the
statisticians and 1 of the economists cannot be included in the
committee for some reason, in how many ways can the
committee be formed?
3. In how many ways can the students in Q1 above be arranged if
only Coop students must sit together?
4. In how many ways can the letters in the word MISSISSIPPI be
arranged?
5. Given two events A and B with P (A) =0.3 and P (AUB) =0.5.
Find
a) i) P (B) ii) P (A∩ B) iii) P (A/B) if a) A&B are
independent
b) A & B are mutually exclusive.
6. Literature indicates that 10% of the people in a town are
unemployed. If a random sample of 5 people are selected,
what is the probability that;
a) All 5 people will be unemployed?
b) All 5 people will be employed?
c) At least 4 people will be employed?
d) At most 3 people will be unemployed?
7. Suppose a normally distributed data set has   200
and   47 . Determine the value of X from the following
information:
a) 60% of the values are greater than X.
b) X is less than 17% of the values.

Ethio lens college Page 73


Statistics For Finance
8. The income of a population of 10,000 persons was found to be
normally distributed with   1750 &   50 . What was the

lowest income among the richest hundred?

Ethio lens college Page 74


Statistics For Finance
CHAPTER FIVE
REGRESSION AND CORRELATION ANALYSIS
Introduction
Do you want to know what will be the effect of advertisement
expenditures on to his sales volume of your company’s product? May
be yes! But how are you going to do it? Perhaps you need to know
industry experiences relating to the amount of expenditure,
frequency of the advertisement, timing of the advertisement, type of
media used and volume of reach by each media type, consumers’
information elasticity etc. Based on this information, you can predict
the effect on the volume of sales.
Student Learning Objectives
After completing this chapter, the student is expected to
- find the regression equation of the dependent variable on the
independent variable
- interpret the slope of the regression line
- find the correlation coefficient between two variables
- interpret the correlation coefficient and the coefficient of
determination
- calculate the spearman’s rank correlation coefficient
- interpret the spearman’s rank correlation coefficient
5.1. Regression Analysis
Definition- Regression analysis is one of the statistical analysis tools
used in the perdition of the value of one variable (dependent
variable) given the value of another variable-(independent variable),
when the two variables are related to each other. In regression
analysis we shall develop on estimating equation. For example- in
the above introductory note the industry experiences on amount of
expenditure, frequency of advertisement type of media used and the
volume of reach by each media etc are the known variable values
(independent variable) based on which we can project what the
volume of sales (dependent variables) will be in future period. In fact
regression analysis enables to study and measure the statistical

Ethio lens college Page 75


Statistics For Finance
relationship between two or more variables; however, we will focus
only to relationships involving two variables.
Example
If you want to study and measure the relation ship between price
and quality demanded,
- collect data
- present the data in an order array [as pairs of (x, y)]
- compute (determine) the functional linear relationship
between the variables
Methods to determine the regression line
1. The Scatter Diagram
A scatter diagram is a graph of observed points where each point
represents the two coordinate values. So, simply by looking at the
chart it is possible to determine the extent of association between the
two variables.
The wider the scatter in the chart, the less close is the relationship.
The closer the points and the closer they came to falling on a line
passing through them, the higher the degree of relationship.
Example The following data represents the money spent on
advertising of a product and the consequent profits achieved from
each advertising period for a given product
Advertising Profit
5 8
6 7
7 9
8 10
9 13
10 12
11 13
Required draw the scatter diagram
The scatter diagram is drawn by locating the X-Y points or values on
the graph as shown in the graph below (the dotted points):

Ethio lens college Page 76


Statistics For Finance

From the trend in the relationship, you can see that it is increasing
even though the relationship is not perfect. In other words, profit
increases with an increase in advertisement expenditure.
Exercise
A teacher wants to study the number of students absent on a given
day is related to the mean temperature on that day. A random
sample of 10 days is used for the study. The following table shows
data on the number of students absent from class and average mean
temperature.
Absent students 8 7 5 4 2 3 6 8 9
Temperature 10 20 25 40 45 50 55 59 60
a. Determine which variable is dependent and which is independent
b) Draw a scatter diagram of these data
i. From the data we can understand that the number of absentee
students is affected by the change in temperature. That is
temperature is independent variable and absenteeism is a
dependent variable
ii.

Ethio lens college Page 77


Statistics For Finance

The dots represent the scatter diagram. From the above diagram,
however, we see that temperature and number of absenteeism have
little relationship as indicated by the regression line in the diagram.
Activity
The Mekelle University Environmental Health department wants to
determine the statistical relationships between many different
variables and the common cold. The following table contains the
data on the use of facial tissues and the number of days that the
common cold symptoms were exhibited by seven people
Facial tissues 2000 1500 500 750 600 900 1000
Number of days 60 60 10 15 5 25 30
a) Determine the dependent and independent variables
b) Draw the scatter diagram
c) What is the type of the relationship
d) Interpret your graph
The Least Square Method
With this method we find the line of best fit that involves
representative ness, i.e., the distance between the line and the points
is minimal. Least Square method is a mathematical procedure to
find the equation for the straight line that minimizes the sum of the
square distances between the line and the data points, as measured
in the vertical (or Y) direction.

Ethio lens college Page 78


Statistics For Finance
The derivation of the equations needed to compute the Y-intercept
and slope of a regression line using the method of least squares
requires the use of calculus. For simplicity, we present here the
equations only;

n xy  [( x)( y )]
b1 
n x 2  ( x ) 2

Where  x = sum of the x values

 = sum of the y values


y

 x  = sum of x values squared


2

 xy = sum of the product of x and y for each period

observation.
n = Number of x-y observations

b0 
 y b x Or b0 = y  b1 x
1
n n
Where  x = sum of the x values

 y = sum of the y values

b1 = slope of the line computed using equation

n = Number of x-y observations



The equation of the line is given by Y  b0  b1 ( x)

n xy  [( x)( y )]
Remember b1 
n  x 2  ( x ) 2

Let’s take the following example which was used to draw a scatter
diagram above:
Advertising(x) Sales (y) xy X2
5 8 40 25
6 7 42 36
7 9 63 49
8 10 80 64
9 13 117 81
10 12 120 100
11 13 143 121
Total 56 72 605 476

Ethio lens college Page 79


Statistics For Finance
7(605)  [56 x72]
b1 
7(476)  (56) 2

203
b1   1.036
196
72 56
And b0 = y  b1 x but y =  10.29 and x = 8
7 7
b0  10.29  1.036(8)  2.002

Y  2.002  1.036( x) is the equation of the regression line.

Interpretation; from the equation of the line we can see that for unit
increase in advertisement expense, sales increases by 1.036 birr.
b. If the advertisement expenses were 7 units, sales will be computed

as Y  2.002  1.036(7)  9.254units

Example
The Maintenance Head of IVECO (Ethiopia) wants to know whether
or not there is a positive relationship between the annual
maintenance cost of their new bus assemblies and their age. He
collects the following data:
Bus Maintenance Age (yrs) xy X2 Y2
cost (birr) (y) (x)
1 859 8 6,872 64 737,881
2 682 5 3,410 25 465,124
3 471 3 1413 9 221,841
4 708 9 6,372 81 501,264
5 1,049 11 12,034 121 1,100,401
6 224 2 448 4 50,176
7 320 1 320 1 102,400
8 651 8 5,208 64 423,801
9 1094 12 12,588 144 1,196,836
6058 59 48,665 513 4.799,724
Required
a. Plot the scatter diagram
b. What kind of relationship exists between these two variables?
c. Determine the simple regression equation
d. Estimate the annual maintenance cost for a five-year-old bus
Solution

Ethio lens college Page 80


Statistics For Finance
a.

b. As shown in the diagram, there is a positive and direct


relationship which is equal to 87.9% (R2 = 0.879). You will see how
the R2 is computed as well as its meaning.
n xy  [( x)( y )]
c) b1 
n  x 2  ( x ) 2

9 x 48,665  (59 x6058)


b1 
9 x513  (59) 2

437,985  357,422 80,563


=   70.92
4617  3481 1,136

b0 = y  b1 x

=
 y  70.92  x 
n  n 
 
6058  59 
=  70.92 
9  9 
= 673.11  464.92  208.19
Then y r  70.92  208.19 x ,

d) When the bus is only fire years


y r  70.92  208.19(5)

1, 111.87 birr is the maintenance cost at age five


5.2. Correlation Analysis
It is desirable to measure the extent of the relationship between x
and y as well as observe it in a scatter diagram. The measurement
used for this purpose is the correlation coefficient. This is a

Ethio lens college Page 81


Statistics For Finance
numerical value ranging-1 to +1 that measures the strength of the
linear relationship between two quantitative variables. Correlation
coefficient (ρ= rho) exist for a population of date values and for each
sample selected from it.
Correlation coefficient characteristics
Data Collection Correlation Range of Values
coefficient
Population P -1 < p < +1
Sample r -1 < r < +1
For both p and r
-1: prefect negative relationship
0: No linear relationship
+1: Perfect positive relationships
These values are rarely encountered in real world situations, but
they are good benchmarks for evaluating the correlation coefficient of
any data collection. Karl Pearson’s Coefficient of Correlation
(Pearson Product Moment Correlation Coefficient) (r)

n  xy  ( x)( y )
r
n  x 2  (  x ) 2  n  y 2  ( y ) 2

Where  = sum of the x values


x

 = sum of the y values


y

x
2
= sum of squared x values

y 2
= sum of squared y values

( x) 2 = the sum of the x values squared

( y ) 2 = sum of y values squared

 xy = the sum of the product of x and y for each period

observation.
n = number of x-y observations
Example

Ethio lens college Page 82


Statistics For Finance
The following table comprises that data on the weight of a cars and
mileages covered for the sample of 5 cars.
Weight of a car Milage
2,743 21.4
3,518 15.2
1,855 38.9
5,214 12.7
4,341 17.8
Required
a. Compute the Pearson product moment correlation coefficient
b. Interpret your answer.
Solution
First thing you do is that find the square, sum and product values of
the sample data as below:
Weight of a Mileage xy X2 Y2
car(x) (y)
2,743 21.4 58,700.2 7,524,049 457.96
3,518 15.2 53,473.6 12,376,324 231.04
1,855 38.9 72,159.5 3,441,025 1,513.21
5,214 12.7 66217.8 27,185,796 161.29
4,341 17.8 77,269.8 18,844,281 316.84
Total 17,671 106.0 327,820.9 69,371,475 2,680.34
Then use the numbers in the table and insert them in to the formula
n xy  ( x)( y )
r
n  x 2  ( x ) 2 n  y 2  ( y ) 2

5(327,820.9)  (106 x17,71)


=
5(69,371,475)  (17,671) 2 5(2,680.34)  (106) 2

 234,021.5
= = -0.855  -0.86
273,494.4
c) The correlation coefficient r= -0.86 indicates a rather strong
negative linear relationship between car weight and miles per
gallon in to the sample. That is, cars that weight more seem
to get fewer miles per gallon and vice versa.
You may also see this same relationship in the following diagram
with the R2 value being 0.731:

Ethio lens college Page 83


Statistics For Finance

*The coefficient of Determination


Another measure of goodness of fit of the regression line is the
Coefficient of Determination, which is the square of the correlation
coefficient, that is
Coefficient of Determination = r2
The value of r2 lies between 0 and 1, inclusive.
r2 measures close to 1indicates a strong correlation between the
variables
r2 measure close to 0 indicates little or no correlation
The total change or variation in the dependent variable can be
divided in to two:
a. Explained variation- is the change in the dependent variable(Y)
explained by changes in the independent variable(X). The proportion
of variation is given by,
r2.100%
b. Unexplained variation- is the variation in the dependent
variable(Y) due to chance, excluded variables etc.
The proportion of unexplained variation is given by,
(1-r2).100%
For example, find the proportion of explained and unexplained
variations for the above example,
Solution

Ethio lens college Page 84


Statistics For Finance
r2 is computed to be -0.86; then the proportion of explained variation
is given as 0.86.100 = 86% and the proportion of unexplained
variation is (1-0.86).100 = 14%
Activity
AFRICA Insurance Share Company feels that the amount of time a
sales person spends with clients should be positively related to the
size of that clients account.
The company gathers the following information so as to see whether
the relationship is positive:
Client Accounts Size, y Minutes spent x
1 1056 108
2 825 123
3 651 62
4 748 95
5 894 58
6 1,242 134
7 1,058 87
8 1,112 78
9 1,259 120
Required
a. Compute the correlation coefficient
b. What would be the interpretation
Acct size y Minutes xy X2 Y2
spent x
1056 108
825 123
651 62
748 95
894 58
1,242 134
1,058 87
1,112 78
1,259 120
8,845 865
The following data are collected on the supply and price of a certain
product.
Price (X) 2 4 6 8 10 12 14 16 18 20
Supply (Y) 10 20 50 40 50 60 80 90 90 120

Ethio lens college Page 85


Statistics For Finance
a) Construct a regression equation of Y on X.
b) Find the value of Y (supply) when the price (X) is 11.
c) How much should the price be in order for the supply to be
100?
d) Find the correlation coefficient b/n X&Y.
e) Find the percentage variation in supply which is not explained
by price.
5.3. Rank Correlation Method
The method assumes that data units can be ordered (ranked so that
one can measure the degree of correlation between the series of
ranks (often two series). The method is called rank correlation
coefficient R
R is given by
6 D 2
R= 1 
N ( N 2  1)
Where N=the number of individuals in each series
D= difference between the ranks of the two series
To perform the computation the number of individuals (N) in the
series must be assigned ranks.
Example
In the 2000 Miss Millennium Ethiopia Beauty contest two judges
ranked eight candidates A, B, C, D, E, F, G, H, in order of their
performance, as is shown below.

A B C D E F G H
Judge1 5 2 8 1 4 6 3 7
Judge2 4 5 7 3 2 8 1 6

Ethio lens college Page 86


Statistics For Finance
Find the rank correlation coefficient
Solution
Judge 1(X) Judge2(Y) D(X-Y) D2
A 5 4 1 1
B 2 5 -3 9
C 8 7 1 1
D 1 3 -2 4
E 4 2 2 4
F 6 8 -2 4
G 3 1 2 4
H 7 6 1 1
D 2
28
Find the rank correlation coefficient
6 D 2 6(28) 168 168 168
R= 1  = 1  1 = 1  1 
N (N 2
 1) 8(8  1)
2
8(64  1) 8(63) 504

1-0.33=0.67=67%
Example
The following table presents the scores of students in New
Millennium College 3rd year Management Students
Marks in: 1 2 3 4 5 6 7 8 9 10
Mathematics 55 74 40 50 65 74 69 80 40 43
Statistics 62 60 55 70 72 67 80 79 52 40
Compute the rank correlation coefficient
Solution
Students Maths (X) Rank Statistics Rank D=X- D2
Y
1 55 6 62 7 -1 1
2 75 2 68 5 -3 9
3 40 10 55 8 2 4
4 50 7 70 4 3 9
5 65 5 72 3 2 4
6 74 3 67 6 -3 9
7 69 4 80 1 3 9
8 80 1 79 2 -1 1
9 41 9 52 9 0 0
10 43 8 40 10 -2 4
D
2
 50

Ethio lens college Page 87


Statistics For Finance
6 D 2 6(50) 300
R= 1  = 1  1
N (N 2
 1) 1000  10 990

R= 1-030330 = 0.697=69.7%
Example
A company hired six computer technicians. The technicians were
given a test designed to measure their basic knowledge. After a year
of service, their boss was asked to rank to each technician’s job
performance. Test scores and performance ranking are given below:
Technician Test score Performance ranking
1 82 3
2 60 6
3 80 2
4 67 5
5 94 1
6 89 4
Is there any relationship between test score and job performance?
Solution
Test score Rank test Performance D D2
score score
82 3 3 0 0
60 6 6 0 0
80 4 2 2 4
67 5 5 0 0
94 1 1 0 0
89 2 4 -2 4
d  0 d 2
8
Then the rank correlation coefficient is calculated as
6 D 2 6(8) 48 48
R= 1  = 1  1 1  1  0.2285714  0.77
N (N 2
 1) 6(6  1)
2
6(35) 210

Then we can say there is a good positive relationship between the


two variables

Ethio lens college Page 88


Statistics For Finance
Self test questions
1. A specialist in a hospital claims that the number of full-time
employees in the hospital can be estimated by counting the
number of beds in the hospital. Now, a researcher wants to
establish a regression model so as to predict the number of
full-time employees by the number of beds. After a survey in
12 hospitals, the researcher obtained the following data:
Hospital ID Beds FTEs
1 23 69
2 29 95
3 29 102
4 35 118
5 42 126
6 46 125
7 50 138
8 54 178
9 64 156
10 66 184
11 76 176
12 78 225
a. Plot the scatter diagram
b. Find the estimated regression line
c. Find r
d. What type of relationship do they have?
2. The following is a data on the number of students and the
annual sales turnover of fast food restaurants around major
universities;

Ethio lens college Page 89


Statistics For Finance

Restaurant ID Number of students (X) Sales volume (Y)


1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
a. Does there seem to be any relationship between the
sales volume and the number of students in these
universities?
b. Find the estimated regression equation
c. What type of relationship do they have
d. Compute the Karl Pearson’s Co-efficient of Correlation
3. The coefficient of correlation b/n tying speed and typing error
was found to 0.4. The percentage variation in typing errors
due to inattention is 4 times as great as the percentage
variation due to speed. Find r between typing errors and
inattention.
4. The coefficient of rank correlation of the marks of 10 students
in statistics and accounting was found to be 0.8. It was later
discovered that the difference in ranks in the two subjects of
one student was wrongly taken as 7 instead of 9. Find the
correct r’.

Ethio lens college Page 90


Statistics For Finance
CHAPTER SIX
INFERENTIAL STATISTICS
Introduction
As discussed in the previous chapters, inferential statistics consists
of taking a sample to reach a conclusion about the population from
which the sample is taken. Given the descriptive statistics values,
one can draw conclusions about the larger population which in fact
represents a major challenge in many application areas.
Inference (drawing a conclusion) has two forms: estimation and
hypothesis testing
Student learning objectives
The overall objective of this chapter is to help you understand how
to infer/conclude about a population using a sample, thereby
enabling you to:
1. Know the difference b/n point & interval estimation.
2. Estimate a population mean from a sample mean for
large and small sample sizes.
3. Understand the logic of hypothesis testing and know
how to establish null and alternate hypotheses.
4. Understand Type I & Type II errors.
5. Test hypotheses about a single population mean
when sample size is large.
6. Test hypotheses about a single population mean
when sample size is small and  is unknown
6.1 Estimation
Estimation involves approximating a population parameter (µ, σ, etc)
using a sample statistic ( x , s, etc). There are two types of estimation
a) Point estimation: - here, a single sample statistic is used to
estimate a population parameter.
E.g. -Sample mean ( x ) is the point estimate of population mean (µ).
- Sample standard deviation (s) is the point estimate of
population standard deviation (σ).

Ethio lens college Page 91


Statistics For Finance
b) Interval estimation: - this gives an interval which contains the
true population mean with a given level of confidence (Lc). The level
of confidence (Lc) to be attached depends on the researcher’s need.
Level of confidence (Lc) is expressed as a percentage. The most
commonly used confidence levels are 90%, 95%, and 99%.
Confidence interval for the population mean (  ):

Here, we will construct an interval which is supposed to contain the


population mean with a certain level of confidence. To construct this
interval, sample data are needed to determine sample values such as
x and s.
There are 3 cases to be considered.
Case1: When  is known.
In this case,  is supposed to found in the interval

x ± (zα/2) ( )
n
 
That is, [ x - (zα/2) ( )] ≤  ≤ [ x + (zα/2) ( )],
n n
where α is read as alpha and is equal to 1 minus Lc (level of
confidence). Alpha (α) is the amount of risk (uncertainty) to be taken
by the decision maker, the researcher. Alpha is also known as level of
significance.
Case2: When  is unknown and sample size (n) is large (n ≥30).
In this case, the sample standard deviation ( S ) can represent
the population standard deviation (  ), because n ≥30. Therefore
the confidence interval for  will as follows:
S S S
x ± (zα/2) ( ) That is, [ x - (zα/2) ( )] ≤  ≤ [ x + (zα/2) ( )]
n n n
Case3: When  is unknown and sample size (n) is small (n< 30).
In cases 1 and 2 above, we use the z- table in our calculations.
In this 3rd case, we can’t use the z- table, b/c  is unknown and it
can’t be represented by S since n< 30. Therefore, we introduce
another table, the t- table. So the confidence interval for  will be:
S
x ± (tα/2, n-1) ( )
n

Ethio lens college Page 92


Statistics For Finance
S S
That is,[ x - (tα/2, n-1) ( )] ≤  ≤ [ x + (tα/2, n-1) ( )]
n n
(tα/2, n-1) is a value to be read from the t-table with n-1 degrees of
freedom, where n is the sample size. Degrees of freedom is a
statistical adjustment used along with the t-table.
How to find (zα/2) and (tα/2, n-1):
 (zα/2) is a value to be read from the z-table. It is the value
of z when the area under the normal curve to the right or
left of z is α/2.That is
This area is α/2 this area is α/2(A1 = α/2)

A1 A2 A2 A1
- zα/2 0 zα/2
A1 + A2 =0.5(half of the curve), but A1= α/2
So, A2 = 0.5 - α/2
 As mentioned above, (tα/2, n-1) is a value to be read from
the t-table with n-1 degrees of freedom. A sample of the
t-table is given below.

Ethio lens college Page 93


Statistics For Finance
Sample t-table
Degrees of tα/2
freedom(df) t0.100 t0.050 t0.025 t0.01
t0.005
1
2
3
.
.
.
10
11
12
.
.
.
20
21
22
23
24 1.711
25
.
.
.
The t- value is located at the intersection point of the given df and
the selected α/2 value from the t-table.
E.g. if the df for a given t-statistic is 24(i.e., n=25) and the desired α
value is 0.1, the t-value is found at (t0.1/2, 24) = (t0.05, 24) =1.711(see
the sample t-table).
E.g. A company’s manager estimates future grain production based
on samples. A sample of 1000 plots produced a mean of 35 quintals of
wheat per hectare. Assuming a sample standard deviation (s) of 4.5
quintals,
Ethio lens college Page 94
Statistics For Finance
a) Calculate a 95% confidence interval for the population mean
(  ).

b) Interpret the result.


Solution: n = 100
x = 35
s=4.5
a) Level of confidence (Lc) = 95% = 0.95
 α = 1-Lc = 1-0.95 = 0.05
Although the population standard deviation (  ) is not known, we
still use z-table because n>30. So the confidence interval for  will

be:
S
x ± (Zα/2) ( )
n
S S
 [ x- (Zα/2) ( )] ≤  ≤ [ x + (Zα/2) ( )]
n n
45 45
 [35-(Z0.05/2) ( )]≤  ≤ [35+ (Z0.05/2) ( )]
100 100
 [35-(Z0.025) (0.45)] ≤  ≤ [35+ (Z0.025) (0.45)]

Every thing except (z0.025) is known. Therefore, let’s find the value of
(z0.025) as follows:

A2 A1  A1= 0.025, so A2 =0.5-0.025= 0.475


0 Z0.025

Now, to find the value of z0.025, we look for the number 0.475 in the
body of the z-table. Then, read the corresponding values in the
column under z and in the row containing z and then add them to get
the value of Z0.025.
For our example, we can read this value from a sample z-table as
follows.

Ethio lens college Page 95


Statistics For Finance
Second decimal place in z
z 0.00 0.01 0.02………………………….0.06
0.07…..
0.0
0.1
0.2
0.3
.
.
.
1.0
1.1
1.2
1.3
.
.
.
1.9 0.475
.
.
So, the value of Z0.025 = 1.9+0.06 = 1.96.Therefore,
[35-(Z0.025) (0.45)] ≤  ≤ [35+ (Z0.025) (0.45)

 [35-(1.96) (0.45)] ≤  ≤ [35+ (1.96) (0.45)

 34.118 ≤  ≤ 35.882

 The population mean (  ) is found in the from 34.118 - 35.882

inclusive.
b) Interpretation:
The interpretation of the 95% confidence interval is as follows:
We are 95% confident that the population mean is found in the
interval [34.118, 35.882].
6.2 Hypothesis testing:
A hypothesis is one’s thinking, perception, or about something.
Hypothesis testing is the process of proving or disproving a

Ethio lens college Page 96


Statistics For Finance
hypothesis. In the process of hypothesis testing, there are two
hypotheses to be considered.
A) Null hypothesis (H0):- is the hypothesis that reflects the current
situation/perception about something. It is the hypothesis that a
researcher wants to disprove.
B) Alternate hypothesis (Ha): - is the hypothesis that reflects the
researcher’s idea/belief about the same thing as opposed to H0.It is
the research hypothesis that the researcher wants to prove.
N.B: H0 and Ha complementary (collectively exhaustive) and
mutually exclusive.
Steps in hypothesis testing:
1. Establish both H0 and Ha
2. Determine alpha(α)
3. Determine the appropriate statistical test (z, t, or F test) to
use
4. Determine the critical (table) value of the statistic determined
in step 3[Zα, Zα/2, (tα, n-1), or (tα/2, n-1)] from the appropriate
table
5. Gather data to determine x, s, or other sample statistics
6. Calculate the value of the appropriate test statistic, i.e., z-
calculated (Zcal ) or t-calculated (tcal)
7. Make decision (decide whether to reject H0 or not) by
comparing the critical values determined in step 4 and the test
statistics calculated in step 6 above.
N.B: 1.Use Zα or (tα, n-1) when you have a one-tailed test, i.e., when
Ha contains a sign other than the “is not equal to (≠)” sign.
2. Use Zα/2 or (tα/2, n-1) when you have a two-tailed test, i.e.,
when Ha Contains the “is not equal to (≠)” sign.
Hypothesis testing about the population mean (  )

When testing a hypothesis about  , we use the:


1. The normal table (Z-table) if:
a)  is known (whatever the size of n may be) and Zcal is
calculated as follows,

Ethio lens college Page 97


Statistics For Finance

x  0
Zcal = , where 0 is the value of  given by H0.

n
b)  is un known and n is large (n ≥ 30) and
x  0
Zcal =
S
n
2. the t-table if  is un known and n is small (n<30) and
x  0
tcal =
S
n
Decision rules:
There are different decision rules (whether or not to reject H0)
depending on the structure of our hypotheses and the type of table
used. A summary of the decision rule follows.

Hypotheses : Table Reject H0 if:


used:
I. H0:    0 z-table Z cal  Z / 2

Ha:   0 t-table tcal  (t / 2 , n  1)

II. H0:   0 z-table Z cal  Z

Ha:   0 t-table tcal  (t / 2 , n  1)

III. H0:    z-table Z cal  Z

Ha:   0 t-table tcal  (t , n  1)

N.B: The symbol means absolute value.

E.g. Use the following data to test the given hypotheses.

H0:   25 x  28.1 S =8.47


Ha:   25 n=57  =0.01
Solution:
Explanation: H0 reflects that the population mean of a certain
variable is 25. Whereas Ha reflects that the population mea is not 25,
it is different from 25. This is the researcher’s
perception/understanding.

Ethio lens college Page 98


Statistics For Finance
So let’s follow the 7 steps of testing a hypothesis:
Step: 1. H0:   25 already given

Ha:   25

2.  =0.01 already given


3. We use z-table, because n>30
4. As the test is a two-tailed test, we find the value of Zα/2.
Zα/2 = Z0.01/2 = Z0.005

A1 A2
0 Z0.005
 A2=0.005
Therefore, A1=0.5-0.005=0.495
Data are already gathered x , S , and n have been given.
x  0
5. Zcal = , 0 is the value of  given reflected in H0.So,
S
n
28.1  25
Zcal =  2.76
8.47
57
7. Decion:
Compare Z cal and Zα/2 and reject H0 if Z cal >Zα/2

So, we reject H0, because 2.76 > 2.57

2.76>2.75
Self test questions
1. An experiment consists of selecting a random sample of 256
middle managers for study. One item of interest is their annual
income. The sample mean is computed to be $ 35,420 and the sample
s.d =$2050.
a) What is the point estimate of  ?

b) Determine a 90% confidence interval for  .

c) Interpret the result.

Ethio lens college Page 99


Statistics For Finance
2. A tire company guarantees that a particular brand of tire has a
mean useful life time of 42,000 kms or more. Customers advocate
wishes to verify this claim and took a sample of 10 tires and found
the following useful life times (in thousands of kms). 42 36 46
43 41 43 45 40 39 35
Use these data and  = 0.05 to determine whether there is a
sufficient evidence to contradict the manufacturer’s claim
(Hypothesis testing).

Ethio lens college Page 100


Statistics For Finance
CHAPTER SEVEN
ANALYSIS OF VARIANCE (ANOVA)
Introduction
ANOVA has a lot to do in statistics. In the case of regression
analysis, ANOVA is the technique of decomposing the total variation
in the dependent variable into its components (regression and error).
The regression component of the total variation is the variation in
the dependent variable due to the independent variable under
consideration, whereas, the error component of the total variation is
the variation in the dependent variable due to other factors, which
are not included in the regression model.
Student learning objectives
After the completion of this lesson, students must be able to:
 Understand the application of ANOVA
 Use ANOVA as a hypothesis testing instrument
7.1 The meaning of ANOVA
Analysis of variance (ANOVA) is a statistical instrument which is
used for testing as to whether the means of more than two
quantitative populations are equal. It helps in identifying whether
that two different sample data classified in terms of a single variable
are meaningful, provides a meaningful comparison between sample
data which are classified according two or more variables etc.
The sample statistic which we use here is the sample F-distribution
(see appendix)
7.2 Assumptions in ANOVA
ANOVA can only hold true when the following assumptions are met:
 Observations are drawn from normally distributed
populations
 Observations represent random samples from a
population
 Variance or standard deviations of the population are
equal

Ethio lens college Page 101


Statistics For Finance
As a hypothesis testing tool, ANOVA involves the steps that were
used to undertake hypothesis testing. You can also refer to the
following table as a general process of ANOVA;
ANOVA table
Source of Degrees Sum of Mean sum of F-statistic
variation of squares, ss squares, mss
freedom,
df
Regression df1 = k Rss   ( yˆ  y ) 2 mRss  Rss
df 1
mRss
Fcal 
Error df2 =n- Ess   ( y  yˆ ) 2 mEss  Ess mEss
df 2
(1+k)
Total df1+df2 Tss   ( y  y ) 2
=n-1
Key: k= number of independent variables under consideration.
n= sample size (no. of pairs of X and Y)
Rss= Regression sum of squares.
Ess= Error sum of squares
Tss= Total sum of squares.
mRss= mean Regression sum of squares
mEss= mean Error sum of squares
Tss=Rss+Ess
Fcal is used to test the overall significance of a regression model.

Testing the overall significance of the model means checking


whether or not X and Y are related. To do so, we simply test if 1 is

different from zero or not. That is, we test the following hypotheses:
H 0 : 1  0
H a : 1  0

 If H0 is rejected, the model is significant (X and Y are related).


 If H0 is not rejected, the model is not significant (X and Y are
not related).
N.B: H0 is rejected if Fcal  F (df 1, df 2)

Ethio lens college Page 102


Statistics For Finance
F (df 1, df 2) is a value to be read from F-table. A sample of the F-table

is given below.
 =0.05
df1
df2 1 2 3 4……………
1 161.4
2 18.51
3
10.13
4
5 7.71
. .
.
.
.
. .
10 .
11
.
12
. 4.96
.
.
.
Example
The following data are collected on the supply and price of a certain
product.
Price (X) 2 4 6 8 10 12 14 16 18 20
Supply (Y) 10 20 50 40 50 60 80 90 90 120

Construct a regression equation of Y on X


a) Test the significance of the regression model for the price-
supply data (use  =0.05).
H 0 : 1  0
Solution:
H a : 1  0

ANOVA table:
Source of variation df ss mss Fcal
Regression df1=1 Rss=507 507
Error df2=10 Ess=387 38.7

Total df1+ df2=11 Tss=894

Ethio lens college Page 103


Statistics For Finance
Decision: Reject H0,
Because [Fcal=13.1]> [ F (df 1, df 2) = F0.05 (1,10)  4.96 , see the F-table in

the appendix].
N.B: The coefficient of determination (r2) can also be expressed in
terms of Rss and Ess.
Rss
That is, r 2 
Tss
Rss 507
For example for the given price-supply data, r 2    0.567
Tss 894
which is the same as the value previously obtained by simply
squaring the coefficient of correlation (r).
The other approach of computing the F-value is shown below:

Where, n is the sample size


The F-distribution table has two unique degrees of freedom
values:
Degree of freedom in the numerator, and degree of freedom in

the denominator,

The degree of freedom is computed as:


and

Where k is the number of sample and n is the sample size.


Before trying to use the F-test and its distribution as a sample
statistic/or critical values, it is important to compute the following:

Once we calculate these values we may proceed for the analysis of


variance.

Ethio lens college Page 104


Statistics For Finance

Example:
AGIP Oil Company wanted to determine if the amount of oil
delivered by its truck to customers is the same in its three sales
districts. The company obtained a random data as given below:
Gallons delivered in one delivery
Districts
1 2 3
81 100 295
179 158 82
142 272 155
199 248 271
124 62 212

Perform an ANOVA at a 0.05 level of significance to determine if the


mean amounts per delivery in the sales districts are equal.
Step-1: state the hypothesis
Ho: mean delivery amounts for the three districts equal
Ho: mean delivery amounts for the three districts is not equal
Step-2: state the decision rule

Then, reject Ho: if sample F > 3.89


Step-3: compute sample F
District 1 District 2 District 3

81 4096 100 4624 295 8464


179 1156 158 100 82 14641
142 9 272 10816 155 2304
199 2916 248 6400 271 4624
124 441 62 11236 212 81
725 8618 840 33176 1015 30114

Ethio lens college Page 105


Statistics For Finance

, is the variance
of sample means
Again, compute for the sample variances:

Then, the mean of sample variances will be given as:

Step-4: accept or reject Ho:


As F calculated is less than the table F table value, then accept Ho; that
is 0.0427 > 3.89 which is false.
Try this!
Suppose that a typewriter manufacturer has prepared three different
study manuals for use by typists learning to operate an electronic
word processing typewriter. Each manual was studied using a simple
random sample of 5 typists. The time to achieve proficiency was
recorded for each typist, and the sample mean learning times for
manuals A, B, and C were hours. The manufacturer wants to know
whether the variation in sample means is large enough to show that:
population mean learning times for the manuals are different.
Perform an ANOVA test at a 0.05 level of significance to determine
whether mean learning times are equal.

Ethio lens college Page 106


Statistics For Finance

Manual-I Manual-II Manual-III


21 17 31
27 25 28
29 20 22
23 15 30
25 23 24
Self test questions:
1. A research firm has tested four random samples of types of
light bulbs by keeping bulbs lit unit they burned out. The
following table gives burning times in tens of hours:
Type
1 2 3 4
78 65 77 76
78 73 69 83
72 75 68 77
71 71 75 83
77 67 77 82
80 69 72 85
Perform ANOVA (use α = 0.05) to determine if mean burning
times for the bulb types are equal
2. A company has three manufacturing plants and company
officials want to determine whether is a difference in the
average age of workers at the three locations. The following
data is the age of five randomly selected workers at each
plant:

Ethio lens college Page 107


Statistics For Finance

Plant (age of employees)


1 2 3
29 32 25
27 33 24
30 31 24
27 34 25
28 30 26
Perform ANOVA test (use α = 0.01) to determine whether
there is a significant difference in the mean ages of the
workers at the three palnts
3. Hogos wanted to build a new store in three different kebeles in
Mekelle town. In an attempt to study whether the numbers of
potential customers are different for the three kebelles,
passerby counts were made for a random sample of 11 periods
at each kebelle. The sample means and sample variances are
given below:
Kebelle Sample mean Sample variance
16 125 288
17 141 248
19 124 304
Using α = 0.01, perform ANOVA test to determine if the mean
numbers of passerby at the three kebelles is equal.

Answer key for Activities


Chapter one
Part one:
1. True
2. False
3. False
4. True
5. False
6. False

Ethio lens college Page 108


Statistics For Finance
7. True
8. False

Part two:
1. 61 students; 105 and 246 students are the sample sizes
respectively
2. 60 is the sample size

Chapter two
1.

a.

Histogram

18
16
14
Frequency of claims

12
10
8
6
4
2
0
1
Claims

10,000-19,999 20,000-29,999 30,000-39,999 40,000-49,999


50,000-59,999 60,000-69,999 70,000-79,999 80,000-89,999

b.

Frequency polygon

18
16
14
Frequeny of claims

12
10
Series1
8
6
4
2
0
9

9
99

99

99

99

99
99

99

99
9,

9,
9,

9,

9,

9,

9,

9,
-1

-2

-4

-5

-7
-3

-6

-8
00

00
00

00

00

00

00

00
,0

,0

,0

,0

,0
,0

,0

,0
50
10

20

30

40

60

70

80

Claims (birr)

Ethio lens college Page 109


Statistics For Finance
2.

Bar chart

45

40

35
Number of visitors

30

25
Series1
20

15

10

0
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
College year

3. a.

A Less-than Ogive

120

100
Commulative frequency

80

60 Series1

40

20

0
0 up to 5 5 up to 10 up to 15 up to 20 up to 25 up to
10 15 20 25 30
Turnover rates

b.

Ethio lens college Page 110


Statistics For Finance

A Greater-than Ogive

120

100

Commulative frequency
80

60 Series1

40

20

0
0 up to 5 5 up to 10 up to 15 up to 20 up to 25 up to
10 15 20 25 30
Rates of turnover

Chapter Three
1. a) = 20;
b) = 19.16; = 16;
c) = 45.8, 6.77, & C.V. = 33.85%
2. a. (marketing) = 1.93 & managerial stat 0.993; &
0.996
b. C.V marketing = 4.98%; C.V managerial statistics = 6.63%)
c. Marketing

3. a. C.V agro processing = 19.58% and C.V high school = 21.20%


b. The agro processing business is less risky

Chapter three
1. The number of ways are:
a. 207,360 ways = (5! ×3! ×2! ×3!)
b. 3,628,800 ways = [(5+3+2)! = 10!]
2. 3 C 2 4 C3  12 ways

3. 5! ×6! = 86,400
11!
4. =34,650
1!4!4!2!
5. The probabilities are:

Ethio lens college Page 111


Statistics For Finance
a.
i. 0.28
ii. 0.084
iii. 0.3
b.
i. 0.2
ii. 0
iii. 0
6.
a. 5 C5 (0.1)5 (0.9)0

b. 5 C0 (0.1) 0 (0.9) 5

c. [ 5 C 0 (0.1) 0 (0.9) 5 ] [ 5 C1 (0.1)1 (0.9) 4 ]

d. [5 C0 (0.1)0 (0.9)5  5 C1 (0.1)1 (0.9) 4  5 C2 (0.1) 2 (0.9)3  5 C3 (0.1)3 (0.9) 2

7.
a. X  188.25
b. X  244.65
8. 1866.5

Ethio lens college Page 112


Statistics For Finance

Chapter Four
1. a

b. Y = 2.231x + 30.91
c.
d. They have positive relationship

2.
a.

Yes they have a relationship (as shown in the diagram)


b. Y = 5x + 60
c. They have a positive relationship

Ethio lens college Page 113


Statistics For Finance
d.
3. 0.8
4. 0.61

Chapter Five
1.
a. x = $ 35,420W
b. 35,208.59 to 35,631.41
c. We are 90% confident that  is found in the above
determined confidence interval
2. There is no enough evidence to contradict the manufacturer’s claim.
That is, H0 should not be rejected.

Chapter Six
1. ,

;
The sample variances are:

Ethio lens college Page 114


Statistics For Finance

and;

F > 3.10
9.43 > 3.10 is true; therefore, reject Ho
2. ;

;
Then, check if F-calculated is greater than the table value;
Here, we have F-calculated equal to 39.8 which is greater than
the table value 6.93; therefore the Ho is rejected.
3. ; reject if F-calculated is greater than
the table value;

Ethio lens college Page 115


Statistics For Finance

;
Then, reject if F-calculated is greater than the table value.
Here we have F-calculated equal to 3.75 which is less than 5.39.
Then we accept the Ho.

Ethio lens college Page 116


Statistics For Finance
References:
 Allen Webster (1992). Applied Statistics for Business and
Economics, USA
 Bowerman (2001): Business Statics in Practice, 2nd ed.,
McGraw Hill
 Frank. H. (1984). Business Statistics: An inferential Approach,
1st ed., Dellen
 R. P. Hooda (2000). Statistics for Business and Economics. 2 nd
ed., MacMillan
 R.I. Levin, D.S. Rubin, (1998). Statistics for Management. 7th
ed., New Delhi
 S. P. Gupta, M. P. Gupta (2002). Business Statistics, reprint,
New Delhi

Ethio lens college Page 117

You might also like