Statistical Methods of Financial Accounting
Statistical Methods of Financial Accounting
Sepet-2013
CHAPTER ONE
AN OVERVIEW OF STATISTICS
1.1 INTRODUCTION
Example If we say that Mr. Alex’s age is 35, what sense would it, then, make?
Therefore, statements of facts must allow comparisons and making of
relationships to others.
Statistics are affected by multiplicity of causes: statistical facts and figures are never
independent; rather they are affected by a number of operating forces, which may not
be measurable.
Example The rate of transmission of the HIV pandemic in the year 199x has
decreased by 2.5%. You may ask why; the possibilities are attitude change,
political commitments, education etc. Then can you specifically and separately
measure their contribution to the aggregate?
Statistics are numerically expressed: statements of facts like “the Ethiopian economy is
growing” or ‘Rural-to-Urban influx of population will increase’ are not statistics. As a
matter of fact, all statistics are numerical statements of facts, i.e., expressed in
numbers.
Reasonable standards of accuracy: facts of a given phenomenon can be derived either
through converting and measurement or estimation. However, in many of the
statistical enquires, it may be difficult to acquire a 100% accuracy of the facts. Thus it
is important to use some reasonable standards of accuracy.
Statistical collections are systematic: we don’t collect data in any way possible; rather
it demands that one has to put a comprehensive and appropriate plan for data
collection. Statistics are purposeful
Comparability: statistics must be place in relation to each other so that some
comparable figures can be drawn accordingly.
Activity
Which statement is statistics?
- Chewing habit of chat among the youth in Mekelle Town is growing
- The number of ‘chat’ chewing youth has increased by 20% in the year 199x while
in the year 199x the rate was 5% only
The above definition and basic characteristics describe only the facts. However, it is
important to know how the facts are being built. This leads us to define statistics as a
series of logical procedures ranging from data collection through interpretation. Then,
‘Statistics can be defined as the collection, presentation, analysis and
interpretation of numerical data: [singular sense].
Ethio lens college Page 3
Statistics For Finance
Data Collection
It is the first step in any statistical investigations and care must be enveloped here for
it prints the foundation of statistical analysis. Here it is important to locate the sources
of data and the techniques that need to be put in to effect such that one can realize
data utility.
Organization
This involves three jobs:
Editing – to rule for omissions, inconsistencies, wrong computations etc
Classifying- arranging data according to some attributes of homogeneity.
Tabulation – arranging data into column and rows to ensure clarity.
Presentation
This involves representing the statistical relevance on to diagrams and graphs such as
pie charts, bar graphs, histograms etc.
Analysis
Once the first three procedural requirements are put into effect, then it is on the ‘go
ahead’ position to allow the search for information useful for decision- makers. Notice
that analysis and interpretation is note one and the same. The premier deals with the
data it self while the later jumps beyond.
Interpretation
It is about drawing meaningful conclusions from the data collected and analyzed and
based on which implementation packages are set forth in resolving the managerial
problem on hand.
1.2.1 Classification of Statistics
It helps in predicting the future - mangers make provision of the future, which
almost is unknown to their view. Therefore, knowledge of future trends is the only
weapon on hand to visualize the future. This is made happen with the use of statistical
methods.
It helps in policy formulation-many of governmental data collections aim at
formulating appropriate policy, of what ever type, that help in the proper
administration of human and non-human resources.
1.2.2 Limitations of Statistics
- Statistics does not deal with individual measurements. Characteristically,
statistics deals only with aggregates of facts; in fact these aggregates allow
comparison and drawing meaningful information. Therefore individual
measurements, for instance ‘the number of students in New Millennium College
is 2,500’ are meaningless and barely considered as statistics.
- Statistics deals only with numerical characteristics. There are situations where
it may be difficult to express some characteristics quantitatively. In this case
the interpretation or conclusion would be subjective and not statistical.
- Statistical facts (results) are true only on an average. Statistical undertakings
end at drawing a conclusion; however, the conclusion draw cannot be universally
true.
- Statistics is only one of the methods of studying a problem . As statistics fail to
provide best solution under all circumstances, the measurements must be again
evaluated using other evidences.
- Statistics can be misused
Check point!
Make sure that you have very well understood the real sense of statistics as it applies
to many of the business decision schemes
1.3 Data collection
1.3.1 Introduction
Once the purpose of the investigation is well spelled, the scope well defined, the unit of
data collection is decided, the sources, techniques, frame and degree of accuracy are
principally and comprehensively uncovered, and other pre operating activities
(preliminaries) are put in place, n is the time to step up to the first step called data
Ethio lens college Page 6
Statistics For Finance
collection. Data set the foundations for statistical analysis. There fore, the volume and
quality of data collected and how it is collected are of important boundaries between
which effective information is and which is not.
1.3.2 Sources of data
Generally there are two types of sources of data, namely Primary sources and
secondary sources.
Primary sources these are sources which the investigator meets in person so as
to generate first hand data. Or, a primary source is one that it self collects the data.
The sources thought may vary depending on the nature of the enquiry.
Primary sources are used for either of the following reasons:
Secondary sources may be mistaken due to errors that may have been committed while
in recording the primary data.
Primary sources enable to show a greater detail of data
Primary sources also allow tracing the procedures used in collecting the data as well as
in selecting the sample.
Thus, depending on the type of source, data can be classified in to two:
Primary data- the type of data that are originally and directly generated from the
mouth and action/behaving of the respondents. Example, if a manufacturing company
is collecting data from the users of its product on issues concerning the product’s
capability in satisfying their needs compared to that of competitors’; then the primary
sources are the customers and the primary data is the word of mouth and behavioral
reflections of the customers
Secondary data- unlike to the primary data, secondary data are collected from already
processed documents like journals, articles, government releases, and other documents
of relevance. One can deduct that with the use of secondary sources it is easy to save
time and (that would other wise have been lost in planning and executing the collection
project). More over, it is useful in times when it is impossible to collect the primary
data. In fact, the limitations that are typically related with secondary data are that of
‘fit’ and ‘accuracy’
Factors that determine the choice between primary and secondary data
The question as to whether to use primary data or secondary data is determined by the
following factors:
Nature and scope of the enquiry
Ethio lens college Page 7
Statistics For Finance
at the gates of the library and sample the first 8 students entering in to the library.
This method can, rather, be used for pilot studies, before a final sampling design is
decided up on.
c. Quota Sampling this is a type of judgment sampling-there quotas are set up
according to a given criterion, but the selection of the sample units with in the
prescribed is made according to the personal judgment of the investigator.
For example, if you have to investigate on the saving habits of 250 college students,
you may perform it in that out of the first 100 persons 35 should be 4th year
students, 30 should be 3rd year students, 20 of 2nd year students and 15 of 1st year
students. This is what is called quota sampling. Despite the fact that the method is
easy, it is highly subject to personal biases and consequently the sample may not be
representative of the population.
Determining the size of a sample
The following inputs must be well-thought-of in order to determine the sample size for
a given study:
1. The desired precision level; that is, the magnitude of the term that the
researcher is willing to tolerate; for instance you may allow a 5% margin of
error.
2. The desired confidence interval (Zq – value), that is , the degree of confidence that
the decision maker have in the interval estimate
3.An estimate of the degree of variability in the population, expressed in the form of
a standard deviation
Given the desired precision level, H; the desired confidence level (q), and an estimate of
the standard deviation (s) we can write the following equation;
ZqS
H Squaring both sides of the equation, we can re-write the equation
n
as,
Zq 2 S 2
n
H2
****n stands for the total number of items to be included in the sample
Example A marketing manager of Almeda Textiles wants to estimate the average
annual amount that families in a certain locality spend on local textiles per year. He
wants the estimate to be with in a birr 10 margin of error. When such an interval
Zq 2 S 2 (2.575) 2 (100) 2
n = 663 families
H2 10 2
Example The executive committee of Mesfin Industrial Engineering wants a 95%
confidence when dealing with the Assembly Contract Provision that it is planning to
take over. How ever, the industrial experience shows that such major takeovers may
involve a market risk of about 4.5 million. More over, the computed standard of
deviation of the same industry amounts to 11.45 million.
The executive committee wants to know the expected return on its decision. What
should the number be of cases (in the industry) the committee considers for the
analysis purposes?
Solution
In the above example the desired confidence level thought is 95%. Its value from the
table is given as 1.96.
Zq 2 S 2 (1.96) 2 (11,450,000) 2
Then, n 11,920747.6
H2 4,500,000
Review Questions
CHAPTER TWO
CLASSIFICATION AND PRESENTATION OF DATA
Introduction
Dear distance learner, once data are collected they will not be in their useable form
unless further value adding activities are undertaken. Putting the bulk of data in to
easily manageable and compact form and ensuring that they are collected as per the
research structure is and in a way adhering to the objective is more important logical
statistical phase. This process is called presentation. Another important
complementary task is that of organization dealing with editing, classification and
tabulation.
Student learning objectives
At the end of this section you should be able to
- understand the meaning of data classification and presentation
- Construct the various types of tables and graphs and charts
- Apply the constructions in real world business decision
375, 525, 1200, 150, 700, , 700, 525, 375, 375, 700, 150, 150, 525, 150, 1200. This can be
listed in a tabular form and the frequency of occurrences be assigned as shown below;
Sales Frequency
volume(birr)
150 4
375 4
525 5
700 5
1200 2
20
Exercise
Students in the New Millennium College have been asked to rate the college’s service
program. The results show Good, Excellent, Very good, Very good, Very good, Good,
Moderate, Excellent, Very good, Good, Very good, Moderate, Excellent. The student’s
reaction can be summarized as
Students’ reaction Frequency
Excellent 4
Very good 5
Good 2
Moderate 2
Poor 0
13
Steps to construct a frequency distribution (grouped data)
Generally, the following steps should be followed in constructing a given frequency
distribution table
a. Determine the number of classes, usually between 5 and 15
b. Determine the size of each class. Class size or width is determined by finding the
difference between the largest value in the data set and the smallest value and
dividing it to the number of classes desired.
c. Determine the starting point for the first class
d. Prepare a table of the distribution using the actual counts/ percentages (relative
percentages)
Exercise
As part of the financial policy and pay system reform project, ROSE Consulting Group
has been investigating the monthly income of the employees of the client company,
CLEAR BLUE, the following results were obtained; 545, 545, 545, 675, 545, 690,690,
675, 1450, 1200, 545, 870,870,375,454,400,600,900, 955,640, 1125, 1000, 1040, 755,
790,850 775, 1075, 690, 650.
Then construct the frequency distribution table.
Solution
Determine the number of classes that you want. Let’s assume a class of 5
Determine the size of the of each class;
First find the range of the data by subtracting the lowest value from the highest value;
the higher value is 1450 birr and the lowest value is 375 birr. Then the range ( R) is
1450 – 375 = 1075. Second you divide the range to the number of classes thought, i.e.,
range 1075
Class size (width) 215
total number of classes assumed 5
Then you can start constructing the intervals by determining the lower limit of the first
class. Assume a lower class limit of 375; then the class intervals will be as follows:
Monthly income Tally Frequency
375 – 590 IIIIIIII 8
590– 805 IIIIIIIIIII 11
805- 1020 IIIIII 6
1020– 1235 IIII 4
1235 - 1450 I 1
2.2 Cumulative Frequency Distribution
Cumulative frequency distribution, unlike the simple frequency distribution, spells the
total number of items or observations that fall above or below a certain point or
juncture. Thus, if you would like to know the total number of observations that fall
below or above a given point, you can use the cumulative frequency distribution. For
example, construct a cumulative frequency distribution for the above example of ROSE
Consulting Group
Monthly income Frequency Cumulative frequency
375 – 590 8 8
590– 805 11 19
805- 1020 6 25
1020– 1235 4 29
1235 - 1450 1 30
Exercise
Given the number of visitors of the Mekelle Museum of Martyrs, as reported by the
authorities, 24-45; 45-66; 66-87; 87-108; 108-129; 129-150 and their respective
Construct
A. Frequency distribution table
B. Cumulative frequency distribution table
C. Relative frequency distribution
1 2 3 4 5
Activity
Using the data on the monthly income of 30 employees, construct the relative
frequency distribution and the pie chart showing their relative importance of the
distributions,
Income levels Frequency Cumulative frequency Relative frequency
375-590 8 8 8/30
590-805 11 19 11/30
805-1020 6 25 6/30
1020-1235 4 29 4/30
1235-1450 1 30 1/30
Then construct the pie chart representing the percentage/relative importance of the
above income distributions
2.3.2 Bar charts
Bar chart is another common method for graphically representing nominal- and
ordinal-scaled data. The height of each bar is proportional to the number of items in
each category. The bars are separated, positioned vertically with this base on the x-
axis.Take for example the above market share values of the five industries in the
economy; the bar chart representation will be as
2.3.3 Histogram
The histogram is frequently used to graphically present interval and ratio data. In this
graphing method, the categories or classes are plotted along the horizontal axis of the
graph, and the numerical values of each class are presented by vertical bards. The bars
are not separated; the adjacent bears indicated that a numerical range is being
summarized by indicating the frequencies in arbitrary chosen classes.
Again take the above market share values representing each industry in the economy;
the histogram representation will be:
The histogram representation will be, then:
35
30
Num
25
ber
of
20
vote
rs 15
10
0
1
Age group
2.3.5 Ogive
A graph of a commutative frequency distribution is called an ogive. It is used when one
wants to determine how many observations lie above or below a certain value is a
distribution. A less than ogive tells how many items in the distribution have a value
less than upper class limit of each class.
First a cumulative frequency distribution (CFD) is constructed. Next, the commutative
frequencies are plotted at the upper class limit of each category. Finally the points are
connected with straight lines to form the ogive curve.
Exercise
The following table shows the average weights of 20 heavy weight boxers
Weight Frequency Cumulative Frequency Distribution
less-than greater-than
110-125 3 3 20
125-140 6 9 17
140-155 5 14 8
155-170 4 18 4
170-185 2 20 2
Then, you can draw the less than curve taking the upper class limits of each class as
25
20
Commulative frequency
15
Series1
10
0
110-125 125-140 140-155 155-170 170-185
Weight
A more than ogive shows how many items in the distribution have a value grater than
or equal to the lower limit of a particular limit;
25
20
ulativefrequency
15
Series1
10
Comm
0
110-125 125-140 140-155 155-170 170-185
We ight
Activity
Given the following information on the monthly apartment rental rates for 200
apartments construct;
Histogram of the distribution
Frequency polygon
Frequency curve (Ogive):
Greater than
Less than
a. Construct a histogram
b. Draw a frequency polygon
2. A librarian in the Mekelle City Public Library has been tallying the number of
college students visiting her library by their number of years in college and came
up with the following data:
CHAPTER THREE
MEASURES OF CENTRAL TENDENCY
Introduction
In calculating summary values for data collection, the first consideration is to find a
central, or typical, value for the data. Three important measures of central tendency
are presented in this section: mean, median, mode. With the use of these measures we
can summarize the huge volume of data with a single value characterizing the nature
of data we have. More over, measures of variation or dispersion are used to diagnose
how good the distribution of data is with reference to the central measures.
chapter objectives
By the end of this chapter, students should be able to:
Understand the use of the measure of central tendency
Calculate the different measures of central tendency
Understand and calculate the measures of variation
3.1 Measures of central tendency
Arithmetic mean
The arithmetic mean of a collection of numerical values is the sum of these values
divided by the number of values. The symbol for the population mean is the correct
letter (mu), and the symbol for a sample mean is x (x-bar):
n
x i
x i 1
, . . . . . . . . . . . . . . . . . . . . . For ungrouped data
n
n
fx i i
x i 1
, . . . . . . . . . . . . . . . . . . . . For a grouped data
N
Weighted Mean
The simple measure of an arithmetic mean that has been measured above gives equal
importance or weight to all the data items. How ever, it is possible that data items may
not have equal weights demanding different treatment. The type of mean which is
obtained by taking the weights of each observation in to consideration is known as the
weighted arithmetic mean (or weighted mean).
Suppose the weights be P1, P2, P3……Pn and the data values as X1, X2, X3…… Xn, then
the weighted mean is given by
PX i i
P1 X 1 P2 X 2 P3 X 3 ... Pn X n
i 1
, i.e., X w
N N
Where Pi is the weight given for each observation
Xi is the data items
Ni is the sum of the weights
Example
Tadesse has registered for 5 courses of credits 4, 3, 3, 3 and 3 respectively in the second
semester of a given year. At the end he scored A, B, A, D and C respectively. Consider
A=4, B=3, C=2, D=1, and F=0. Then compute the weighted mean of the student.
Solution
P1 X 1 P2 X 2 P3 X 3 ... Pn X n
Xw
N
4 4 3 3 4 3 1 3 2 3 16 9 12 3 6 46
3.286
4 3 4 1 2 14 14
Example
A marketing manager of ‘silas’ and family Plc has come out with 5 major economic
courses of actions which would result 20, 21, 24, 22, and 27 (in millions) returns on
investment. However, due to uncertainties the manager has assigned different
probability degrees of occurrences as 5, 4.5, 3, 3.5 and 2.75. Then compute the weighted
mean
Solution
Since sum of all probabilities is equal to 1, we first convert the probabilities in to the
same, i.e., 5+4.5+3+3.5+2.75= 18.75
Now, 5/18.75 = 0.267
4.5/18.75 = 0.24
3/18.75 = 0.16
3.5/18.75 = 0.187
2.75/18.75 = 0.146
0.267 20 0.24 21 0.16 24 0.187 22 0.146 27 22.28
Xw 22.28
1 1
Activity:
A company has earned sales of $40.000 at the end of the year 1997 from the sale of
three model bikes: MA1, MA2 and MA3. Each unit of model A1 sells for $1,750, each
unit of model A2 sells for $1,400 and each unit of model A3 sells for $1,150. The
company sold 50, 90 and 110 units of each model type respectively. Compute the
weighted mean
Geometric Mean
It is unjust to assume that figures (Quantities) will remain the same. They rather
change for which we may be interested in finding the average rate of change over a
period of time. The measure is called Geometric Mean (GM)
Geometric Mean (GM) = n product of all x values
Compute the average rate of fall of the selling price over the four months.
(ANS 6.94%)
The Median (Some times called counting average)
Median refers to a single value from the data set that measures the central item in the
data. The single item is the middle most or most central item in the set of numbers.
Half of the items lie above this point, and the other half lie below it.
To find the median we first array the data in a descending or ascending order. Once
ordered, the middle value will be the median (if the number of observations is odd) or
the average of the two middle items (if the number of items is even)
Calculation of median
th
n 1
Median 1) = item in a data array (ungrouped data)
2
n 1
F 1
2
2) X w L (Grouped data)
fm m
Where:
X is sample median
n is total number of items in the distribution
F is sum of all class frequencies up to, but not including, the median class
fm is the frequency of the median class
w is class interval
Lm is lower limit of the median class interval width
The Dashen Bank Mekelle Branch has disclosed that distribution of its customers
monthly balance as in the following table.
Class Frequency
interval(Birr)
0-49.99 78
50-99.99 123
100-149.99 187
150-199.99 82
200-249.99 51
250-299.99 47
300-349.99 13
350-399.99 9
400-449.99 6
450-499.99 4
Total 600
Solution
th
n 1 600 1
Using the first method, i.e, 300.5 item is the center most. (You can
th
2 2
take it as the 300th and the 301st item)
Add the frequencies to locate the class that contain the above center most element ( i.e.,
78+123+187=388) this shows that the item is in the 3rd class 100-149.99
Lm (the lower limit of the class) =100
n (number of observations=600
F (sum of all frequencies up to the med. class) =201
W (class interval width) 149.99 100 49.99 50
fm (frequency of the median class)=187
(n 1)
( F 1)
~
x Lm 2 W, substituting the items in the formula we have,
fm
(600 1)
(201 1)
~
x 100 2 50
187
(601)
(202)
x 100 2
~ 50
187
~ (98.5)
x 100 50
187
=100+ (0.527)50
=100+26.35; ~
x =126.35 is the sample median
Example
The following data represent the weights of fishes caught in lake ‘Hashenge’ by a local
fisherman
Class Frequency
0-24.9 5
25-49.9 13
50-74.9 16
75-99.9 8
100-124.9 6
( n 1)
( F 1)
~
x Lm 2
fm
(48 1)
(18 1)
~
x 50 2 25
16
(49)
(19)
~
x 50 2 25
16
(49 38)
~
x 50 2 25
16
(11)
x 50 2 25
~
16
~ 5.5
x 50 25
16
50+(0.34375)25
50+8.59
~
x 58.59 is the median item.
Activity
AWASH Insurance S.Co has present the following table of claims by its customers for
vehicle accidents in the last fiscal year.
Amount of claims (in birr) Frequency
0-299.99 52
250-499.99 337
500-749.99 1,066
750-999.99 1,776
1000-1249.99 1,492
Compute the median
The Mode (observed average) x̂
Sometimes you may come to situation where you want to know the value with the
greatest number of happening (occurrence) the value, therefore, with the largest
number of occurrence is what is called mode or modal value. Or it can be defined rather
as the value about which the items are most closely concentrated
Graphically the most typical or fashionable value of a distribution can be given as
follows:
y
0 mode x̂ x
Calculation of mode
If the distribution is ungrouped, then item with the greatest frequency is selected as
the modal value.
However if the distribution is grouped the following formula is used,
1
xˆ Lmo w
1 2
Where x̂ is the mode
Lmo is the lower limit of the modal class
1 is the difference between the frequency of the modal class and the frequency
of the pre modal class
2 is the difference between the frequency of the modal class and the frequency
of the post modal class
w is the class interval of the modal class
Example Consider the following table of income distribution of 300 workers of Messebo
Center Factory
Income interval Frequency
100-149.5 12
150-199.5 14
200-249.5 27
250-299.5 58
300-349.5 72
350-399.5 63
400–449.5 36
450–499.5 18
Solution
Locate the class with the greatest frequency; in this case 300-349.5 is the modal class
(72)
Then Lm = 300
1 = 72-58 = 14
2 = 72-63 = 9
w = 349.5-300 = 49.5 50
1
xˆ Lmo w
1 2
14
300 50
14 9
14
300 50
23
= 300 + (0.6087)50
= 300 + 30.44
= 330.44
Example
The following were the grade score points of 60 students in their managerial statistics
Score Frequency
35 – 41.99 10
45 – 54.99 12
55 – 64.99 18
65 – 74.99 13
75 – 84.99 7
Compute the modal score point
Solution
From the table the class with the largest frequency is 55 – 64.99 (18)
Then Lmo = 55
1 = 18 – 12 = 6
2 = 18 – 13 = 5
w = 64.99 – 55 = 9.99 10
1
xˆ Lmo w
1 2
6
= 55 + 10
65
6
= 55 + 10
11
= 55 + (0.5455) 10
= 55 + 5.45
x̂ 60.45 is the modal value
Which method to use
Generally when the distribution is symmetrical that contains only one mode the values
of the mean, median and the mode are the same. In this case we can use any one of the
measures. However if the distribution is skewed, the median is the best measure.
Ethio lens college Page 35
Statistics For Finance
The following contingencies can also be used to determine which method to use:
If the numerical data have no extreme values, the mean can be used
If the numerical data have extreme values(s) or if the data is non numerical
which can be arranged in some order, and then the median is the best measure.
If the data is non-numerical and can not be represented (order) is any way the
mode is the best measure.
Attention! It is also possible to use Karl Pearson’s relationship to compute the values of
the mean, the median and the mode.
Here it is,
Mode = Mean – 3 (mean –Median)
Mode = 3 median – 2 mean
Median = mode + 2/3 (mean – mode)
Example It has been reported by the Bureau for trade and investment of Mekelle town
that out of the total investment certifications of 200 projects, 51.5 percent accounts for
20.5 million birr and the average capital investment of the 200 investment projects was
22.4 million. Find the median investment
~
x xˆ 2 ( x xˆ )
3
~
x 20.5mil. 2 (22.4mil. 20.5mil.)
3
~
x 20.5mil. 2 (1,900,000)
3
~
x = 20,500,000 + 1,266,666.7
~
x = 21,766,666.7 birr
Activity
The following table is age distribution of residents of kebele 20 in Mekelle town
Class Frequency
47 – 51.9 4
52 -56.9 9
57 – 61.9 13
62 – 66.9 42
67 – 71.9 39
72 – 76.9 20
77 - 81.9 9
Required
Compute a. The mean age of the residents
b. The median age of the residents
c. The modal age of the residents
Age Calorie
requirement
15 150-174.5
18 175-199.5
21 200-224.5
24 225-249.5
27 250-274.5
30 275-299.5
x i ~
x f i xi ~
x
M .D.~x (ungrouped data ), or M .d .~x , (for grouped data)
n N
Example
The following are data values of student’s score in 5 scores 70, 50, 81, 67, 59. Then
compute the mean deviation from the mean and from the median.
Solution
First compute the mean of the sample, i.e.
Xi 70 50 81 67 59 65.4;
n 5
Then ( X x) is, (X ~
x)
70 65.4 4.6 70 67 3
50 65.4 15.4 50 67 3
81 65.4 15.6 81 67 14
67 65.4 1.6 67 67 0
59 65.4 6.4 59 67 10
X x 43.6 X ~x 30
M .D x
X x
43.6
8.72 M .D~x
X ~
x
30
6
5 5 5 5
Example
The sales records of ABC trading shows, 9, 12, 14, 11, 8, 5.5, 15, 8.5, 9.5 and 10.75. The
frequency of the observations is 4, 2, 5, 3, 6, 6, 7, 4, 6, 4,
Find the mean deviation from the mean
Solution
Sales Frequency fixi Sales Frequency fixi
9 4 36 5.5 6 33
12 2 24 15 7 105
14 5 70 8.5 4 34
11 3 33 9.5 6 57
8 6 48 10.75 4 43
The mean of the distribution is given by,
x
fiXi 483 10.28 , then
fi 47
Then X x is, f f Xi x
9 10.28 1.28...........................4............................5.12
12 10.28 1.72.........................2............................3.44
14 10.28 3.72.........................5............................18.6
11 10.28 0.72.........................3............................2.16
8 10.28 4.28..........................6............................25.68
5.5 10.28 4.78.......................6............................28.68
15 10.28 4.72........................7............................33.04
8.5 10.28 1.78.......................4............................7.12
9.5 10.28 0.78.......................6............................4.68
10.75 10.28 0.47...................4............................1.88
M .d . ~x
fi Xi x =
130.4
2.77
N 47
Activity
Given below is the age distribution of 8 runners:
Age Frequency
15 15
45 10
29 30
40 10
48 9
32 12
42 11
65 4
Example Given the mean deviation (of the mean) 2.77and mean
value of 10.28, the coefficient of mean deviation will be
M .D. x 2.77
Cmd = 0.2695 26.95%
x 10.28
x
fiXi 4610 118.21
fi 39
M .d .~x
fi Xi x
611 .79
15.69
N 39
M .D.x 15.69
Cmd = 0.1327 13.27%
x 118.21
Activity
The number of visitors to the Mekelle museum is given below for the
month ‘Hamle’ per day
Visitors frequency
15 6
20 4
22 12
24 5
10 3
Compute the mean, mean deviation, and the coefficient of mean
deviation.
Variance and Standard Deviation
These are other measures of dispersion which are often used in many
areas of interest and particularly as they apply to business. Variance
and standard deviation are powerful measures of dispersion which
take in to account how all the observations in the data are
Ethio lens college Page 50
Statistics For Finance
distributed and take in to consideration each value of the data. If the
data are reasonably closer to the center (to the mean), then we say
that there is little variability or dispersion in the data. On the other
hand, if the data are quite dispersed and at a considerable distance
from the center, then we would say that the data is highly variable.
Their measure is given by:
(Ungrouped Data)
(x )2 N x i ( x i ) 2
2
Variance
i
2
, or , population
N N2
(x n x i ( xi ) 2
2
i x) 2
Variance s 2 , or , sample
n 1 n 2 (n 1)
(x i )2
, population
N
Where
2 = population variance
s 2 = sample variance
N = total number of observations (population size)
n = total sample observation
= population mean
xi = data values or class midpoints
x = sample mean
= population standard deviation
s = sample standard deviation
Example
Take sample ages of 10 college students below. Find their standard
deviation and the variance.
17, 17, 18, 19, 20, 20, 22, 22, 22, and 23
Solution
First compute the mean of the distribution, i.e.,
x
Xi 200 20
n 10
Then the variance can be computed as follows:
Age(x) x x x x 2
17 -3 9
17 -3 9
18 -2 4
19 -1 1
20 0 0
20 0 0
22 2 4
22 2 4
22 2 4
23 3 9
x
2
i x 44
Variance s 2
(x i x) 2
44
44
4.88 ; the standard
n 1 10 1 9
deviation of the distribution is the root value of its variance,
candidates(x)
22 -1.4 1.96
21 -2.4 5.76
20 -3.4 11.56
25 1.6 2.56
26 2.6 6.76
24 0.6 0.36
26 2.6 6.76
24 0.6 0.36
22 -1.4 1.96
24 0.6 0.36
x
2
i x 38.4
x
x i
234
23.4
n 10
S
(x i x) 2
38.4
38.4
4.2667 2.0656
n 1 10 1 9
38.4
S2 4.2667
9
Since the computed standard deviation is greater than the desired
one, the candidates may not all qualify
Example:-
The age of college students is given below
Age F
16 -17 4
17 -18 14
18 -19 18
19 - 20 28
20 -21 20
21 – 22 12
22 – 23 4
point
16 -17 16.5 4 66 -2.98 8.8804 32.52
17 -18 17.5 14 245 -1.98 3.9204 54.89
18 -19 18.5 18 333 -0.98 0.9604 17.29
19 - 20 19.5 28 546 0.02 0.0004 0.01
20 -21 20.5 20 410 1.02 1.0404 20.81
21 – 22 21.5 12 258 2.02 4.0804 48.96
22 – 23 22.5 4 90 3.02 9.1204 36.48
fx 1,948 f x
2
i i i x 210.96
Variance 2
f (x )
i i
2
, and s 2
f (x
i i x) 2
f i n 1
, and s
fi( x i x) 2
fi n 1
x
fx i i
1948
19.48
f i 100
s2
f (x
i i x) 2
210.96 210.96
2.13
n 1 100 1 99
Standard deviation s
fi( x x)
i
2
2.13 1.46
n 1
A
(x i x) 2 A
4038
448.67 21.18
n 1 10 1
A 21.18
C.V . A 100 100 39.96%
xA 53
(x
2
x) B 13,734
B
i
1526 39.06
n 1 10 1
B 39.06
C.VB 100 100 79.71%
xB 49
There fore, from the above computations, Ayele is better and
consistent than Bogale.
Activity
Ato Pawlos has tried to test samples of polythene bags from
manufacturers for bursting pressure and got the following results
Bursting pressure Number of
bags
A B
5.0-9.9 2 9
10.0-14.9 9 11
15.0-19.9 29 18
20.0-24.9 54 32
25.0-29.9 11 27
30.0-34.9 5 13
Required:
1. Which set of bags has the highest average bursting pressure?
2. Which bag has more uniform pressure?
3. If prices are the same, which manufacturer’s bags would be
preferred and why?
Self test questions
1. The Addis Ababa City Municipality Police Traffic Control
Department has observed the number of car accidents (per
month) to be categorized as shown in the table below:
A) Principle of multiplication
This principle shows that if an action can be completed in k steps of
which the 1st step can be done in n1 ways, the 2nd in n2 ways, the 3rd
in n3 ways,…, and the kth step in nk ways, then the whole action can
be completed in ( n1)(n2)( n3)….( nk) ways.
Example
There are 4 ways that take from a given dormitory to a cafeteria.
There are also 3 ways that take from the cafeteria to a class. In how
B) Permutation.
The 1st place can be filled in 4 ways, the 2nd in 3 ways, the 3rd in 2
ways, and the 4th in only 1 way. Therefore, the students can be
seated in 4× 3× 2× 1=24 ways. At this point, we can introduce a
notation, the Factorial notation.
The symbol n! is read as” n factorial” and is given by
Example
From a committee of 30 members, in how many ways can a
president, a vice president and a secretary be selected?
Solution: Note that order is important here.
n = 30
r=3
r”.
If r =n, then nCr = nCn =n! ÷ (n-n)! × n! =n! ÷ 0! × n! =n! ÷ 1× n! =n!
÷ n! =1. Note that n Pr nCr r!
Example
In how many ways can 3 books named A, B, and C be combined or
grouped taken 2 at a time?
Solution: n = 3
r=2
Therefore, there are nCr =3C2 = 3! ÷ (3-2)! × 2! = 3! ÷ 1! × 2! =3× 2! ÷
1× 2! =3 ways of grouping the 3 books taken 2 at a time. These 3
groups are AB, AC, and BC.
P ( A B ) P ( A) P ( B / A) P ( B ) P ( A / B )
= 4C2 .(1/2)2.(1-1/2)(4-2)
= 4! . × 1 × 1=3
(4-2)! 2! 4 4 8
Example
10% of the students in a class are left-handed. If 8 students are
sampled, what is the probability of getting 3 left-handed students?
Solution: success = left-handedness
n=8
X=3
µ
The normal curve
µ x 0 z
The normal curve The standardized (z) curve
The z- distribution is used to determine the percentage of
observations which are greater than or less than a given value. It is
also used to determine the probability that an observation is found in
a given interval.
The probability (area) under the standard normal curve is found
from a standard table (z- table) which is given in different statistics
books as an appendix. This standard table gives the probability that
a given value is between the zero (0) and z.
E.g. Consider a normal distribution with µ = 20 and σ = 2.
a) Find the probability that a measurement will be in the interval
from 20 to 23.
Solution: µ = 20
σ=2
A
0 1.5
Now, one can read the value of A from the
standard table.
The standard table is given in the appendix (see appendix)
In our example, the value of z is 1.5.Therefore, we look for the value
1.5 in the column under z and then look for the value 0.00 in the row
containing z (at the top), because the 2nd decimal place of the z value
is 0 (1.5 = 1.50).Then, take the value in the body of the table that is
found at the intersection point of 1.5 and 0.00 (see the sample table).
This value is 0.4332.
Therefore, our answer is A= 0.4332, which is the probability that a
measurement will be in the interval from 20 to 23.
N.B: If our z value were 1.23 the area would be 0.3907 (see the
sample table).
b) Find the probability that a measurement will be in the interval
from16 to 18.
Solution:
X1=16 A
X2=18 16 18 20
µ =20
We are asked to find A. So, let’s change the normal curve to z-curve
as follows.
-2 A -1 0
A= (Area b/n -2 and 0)-(Area b/n -1 and 0)
=0.4772-0.3413
=0.1359
So, the probability that a measurement will be b/n 16 and 18 is
0.1359.
N.B :( Area b/n –a and 0) is the same as (Area b/n 0 and a)
E.g. (Area b/n -2 and 0) = (Area b/n 0 and 2) =0.4772
c) Find the probability that a measurement will be less than 16.
Solution:
A
16 20
We are asked to find the area A. So,
x 16 20 4
z 2
2 2
A
-2 0
So, A= (Area to the left of 0)-(Area b/n -2 and 0)
= 0.5-0.4772
= 0.0228
Therefore, the probability that a measurement will be less than 16 is
0.0228.
From the trend in the relationship, you can see that it is increasing
even though the relationship is not perfect. In other words, profit
increases with an increase in advertisement expenditure.
Exercise
A teacher wants to study the number of students absent on a given
day is related to the mean temperature on that day. A random
sample of 10 days is used for the study. The following table shows
data on the number of students absent from class and average mean
temperature.
Absent students 8 7 5 4 2 3 6 8 9
Temperature 10 20 25 40 45 50 55 59 60
a. Determine which variable is dependent and which is independent
b) Draw a scatter diagram of these data
i. From the data we can understand that the number of absentee
students is affected by the change in temperature. That is
temperature is independent variable and absenteeism is a
dependent variable
ii.
The dots represent the scatter diagram. From the above diagram,
however, we see that temperature and number of absenteeism have
little relationship as indicated by the regression line in the diagram.
Activity
The Mekelle University Environmental Health department wants to
determine the statistical relationships between many different
variables and the common cold. The following table contains the
data on the use of facial tissues and the number of days that the
common cold symptoms were exhibited by seven people
Facial tissues 2000 1500 500 750 600 900 1000
Number of days 60 60 10 15 5 25 30
a) Determine the dependent and independent variables
b) Draw the scatter diagram
c) What is the type of the relationship
d) Interpret your graph
The Least Square Method
With this method we find the line of best fit that involves
representative ness, i.e., the distance between the line and the points
is minimal. Least Square method is a mathematical procedure to
find the equation for the straight line that minimizes the sum of the
square distances between the line and the data points, as measured
in the vertical (or Y) direction.
n xy [( x)( y )]
b1
n x 2 ( x ) 2
observation.
n = Number of x-y observations
b0
y b x Or b0 = y b1 x
1
n n
Where x = sum of the x values
n xy [( x)( y )]
Remember b1
n x 2 ( x ) 2
Let’s take the following example which was used to draw a scatter
diagram above:
Advertising(x) Sales (y) xy X2
5 8 40 25
6 7 42 36
7 9 63 49
8 10 80 64
9 13 117 81
10 12 120 100
11 13 143 121
Total 56 72 605 476
203
b1 1.036
196
72 56
And b0 = y b1 x but y = 10.29 and x = 8
7 7
b0 10.29 1.036(8) 2.002
Y 2.002 1.036( x) is the equation of the regression line.
Interpretation; from the equation of the line we can see that for unit
increase in advertisement expense, sales increases by 1.036 birr.
b. If the advertisement expenses were 7 units, sales will be computed
as Y 2.002 1.036(7) 9.254units
Example
The Maintenance Head of IVECO (Ethiopia) wants to know whether
or not there is a positive relationship between the annual
maintenance cost of their new bus assemblies and their age. He
collects the following data:
Bus Maintenance Age (yrs) xy X2 Y2
cost (birr) (y) (x)
1 859 8 6,872 64 737,881
2 682 5 3,410 25 465,124
3 471 3 1413 9 221,841
4 708 9 6,372 81 501,264
5 1,049 11 12,034 121 1,100,401
6 224 2 448 4 50,176
7 320 1 320 1 102,400
8 651 8 5,208 64 423,801
9 1094 12 12,588 144 1,196,836
6058 59 48,665 513 4.799,724
Required
a. Plot the scatter diagram
b. What kind of relationship exists between these two variables?
c. Determine the simple regression equation
d. Estimate the annual maintenance cost for a five-year-old bus
Solution
b0 = y b1 x
=
y 70.92 x
n n
6058 59
= 70.92
9 9
= 673.11 464.92 208.19
Then y r 70.92 208.19 x ,
n xy ( x)( y )
r
n x 2 ( x ) 2 n y 2 ( y ) 2
x
2
= sum of squared x values
y 2
= sum of squared y values
observation.
n = number of x-y observations
Example
234,021.5
= = -0.855 -0.86
273,494.4
c) The correlation coefficient r= -0.86 indicates a rather strong
negative linear relationship between car weight and miles per
gallon in to the sample. That is, cars that weight more seem
to get fewer miles per gallon and vice versa.
You may also see this same relationship in the following diagram
with the R2 value being 0.731:
A B C D E F G H
Judge1 5 2 8 1 4 6 3 7
Judge2 4 5 7 3 2 8 1 6
1-0.33=0.67=67%
Example
The following table presents the scores of students in New
Millennium College 3rd year Management Students
Marks in: 1 2 3 4 5 6 7 8 9 10
Mathematics 55 74 40 50 65 74 69 80 40 43
Statistics 62 60 55 70 72 67 80 79 52 40
Compute the rank correlation coefficient
Solution
Students Maths (X) Rank Statistics Rank D=X- D2
Y
1 55 6 62 7 -1 1
2 75 2 68 5 -3 9
3 40 10 55 8 2 4
4 50 7 70 4 3 9
5 65 5 72 3 2 4
6 74 3 67 6 -3 9
7 69 4 80 1 3 9
8 80 1 79 2 -1 1
9 41 9 52 9 0 0
10 43 8 40 10 -2 4
D
2
50
R= 1-030330 = 0.697=69.7%
Example
A company hired six computer technicians. The technicians were
given a test designed to measure their basic knowledge. After a year
of service, their boss was asked to rank to each technician’s job
performance. Test scores and performance ranking are given below:
Technician Test score Performance ranking
1 82 3
2 60 6
3 80 2
4 67 5
5 94 1
6 89 4
Is there any relationship between test score and job performance?
Solution
Test score Rank test Performance D D2
score score
82 3 3 0 0
60 6 6 0 0
80 4 2 2 4
67 5 5 0 0
94 1 1 0 0
89 2 4 -2 4
d 0 d 2
8
Then the rank correlation coefficient is calculated as
6 D 2 6(8) 48 48
R= 1 = 1 1 1 1 0.2285714 0.77
N (N 2
1) 6(6 1)
2
6(35) 210
A1 A2 A2 A1
- zα/2 0 zα/2
A1 + A2 =0.5(half of the curve), but A1= α/2
So, A2 = 0.5 - α/2
As mentioned above, (tα/2, n-1) is a value to be read from
the t-table with n-1 degrees of freedom. A sample of the
t-table is given below.
be:
S
x ± (Zα/2) ( )
n
S S
[ x- (Zα/2) ( )] ≤ ≤ [ x + (Zα/2) ( )]
n n
45 45
[35-(Z0.05/2) ( )]≤ ≤ [35+ (Z0.05/2) ( )]
100 100
[35-(Z0.025) (0.45)] ≤ ≤ [35+ (Z0.025) (0.45)]
Every thing except (z0.025) is known. Therefore, let’s find the value of
(z0.025) as follows:
Now, to find the value of z0.025, we look for the number 0.475 in the
body of the z-table. Then, read the corresponding values in the
column under z and in the row containing z and then add them to get
the value of Z0.025.
For our example, we can read this value from a sample z-table as
follows.
34.118 ≤ ≤ 35.882
inclusive.
b) Interpretation:
The interpretation of the 95% confidence interval is as follows:
We are 95% confident that the population mean is found in the
interval [34.118, 35.882].
6.2 Hypothesis testing:
A hypothesis is one’s thinking, perception, or about something.
Hypothesis testing is the process of proving or disproving a
x 0
Zcal = , where 0 is the value of given by H0.
n
b) is un known and n is large (n ≥ 30) and
x 0
Zcal =
S
n
2. the t-table if is un known and n is small (n<30) and
x 0
tcal =
S
n
Decision rules:
There are different decision rules (whether or not to reject H0)
depending on the structure of our hypotheses and the type of table
used. A summary of the decision rule follows.
Ha: 25
A1 A2
0 Z0.005
A2=0.005
Therefore, A1=0.5-0.005=0.495
Data are already gathered x , S , and n have been given.
x 0
5. Zcal = , 0 is the value of given reflected in H0.So,
S
n
28.1 25
Zcal = 2.76
8.47
57
7. Decion:
Compare Z cal and Zα/2 and reject H0 if Z cal >Zα/2
2.76>2.75
Self test questions
1. An experiment consists of selecting a random sample of 256
middle managers for study. One item of interest is their annual
income. The sample mean is computed to be $ 35,420 and the sample
s.d =$2050.
a) What is the point estimate of ?
different from zero or not. That is, we test the following hypotheses:
H 0 : 1 0
H a : 1 0
is given below.
=0.05
df1
df2 1 2 3 4……………
1 161.4
2 18.51
3
10.13
4
5 7.71
. .
.
.
.
. .
10 .
11
.
12
. 4.96
.
.
.
Example
The following data are collected on the supply and price of a certain
product.
Price (X) 2 4 6 8 10 12 14 16 18 20
Supply (Y) 10 20 50 40 50 60 80 90 90 120
ANOVA table:
Source of variation df ss mss Fcal
Regression df1=1 Rss=507 507
Error df2=10 Ess=387 38.7
the appendix].
N.B: The coefficient of determination (r2) can also be expressed in
terms of Rss and Ess.
Rss
That is, r 2
Tss
Rss 507
For example for the given price-supply data, r 2 0.567
Tss 894
which is the same as the value previously obtained by simply
squaring the coefficient of correlation (r).
The other approach of computing the F-value is shown below:
the denominator,
Example:
AGIP Oil Company wanted to determine if the amount of oil
delivered by its truck to customers is the same in its three sales
districts. The company obtained a random data as given below:
Gallons delivered in one delivery
Districts
1 2 3
81 100 295
179 158 82
142 272 155
199 248 271
124 62 212
, is the variance
of sample means
Again, compute for the sample variances:
Part two:
1. 61 students; 105 and 246 students are the sample sizes
respectively
2. 60 is the sample size
Chapter two
1.
a.
Histogram
18
16
14
Frequency of claims
12
10
8
6
4
2
0
1
Claims
b.
Frequency polygon
18
16
14
Frequeny of claims
12
10
Series1
8
6
4
2
0
9
9
99
99
99
99
99
99
99
99
9,
9,
9,
9,
9,
9,
9,
9,
-1
-2
-4
-5
-7
-3
-6
-8
00
00
00
00
00
00
00
00
,0
,0
,0
,0
,0
,0
,0
,0
50
10
20
30
40
60
70
80
Claims (birr)
Bar chart
45
40
35
Number of visitors
30
25
Series1
20
15
10
0
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
College year
3. a.
A Less-than Ogive
120
100
Commulative frequency
80
60 Series1
40
20
0
0 up to 5 5 up to 10 up to 15 up to 20 up to 25 up to
10 15 20 25 30
Turnover rates
b.
A Greater-than Ogive
120
100
Commulative frequency
80
60 Series1
40
20
0
0 up to 5 5 up to 10 up to 15 up to 20 up to 25 up to
10 15 20 25 30
Rates of turnover
Chapter Three
1. a) = 20;
b) = 19.16; = 16;
c) = 45.8, 6.77, & C.V. = 33.85%
2. a. (marketing) = 1.93 & managerial stat 0.993; &
0.996
b. C.V marketing = 4.98%; C.V managerial statistics = 6.63%)
c. Marketing
Chapter three
1. The number of ways are:
a. 207,360 ways = (5! ×3! ×2! ×3!)
b. 3,628,800 ways = [(5+3+2)! = 10!]
2. 3 C 2 4 C3 12 ways
3. 5! ×6! = 86,400
11!
4. =34,650
1!4!4!2!
5. The probabilities are:
b. 5 C0 (0.1) 0 (0.9) 5
7.
a. X 188.25
b. X 244.65
8. 1866.5
Chapter Four
1. a
b. Y = 2.231x + 30.91
c.
d. They have positive relationship
2.
a.
Chapter Five
1.
a. x = $ 35,420W
b. 35,208.59 to 35,631.41
c. We are 90% confident that is found in the above
determined confidence interval
2. There is no enough evidence to contradict the manufacturer’s claim.
That is, H0 should not be rejected.
Chapter Six
1. ,
;
The sample variances are:
and;
F > 3.10
9.43 > 3.10 is true; therefore, reject Ho
2. ;
;
Then, check if F-calculated is greater than the table value;
Here, we have F-calculated equal to 39.8 which is greater than
the table value 6.93; therefore the Ho is rejected.
3. ; reject if F-calculated is greater than
the table value;
;
Then, reject if F-calculated is greater than the table value.
Here we have F-calculated equal to 3.75 which is less than 5.39.
Then we accept the Ho.