Statistics Notes
Statistics Notes
1
5) Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all
branches of activities. Statistics is necessary for the formulation of policies to start
new course, consideration of facilities available for new courses etc. These are
possible only through statistics.
6) Statistics and Planning:
Statistics is indispensable in planning. In order to achieve the goals, the statistical data
relating to production, consumption, demand, supply, prices, investments, income
expenditure etc ., In India statistics play an important role in planning, commissioning
both at the central and state government levels.
7) Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of
a new drug or medicine, t - test is used or to compare the efficiency of two drugs or
two medicines, t-test for the two samples is used. More and more applications of
statistics are at present used in clinical investigation.
8) Statistics and Modern applications:
Recent developments in the fields of computer technology and information
technology have enabled statistics to integrate their models and thus make statistics a
part of decision making procedures of many organisations. There are so many
software packages available for solving design of experiments, forecasting simulation
problems etc.
2
Primary data and secondary data
1.5.1 Primary data:
The data which is collected by actual observation or measurement is called
primary data.
Methods of Collection of Primary Data:
The primary data can be collected by the following five methods.
1. Direct personal interviews.
2. Indirect oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.
3
1.5.2 Secondary Data:
The data which are compiled from the records of others is called secondary
data.
Sources of Secondary data:
The sources of secondary data can broadly be classified under two heads:
1. Published sources, and
2. Unpublished sources.
1. Published sources:
The various sources of published data are:
Reports and official publications of
(i) International bodies such as UNO, UNESCO,……
(ii) Central and State Governments .
Semi-official publication of various local bodies .
Private publications-such as the publications of –
(i) Trade and professional bodies
(ii) Financial and economic journals
(iii) Annual reports of joint stock companies.
(iv) Publications brought out by research agencies, research scholars, etc.
It should be noted that the publications mentioned above vary with regard to the
periodically of publication.
2. Unpublished Sources
All statistical material is not always published. There are various sources of
unpublished data such as records maintained by various Government and private
offices, studies made by research institutions, scholars, etc. Such sources can also be
used where necessary.
Precautions in the use of Secondary data
The following are some of the points that are to be considered in the use of secondary
data
1. How the data has been collected and processed
2. The accuracy of the data
3. How far the data has been summarized
4. How comparable the data is with other tabulations
5. How to interpret the data, especially when figures collected for one purpose is used
for another
Merits and Demerits of Secondary Data:
1. Secondary data is cheap to obtain. Many government publications are
relatively cheap and libraries stock quantities of secondary data produced by the
government, by companies and other organisations.
2. Large quantities of secondary data can be got through internet.
3. Much of the secondary data available has been collected for many years and
therefore it can be used to plot trends.
4
For Example, letters in the post office are classified according to their destinations
viz., Delhi, Madurai, Bangalore, Mumbai etc.,
1.6.1 Objects of classification of data:
Classification is to separate the similar things and from the dissimilar things
and thereby to bring out the salient features.
This enables comparison of one class of data with another.
This helps in studying the relationship between several characteristics.
This is a means to access a data properly.
Important features can be seen at a glance.
1.6.2 Types of classification:
Statistical data are classified in respect of their characteristics. Broadly there are
four basic types of classification namely
A) Chronological classification
B) Geographical classification
C) Qualitative classification
D) Quantitative classification
A) Chronological classification:
In chronological classification the collected data are arranged according to the
order of time expressed in years, months, weeks, etc., The data is generally classified
in ascending order of
Time. For example, the data related with population, sales of a firm, imports and
exports of a country are always subjected to chronological classification.
B) Geographical classification:
In this type of classification the data are classified according to geographical
region or place. For instance, the production of paddy in different states in India,
production of wheat in different Countries etc..
C) Qualitative classification:
In this type of classification data are classified on the basis of same attributes or
quality like sex, literacy, religion, employment etc., Such attributes cannot be
measured along with a scale.
For example, if the population to be classified in respect to one attribute, say sex, then
we can classify them into two namely that of males and females. Similarly, they can
also be classified into ‘ employed’ or ‘ unemployed’ on the basis of another attribute ‘
employment’. This type of classification is called simple or dichotomous
classification.
5
Population
Male Female
The classification, where two or more attributes are considered and several classes are
formed, is called a manifold classification.
Still the classification may be further extended by considering other attributes like
marital status etc. This can be explained by the following chart
Population
Male Female
D) Quantitative classification:
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the
students of a college may be classified according to weight as given below.
6
It presents facts in minimum possible space and unnecessary repetitions and
explanations are avoided.
Tabulated data are good for references and they make it easier to present the
information in the form of graphs and diagrams.
7
Classification Tabulation
Classification is the process of Tabulation is a process of
arranging data into groups arranging data systematically in
according to the common rows and columns of a table.
characteristics possessed by
individual items.
The purpose is to analyse data. The purpose is to present data.
Classification of data is done Tabulation follows classification.
after collection process is
completed. It is arranged in rows and
It is based on similar attributes columns in a systematic way.
and variables of the
observations. Tabulation aims at presenting
data, to ensure easy comparison
It is performed with the objective of various figures.
of analysing data in order to In tabulation, data is divided into
draw inferences. headings and sub-headings.
In classification, data is divided
into categories and sub- The data are so placed in a table
categories. that proper comparison is
This enables comparison of one possible and easier.
class of data with another. A table facilitates further
Important features can be seen at analysis of data.
a glance.
1.9.1 Diagrams:
A diagram is a visual form for presentation of statistical data, highlighting their
basic facts and relationship. It is readily intelligible and save a considerable amount of
time and energy.
Significance of Diagrams and Graphs:
Diagrams and graphs are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
General rules for constructing diagrams:
The diagrammatic presentation of statistical facts will be advantageous provided
the following rules are observed in drawing diagrams.
1. A diagram should be neatly drawn and attractive.
2. The measurements of geometrical figures used in diagram should be accurate
and proportional.
3. The size of the diagrams should match the size of the paper.
4. Every diagram must have a suitable but short heading.
8
5. The scale should be mentioned in the diagram.
6. Diagrams should be neatly as well as accurately drawn with the help of drawing
instruments.
7. Index must be given for identification so that the reader can easily make out the
meaning of the diagram.
8. Footnote must be given at the bottom of the diagram.
Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are
constantly being added. For the sake of convenience and simplicity, they may be
divided under the following heads:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
1) One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and the
width is not considered. These diagrams are in the form of bar or line charts and can
be classified as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line Diagram:
Line diagram is used in case where there are many items to be shown and there is
not much of difference in their values. Such diagram is prepared by drawing a vertical
line for each item according to the scale. Line diagram makes comparison easy, but it
is less attractive.
Simple Bar Diagram:
Simple bar diagram can be drawn either on horizontal or vertical base, but bars on
horizontal base more common. Bars must be uniform width and intervening space
between bars must be equal. While constructing a simple bar diagram, the scale is
determined on the basis of the highest value in the series.
Multiple Bar Diagram:
Multiple bar diagram is used for comparing two or more sets of statistical data.
Bars are constructed side by side to represent the set of values for comparison. In
order to distinguish bars, they may be either differently coloured or there should be
different types of crossings or dotting, etc. An index is also prepared to identify the
meaning of different colours or dottings.
2) Two-dimensional Diagrams:
In one-dimensional diagrams, only length is taken into account. But in two-
dimensional diagrams the area represent the data and so the length and breadth have
both to be taken into account. Such diagrams are also called area diagrams or surface
diagrams. The important types of area diagrams are:
1. Rectangles 2. Squares 3. Pie-diagrams
Rectangles:
9
Rectangles are used to represent the relative magnitude of two or more values.
The area of the rectangles are kept in proportion to the values. Rectangles are placed
side by side for comparison. When two sets of figures are to be represented by
rectangles, either of the two methods may be adopted.
Squares:
The rectangular method of diagrammatic presentation is difficult to use where the
values of items vary widely. The method of drawing a square diagram is very simple.
One has to take the square root of the values of various item that are to be shown in
the diagrams and then select a suitable scale to draw the squares.
Pie Diagram or Circular Diagram:
Another way of preparing a two-dimensional diagram is in the form of circles. In
such diagrams, both the total and the component parts or sectors can be shown. The
area of a circle is proportional to the square of its radius.
3) Three-dimensional diagrams:
Three-dimensional diagrams, also known as volume diagram, consist of cubes,
cylinders, spheres, etc. In such diagrams three things, namely length, width and height
have to be taken into account. Of all the figures, making of cubes is easy. Side of a
cube is drawn in proportion to the cube root of the magnitude of data.
4) Pictograms and Cartograms:
Pictograms are not abstract presentation such as lines or bars but really depict the
kind of data we are dealing with. When Pictograms are used, data are represented
through a pictorial symbol that is carefully selected. Cartograms or statistical maps
are used to give quantitative information as a geographical basis.
1.9.2 GRAPHS
Definition: A graph is a visual form of presentation of statistical data. A graph is
more attractive than a table of figure.
The types of graphs are
[Link] 2. Frequency Polygon [Link] Curve [Link] 5. Lorenz Curve
1) Histogram:
A histogram is a bar chart or graph showing the frequency of occurrence of each
value of the variable being analysed. In histogram, data are plotted as a series of
rectangles. Class intervals are shown on the ‘X-axis’ and the frequencies on the ‘Y-
axis’ . The height of each rectangle represents the frequency of the class interval.
2) Frequency Polygon:
If we mark the midpoints of the top horizontal sides of the rectangles in a
histogram and join them by a straight line, the figure so formed is called a Frequency
Polygon. The area of the polygon is equal to the area of the histogram, because the
area left outside is just equal to the area included in it.
3) Frequency Curve:
If the middle point of the upper boundaries of the rectangles of a histogram is
corrected by a smooth freehand curve, then that diagram is called frequency curve.
The curve should begin and end at the base line.
4) Ogives:
For a set of observations, we know how to construct a frequency distribution. This
accumulated frequency is called cumulative frequency. These cumulative frequencies
are then listed in a table is called cumulative frequency table. The curve table is
obtained by plotting cumulative frequencies is called a cumulative frequency
curve or an ogive.
10
5) Lorenz Curve:
Lorenz curve is a graphical method of studying dispersion. It is also used to study
the variability in the distribution of profits, wages, revenue, etc. It is specially used to
study the degree of inequality in the distribution of income and wealth between
countries or between different periods. It is a percentage of cumulative values of one
variable in combined with the percentage of cumulative values in other variable and
then Lorenz curve is drawn.
Diagrams
We are too well aware of the use of diagrams to explain information and facts that
are presented in the form of text. If you need to explain the parts of a machine or the
principle of its working, it becomes difficult to make one understand the concept
through text only. This is where diagrams in the form of sketches come into play.
Similarly, diagrams are made heavy use of in biology where students have to learn
about different body parts and their functions. Visual representation of concepts
through diagrams has better chances of retention in the memory of students than
presenting them in the form of text. Diagrams are resorted to right from the time a kid
enters a school as even alphabets are presented to him in a more interesting and
attractive manner with the help of diagrams.
Graphs
Whenever there are two variables in a set of information, it is better to present the
information using graphs as it makes it easier to understand the data. For example, if
one is trying to show how the prices of commodities have increased with respect to
time, a simple line graph would be a more effective and interesting way rather than
putting all this information in the form of text which is hard to remember whereas
even a layman can see how prices have gone up or down in relation to time
*******************
UNIT-II
11
MEASURES OF CENTRAL TENDENCY
2.1 INTRODUCTION:
In this chapter we are going to deal with Meaures of central
tendency and about the measures of dispersion. The measures of central tendency
concentrate about the values in the central part of the distribution. Plainly speaking an
average of a statistical series is the value of the variable which is the representative of
the entire distribution. If we know the average alone we cannot form a complete idea
about the distribution so for the completeness of the idea we use Measures of
dispersion.
The following are the three measures of central tendency in this chapter we deal with
Arithmetic Mean or simply Mean
Median
Mode
Individual series:
X =X/N
Example:
The expenditure of ten families are given below. Calculate arithmetic mean.
30 ,70 ,10 ,75 ,500 ,8 ,42 ,250 ,40 ,36 .
Solution: Here N=10
X = 30 +70 +10 +75 +500 +8 +42 +250 +40 +36 = 1061
X = 1061 / 10 = 106.1
Discrete series:
X =fX/f
Example:
Calculate the mean number of person per house.
[Link] person : 2 3 4 5 6
[Link] house :10 25 30 25 10
Solution:
12
X f fX
X =400 / 100 = 4 . 2 10 20
3 25 75
4 30 120
5 25 125
6 10 60
f =100 f X= 400
Continuous series:
X =fm/f
where m represents the mid value .
Midvalue = (UL+LL) / 2
Example:
Calculate the mean for the following.
Marks : 20-30 30-40 40-50 50-60 60-70 70-80
[Link] student : 5 8 12 15 6 4
Solution:
C.I f m fm
X = 2460 / 50 = 49.2. 20-30 5 25 125
30-40 8 35 280
40-50 12 45 540
50-60 15 55 825
60-70 6 65 390
70-80 4 75 300
f = 50 f m= 2460
2.2.2 MEDIAN:
The median is the value for the middle most item when all the items are in the
order of magnitude. It is denoted by M or Me.
Individual series:
For odd number of item, Position of the
median = (N+1) / 2
For even number of item, Position of the
median = [(N/2)+ ((N/2)+1)] / 2
Example:
Calculate median for the following : 22 ,10, 6, 7 ,12, 8, 5.
Solution:
Here N =7
Arrange in ascending order or descending order.
5,6,7,8,10,12,22
(N+1) / 2= (7+1) /2 = 4 th item = 8
Discrete series:
Position of the median = (N+1) / 2 th item.
13
Example:
Find the median for the following.
X : 10 15 17 18 21
F: 4 16 12 5 3
Solution:
X f c.f
10 4 4
(N+1) /2 = (40+1) / 2 = 20.5th item 15 16 20
= (20 th item +21st item) /2 17 12 32
=(15+17) /2 18 5 37
= 16.
21 3 40
N= 40
14
Continuous series : M = L+[((N/2) –c.f) x i]
f.
Example :
Calculate the median height given below.
Height : 145-150 150-155 155-160 160-165 165-170 170-175
[Link] student : 2 5 10 8 4 1
Solution :
Position of the median = N/2 th item Height No. of student c.f
= 30 / 2 =15. 145-150 2 2
= 155+ [(15-7)x5] 150-155 5 7
10 155-160 10 17
= 155+(40/10) = 159. 160-165 8 25
165-170 4 29
170-175 1 30
f = 30
2.2.3 Mode :
Mode is the value which has the greatest frequency density . Mode is usually
denoted by Z .
Individual series:
The value which occur more times are identified as mode.
15
Marks : 0-10 10-20 20-30 30-40 40-50
[Link] student : 5 20 35 20 12
Solution:
Empirical Relation :
Mode= 3 median -2 mean.
2.3.1 Range:
Range is the difference between the greatest and the smallest value.
Individual series :
Example : Find the value of range and its coefficient of range for the following data.
8 ,10, 5, 9,12,11
Solution:
Range = L – S = 12- 5 =7
coefficient of range = (L-S) / (L+S) = (12-5) / (12+5) = 7 /17 = 0.4118
16
C.I f m
20-30 5 25
30-40 8 35
40-50 12 45
50-60 15 55
60-70 6 65
70-80 4 75
Coefficient of correlation = Q3 – Q1 / Q3 + Q1
Example:
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600,
1470, 1750, and 1885. Find the quartile deviation and coefficient of quartile deviation.
Solution:
After arranging the observations in ascending order, we get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720,
1730, 1750, 1755, 1785, 1880, 1885, 1960.
17
Quartile Deviation Q.D)
= ( x2 / N) –( x / N)2
Discrete series:
= ( fx2 / f) –( fx / f)2
Solution:
X f fx x2 fx2
0 1 0 0 0
1 2 2 1 2
2 4 8 4 16
3 3 9 9 27
18
4 0 0 16 0
5 2 10 25 50
f = 12 fx= 29 fx =95
2
Continuous series :
= ( fm2 / f) –( fm / f)2
Example:
C.I : 0-10 10-20 20-30 30-40 40-50
F : 2 5 9 3 1
Solution:
C.I f m fm m2 fm2
0-10 2 5 10 25 50
10-20 5 15 75 225 1125
20-30 9 25 225 625 5625
30-40 3 35 105 1225 3675
40-50 1 45 45 2025 2025
20 460 12500
19
Coefficient of variation:
*******************
UNIT-III
Correlation Of Analysis:
Positive or negative : when the values of two variables change in the same direction,
their positive correlation between the two variables.
Y 23 32 37 41 46 50
Example : X 34 25 18 10 7
Y 51 49 42 33 19
20
Simple or partial or Multiple :
When only two variables are considered as under positive or negative
correlation above the correlation between them is called Simple correlation. When
more than two variables as considered the correlation between two of them when all
other variables are held constant, i.e., when the linear effects of all other variables on
them are removed is called partial correlation. When more than two variables are
considered the correlation between one of them and its estimate based on the group
consisting of the other variables is called multiple correlation.
Methods :
The following four methods are available under simple linear correlation and
among them , product moment method is the best one.
Scatter Diagram
Karl Pearson’s correlation coefficient or product moment correlation
coefficient (r)
Spearman’s rank correlation coefficient ( )
Correlation coefficient by concurrent deviation method ( rc ).
Scatter Diagram :
21
Making a scatter diagram and drawing a line or curve is the primary
investigation to assess the type of relationship between the variables. The knowledge
gained from the scatter diagram can be used for further analysis of the data. In most of
the cases the diagrams are not as simple as in figure (a). There are quite complicated
diagrams and it is difficult to choose a proper mathematical model for representing
the original data. The scatter diagram gives an indication of the appropriate model
which should be used for further analysis with the help of method of least squares.
Figure (b) shows that the points in the scatter diagram are falling from the top left
corner to the right. This is a relation called inverse or indirect. The points are in the
neighborhood of a certain line called the regression line.
As long as the scattered points show closeness to a straight line of some
direction, we draw a straight line to represent the sample data. But when the points do
not lie around a straight line, we do not draw the regression line. Figure (c) shows that
the plotted points have a tendency to fall from left to right in the form of a curve. This
is a relation called non-linear or curvilinear. Figure (d) shows the points which
apparently do not follow any pattern. If X takes a small value, Y may take a small or
large value. There seems to be no sympathy between X and Y. Such a diagram
suggests that there is no relationship between the two variables.
22
London in the year 1911. He along with his colleagues Weldon and Galton founded
the journal “Biometrika” whose object was the development of statistical theory.
The Correlation between two variables X and Y, which are measured using Pearson’s
Coefficient, give the values between +1 and -1. When measured in population the
Pearson’s Coefficient is designated the value of Greek letter rho (ρ). But, when
studying a sample, it is designated the letter r. It is therefore sometimes called
Pearson’s r. Pearson’s coefficient reflects the linear relationship between two
variables. As mentioned above if the correlation coefficient is +1 then there is a
perfect positive linear relationship between variables, and if it is -1 then there is a
perfect negative linear relationship between the variables. And 0 denotes that there is
no relationship between the two variables.
The degrees -1, +1 and 0 are theoretical results and are not generally found in normal
circumstances. That means the results cannot be more than -1, +1. These are the upper
and the lower limits.
Pearson’s Coefficient computational formula
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257
Step 3: Take the square of the numbers in the x column, and put the result in the x2
column.
23
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 4: Take the square of the numbers in the y column, and put the result in the y2
column.
Step 5: Add up all of the numbers in the columns and put the result at the bottom.2
column. The Greek letter sigma (Σ) is a short way of saying “sum of.”
Step 6: Use the following formula to work out the correlation coefficient.
The answer is: 1.3787 × 10-4
the range of the correlation coefficient is from -1 to 1. Since our result is 1.3787 × 10-
4, a tiny positive amount, we can’t draw any conclusions one way or another.
24
37 42 2 3 -1 1
25 43 4 2 2 4
Total d=0 d2 =
38
= 1 - 6 d2
N(N2 -1)
= 1- 6 x 38
5 (52 – 1)
= 1- 1.9
= -0.9
Tied Ranks :
When one or more values are repeated the two aspects- ranks of the repeated
values and changes in the formula are to be considered.
Example:
Find the rank correlation coefficient for the percentage of marks secured by a
group of 8 students in Economics and Statistics.
Marks in Economics: 50 60 65 70 75 40 70 80
Marks in Statistics: 80 71 60 75 90 82 70 50
Solution:
Let X - Marks in Economics
Y - Marks in Statistics
RANK
X Y X Y d D2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0.25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
Total d=0 d2 =113.5
= 1 - 6{ d2 + m(m2-1)/12}
N(N2 -1)
Therefore = 1- 6{113.5+0.5}/8(82-1)}
= 1- 1.3571 = -0.3571
25
The line which gives the average relationship between the two variables is
known as the regression equation. The regression equation is also called estimating
equation.
Uses:
1. Regression analysis is used in statistics and other displines.
2. Regression analysis is of practical use in determining demand curve, supply
curve, consumption function, etc from market survey.
3. In Economics and Business, there are many groups of interrelated variables.
4. In social resarch, the relation between variables may not known; the relation
may differ from place to place.
5. The value of dependent variable is estimated corresponding to any value of the
independent variable using the appropriate regression equation.
Similarly, the values of a’ and b’ for the given pairs of values of (xi,yi)
i=1,2,3…..are determined,
Using the normal equations as ,
∑x = Na’ + b’∑y
∑xy = a’∑y+ b’∑y2
26
Y 9 11 5 8 7 use normal equations.
Solution:
X Y XY X2 Y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑x=0 ∑y=0 ∑xy=214 ∑x2=220 ∑y2=340
27
38 30 6 -8 -48 36 64
34 33 2 -5 -10 4 25
32 39 0 1 0 0 1
32 380 0 0 -93 140 398
0
X` = 32, Y`= 38, bxy = xy / y2 = -0.2337, byx = xy / x2 = -0.6643
iv) Regression equation of Y on X ,(Y - Y` )= byx (X-X`)
( Y – 38 ) = -0.6643(X-32) Y = 59.26-0.6643X
(ii) Regression equation of X on Y , (X - X` )= bxy (Y-Y`)
( X – 32) = -0.2337Y +8.88 X = 40.88 - 0.233 Y
(iii) r = + byx bxy = -0.3940
(iv) Y = 59.26-0.6643x30= 39
Properties of Regression coefficients :
The two regression equations are generally different and are not to be
interchanged in their usage.
The two regression lines intersect at (X, Y).
Correlation coefficient is the geometric mean of two regression coefficients.
The two regression coefficients and the correlation coefficient have the same
sign.
Both the regression coefficients and the correlation coefficient cannot be
greater than one numerically and simultaneously.
Regression coefficients are independent of change of origin but are affected
by the change of scale.
Each regression coefficient is in the unit of the measurement of the dependent
variable.
Each regression coefficient indicates the quantum of change in the dependent
variable corresponding to unit increase in the independent variable.
*******************
UNIT IV
INDEX NUMBERS
28
The units of measurements of commodities are different. But, a price index number
gives the percentage of changes in prices on the average. Hence, index numbers are a
special type of averages. For example, let the commodities be rice, kerosene and
cloth. The price of rice per kilogram is considered; the price of kerosene per litre and
the price of cloth per metre are considered. The average change in prices is indicated
by the index number.
2. Index numbers are percentages. The price in the current year is divided by the
price in the base year to get the ratio of change in price. It is multiplied by 100.
Interpretation of an index number is made easy by this procedure.
3. Index numbers indicate the percentage of change which is not possible
otherwise. No other statistical tool is so effective in studying such a wide variety of
situations.
4. Index numbers are meant for comparisons. Index numbers have been devised
to compare two different times. Comparisons of two different places or situations are
also possible with index numbers.
Uses
1. Index numbers provide scope for comparisons. Price, production, value, etc.
in two times are compared by index numbers.
2. Index numbers are Economic Barometers. .
3. Index numbers serve as guides. Being economic barometers, the direction in
which the economy is likely to move is foretold. Governments, businessman,
Economists, etc. benefit by acting activities.
4. Index numbers are the pulse of an economy. The condition of an economy is
known from the index numbers of various economic activities.
5. Index numbers measure the purchasing power of money.
6. Index numbers help to calculate real wages.
2. The base period. The period may be one year or a few years. The base
period is to be taken according to the purpose. If the impact of Five-Year Plan on the
29
Indian economy is to be assessed, 1951. The condition of any subsequent year till
that year in relation with that shows how the country has progressed till that year from
1951. Generally the base period should be as follows.
(i) It should be a normal period. There should not have been natural calamities
such as famine, flood and earthquake, political upheavals, war, etc.
(ii) It should not be too short. In short periods, typical conditions might not be
there. The price of a commodity, for example, might be very high during a very short
time. The true condition is distorted of it is taken as the base period.
(iii) It should not be too distant in the past. This is to keep the index number
useful.
(iv) It may be fixed period for all the different periods under consideration. Or,
under chain base method, link relatives in which for every year the preceding year is
the base year may be calculated first and then may be chained together to a common
base year. Link relatives may prove their use in business and industry when any year
is to be compared with the year just preceding it. Whenever different years are to be
compared among themselves (with a common base year), fixed base as well as chain
base index numbers are useful.
3. The items. Including all the items in a study is neither feasible nor useful.
Only those items which concern the people for whom the index number is intended
are to be included. For considering the living conditions of people in hill stations,
woollen clothes should be included. For people who live in hot places throughout the
year woollen clothes are not at all necessary. For students pen and paper may be
necessary. For Keralites umbrella may be necessary. Only items essential for the
people concerned should be included.
4. The price quotations. The prices are to be properly gathered. For consumer
price index number, retail prices are necessary. For whole sale price indices, whole-
sale prices are needed. The places from where the people concerned buy are selected.
The difficulty is all the greater when the prices vary from locality to locality in the
same town, from shop to shop in the same locality and from customer to customer in
the same shop.
5. The Average. For arriving at the average value of a group of items, the
suitable average is to be decided. In other contexts A.M. may be more useful. It may
be simple to understand and easy to calculate. Nowdays calcualtors may be available
to show the A.M. Median and Mode may be obtained by mere inspection. But,
Geometirc mean is the preferable average due to the following reasons:
(i) G.M. is the appropriate average to measure relatives’ changes. Hence, index
numbers where in the relative changes are expressed as percentages, give scope for
G.M.
(ii) It gives more weightage to smaller items and lesser weightage to greater items.
It is not as unduly affected as A.M. by extreme items.
(iii) It facilities the change of the base period. Base cannot be kept the same for a
long time because the purpose and all around changes may warrant a change in the
base period.
30
The quantity purchased, the amount spent, etc. show the relative importance of the
different items. Weighting may be explicit as follows.
(i) Base year quantity as in Laspeyre’s method of current year quantity as in
Paasche’s method for price index number.
(ii) Base year value (price X quantity) as in consumer price index number by
Family Budget Method.
(iii) Some fixed weight based on neither base year quantity nor current year
quantity but on some other consideration as in Kelly’s method.
7. The Formula. As seen in the following pages, many formulae are available.
Each one has its own advantages. If for a certain situation only one formula is
suitable, there is no difficulty in using the formula. For certain other situations more
than one formula may be found suitable. In such cases the purpose and the opinion of
the experts in the field are the guides in choosing a formula.
Proper decision under each of those headings is bound to lead to a good index
number.
Period is refused to as year hereafter and the following notations are used.
P0 - price of a commodity in the base year.
P1 - price if a commodity in the current year.
q0 - quantity of a commodity in the base year.
q1 - quantity of a commodity in the current year.
p - price of a commodity.
Q - quantity of a commodity.
V or W- weight of a commodity.
I or P - price relative or price index number of a commodity.
Q - quantity relative or quantity index number of a commodity.
P= p1/p0 × 100 Q= q1/q0 × 100
P01 - Price index number of the current year compared with the base year.
Q01- quantity index number of the current year compared with the base year.
Formulae. All the formulae can be brought under four groups as follows. First, they
are divided into two groups. Viz., Unweighted Methods and Weighted Methods and
then each group is subdivided into two as Aggregatives Methods and Average of
Relatives Methods. Under each of the four subdivisions one or more formula are
available.
Methods
Unweighted Weighted
31
It is based on the aggregates or the totals as shown below.
P01 = ∑p1/∑p0 X 100
It may be noted that the current year figure is in the numerator while the base year
figure is in the denominator as in the other methods when the index number if the
current year as compared to the base year is calculated.
When quantity index number is required, Q01 = = ∑q1/∑q0 X 100
The calculation is illustrated together with the simple averages of relative’s method.
Example : From the following data construct an index for 1995 taking as
base:
Commodities A B C D E
Price in 1994 (Rs.) 50 40 80 110 20
Price in 1995 (Rs.) 70 60 90 120 20
Solution:
Commodities Price
A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000
By Aggregatives Method,
P01 = ∑p1/∑p0 = 360/300 X 100 =120
Using A.M., P01 = ∑P/N =611.9/5 = 122.32
Using G.M., P01 = Antilog (∑log P/N ) = Antilog(10.4112/5) = 120.84
Note: Although any one of them is sufficient, all the three possible indices have been
calculated for the sake of illustration.
32
When the index number is required by only one method as in this problem, the
preferable method is simple A.M. and the answer is P01 = 122.32
P01 = 122.32 indicates that the prices, on the average, have increased 22.32%
in the current year compared with the base year.
Whenever the price index number is less than 100, it indicates that the prices,
in the average, have increased in the current year compared with the base year.
This method is better than the corresponding unweighted method in showing the
relative change. From the data available under this method, index numbers by
unweighted averages of relatives also could be calculated. This method provides
scope for replacing one or more items at a later stage.
Note: G.M. is the suitable average. When nothing is mentioned A.M. alone is
usually calculated.
Index numbers are constructed to study the relative changes in prices, quantities, etc.
of one time in comparison with another. Many formulae are available. They are
tested as follows.
1. Unit Test. This requires the formula to be independent of the units in which
prices and quantities are quoted.
.
The following examples show the different results given by the simple
aggregatives method although the price condition is the same. Laspeyre’s, Paasche’s
and Fisher’s formulae give the same result in spite of the difference in units.
Price Quantity
Item Unit P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1
33
By simple Aggregative Method,
By Laspeyre’s formula,
By Paasche’s formula,
By Fisher’s formula,
P01 = = x =
= 154.86
The same prices and quantities are quoted below to different units:
Price Quantity
Item Unit P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1
[Link] Reversal Test (T.R. test): This requires the formula to be such that p01Xp10 =
1, after ignoring the factor 100 in each index. In the words of Prof. Irving Fisher who
proposed that test condition, “…………the formula for calculating the index number
should be such that it will give the same ratio between one point of comparison and
34
the other, no matter which of the two is taken as base or putting it in another way, the
index number reckoned forward should be reciprocal of the one reckoned backward”.
P10 is the index number of the base year in comparison with the current year. That is,
the base year figure will be in the numerator and the current year figure will be in the
dominator. Hence, it is expected to be the reciprocal of P01. In other words, the
product of P01 and P10 is expected to be unity.
Fisher’s formula, Marshall – Edgeworth formula, Kelly’s formula, Simple
Aggregatives Method and Weighted and Unweighted Geometric Means of Relatives
Methods satisfy this test.
The examination of a few formulae under this test is presented in the table in the next
page. From that it could be seen whether the test is satisfied or not by the concerned
formula.
[Link] Reversal Test. (F.R. Test) This requires the formula to be such that
P01XQ01 = after ignoring the factor 100 in each index. In the words of Prof.
Irving Fisher who proposed this condition also, “Just as our formula should permit
the interchanging of two times without giving inconsistent results, so it ought to
permit interchanging the prices and quantities without giving inconsistent results –
that is, the two results multiplied together should give the true value ratio, expect for a
constant of proportionality”.
P01 gives the relative change in price while Q01 gives the relative change in quantity.
Hence, P01 X Q01 should give the relative change in price multiplied by quantity (i.e.,
A 6 50 10 56
B 2 100 2 120
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36
Solution:
1990 1992
Commodity P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1
35
Total ----- ----- ----- ----- =1360 =1900 =1344 =1880
P01 = =
P10 = = and so
P01 x P10 = x
= = =1
Q01 = =
P01 x Q01 = x
= =
Using the given data, Fisher’s index in found to satisfy both time reversal and factor
reversal tests.
CIRCULAR TEST
Circular test is an extension of the time reversal test. If three years 0,1 and 2 are
under consideration, this requires the formula to be such that
P01 X P12 X P20 =1
FIXED BASE
When the data are available for more than two years question ‘which is the base year
arises. Under fixed base method, the base ‘year’ is same for all the different years
under consideration. Base year figures may be figures of any one year or the averages
of a few years or the totals of a few years or those suggested. When nothing is
36
indicated, the first year in the series of years in chronological order is to be taken as
the base.
If no method is suggested, the method which is suitable for the data under
consideration is to be chosen. For the given data, although index number can be
calculated by more than one method, the result is obtained by only one method unless
stated otherwise. The method is selected in the following order:
(i) Fisher’s formula
Or
(ii) Weighted A.M. method
Or
(iii) Unweighted A.M. method
Example :Calculate fixed base index number from the following prices:
I 4 5 6 6 8 10
II 5 7 8 10 13 15
III 6 9 12 12 15 15
Solution:
For each commodity, the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are
calculated and are given in the last column of the above table.
For the first year which is the base year, fixed base index number as well as
each P is 100.
CHAIN BASE
When the data are available for more than two years, the method available
besides the fixed base method for computing index numbers is the chain base method.
Chain Index=
37
Example : Construct (a) fixed base and (b) chain base index numbers from the
following data relating to the production of electricity.
Year 1981 1982 1983 1984 1985 1986 1987 1988
Production 25 27 30 24 28 29 31 35
Year 1989 1990 1991 1992 1993 1994 1995 1996
Production 40 41 36 32 37 38 39 40
Solution:
Quantities of production are given for 16 years. The production every year is
divided by that of 1981, i.e., 25 and is multiplied by 100 to get the fixed base quantity
indices (Q01) given in col. (3).
For calculating link relatives (L.R.) of col. (4), quantity of every year is
divided by that of its preceding year and is multiplied by 100.
Link relatives are converted into chain base indices (Q01) given in col. (5)
using usual formula.
Cost of living index number shows the impact if changes in the prices of a number of
commodities and services on particular class of people in the current year in
comparison with the base year. Cost of living index number is also known as
consumer price index number.
Main steps in the construction of Cost of Living Index Number.
[Link] Purpose. At the outset, the class of people for whom the index number is
intended is to be identified. The knowledge of their area of living, their ways of life,
their necessities, their habits, etc. play an important role in getting good results.
[Link] Base Year. Similar survey might have been conducted earlier. The current
interest might be to study the subsequent changes. For example, the pay scales of the
employees of Tamil Nadu Govt. were revised in 1994.
38
[Link] Budget Enquiry. A sample survey, known as family budget enquiry, is
conducted and the items to be included, their quantity, etc. are found. IT is customary
to bave the items under the five heads (i) Food (ii) Clothing (iii) Fuel and Lighting
(iv) House Rent and (v) Miscellaneous. From the families of the concerned class of
people, a sample of adequate size is selected. From each such family, the details of
the different items consumed, their quality and quantity are noted.
[Link] prices. The average price paid for each item if to be gathered from the shops of
the region. The prices are retail prices. As mentioned earlier under general problems
in the construction of index numbers, it is a difficult task to gather and to arrive at an
average price of an item. The shops where many of the families buy and the most
likely prices in those shops are to be noted before finding their average.
2. The Average. Both arithmetic mean and geometric mean can be used, the
former owing the former owing to its case if calculation and the latter owing to its
suitability.
3. The Formula. Two formulae are available. They are given below.
39
C 6 6.00 9.00
D 4 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00
Solution:
*******************
UNIT V
ANALYSIS OF TIME SERIES
40
97-98 32.4 35.0 34.7 33.5 31.5 33.1
98-99 45.8 50.2 50.4 51.3 52.0 52.4
Oct. Nov. Dec. Jan. Feb. Mar.
96-97 28.5 30.2 27.3 25.1 20.7 21.5
97-98 32.4 35.7 36.0 36.7 30.0 31.2
98-99 51.7 54.2 53.7 52.4 41.3 43.6
Uses:
Variables such as Sales, Production, Profit and Population have different
values at different points of time. Analysis of such series of values is important as
pointed out below.
(i) The analysis of time series helps to know the past conditions. The observation
at the past periods of time indicate the conditions which existed. A detailed study
enables us to know further.
(ii) It helps in assessing the present achievements. If the past conditions had
continued what would be the present position? What is the actual position now?
What are the causes for the difference? Are we satisfied with the present? Thinking
in these lines helps not only to assess the present but also to plan for the future.
(iii) It helps to predict reliably. There are many methods in Statistics to estimate
the value of a variable at a certain time in the future.
(iv) It facilitates comparison. Relevant time series could be compared and vital
inferences be drawn. For example, the production of motor cycles of two companies
can be compared over a period of time.
(v) It forewarns. As it predicts the future most reliably, future could be met with
due preparedness. If the sales in cloth shop is likely to fall, advertisement campaign
can be tried to increase the sales, the services of certain staff may be terminated,
unnecessary godown facilities may be surrendered, etc. On the contrary, of increased
sales is expected, stock may be increased, more sales personnel be employed, etc. In
short losses, is any, could be minimized. Profiles, is any, could be maximized.
Components:
The fluctuations in a time series are of four different natures generally. They
have been named as follows and are called components of a time series. Secular trend
is the long term effect. The other three components are called short term variations.
Long – Terms Effect:
1. Secular Trend
Short – Term Variations:
2. Seasonal Fluctuations
3. Cyclical Fluctuations
4. Irregular Variations
1. Secular Trends: Secular trends is also called long – term trend or trend,
simply. The overall nature of the series is the trend. The general tendency of a series
is to increase or decrease over a period of time. Increasing trend is observed in
population, price, production, literacy, etc. There is decreasing trend in birth rate,
death rate, poverty, illiteracy, etc. It is very rare to find a time series which neither
increases nor decreases.
Mathematically, trend may be
(i) Linear or
(ii) Non – linear.
41
Graphically, linear trend is a straight line. The discussion in this chapter is restricted
to linear trend. Parabolic trend equation, if necessary, can be formed as explained in
‘Method of Least Squares’.
2. Seasonal Fluctuations. Season is a period which is less than one year. It may
be a period of 6 months or 4 months or 3 months or 1 months, etc. Certain nature is
observed in the first season, another nature is observed in a season in every year. In
other words, the different natures recur year after year at the respective seasons.
These variation over time are called seasonal fluctuations.
The factor which cause seasonal variations are of the following two kinds:
(i) Climate and weather conditions.
(ii) Customs, traditions and habits of the people.
(iii) Climate and weather condition: Sales of ice – cream, khadi and cotton clothes,
etc. are more during summer. Sales of umberellas are at its peak during rainy season.
Production of paddy, wheat, etc. is more in a few months and less in other months of a
year. Climate and weather cause this kind of variations.
iv) Customs, traditions and habits of the people. Sales of crackers and
fire works is found to be more during Deepavali every year. Cloth shops register very
good sales during festival; seasons such as Deepavali, Pongal, Ramzan and Chritmas
and marriage seasons. Post men are very busy in those days in sorting and delivering
greeting. All these variations in sales, work load, etc. are due to the customs,
traditions and habits of the people.
3. Cyclical Fluctuations. Cyclical fluctuations are similar to seasonal
variations. The difference is in the interval of recurrence. In seasonal fluctuations a
nature of the series recurs at an interval of one year. Cyclical fluctuations recur at an
interval of 3 or more years. The fitting example is business cycle. In Economics and
Business, there are many times series which have certain wave – like movements
called business cycles, in one period, profits are easily made and are made in plenty
also. Prices are high. This period is called prosperity. After this (peak) conditions
things decline instead of improving. High wages, decreasing efficiently, increasing
interest rate, etc. cause the decline. This is the period of recession. After touching the
bottom which is called depression the condition improves. The recovery from
depression leads to prosperity. The four phase of a business cycle, namely, (i)
prosperity (ii) recession (iii) depression and (iv) recovery recur one after another
regularly.
4. Irregular Variations. Variations which no not come under the other three
components are called irregular variations. The other three components have certain
regularity. But this is irregular. Fire, floods, earthquakes, wars, lock – outs, strikes,
etc, cause irregular variations. Sometimes causes as above for irregular variations are
known. Sometimes causes may not be known. For example, there may be very poor
sales on a particular day in a leading cloth shop on the eve of Deepavali. Cause for
such a happening may not be known.
Irregular variations is called random variation or erratic fluctuation.
Models: There exit certain relations between the components and the series of
observations. The relation between the observed value and the components is called
model. Many models exit. In this book, only two models are considered. Let Y be
42
observed data, T or Yt be the trend, S be seasonal variation ,C be cyclical variation
and I be irregular variation.
(i) Additive Model
Y=T+S+C+I
When short – term variations is to be found out as are this model,
Short – term variation = Y-Yt
(ii) Multiplicative Model
Y=TxSxCxI
Many time series in Economics and Business are found to be of multiplicative
model. A few other series are found to be of additive model.
SECULAR TREND
There are four methods to estimate the secular trend.
They are Graphic Method.
1. Method of Semi – Averages.
2. Method of Moving Averages.
3. Method of Least Squares.
1. Graphic Method. It is also known as free – hand method. X axis represents
time and Y axis, the observed data. Corresponding to each pair of time and observed
value, a point is marked on a graph sheet. the line is drawn such that the following
three conditions are satisfied.
(i) The number of points above the line is equal to the number points below the
line, as far as possible.
(ii) The sum of the vertical distances of the points Above the line equals that of
the points below the line.
(iii) The sum of the squares of the vertical distances of all the points from the lone
is the minimum.
It is not easy to draw such a line. But method of least squares provides such a line
mathematically.
Example : Draw the trend lone by graphic method and estimate the production in
2003.
Solution: Year is represented in X axis. Production is represented in Y axis. Points
(1995, 20), (1996, 22), (1997, 25), (1998, 26), (1999, 25), (2000, 27) and (2001, 30)
are marked on a graph sheet.
A central line in the middle of those points is drawn such that the line satisfies the
three conditions.
Corresponding to X = 2003, the Y coordinate of the point to the line is found to be
32.2. Thus, the estimated production in the year 2003 is 32.2 units.
2. Method of Semi – Averages. The time series is considered. When there are even
numbers of years, the middle most years and the arithmetic mean of the observed
values are found out for each half. When there are odd numbers if years, the middle
most years and the corresponding observed value are omitted. The middle most year
and the arithmetic mean of the observed values are then found out for each half.
Based on them two points are marked line which is extended on either side. It is the
trend line. The trend at any point of time can be found from that line. Only two
points are marked on a line. There is no difficulty in drawing the line along the two
points.
Example : The sales in tonnes of a commonly varied from 1990 to 2001 as under:
43
280,300,280,280,270,240,230,230,220,200,210,200
Fit a trend line by the method of semi – averages. Estimate the sales in 2002.
Solution: Given
44
Examples : Calculate 5 yearly moving average of number of students studying in a
Commerce College as shown by the following figures:
Year No. of Students Year No. of Students
1987 332 1992 405
1988 311 1993 410
1989 357 1994 427
1990 392 1995 405
1991 402 1996 438
Solution:
Year No. of Students 5 Yearly 5 yearly
Moving Totals Moving Averages
1987 333 - -
1988 311 - -
1989 357 1794 358.8
1990 392 1867 373.4
1991 402 1966 393.2
1992 405 2036 407.2
1993 410 2049 409.8
1994 427 2085 417.0
1995 405 - -
1996 438 - -
Case 2. Period of Moving Average is an even number such as 4 or 6 or 8…
The mid years of the moving totals are not the given years in this case. Hence, 2
periods moving totals of the moving totals are found. The given years are found to be
the mid years of these totals. 2 periods moving totals are divided by twice the period
of moving averages to get the centered moving averages. The centered moving
averages are the trend values.
Example : Fit a straight line trend equation to the following data by the method of
least squares and estimate the value of sales for the year 1985.
Year 1979 1980 1981 1982 1983
Sales (in Rs.) 100 120 140 160 180
Solution: Let Y = a+bX be the equation of the trend line where X – year and Y –
sales.
As X values are large, consider x = X – X = X – 1981
Let the resulting equation be y = A+Bx where Y = y
For finding the values of A and B, the normal equations are
= NA + B
=N +B
45
1979 100 -2 -200 4 100
1980 120 -1 -120 1 120
1981 140 0 0 0 140
1982 160 1 160 1 160
1983 180 2 360 4 180
Total = t =
700 0 200 10 700
The following four methods are used to estimate the seasonal variations.
1. Method of Simple Averages.
2. Method of Moving Averages
(a) Difference from Moving Averages
(b) Ratio – to – Moving Averages.
3. Ratio – to – Trend Method.
4. Method of Link Relatives.
1. Method of Simple Averages. This method assumes absence of trend in a time
series. The following are the steps:
(i) The data are arranged season – wise in chronological order.
(ii) For each seas0on, the total of the seasonal values is found and called seasonal
total.
(iii) Each seasonal total is divided by number of years and seasonal average is
obtained.
(iv) The total and the averages of the seasonal averages are found. The average is
called grand average.
46
(v) Seasonal index of every season is calculated as follows.
100
Example : Assuming no trend in the series, calculate seasonal indices for the
Following data:
Quarter
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
(C.A. Foundation, M 99)
Solution:
Quarter Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 8 74
1998 76 74 86 82
Seasonal Total 376 352 416 384 Total Grand Average
Seasonal Average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal Index 98.4 92.2 108.9 100.5 400.0 -
47