0% found this document useful (0 votes)
32 views47 pages

Statistics Notes

The document provides an overview of statistics, including its definition, functions, characteristics, scope, limitations, and methods of data collection. It emphasizes the importance of statistics in various fields such as industry, commerce, agriculture, economics, education, planning, and medicine, while also discussing the differences between primary and secondary data. Additionally, it covers the classification and tabulation of data, highlighting their significance in organizing and presenting information effectively.

Uploaded by

Ruth Kethsial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views47 pages

Statistics Notes

The document provides an overview of statistics, including its definition, functions, characteristics, scope, limitations, and methods of data collection. It emphasizes the importance of statistics in various fields such as industry, commerce, agriculture, economics, education, planning, and medicine, while also discussing the differences between primary and secondary data. Additionally, it covers the classification and tabulation of data, highlighting their significance in organizing and presenting information effectively.

Uploaded by

Ruth Kethsial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-I

STATISTICS AND DATA COLLECTION


Definition: Statistics is the science of collection, organization, presentation, analysis
and interpretation of numerical data.

1.1 FUNCTIONS OF STATISTICS:


The following are the functions of Statistics
1. Collection 6. Forecasting
2. Numerical Presentation 7. Policy Making
3. Diagrammatic Presentation 8. Effect Measuring
4. Condensation 9. Estimation
5. Comparison 10. Test of Significance

1.2 CHARACTERISTICS OF STATISTICS:


The following are the characteristics of statistics
1. Statistics is a Quantitative Science
2. It never considers a single item
3. The values should be different
4. Inductive logic is applied
5. Statistical results are true on the average
6. Statistics is liable to be misused

1.3 SCOPE OF STATISTICS:


Statistics is not a mere device for collecting numerical data, but as a means of
developing sound techniques for their handling, analysing and drawing valid
inferences from them.
1) Statistics and Industry:
Statistics is widely used in many industries.
 In industries, control charts are widely used to maintain a certain quality level.
 In production engineering, to find whether the product is conforming to
specifications or not, statistical tools, namely inspection plans, control charts,
etc., are of extreme importance.
2) Statistics and Commerce:
Statistics are lifeblood of successful commerce. Any businessman cannot afford to
either by under stocking or having overstock of his goods. In the beginning he
estimates the demand for his goods and then takes steps to adjust with his output or
purchases. Thus statistics is indispensable in business and commerce.
3) Statistics and Agriculture:
Analysis of variance (ANOVA) is one of the statistical tools developed by Professor
R.A. Fisher, plays a prominent role in agriculture experiments. In tests of significance
based on small samples, it can be shown that statistics is adequate to test the
significant difference between two sample means.
4) Statistics and Economics:
Nowadays the uses of statistics are abundantly made in any economic study. Both in
economic theory and practice, statistical methods play an important role. It may also
be noted that statistical data and techniques of statistical tools are immensely useful in
solving many economic problems such as wages, prices, production, distribution of
income and wealth and so on.

1
5) Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all
branches of activities. Statistics is necessary for the formulation of policies to start
new course, consideration of facilities available for new courses etc. These are
possible only through statistics.
6) Statistics and Planning:
Statistics is indispensable in planning. In order to achieve the goals, the statistical data
relating to production, consumption, demand, supply, prices, investments, income
expenditure etc ., In India statistics play an important role in planning, commissioning
both at the central and state government levels.
7) Statistics and Medicine:
In Medical sciences, statistical tools are widely used. In order to test the efficiency of
a new drug or medicine, t - test is used or to compare the efficiency of two drugs or
two medicines, t-test for the two samples is used. More and more applications of
statistics are at present used in clinical investigation.
8) Statistics and Modern applications:
Recent developments in the fields of computer technology and information
technology have enabled statistics to integrate their models and thus make statistics a
part of decision making procedures of many organisations. There are so many
software packages available for solving design of experiments, forecasting simulation
problems etc.

1.4 LIMITATIONS OF STATISTICS:


Statistics with all its wide application in every sphere of human activity has its
own limitations. Some of them are given below.
1. Statistics is not suitable to the study of qualitative phenomenon:
Since statistics is basically a science and deals with a set of numerical data, it is
applicable to the study of only these subjects of enquiry, which can be expressed in
terms of quantitative measurements. For example, the intelligence of a group of
students can be studied on the basis of their marks in a particular examination.
2. Statistics does not study individuals:
Statistics does not give any specific importance to the individual items, in fact it
deals with an aggregate of objects.
3. Statistical laws are not exact:
It is well known that mathematical and physical sciences are exact. But statistical
laws are not exact and statistical laws are only approximations.
4. Statistics table may be misused:
Statistics must be used only by experts; otherwise, statistical methods are the most
dangerous tools on the hands of the inexpert. The use of statistical tools by the
inexperienced and untraced persons might lead to wrong conclusions. Statistics can be
easily misused by quoting wrong figures of data.
5. Statistics is only, one of the methods of studying a problem:
Statistical method do not provide complete solution of the problems because
problems are to be studied taking the background of the countries culture, philosophy
or religion into consideration. Thus the statistical study should be supplemented by
other evidences.

1.5 COLLECTION OF DATA:


Based on the source, data are classified under two categories:

2
Primary data and secondary data
1.5.1 Primary data:
The data which is collected by actual observation or measurement is called
primary data.
Methods of Collection of Primary Data:
The primary data can be collected by the following five methods.
1. Direct personal interviews.
2. Indirect oral interviews.
3. Information from correspondents.
4. Mailed questionnaire method.
5. Schedules sent through enumerators.

1. Direct personal interviews:


The persons from whom informations are collected are known as informants.
The investigator personally meets them and asks questions to gather the necessary
informations. It is the suitable method for intensive rather than extensive field
surveys.
2. Indirect Oral Interviews:
Under this method the investigator contacts witnesses or neighbours or friends or
some other third parties who are capable of supplying the necessary information. This
method is preferred if the required information is on addiction or cause of fire or theft
or murder etc.,
3. Information from correspondents:
The investigator appoints local agents or correspondents in different places and
compiles the information sent by them. Informations to Newspapers and some
departments of Government come by this method. The advantage of this method is
that it is cheap and appropriate for extensive investigations.
4. Mailed questionnaire method:
Under this method a list of questions is prepared and is sent to all the informants
by post. The list of questions is technically called questionnaire. This method is
appropriate in those cases where the informants are literates and are spread over a
wide area.
5. Schedules sent through Enumerators:
Under this method enumerators or interviewers take the schedules, meet the
informants and filling their replies. It is suitable for extensive surveys.

Merits and Demerits of primary data:


1. The collection of data by the method of personal survey is possible only if
the area covered by the investigator is small. Collection of data by sending the
enumerator is bound to be expensive.
2. Collection of primary data by framing a schedules or distributing and
collecting questionnaires by post is less expensive and can be completed in shorter
time.
3. Suppose the questions are embarrassing or of complicated nature or the
questions probe into personnel affairs of individuals, then the schedules may not be
filled with accurate and correct information and hence this method is unsuitable.
4. The information collected for primary data is more reliable than those
collected from the secondary data.

3
1.5.2 Secondary Data:
The data which are compiled from the records of others is called secondary
data.
Sources of Secondary data:
The sources of secondary data can broadly be classified under two heads:
1. Published sources, and
2. Unpublished sources.

1. Published sources:
The various sources of published data are:
 Reports and official publications of
(i) International bodies such as UNO, UNESCO,……
(ii) Central and State Governments .
 Semi-official publication of various local bodies .
 Private publications-such as the publications of –
(i) Trade and professional bodies
(ii) Financial and economic journals
(iii) Annual reports of joint stock companies.
(iv) Publications brought out by research agencies, research scholars, etc.
It should be noted that the publications mentioned above vary with regard to the
periodically of publication.

2. Unpublished Sources
All statistical material is not always published. There are various sources of
unpublished data such as records maintained by various Government and private
offices, studies made by research institutions, scholars, etc. Such sources can also be
used where necessary.
Precautions in the use of Secondary data
The following are some of the points that are to be considered in the use of secondary
data
1. How the data has been collected and processed
2. The accuracy of the data
3. How far the data has been summarized
4. How comparable the data is with other tabulations
5. How to interpret the data, especially when figures collected for one purpose is used
for another
Merits and Demerits of Secondary Data:
1. Secondary data is cheap to obtain. Many government publications are
relatively cheap and libraries stock quantities of secondary data produced by the
government, by companies and other organisations.
2. Large quantities of secondary data can be got through internet.
3. Much of the secondary data available has been collected for many years and
therefore it can be used to plot trends.

1.6 CLASSIFICATION OF DATA


The process of grouping into different classes or sub classes according to some
characteristics is known as Classification. Tabulation is concerned with the systematic
arrangement and presentation of classified data. Thus classification is the first step in
tabulation.

4
For Example, letters in the post office are classified according to their destinations
viz., Delhi, Madurai, Bangalore, Mumbai etc.,
1.6.1 Objects of classification of data:
 Classification is to separate the similar things and from the dissimilar things
and thereby to bring out the salient features.
 This enables comparison of one class of data with another.
 This helps in studying the relationship between several characteristics.
 This is a means to access a data properly.
 Important features can be seen at a glance.
1.6.2 Types of classification:
Statistical data are classified in respect of their characteristics. Broadly there are
four basic types of classification namely
A) Chronological classification
B) Geographical classification
C) Qualitative classification
D) Quantitative classification

A) Chronological classification:
In chronological classification the collected data are arranged according to the
order of time expressed in years, months, weeks, etc., The data is generally classified
in ascending order of
Time. For example, the data related with population, sales of a firm, imports and
exports of a country are always subjected to chronological classification.

The estimates of birth rates in India during 1970 – 76 are


Year 1970 1971 1972 1973 1974 1975 1976
Birth
36.8 36.9 36.6 34.6 34.5 35.2 34.2
Rate

B) Geographical classification:
In this type of classification the data are classified according to geographical
region or place. For instance, the production of paddy in different states in India,
production of wheat in different Countries etc..

Country America China Denmark France India


Yield of 1925 893 225 439 862
wheat

C) Qualitative classification:
In this type of classification data are classified on the basis of same attributes or
quality like sex, literacy, religion, employment etc., Such attributes cannot be
measured along with a scale.
For example, if the population to be classified in respect to one attribute, say sex, then
we can classify them into two namely that of males and females. Similarly, they can
also be classified into ‘ employed’ or ‘ unemployed’ on the basis of another attribute ‘
employment’. This type of classification is called simple or dichotomous
classification.

A simple classification may be shown as under

5
Population

Male Female

The classification, where two or more attributes are considered and several classes are
formed, is called a manifold classification.
Still the classification may be further extended by considering other attributes like
marital status etc. This can be explained by the following chart
Population

Male Female

Employed Unemployed Employed Unemployed

D) Quantitative classification:
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the
students of a college may be classified according to weight as given below.

Weight (in lbs) No of Students


100 – 110 200
110 – 120 260
120 -130 360
130 – 140 90
140 – 150 40
Total 950

1.7 TABULATION OF DATA


Tabulation is the process of summarizing classified or grouped data in the
form of a table so that it is easily understood and an investigator is quickly able to
locate the desired information.
A table is a systematic arrangement of classified data in columns and rows.

1.7.1 Advantages of tabulation:


 Large and complex data can be presented in a neat and compact form.
 Nature of the data can be easily understood.
 A table is the convenient form for diagrammatic representation of data.
 It is a permanent record and enables ready reference.
 It facilitates comparison of related facts.
 It facilitates computation of various statistical measures like averages,
dispersion, correlation etc.

6
 It presents facts in minimum possible space and unnecessary repetitions and
explanations are avoided.
 Tabulated data are good for references and they make it easier to present the
information in the form of graphs and diagrams.

1.7.2 Preparing a Table:


The making of a compact table itself an art. This should contain all the information
needed within the smallest possible space. What the purpose of tabulation is and how
the tabulated
Information is to be used are the main points to be kept in mind while preparing for a
statistical table.
An ideal table should consist of the following main parts:
1. Table number
2. Title of the table
3. Captions or column headings
4. Stubs or row designation
5. Body of the table
6. Footnotes
7. Sources of data
1.7.3 Requirements of a Good Table:
While preparing a table, one must have a few general point should be kept in mind:
1. A table should be formed in keeping with the objects of statistical enquiry.
2. A table should be carefully prepared so that it is easily understandable.
3. A table should be formed so as to suit the size of the paper. But such an adjustment
should not be at the cost of legibility.
4. If the figures in the table are large, they should be suitably rounded or
approximated. The method of approximation and units of measurements too should be
specified.
5. Rows and columns in a table should be numbered and certain figures to be stressed
may be put in ‘ box’ or ‘ circle’ or in bold letters.
6. The arrangements of rows and columns should be in a logical and systematic order.
This arrangement may be alphabetical, chronological or according to size.
7. The rows and columns are separated by single, double or thick lines to represent
various classes and sub-classes used.
8. The averages or totals of different rows should be given at the right of the table and
that of columns at the bottom of the table. Totals for every sub-class too should be
mentioned.
9. In case it is not possible to accommodate all the information in a single table, it is
better to have two or more related tables.
1.7.4 Type of Tables:
Tables can be classified according to their purpose, stage of enquiry, nature of data or
number of characteristics used. On the basis of the number of characteristics, tables
may be classified as follows:
1. Simple or one-way table 2. Two way table 3. Manifold table.

1.8 DIFFERENCE BETWEEN CLASSIFICATION AND TABULATION:

7
Classification Tabulation
 Classification is the process of  Tabulation is a process of
arranging data into groups arranging data systematically in
according to the common rows and columns of a table.
characteristics possessed by
individual items.
 The purpose is to analyse data.  The purpose is to present data.
 Classification of data is done  Tabulation follows classification.
after collection process is
completed.  It is arranged in rows and
 It is based on similar attributes columns in a systematic way.
and variables of the
observations.  Tabulation aims at presenting
data, to ensure easy comparison
 It is performed with the objective of various figures.
of analysing data in order to  In tabulation, data is divided into
draw inferences. headings and sub-headings.
 In classification, data is divided
into categories and sub-  The data are so placed in a table
categories. that proper comparison is
 This enables comparison of one possible and easier.
class of data with another.  A table facilitates further
 Important features can be seen at analysis of data.
a glance.

1.9 DIAGRAMATIC AND GRAPHICAL REPRESENTATION


One of the most convincing and appealing ways in which statistical results
may be presented is through diagrams and graphs.

1.9.1 Diagrams:
A diagram is a visual form for presentation of statistical data, highlighting their
basic facts and relationship. It is readily intelligible and save a considerable amount of
time and energy.
Significance of Diagrams and Graphs:
Diagrams and graphs are extremely useful because of the following reasons.
1. They are attractive and impressive.
2. They make data simple and intelligible.
3. They make comparison possible
4. They save time and labour.
5. They have universal utility.
6. They give more information.
7. They have a great memorizing effect.
General rules for constructing diagrams:
The diagrammatic presentation of statistical facts will be advantageous provided
the following rules are observed in drawing diagrams.
1. A diagram should be neatly drawn and attractive.
2. The measurements of geometrical figures used in diagram should be accurate
and proportional.
3. The size of the diagrams should match the size of the paper.
4. Every diagram must have a suitable but short heading.

8
5. The scale should be mentioned in the diagram.
6. Diagrams should be neatly as well as accurately drawn with the help of drawing
instruments.
7. Index must be given for identification so that the reader can easily make out the
meaning of the diagram.
8. Footnote must be given at the bottom of the diagram.
Types of diagrams:
In practice, a very large variety of diagrams are in use and new ones are
constantly being added. For the sake of convenience and simplicity, they may be
divided under the following heads:
1. One-dimensional diagrams
2. Two-dimensional diagrams
3. Three-dimensional diagrams
4. Pictograms and Cartograms
1) One-dimensional diagrams:
In such diagrams, only one-dimensional measurement, i.e height is used and the
width is not considered. These diagrams are in the form of bar or line charts and can
be classified as
1. Line Diagram
2. Simple Diagram
3. Multiple Bar Diagram
4. Sub-divided Bar Diagram
5. Percentage Bar Diagram
Line Diagram:
Line diagram is used in case where there are many items to be shown and there is
not much of difference in their values. Such diagram is prepared by drawing a vertical
line for each item according to the scale. Line diagram makes comparison easy, but it
is less attractive.
Simple Bar Diagram:
Simple bar diagram can be drawn either on horizontal or vertical base, but bars on
horizontal base more common. Bars must be uniform width and intervening space
between bars must be equal. While constructing a simple bar diagram, the scale is
determined on the basis of the highest value in the series.
Multiple Bar Diagram:
Multiple bar diagram is used for comparing two or more sets of statistical data.
Bars are constructed side by side to represent the set of values for comparison. In
order to distinguish bars, they may be either differently coloured or there should be
different types of crossings or dotting, etc. An index is also prepared to identify the
meaning of different colours or dottings.

2) Two-dimensional Diagrams:
In one-dimensional diagrams, only length is taken into account. But in two-
dimensional diagrams the area represent the data and so the length and breadth have
both to be taken into account. Such diagrams are also called area diagrams or surface
diagrams. The important types of area diagrams are:
1. Rectangles 2. Squares 3. Pie-diagrams

Rectangles:

9
Rectangles are used to represent the relative magnitude of two or more values.
The area of the rectangles are kept in proportion to the values. Rectangles are placed
side by side for comparison. When two sets of figures are to be represented by
rectangles, either of the two methods may be adopted.
Squares:
The rectangular method of diagrammatic presentation is difficult to use where the
values of items vary widely. The method of drawing a square diagram is very simple.
One has to take the square root of the values of various item that are to be shown in
the diagrams and then select a suitable scale to draw the squares.
Pie Diagram or Circular Diagram:
Another way of preparing a two-dimensional diagram is in the form of circles. In
such diagrams, both the total and the component parts or sectors can be shown. The
area of a circle is proportional to the square of its radius.
3) Three-dimensional diagrams:
Three-dimensional diagrams, also known as volume diagram, consist of cubes,
cylinders, spheres, etc. In such diagrams three things, namely length, width and height
have to be taken into account. Of all the figures, making of cubes is easy. Side of a
cube is drawn in proportion to the cube root of the magnitude of data.
4) Pictograms and Cartograms:
Pictograms are not abstract presentation such as lines or bars but really depict the
kind of data we are dealing with. When Pictograms are used, data are represented
through a pictorial symbol that is carefully selected. Cartograms or statistical maps
are used to give quantitative information as a geographical basis.

1.9.2 GRAPHS
Definition: A graph is a visual form of presentation of statistical data. A graph is
more attractive than a table of figure.
The types of graphs are
[Link] 2. Frequency Polygon [Link] Curve [Link] 5. Lorenz Curve

1) Histogram:
A histogram is a bar chart or graph showing the frequency of occurrence of each
value of the variable being analysed. In histogram, data are plotted as a series of
rectangles. Class intervals are shown on the ‘X-axis’ and the frequencies on the ‘Y-
axis’ . The height of each rectangle represents the frequency of the class interval.
2) Frequency Polygon:
If we mark the midpoints of the top horizontal sides of the rectangles in a
histogram and join them by a straight line, the figure so formed is called a Frequency
Polygon. The area of the polygon is equal to the area of the histogram, because the
area left outside is just equal to the area included in it.
3) Frequency Curve:
If the middle point of the upper boundaries of the rectangles of a histogram is
corrected by a smooth freehand curve, then that diagram is called frequency curve.
The curve should begin and end at the base line.
4) Ogives:
For a set of observations, we know how to construct a frequency distribution. This
accumulated frequency is called cumulative frequency. These cumulative frequencies
are then listed in a table is called cumulative frequency table. The curve table is
obtained by plotting cumulative frequencies is called a cumulative frequency
curve or an ogive.

10
5) Lorenz Curve:
Lorenz curve is a graphical method of studying dispersion. It is also used to study
the variability in the distribution of profits, wages, revenue, etc. It is specially used to
study the degree of inequality in the distribution of income and wealth between
countries or between different periods. It is a percentage of cumulative values of one
variable in combined with the percentage of cumulative values in other variable and
then Lorenz curve is drawn.

1.9.3 DIFFERENCE BETWEEN DIAGRAMS AND GRAPHS

Diagrams
We are too well aware of the use of diagrams to explain information and facts that
are presented in the form of text. If you need to explain the parts of a machine or the
principle of its working, it becomes difficult to make one understand the concept
through text only. This is where diagrams in the form of sketches come into play.
Similarly, diagrams are made heavy use of in biology where students have to learn
about different body parts and their functions. Visual representation of concepts
through diagrams has better chances of retention in the memory of students than
presenting them in the form of text. Diagrams are resorted to right from the time a kid
enters a school as even alphabets are presented to him in a more interesting and
attractive manner with the help of diagrams.

Graphs
Whenever there are two variables in a set of information, it is better to present the
information using graphs as it makes it easier to understand the data. For example, if
one is trying to show how the prices of commodities have increased with respect to
time, a simple line graph would be a more effective and interesting way rather than
putting all this information in the form of text which is hard to remember whereas
even a layman can see how prices have gone up or down in relation to time

*******************

UNIT-II

11
MEASURES OF CENTRAL TENDENCY

2.1 INTRODUCTION:
In this chapter we are going to deal with Meaures of central
tendency and about the measures of dispersion. The measures of central tendency
concentrate about the values in the central part of the distribution. Plainly speaking an
average of a statistical series is the value of the variable which is the representative of
the entire distribution. If we know the average alone we cannot form a complete idea
about the distribution so for the completeness of the idea we use Measures of
dispersion.

2.2 MEASURES OF CENTRAL TENDENCY:


According to Professor Bowley the measures of central tendency are
“Statistical constants which enable us to comprehend in a single effort the
significance of the whole “

The following are the three measures of central tendency in this chapter we deal with
 Arithmetic Mean or simply Mean
 Median
 Mode

2.2.1 ARITHMETIC MEAN OR SIMPLY MEAN :


Arithmetic Mean or simply Mean is the total values of the item divided by
their number of the items. It is usually denoted by X.

Individual series: 
X =X/N

Example:
The expenditure of ten families are given below. Calculate arithmetic mean.
30 ,70 ,10 ,75 ,500 ,8 ,42 ,250 ,40 ,36 .
Solution: Here N=10
 X = 30 +70 +10 +75 +500 +8 +42 +250 +40 +36 = 1061

X = 1061 / 10 = 106.1

Discrete series:

X =fX/f
Example:
Calculate the mean number of person per house.
[Link] person : 2 3 4 5 6
[Link] house :10 25 30 25 10

Solution:

12
 X f fX
X =400 / 100 = 4 . 2 10 20
3 25 75
4 30 120
5 25 125
6 10 60
 f =100  f X= 400

Continuous series: 
X =fm/f
where m represents the mid value .
Midvalue = (UL+LL) / 2

Example:
Calculate the mean for the following.
Marks : 20-30 30-40 40-50 50-60 60-70 70-80
[Link] student : 5 8 12 15 6 4
Solution:

 C.I f m fm
X = 2460 / 50 = 49.2. 20-30 5 25 125
30-40 8 35 280
40-50 12 45 540
50-60 15 55 825
60-70 6 65 390
70-80 4 75 300
 f = 50  f m= 2460

2.2.2 MEDIAN:
The median is the value for the middle most item when all the items are in the
order of magnitude. It is denoted by M or Me.

Individual series:
 For odd number of item, Position of the
median = (N+1) / 2
 For even number of item, Position of the
median = [(N/2)+ ((N/2)+1)] / 2

Example:
Calculate median for the following : 22 ,10, 6, 7 ,12, 8, 5.
Solution:
Here N =7
Arrange in ascending order or descending order.
5,6,7,8,10,12,22
(N+1) / 2= (7+1) /2 = 4 th item = 8
Discrete series:
Position of the median = (N+1) / 2 th item.

13
Example:
Find the median for the following.
X : 10 15 17 18 21
F: 4 16 12 5 3
Solution:
X f c.f
10 4 4
(N+1) /2 = (40+1) / 2 = 20.5th item 15 16 20
= (20 th item +21st item) /2 17 12 32
=(15+17) /2 18 5 37
= 16.
21 3 40
N= 40

14
Continuous series : M = L+[((N/2) –c.f) x i]
f.

Where L- lower boundary , f-frequency , i-size of class interval ,


c.f- cumulative frequency.

Example :
Calculate the median height given below.
Height : 145-150 150-155 155-160 160-165 165-170 170-175
[Link] student : 2 5 10 8 4 1

Solution :
Position of the median = N/2 th item Height No. of student c.f
= 30 / 2 =15. 145-150 2 2
= 155+ [(15-7)x5] 150-155 5 7
10 155-160 10 17
= 155+(40/10) = 159. 160-165 8 25
165-170 4 29
170-175 1 30
 f = 30

2.2.3 Mode :
Mode is the value which has the greatest frequency density . Mode is usually
denoted by Z .

Individual series:
The value which occur more times are identified as mode.

Example : Determine the mode 32, 35,42, 32, 42,32.


Solution: Unimode = 32

Discrete series : Determine the mode


Size of dress [Link] set
here mode represents highest frequency . 18 55
Mode =20 20 120
22 108
24 45
Continuous series:
Z = L +[ i( f1-f0) /(2f1 –f0 –f2)]

Where L- lower boundary , f1-frequency of the modal class, f0 – frequency of the


preceeding modal class, f2- frequency of the succeeding modal class, i-size of class
interval , c.f- cumulative frequency.

Example : Determine the mode

15
Marks : 0-10 10-20 20-30 30-40 40-50
[Link] student : 5 20 35 20 12
Solution:

Marks No. of student


0-10 5
10-20 20
20-30 35
30-40 20
40-50 12

Z = L +[ i( f1-f0) /(2f1 –f0 –f2)]


= 20+[10(35-20)/(2(35)-20-20)] = 20+5
= 25.

Empirical Relation :
Mode= 3 median -2 mean.

2.3 MEASURE OF DISPERSION :


Measure of dispersion deals mainly with the following three measures
 Range
 Standard deviation
 Quartile deviation
 Coefficient of variation

2.3.1 Range:
Range is the difference between the greatest and the smallest value.

 Range = L – S , where L- largest value & S- Smallest value


 Coefficient of range = ( L-S) /(L+S)

Individual series :

Example : Find the value of range and its coefficient of range for the following data.
8 ,10, 5, 9,12,11
Solution:
Range = L – S = 12- 5 =7
coefficient of range = (L-S) / (L+S) = (12-5) / (12+5) = 7 /17 = 0.4118

Continuous series: Range = L – S , where L-Midvalue of largest boundary


S-Midvalue of smallest boundary

Example : Calculate the range.


Marks : 20-30 30-40 40-50 50-60 60-70 70-80
[Link] student : 5 8 12 15 6 4
Solution:

16
C.I f m
20-30 5 25
30-40 8 35
40-50 12 45
50-60 15 55
60-70 6 65
70-80 4 75

Here L=75 & S=25


Range = L – S = 75-25 = 50

2.3.2 Quartile Deviation:


Quartile Deviation is half of the difference between the first and the third
quartiles. Hence it is called Semi Inter Quartile Range.

Coefficient of Quartile Deviation:


A relative measure of dispersion based on the quartile deviation is called the
coefficient of quartile deviation. It is defined as

Coefficient of correlation = Q3 – Q1 / Q3 + Q1
Example:
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600,
1470, 1750, and 1885. Find the quartile deviation and coefficient of quartile deviation.
Solution:
After arranging the observations in ascending order, we get
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720,
1730, 1750, 1755, 1785, 1880, 1885, 1960.

17
Quartile Deviation Q.D)

Coefficient of Quartile Deviation

2.3.3 Standard deviation:


The standard deviation is the root mean square deviation of the values from
the arithmetic mean .It is a positive square root of variants. It is also called root mean
square deviation. This is usually denoted by  .
Individual series :

 =  ( x2 / N) –(  x / N)2

Example: Calculate standard deviation for the following data.


40,41,45,49,50,51,55,59,60,60.
Solution:
X X2
40 1600
41 1681
45 2025
49 2401
50 2500
51 2601
55 3025
59 3481
60 3600
60 3600
510  x = 26504
2

 =  ( x2 / N) –(  x / N)2 =  (26514/10) – (510/10)2 = 7.09

Discrete series:

 =  ( fx2 /  f) –(  fx /  f)2

Example: Calculate standard deviation for the following data.


X: 0 1 2 3 4 5
F: 1 2 4 3 0 2

Solution:
X f fx x2 fx2
0 1 0 0 0
1 2 2 1 2
2 4 8 4 16
3 3 9 9 27

18
4 0 0 16 0
5 2 10 25 50
 f = 12  fx= 29  fx =95
2

 =  ( fx2 /  f) –(  fx /  f)2 =  (95/12 ) – ( 29 /12 )2 = 1.44

Continuous series :

 =  ( fm2 /  f) –(  fm /  f)2

Example:
C.I : 0-10 10-20 20-30 30-40 40-50
F : 2 5 9 3 1
Solution:

C.I f m fm m2 fm2
0-10 2 5 10 25 50
10-20 5 15 75 225 1125
20-30 9 25 225 625 5625
30-40 3 35 105 1225 3675
40-50 1 45 45 2025 2025
20 460 12500

 =  ( fm2 /  f) –(  fm /  f)2 =  (12500/20 ) – ( 460 /20 )2 = 9.79

19
Coefficient of variation:

Coefficient of variation = [standard deviation / arithmetic mean ] x100

Example : Calculate the coefficient of variation .


Mean= 51, standard deviation = 7.09
Solution:
Coefficient of variation = [standard deviation / arithmetic mean] x100
= (7.09 / 51) x 100 = 13.9

*******************

UNIT-III
Correlation Of Analysis:

Simple Linear Correlation:


The term Correlation refers to the relationship between the variables. Simple
correlation refers to the relationship between two variables. Various types of
correlation are considered.

Positive or negative : when the values of two variables change in the same direction,
their positive correlation between the two variables.

Example : X 50 60 70 95 100 105

Y 23 32 37 41 46 50

Example : X 34 25 18 10 7

Y 51 49 42 33 19

20
Simple or partial or Multiple :
When only two variables are considered as under positive or negative
correlation above the correlation between them is called Simple correlation. When
more than two variables as considered the correlation between two of them when all
other variables are held constant, i.e., when the linear effects of all other variables on
them are removed is called partial correlation. When more than two variables are
considered the correlation between one of them and its estimate based on the group
consisting of the other variables is called multiple correlation.
Methods :
The following four methods are available under simple linear correlation and
among them , product moment method is the best one.

 Scatter Diagram
 Karl Pearson’s correlation coefficient or product moment correlation
coefficient (r)
 Spearman’s rank correlation coefficient (  )
 Correlation coefficient by concurrent deviation method ( rc ).

Scatter Diagram :

Scatter diagram is a graphic picture of the sample data. Suppose a random


sample of n pairs of observations has the values
. These points are plotted on a
rectangular co-ordinate system taking independent variable on X-axis and the
dependent variable on Y-axis. Whatever be the name of the independent variable, it is
to be taken on X-axis. Suppose the plotted points are as shown in figure (a). Such a
diagram is called scatter diagram. In this figure, we see that when X has a small value,
Y is also small and when X takes a large value, Y also takes a large value. This is
called direct or positive relationship between X and Y. The plotted points cluster
around a straight line. It appears that if a straight line is drawn passing through the
points, the line will be a good approximation for representing the original data.
Suppose we draw a line AB to represent the scattered points. The line AB rises from
left to the right and has positive slope. This line can be used to establish an
approximate relation between the random variable Y and the independent variable X.
It is nonmathematical method in the sense that different persons may draw different
lines. This line is called the regression line obtained by inspection or judgment.

21
Making a scatter diagram and drawing a line or curve is the primary
investigation to assess the type of relationship between the variables. The knowledge
gained from the scatter diagram can be used for further analysis of the data. In most of
the cases the diagrams are not as simple as in figure (a). There are quite complicated
diagrams and it is difficult to choose a proper mathematical model for representing
the original data. The scatter diagram gives an indication of the appropriate model
which should be used for further analysis with the help of method of least squares.
Figure (b) shows that the points in the scatter diagram are falling from the top left
corner to the right. This is a relation called inverse or indirect. The points are in the
neighborhood of a certain line called the regression line.
As long as the scattered points show closeness to a straight line of some
direction, we draw a straight line to represent the sample data. But when the points do
not lie around a straight line, we do not draw the regression line. Figure (c) shows that
the plotted points have a tendency to fall from left to right in the form of a curve. This
is a relation called non-linear or curvilinear. Figure (d) shows the points which
apparently do not follow any pattern. If X takes a small value, Y may take a small or
large value. There seems to be no sympathy between X and Y. Such a diagram
suggests that there is no relationship between the two variables.

Karl Pearson’s Coefficient :


Karl Pearson’s Product-Moment Correlation Coefficient or simply Pearson’s
Correlation Coefficient for short, is one of the important methods used in Statistics to
measure Correlation between two variables.
A few words about Karl Pearson. Karl Pearson was a British mathematician,
statistician, lawyer and a eugenicist. He established the discipline of mathematical
statistics. He founded the world’s first statistics department In the University of

22
London in the year 1911. He along with his colleagues Weldon and Galton founded
the journal “Biometrika” whose object was the development of statistical theory.
The Correlation between two variables X and Y, which are measured using Pearson’s
Coefficient, give the values between +1 and -1. When measured in population the
Pearson’s Coefficient is designated the value of Greek letter rho (ρ). But, when
studying a sample, it is designated the letter r. It is therefore sometimes called
Pearson’s r. Pearson’s coefficient reflects the linear relationship between two
variables. As mentioned above if the correlation coefficient is +1 then there is a
perfect positive linear relationship between variables, and if it is -1 then there is a
perfect negative linear relationship between the variables. And 0 denotes that there is
no relationship between the two variables.
The degrees -1, +1 and 0 are theoretical results and are not generally found in normal
circumstances. That means the results cannot be more than -1, +1. These are the upper
and the lower limits.
Pearson’s Coefficient computational formula

Sample question: compute the value of the


correlation coefficient from the following table:

Subject Age x Weight Level y


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

Subject Age x Weight Level y xy x2 y2


1 43 99 .
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257
Step 3: Take the square of the numbers in the x column, and put the result in the x2
column.

Subject Age x Weight Level y xy x2 y2


1 43 99 4257 1849

23
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the square of the numbers in the y column, and put the result in the y2
column.
Step 5: Add up all of the numbers in the columns and put the result at the bottom.2
column. The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject Age x Weight Level y xy x2 y2


1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6: Use the following formula to work out the correlation coefficient.
The answer is: 1.3787 × 10-4
the range of the correlation coefficient is from -1 to 1. Since our result is 1.3787 × 10-
4, a tiny positive amount, we can’t draw any conclusions one way or another.

Spearman’s Rank Correlation Coefficient :


The Spearman correlation coefficient is often thought of as being the
Pearson correlation coefficient between the ranked variables. In practice, however, a
simpler procedure is normally used to calculate ρ. The n raw scores Xi, Yi are
converted to ranks xi, yi, and the differences di = xi − yi between the ranks of each
observation on the two variables are calculated.
If there are no tied ranks, then ρ is given by:
 = 1 - 6  d2
N(N2 -1)
If tied ranks exist, Pearson's correlation coefficients between ranks should be used
for the calculation:
One has to assign the same rank to each of the equal values. It is an average of their
positions in the ascending order of the values.
Example :
X: 21 36 42 37 25
Y: 47 40 37 42 43. For the data given above , calculate the
rank correlation coefficient.
Solution :
RANK
X Y X Y d D2
21 47 5 1 4 16
36 40 3 4 -1 1
42 37 1 5 -4 16

24
37 42 2 3 -1 1
25 43 4 2 2 4
Total d=0 d2 =
38

 = 1 - 6  d2
N(N2 -1)

= 1- 6 x 38
5 (52 – 1)

= 1- 1.9
= -0.9
Tied Ranks :

When one or more values are repeated the two aspects- ranks of the repeated
values and changes in the formula are to be considered.

Example:
Find the rank correlation coefficient for the percentage of marks secured by a
group of 8 students in Economics and Statistics.
Marks in Economics: 50 60 65 70 75 40 70 80
Marks in Statistics: 80 71 60 75 90 82 70 50
Solution:
Let X - Marks in Economics
Y - Marks in Statistics
RANK
X Y X Y d D2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0.25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
Total d=0 d2 =113.5

 = 1 - 6{  d2 + m(m2-1)/12}
N(N2 -1)

When m=2 , m(m2-1)/12 = 0.5

Therefore  = 1- 6{113.5+0.5}/8(82-1)}

= 1- 1.3571 = -0.3571

Simple Linear Regression:

25
The line which gives the average relationship between the two variables is
known as the regression equation. The regression equation is also called estimating
equation.

Uses:
1. Regression analysis is used in statistics and other displines.
2. Regression analysis is of practical use in determining demand curve, supply
curve, consumption function, etc from market survey.
3. In Economics and Business, there are many groups of interrelated variables.
4. In social resarch, the relation between variables may not known; the relation
may differ from place to place.
5. The value of dependent variable is estimated corresponding to any value of the
independent variable using the appropriate regression equation.

Method of Least Squares


from a scatter diagram, there is virtually no limit as to the number of lines that can be
drawn to make a linear relationship between the 2 variables
o the objective is to create a BEST FIT line to the data concerned
o the criterion is the called the method of least squares
o i.e. the sum of squares of the vertical deviations from the points to the
line be a minimum (based on the fact that the dependent variable is drawn on the
vertical axis)
o the linear relationship between the dependent variable (Y) and the
independent variable(x) can be written as Y = a + bX , where a and b are parameters
describing the vertical intercept and the slope of the regression.
o Similarly the linear relationship between the dependent variable (XY)
and the independent variable(Y) can be written as X = a’ + b’Y , where a and b are
parameters describing the vertical intercept and the slope of the regression.
o
Calculating a and b:
The values of a and b for the given pairs of values of (xi,yi) i=1,2,3…..are
determined,
Using the normal equations as ,
∑y = Na + b∑x
∑xy = a∑x + b∑x2

Similarly, the values of a’ and b’ for the given pairs of values of (xi,yi)
i=1,2,3…..are determined,
Using the normal equations as ,
∑x = Na’ + b’∑y
∑xy = a’∑y+ b’∑y2

Methods of forming the regression equations:

 Regression equations on the basis of normal equations.


 Regression equations on the basis of X and Y and bYX and bXY.
Problem:
From the following data, obtain the two regression equations.
X 6 2 10 4 8

26
Y 9 11 5 8 7 use normal equations.
Solution:
X Y XY X2 Y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑x=0 ∑y=0 ∑xy=214 ∑x2=220 ∑y2=340

Let the regression equation Y on X is Y = a + bX

The normal equations are ,


∑y = Na + b∑x
∑xy = a∑x + b∑x2

By substituting the values from the table, we get


5a+30b = 40 -------1
30a + 180b = 214 --------2
Solving these two equations we get,
a=11.90 and b= -0.65
Therefore the regression Y on X is Y = 11.90-0.65X.
Let the regression equation X on Y is X = a’ + b’Y
The normal equations are ,
∑x = Na + b∑y
∑xy = a∑y + b∑y2

By substituting the values from the table, we get


5a’+40’b = 30 -------3
40a’ + 340b’ = 214 --------4
Solving these two equations we get,
a’ = 16.40 and b= -1.30
Therefore the regression equation X on Y is X = 16.40-1.30Y

Example From the data given below, find


(i) the two regression equations
(ii) The correlation coefficient between the variables X and y
(iii) The value of Y when X= 30
X : 25 28 35 32 31 36 29 38 34 32
Y : 43 46 49 41 36 32 31 30 33 39
Solution :
X Y x= X- X` Y= Y-Y` xy x2 y2
25 43 -7 5 -35 49 25
28 46 -4 8 -32 16 64
35 49 3 11 33 9 121
32 41 0 3 0 0 9
31 36 -1 -2 2 1 4
36 32 4 -6 -24 16 36
29 31 -3 -7 21 9 49

27
38 30 6 -8 -48 36 64
34 33 2 -5 -10 4 25
32 39 0 1 0 0 1
32 380 0 0 -93 140 398
0
X` = 32, Y`= 38, bxy = xy / y2 = -0.2337, byx = xy / x2 = -0.6643
iv) Regression equation of Y on X ,(Y - Y` )= byx (X-X`)
( Y – 38 ) = -0.6643(X-32)  Y = 59.26-0.6643X
(ii) Regression equation of X on Y , (X - X` )= bxy (Y-Y`)
( X – 32) = -0.2337Y +8.88  X = 40.88 - 0.233 Y
(iii) r = +  byx bxy = -0.3940
(iv) Y = 59.26-0.6643x30= 39
Properties of Regression coefficients :
 The two regression equations are generally different and are not to be
interchanged in their usage.
 The two regression lines intersect at (X, Y).
 Correlation coefficient is the geometric mean of two regression coefficients.
 The two regression coefficients and the correlation coefficient have the same
sign.
 Both the regression coefficients and the correlation coefficient cannot be
greater than one numerically and simultaneously.
 Regression coefficients are independent of change of origin but are affected
by the change of scale.
 Each regression coefficient is in the unit of the measurement of the dependent
variable.
 Each regression coefficient indicates the quantum of change in the dependent
variable corresponding to unit increase in the independent variable.
*******************
UNIT IV
INDEX NUMBERS

All index number is a statistical measure designed to show changes in a variable or a


group of related variables with respect to time, geographic location or other
characteristics such as income, profession, etc. (Murray [Link]). index number is
s single ratio (usually in percentage which measures the combined (that is, averages)
change of several variables between two different times, places or situations (A.M.
Tuttle). Index Numbers which show changes in price or quantity in one time
compared with another alone are discussed here. The year for which index number is
calculated is called the current year. The year with which the current year is
compared is called the base year. A price index number is the percentage of change in
the price of one commodity or one group of commodities in the current year
compared with the base year. A similar calculation in quantity results in quantity
index number.

Characteristics of index numbers

1. Index number are a special type of averages

28
The units of measurements of commodities are different. But, a price index number
gives the percentage of changes in prices on the average. Hence, index numbers are a
special type of averages. For example, let the commodities be rice, kerosene and
cloth. The price of rice per kilogram is considered; the price of kerosene per litre and
the price of cloth per metre are considered. The average change in prices is indicated
by the index number.
2. Index numbers are percentages. The price in the current year is divided by the
price in the base year to get the ratio of change in price. It is multiplied by 100.
Interpretation of an index number is made easy by this procedure.
3. Index numbers indicate the percentage of change which is not possible
otherwise. No other statistical tool is so effective in studying such a wide variety of
situations.
4. Index numbers are meant for comparisons. Index numbers have been devised
to compare two different times. Comparisons of two different places or situations are
also possible with index numbers.

Uses

1. Index numbers provide scope for comparisons. Price, production, value, etc.
in two times are compared by index numbers.
2. Index numbers are Economic Barometers. .
3. Index numbers serve as guides. Being economic barometers, the direction in
which the economy is likely to move is foretold. Governments, businessman,
Economists, etc. benefit by acting activities.
4. Index numbers are the pulse of an economy. The condition of an economy is
known from the index numbers of various economic activities.
5. Index numbers measure the purchasing power of money.
6. Index numbers help to calculate real wages.

Real wage = x 100


7. Index numbers are deflators. Deflator is one which makes allowance for the
change in the price of commodities. Generally the cost of living index numbers are
used to revise the salaries, dearness allowance and other allowances so that wage –
earners could maintain the same standard of living.
8. Index numbers are useful to formulae policies. Based on the relevant index
numbers suitable policies are framed by business mean and economists. Governments
and industrialists also use the prevailing conditions and benefit through planning.

General problems in the construction of index numbers

The following aspects are to be carefully considered during the construction of


an index number.
1. The purpose. The purpose of the index number is to be clearly known. For
whom it is meant, by whom it is to be used, etc. are to be spelt out. It is the purpose
which will solve the other problems, such as choosing the suitable formula among the
available formulae, deciding the reference or base period and the like. In short, it tells
what are to be done and what are not to be done.

2. The base period. The period may be one year or a few years. The base
period is to be taken according to the purpose. If the impact of Five-Year Plan on the

29
Indian economy is to be assessed, 1951. The condition of any subsequent year till
that year in relation with that shows how the country has progressed till that year from
1951. Generally the base period should be as follows.
(i) It should be a normal period. There should not have been natural calamities
such as famine, flood and earthquake, political upheavals, war, etc.
(ii) It should not be too short. In short periods, typical conditions might not be
there. The price of a commodity, for example, might be very high during a very short
time. The true condition is distorted of it is taken as the base period.
(iii) It should not be too distant in the past. This is to keep the index number
useful.
(iv) It may be fixed period for all the different periods under consideration. Or,
under chain base method, link relatives in which for every year the preceding year is
the base year may be calculated first and then may be chained together to a common
base year. Link relatives may prove their use in business and industry when any year
is to be compared with the year just preceding it. Whenever different years are to be
compared among themselves (with a common base year), fixed base as well as chain
base index numbers are useful.

3. The items. Including all the items in a study is neither feasible nor useful.
Only those items which concern the people for whom the index number is intended
are to be included. For considering the living conditions of people in hill stations,
woollen clothes should be included. For people who live in hot places throughout the
year woollen clothes are not at all necessary. For students pen and paper may be
necessary. For Keralites umbrella may be necessary. Only items essential for the
people concerned should be included.

4. The price quotations. The prices are to be properly gathered. For consumer
price index number, retail prices are necessary. For whole sale price indices, whole-
sale prices are needed. The places from where the people concerned buy are selected.
The difficulty is all the greater when the prices vary from locality to locality in the
same town, from shop to shop in the same locality and from customer to customer in
the same shop.

5. The Average. For arriving at the average value of a group of items, the
suitable average is to be decided. In other contexts A.M. may be more useful. It may
be simple to understand and easy to calculate. Nowdays calcualtors may be available
to show the A.M. Median and Mode may be obtained by mere inspection. But,
Geometirc mean is the preferable average due to the following reasons:
(i) G.M. is the appropriate average to measure relatives’ changes. Hence, index
numbers where in the relative changes are expressed as percentages, give scope for
G.M.
(ii) It gives more weightage to smaller items and lesser weightage to greater items.
It is not as unduly affected as A.M. by extreme items.
(iii) It facilities the change of the base period. Base cannot be kept the same for a
long time because the purpose and all around changes may warrant a change in the
base period.

6. Weighting. By unweighted method, equal weightage of unity is given to all


the items. It may not be desirable because the items may not be equally important.

30
The quantity purchased, the amount spent, etc. show the relative importance of the
different items. Weighting may be explicit as follows.
(i) Base year quantity as in Laspeyre’s method of current year quantity as in
Paasche’s method for price index number.
(ii) Base year value (price X quantity) as in consumer price index number by
Family Budget Method.
(iii) Some fixed weight based on neither base year quantity nor current year
quantity but on some other consideration as in Kelly’s method.
7. The Formula. As seen in the following pages, many formulae are available.
Each one has its own advantages. If for a certain situation only one formula is
suitable, there is no difficulty in using the formula. For certain other situations more
than one formula may be found suitable. In such cases the purpose and the opinion of
the experts in the field are the guides in choosing a formula.
Proper decision under each of those headings is bound to lead to a good index
number.
Period is refused to as year hereafter and the following notations are used.
P0 - price of a commodity in the base year.
P1 - price if a commodity in the current year.
q0 - quantity of a commodity in the base year.
q1 - quantity of a commodity in the current year.
p - price of a commodity.
Q - quantity of a commodity.
V or W- weight of a commodity.
I or P - price relative or price index number of a commodity.
Q - quantity relative or quantity index number of a commodity.
P= p1/p0 × 100 Q= q1/q0 × 100
P01 - Price index number of the current year compared with the base year.
Q01- quantity index number of the current year compared with the base year.

Formulae. All the formulae can be brought under four groups as follows. First, they
are divided into two groups. Viz., Unweighted Methods and Weighted Methods and
then each group is subdivided into two as Aggregatives Methods and Average of
Relatives Methods. Under each of the four subdivisions one or more formula are
available.

Methods

Unweighted Weighted

Simple Simple Weighted Weighted


Aggregatives Averages Aggregatives Averages
of Relatives Method of Relatives
Method Method

1. Simple or unweighted Aggregatives Method.

31
It is based on the aggregates or the totals as shown below.
P01 = ∑p1/∑p0 X 100
It may be noted that the current year figure is in the numerator while the base year
figure is in the denominator as in the other methods when the index number if the
current year as compared to the base year is calculated.
When quantity index number is required, Q01 = = ∑q1/∑q0 X 100
The calculation is illustrated together with the simple averages of relative’s method.

The drawbacks of this method are.


(i) It does not satisfy even unit test which is explained later. The defect is due to
the fact that the unit prices are added as such even though the units’ of measurements
are different such as kg, metre, liter, etc.
(ii) It does not distinguish between the commodities with regard to their relative
importance.

2. Simple or Unweighted Averages of Relatives Method.


The price relatives, P, for price index number and the quantity relatives, Q, for
quantity index number4 are calculated and their A.M. or G.M. is found.
Price Index (P01)

(i) Using A.M., P01 = ∑P/N


(ii) Using G.M., P01 = Antilog( ∑log P/N)
Both these formulae can be found to satisfy unit test.

Example : From the following data construct an index for 1995 taking as
base:

Commodities A B C D E
Price in 1994 (Rs.) 50 40 80 110 20
Price in 1995 (Rs.) 70 60 90 120 20
Solution:

Commodities Price

1994 (p0) 1995 (p1) P= X 100 log P

A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000

By Aggregatives Method,
P01 = ∑p1/∑p0 = 360/300 X 100 =120
Using A.M., P01 = ∑P/N =611.9/5 = 122.32
Using G.M., P01 = Antilog (∑log P/N ) = Antilog(10.4112/5) = 120.84

Note: Although any one of them is sufficient, all the three possible indices have been
calculated for the sake of illustration.

32
When the index number is required by only one method as in this problem, the
preferable method is simple A.M. and the answer is P01 = 122.32
P01 = 122.32 indicates that the prices, on the average, have increased 22.32%
in the current year compared with the base year.
Whenever the price index number is less than 100, it indicates that the prices,
in the average, have increased in the current year compared with the base year.

3. Weighted Aggregatives Method. Price indices (P01)


(i) Laspeyre’s formula: P01L = ∑p1q1//∑p0q0 x 100

(ii) Paasche’s formula : P01P = ∑p1q1/∑p0q1 x 100

(iii) Fisher’s formula : P01F = =

4. Weighted Averages of Relatives Method.


Prices Indices [P01]

(i) Using A.M., P01 =

(ii) Using G.M., P01 = Antilog

This method is better than the corresponding unweighted method in showing the
relative change. From the data available under this method, index numbers by
unweighted averages of relatives also could be calculated. This method provides
scope for replacing one or more items at a later stage.
Note: G.M. is the suitable average. When nothing is mentioned A.M. alone is
usually calculated.

TESTS OF CONSISTENCY AND ADEQUACY

Index numbers are constructed to study the relative changes in prices, quantities, etc.
of one time in comparison with another. Many formulae are available. They are
tested as follows.

1. Unit Test. This requires the formula to be independent of the units in which
prices and quantities are quoted.
.
The following examples show the different results given by the simple
aggregatives method although the price condition is the same. Laspeyre’s, Paasche’s
and Fisher’s formulae give the same result in spite of the difference in units.

Price Quantity
Item Unit P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1

Rice Ton 3000 4500 1 2 3000 4500 6000 9000


Cloth metre 100 200 4 5 400 800 500 1000

33
By simple Aggregative Method,

P01 = x 100 = x 100 =151.61

By Laspeyre’s formula,

P01 = X 100 = x 100 = = 155.88

By Paasche’s formula,

P01 = X 100 = x 100 = = 153.85

By Fisher’s formula,

P01 = = x =

= 154.86
The same prices and quantities are quoted below to different units:

Price Quantity
Item Unit P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1

Rice Kg. 3 4.5 1000 2000 3000 4500 6000 9000


Cloth cm 1 2.0 400 500 400 800 500 1000

Total ---- =4 =6.5 ----- ----- =3400 =5300 =6500


=10000

Totals except those of P0 and P1 remain the same and so Laspeyre’s,


Passche’s and Fisher’s formulae give the same results as earlier.
By simple Aggregative Method,

P01 = x 100 = x 100 =162.50

[Link] Reversal Test (T.R. test): This requires the formula to be such that p01Xp10 =
1, after ignoring the factor 100 in each index. In the words of Prof. Irving Fisher who
proposed that test condition, “…………the formula for calculating the index number
should be such that it will give the same ratio between one point of comparison and

34
the other, no matter which of the two is taken as base or putting it in another way, the
index number reckoned forward should be reciprocal of the one reckoned backward”.
P10 is the index number of the base year in comparison with the current year. That is,
the base year figure will be in the numerator and the current year figure will be in the
dominator. Hence, it is expected to be the reciprocal of P01. In other words, the
product of P01 and P10 is expected to be unity.
Fisher’s formula, Marshall – Edgeworth formula, Kelly’s formula, Simple
Aggregatives Method and Weighted and Unweighted Geometric Means of Relatives
Methods satisfy this test.
The examination of a few formulae under this test is presented in the table in the next
page. From that it could be seen whether the test is satisfied or not by the concerned
formula.

[Link] Reversal Test. (F.R. Test) This requires the formula to be such that

P01XQ01 = after ignoring the factor 100 in each index. In the words of Prof.
Irving Fisher who proposed this condition also, “Just as our formula should permit
the interchanging of two times without giving inconsistent results, so it ought to
permit interchanging the prices and quantities without giving inconsistent results –
that is, the two results multiplied together should give the true value ratio, expect for a
constant of proportionality”.
P01 gives the relative change in price while Q01 gives the relative change in quantity.
Hence, P01 X Q01 should give the relative change in price multiplied by quantity (i.e.,

value) and so should be equal to


Fisher’s is the only formula which satisfies this test. The examination of a few
formulae under this test is presented in the table in the page after the next.
Fisher’s formula is found to satisfy all the three tests while no other formula does.
Hence, Fisher’s formula is called idea index number formula.
Example : Show that Fisher’s ideal index satisfying both time reversal and factor
reversal tests. Using the following data commonly.

Commodity Price(1990) Qty(1990) Price(1992) Qty(1992)

A 6 50 10 56
B 2 100 2 120
C 4 60 6 60
D 10 30 12 24
E 8 40 12 36

Solution:

1990 1992
Commodity P0 P1 q0 q1 p0q0 p1q0 p0q1 p1q1

A 6 50 10 56 300 500 336 560


B 2 100 2 120 200 200 240 240
C 4 60 6 60 240 360 240 360
D 10 30 12 24 300 260 240 288
E 8 40 12 36 320 480 288 432

35
Total ----- ----- ----- ----- =1360 =1900 =1344 =1880

By fisher’s formula, after ignoring the factor 100,

P01 = =

P10 = = and so

P01 x P10 = x

= = =1

Q01 = =

P01 x Q01 = x

= =
Using the given data, Fisher’s index in found to satisfy both time reversal and factor
reversal tests.

CIRCULAR TEST

Circular test is an extension of the time reversal test. If three years 0,1 and 2 are
under consideration, this requires the formula to be such that
P01 X P12 X P20 =1

FIXED BASE

When the data are available for more than two years question ‘which is the base year
arises. Under fixed base method, the base ‘year’ is same for all the different years
under consideration. Base year figures may be figures of any one year or the averages
of a few years or the totals of a few years or those suggested. When nothing is

36
indicated, the first year in the series of years in chronological order is to be taken as
the base.
If no method is suggested, the method which is suitable for the data under
consideration is to be chosen. For the given data, although index number can be
calculated by more than one method, the result is obtained by only one method unless
stated otherwise. The method is selected in the following order:
(i) Fisher’s formula
Or
(ii) Weighted A.M. method
Or
(iii) Unweighted A.M. method

Example :Calculate fixed base index number from the following prices:

Commodity 1995 1996 1997 1998 1999 2000

I 4 5 6 6 8 10
II 5 7 8 10 13 15
III 6 9 12 12 15 15

Solution:

Year Prices Price Relatives[P] Total Index No.


Commodity Commodity
I II III I II III [ ] [ ]

1995 4 5 6 100 100 100 300 100.00


1996 5 7 9 125 140 150 415 138.33
1997 6 8 12 150 160 200 510 170.00
1998 6 10 12 150 200 200 550 183.33
1999 8 13 15 200 260 250 710 236.67
2000 10 15 15 250 300 250 800 266.67

For each commodity, the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are
calculated and are given in the last column of the above table.
For the first year which is the base year, fixed base index number as well as
each P is 100.

CHAIN BASE

When the data are available for more than two years, the method available
besides the fixed base method for computing index numbers is the chain base method.

Chain Index=

Current year F.B.I. =

37
Example : Construct (a) fixed base and (b) chain base index numbers from the
following data relating to the production of electricity.
Year 1981 1982 1983 1984 1985 1986 1987 1988
Production 25 27 30 24 28 29 31 35
Year 1989 1990 1991 1992 1993 1994 1995 1996
Production 40 41 36 32 37 38 39 40
Solution:
Quantities of production are given for 16 years. The production every year is
divided by that of 1981, i.e., 25 and is multiplied by 100 to get the fixed base quantity
indices (Q01) given in col. (3).
For calculating link relatives (L.R.) of col. (4), quantity of every year is
divided by that of its preceding year and is multiplied by 100.
Link relatives are converted into chain base indices (Q01) given in col. (5)
using usual formula.

Year production F.B.I. L.R. C.B.I.


Q01 Q01
1981 25 100 100.00 100.00
1982 27 108 108.00 108.00
1983 30 120 111.11 120.00
1984 24 96 80.00 96.00
1985 28 112 116.67 112.00
1986 29 116 103.57 116.00
1987 31 124 106.90 124.00
1988 35 140 112.90 140.00
1989 40 160 114.29 160.01
1990 41 164 102.50 164.01
1991 36 144 87.80 144.00
1992 32 128 88.89 128.00
1993 37 148 115.63 148.01
1994 38 152 102.70 152.01
1995 39 156 102.63 156.01
1996 40 160 102.56 160.00

COST OF LIVING INDEX

Cost of living index number shows the impact if changes in the prices of a number of
commodities and services on particular class of people in the current year in
comparison with the base year. Cost of living index number is also known as
consumer price index number.
Main steps in the construction of Cost of Living Index Number.
[Link] Purpose. At the outset, the class of people for whom the index number is
intended is to be identified. The knowledge of their area of living, their ways of life,
their necessities, their habits, etc. play an important role in getting good results.

[Link] Base Year. Similar survey might have been conducted earlier. The current
interest might be to study the subsequent changes. For example, the pay scales of the
employees of Tamil Nadu Govt. were revised in 1994.

38
[Link] Budget Enquiry. A sample survey, known as family budget enquiry, is
conducted and the items to be included, their quantity, etc. are found. IT is customary
to bave the items under the five heads (i) Food (ii) Clothing (iii) Fuel and Lighting
(iv) House Rent and (v) Miscellaneous. From the families of the concerned class of
people, a sample of adequate size is selected. From each such family, the details of
the different items consumed, their quality and quantity are noted.

[Link] prices. The average price paid for each item if to be gathered from the shops of
the region. The prices are retail prices. As mentioned earlier under general problems
in the construction of index numbers, it is a difficult task to gather and to arrive at an
average price of an item. The shops where many of the families buy and the most
likely prices in those shops are to be noted before finding their average.

2. The Average. Both arithmetic mean and geometric mean can be used, the
former owing the former owing to its case if calculation and the latter owing to its
suitability.
3. The Formula. Two formulae are available. They are given below.

(i) Aggregate Expenditure Method or Weighted Aggregatives Method: In


the usual notations the

Cost of living Index Number = X 100


It is the most popular method and the formula is nothing but Laspeyre’s. On the basis
of base year quantities, total expenditure in current year and base year are calculated
and the percentage of change is worked out.

(ii) Family Budget Method or Weighted Averages of Relatives Method. The


formula this method as given in usual notation is

Cost of Living Index Number =


Weights (W) are determined on the basis of the family budget enquiry wherein
the relative importance of the items within a group and the relative importance of a
group to the total are known. When W is base year value (p0q0), both the methods
become one and the same.
Instead of finding the weighted arithmetic mean or price relatives as in the
above formula, weighted geometric mean may also be calculated if required, using the
following formula:

Cost of Living Index Number = Antilog


Uses: 1. Cost of living index numbers are the indicators of changes in real wages.
Many wages are changing and so are prices. Cost of living index numbers help to
know whether money wages overtake the rising prices or are overpowered by them.
2. Decisions on dearness allowance are based on the cost of living indices.
3. They are further used for deflation of income and value in national accounts.
Example 7: Construct cost of living index, for 2000 taking 1999 as the base year
from the following data using ‘Aggregate Expenditure’ Method.

Article Quantity in 1999 Price Rs. (per Kg.)


A 6 5.75 6.00
B 1 5.00 8.00

39
C 6 6.00 9.00
D 4 8.00 10.00
E 2 2.00 1.80
F 1 20.00 15.00

Solution:

Article Quantity Price


1999(q0) 1999(p0) 2000(p1) p1q0 p0q0

A 6 5.75 6.00 36.00 34.50


B 1 5.00 8.00 8.00 5.00
C 6 6.00 9.00 54.00 36.00
D 4 8.00 10.00 40.00 32.00
E 2 2.00 1.80 3.60 4.00
F 1 20.00 15.00 15.00 20.00

Total ----- ----- ----- =156.60 =131.50

*******************

UNIT V
ANALYSIS OF TIME SERIES

A time series is a collection of observations made sequentially in time. (C. Chatfield


– The Analysis of Time series: Theory and Practice, Page 1). The series
of values might have been observed at regular intervals of time such as daily sales,
annual profits and decennial census. Certain time series might have been recorded
at irregular intervals as and when they happened such as flood and earth quake.

Example : Year 1991 1992 1993 1994 1995 1996 1997


Production of Gold 121 101 130 132 126 142 137
(in Crore ounces)
Example : Sales in a cloth shop are given below in Rs. Crores
Year Apr. May June July Aug. Sep.
96-97 23.9 27.1 26.3 27.2 28.4 29.2

40
97-98 32.4 35.0 34.7 33.5 31.5 33.1
98-99 45.8 50.2 50.4 51.3 52.0 52.4
Oct. Nov. Dec. Jan. Feb. Mar.
96-97 28.5 30.2 27.3 25.1 20.7 21.5
97-98 32.4 35.7 36.0 36.7 30.0 31.2
98-99 51.7 54.2 53.7 52.4 41.3 43.6
Uses:
Variables such as Sales, Production, Profit and Population have different
values at different points of time. Analysis of such series of values is important as
pointed out below.
(i) The analysis of time series helps to know the past conditions. The observation
at the past periods of time indicate the conditions which existed. A detailed study
enables us to know further.
(ii) It helps in assessing the present achievements. If the past conditions had
continued what would be the present position? What is the actual position now?
What are the causes for the difference? Are we satisfied with the present? Thinking
in these lines helps not only to assess the present but also to plan for the future.
(iii) It helps to predict reliably. There are many methods in Statistics to estimate
the value of a variable at a certain time in the future.
(iv) It facilitates comparison. Relevant time series could be compared and vital
inferences be drawn. For example, the production of motor cycles of two companies
can be compared over a period of time.
(v) It forewarns. As it predicts the future most reliably, future could be met with
due preparedness. If the sales in cloth shop is likely to fall, advertisement campaign
can be tried to increase the sales, the services of certain staff may be terminated,
unnecessary godown facilities may be surrendered, etc. On the contrary, of increased
sales is expected, stock may be increased, more sales personnel be employed, etc. In
short losses, is any, could be minimized. Profiles, is any, could be maximized.
Components:
The fluctuations in a time series are of four different natures generally. They
have been named as follows and are called components of a time series. Secular trend
is the long term effect. The other three components are called short term variations.
Long – Terms Effect:
1. Secular Trend
Short – Term Variations:
2. Seasonal Fluctuations
3. Cyclical Fluctuations
4. Irregular Variations
1. Secular Trends: Secular trends is also called long – term trend or trend,
simply. The overall nature of the series is the trend. The general tendency of a series
is to increase or decrease over a period of time. Increasing trend is observed in
population, price, production, literacy, etc. There is decreasing trend in birth rate,
death rate, poverty, illiteracy, etc. It is very rare to find a time series which neither
increases nor decreases.
Mathematically, trend may be
(i) Linear or
(ii) Non – linear.

41
Graphically, linear trend is a straight line. The discussion in this chapter is restricted
to linear trend. Parabolic trend equation, if necessary, can be formed as explained in
‘Method of Least Squares’.
2. Seasonal Fluctuations. Season is a period which is less than one year. It may
be a period of 6 months or 4 months or 3 months or 1 months, etc. Certain nature is
observed in the first season, another nature is observed in a season in every year. In
other words, the different natures recur year after year at the respective seasons.
These variation over time are called seasonal fluctuations.
The factor which cause seasonal variations are of the following two kinds:
(i) Climate and weather conditions.
(ii) Customs, traditions and habits of the people.
(iii) Climate and weather condition: Sales of ice – cream, khadi and cotton clothes,
etc. are more during summer. Sales of umberellas are at its peak during rainy season.
Production of paddy, wheat, etc. is more in a few months and less in other months of a
year. Climate and weather cause this kind of variations.
iv) Customs, traditions and habits of the people. Sales of crackers and
fire works is found to be more during Deepavali every year. Cloth shops register very
good sales during festival; seasons such as Deepavali, Pongal, Ramzan and Chritmas
and marriage seasons. Post men are very busy in those days in sorting and delivering
greeting. All these variations in sales, work load, etc. are due to the customs,
traditions and habits of the people.
3. Cyclical Fluctuations. Cyclical fluctuations are similar to seasonal
variations. The difference is in the interval of recurrence. In seasonal fluctuations a
nature of the series recurs at an interval of one year. Cyclical fluctuations recur at an
interval of 3 or more years. The fitting example is business cycle. In Economics and
Business, there are many times series which have certain wave – like movements
called business cycles, in one period, profits are easily made and are made in plenty
also. Prices are high. This period is called prosperity. After this (peak) conditions
things decline instead of improving. High wages, decreasing efficiently, increasing
interest rate, etc. cause the decline. This is the period of recession. After touching the
bottom which is called depression the condition improves. The recovery from
depression leads to prosperity. The four phase of a business cycle, namely, (i)
prosperity (ii) recession (iii) depression and (iv) recovery recur one after another
regularly.

4. Irregular Variations. Variations which no not come under the other three
components are called irregular variations. The other three components have certain
regularity. But this is irregular. Fire, floods, earthquakes, wars, lock – outs, strikes,
etc, cause irregular variations. Sometimes causes as above for irregular variations are
known. Sometimes causes may not be known. For example, there may be very poor
sales on a particular day in a leading cloth shop on the eve of Deepavali. Cause for
such a happening may not be known.
Irregular variations is called random variation or erratic fluctuation.

Models: There exit certain relations between the components and the series of
observations. The relation between the observed value and the components is called
model. Many models exit. In this book, only two models are considered. Let Y be

42
observed data, T or Yt be the trend, S be seasonal variation ,C be cyclical variation
and I be irregular variation.
(i) Additive Model
Y=T+S+C+I
When short – term variations is to be found out as are this model,
Short – term variation = Y-Yt
(ii) Multiplicative Model
Y=TxSxCxI
Many time series in Economics and Business are found to be of multiplicative
model. A few other series are found to be of additive model.
SECULAR TREND
There are four methods to estimate the secular trend.
They are Graphic Method.
1. Method of Semi – Averages.
2. Method of Moving Averages.
3. Method of Least Squares.
1. Graphic Method. It is also known as free – hand method. X axis represents
time and Y axis, the observed data. Corresponding to each pair of time and observed
value, a point is marked on a graph sheet. the line is drawn such that the following
three conditions are satisfied.
(i) The number of points above the line is equal to the number points below the
line, as far as possible.
(ii) The sum of the vertical distances of the points Above the line equals that of
the points below the line.
(iii) The sum of the squares of the vertical distances of all the points from the lone
is the minimum.
It is not easy to draw such a line. But method of least squares provides such a line
mathematically.
Example : Draw the trend lone by graphic method and estimate the production in
2003.
Solution: Year is represented in X axis. Production is represented in Y axis. Points
(1995, 20), (1996, 22), (1997, 25), (1998, 26), (1999, 25), (2000, 27) and (2001, 30)
are marked on a graph sheet.
A central line in the middle of those points is drawn such that the line satisfies the
three conditions.
Corresponding to X = 2003, the Y coordinate of the point to the line is found to be
32.2. Thus, the estimated production in the year 2003 is 32.2 units.
2. Method of Semi – Averages. The time series is considered. When there are even
numbers of years, the middle most years and the arithmetic mean of the observed
values are found out for each half. When there are odd numbers if years, the middle
most years and the corresponding observed value are omitted. The middle most year
and the arithmetic mean of the observed values are then found out for each half.
Based on them two points are marked line which is extended on either side. It is the
trend line. The trend at any point of time can be found from that line. Only two
points are marked on a line. There is no difficulty in drawing the line along the two
points.
Example : The sales in tonnes of a commonly varied from 1990 to 2001 as under:

43
280,300,280,280,270,240,230,230,220,200,210,200
Fit a trend line by the method of semi – averages. Estimate the sales in 2002.

Solution: Given

Year Sales in tonnes Middle Most Year Mean


Sales
1990 280
1991 300
1992 280
1992.5 1650/6 = 275.0
1993 280
1994 270
1995 240
1996 230
1997 230
1998 220
1998.5 1290/6 = 215.0
1999 200
2000 210
2001 200
Points (1992.5, 275.0) and (1998.5, 215.0) are marked ion a graph sheet. A line is
drawn along them. It is the trend line. Corresponding to X=2002, Y=180 from the
line. Hence, the estimated sales in 2002 is 180 tonnes.
3. Method of Moving Averages. The method of moving averages of one of the most
useful methods of estimating trend. It is an algebraic method. Graph sheet is not used
for calculating two cases arise:
Case 1. Period of Moving Average is an odd number such as 3 or 5 or 7……
Let a, b, c,...... be the observed values. When 3 yearly moving averaged are required,
a+b+c, b+c+d, c+d+e+,….. are the moving totals corresponding to second, third,
fourth,……..years. Each total is then divided by 3 to get the moving averages.

That is, , , ,……… are the moving averages


corresponding to second, third, fourth, ……. Years. There is no moving total is
moving averages corresponding to the first year and the last year.
When 5 yearly moving averages are required, a+b+c+d+e, b+c+d+e+f, c+d+e+f+g,….
Are the moving totals corresponding to third, fourth, fifth,…… years. Each total is
then divided by 5 to get the moving averages. That is,

, ……… are the


moving averages corresponding to third, fourth, fifth………… years. For the first
two years and the last two years, there is no moving total or moving average.7 yearly,
9 yearly,………….. moving averages are calculated in a similar manner.

44
Examples : Calculate 5 yearly moving average of number of students studying in a
Commerce College as shown by the following figures:
Year No. of Students Year No. of Students
1987 332 1992 405
1988 311 1993 410
1989 357 1994 427
1990 392 1995 405
1991 402 1996 438
Solution:
Year No. of Students 5 Yearly 5 yearly
Moving Totals Moving Averages
1987 333 - -
1988 311 - -
1989 357 1794 358.8
1990 392 1867 373.4
1991 402 1966 393.2
1992 405 2036 407.2
1993 410 2049 409.8
1994 427 2085 417.0
1995 405 - -
1996 438 - -
Case 2. Period of Moving Average is an even number such as 4 or 6 or 8…
The mid years of the moving totals are not the given years in this case. Hence, 2
periods moving totals of the moving totals are found. The given years are found to be
the mid years of these totals. 2 periods moving totals are divided by twice the period
of moving averages to get the centered moving averages. The centered moving
averages are the trend values.
Example : Fit a straight line trend equation to the following data by the method of
least squares and estimate the value of sales for the year 1985.
Year 1979 1980 1981 1982 1983
Sales (in Rs.) 100 120 140 160 180
Solution: Let Y = a+bX be the equation of the trend line where X – year and Y –
sales.
As X values are large, consider x = X – X = X – 1981
Let the resulting equation be y = A+Bx where Y = y
For finding the values of A and B, the normal equations are

= NA + B

=N +B

Year Sales (in Rs.) x=


Trend X Y=y X – 1981 xy Yt

45
1979 100 -2 -200 4 100
1980 120 -1 -120 1 120
1981 140 0 0 0 140
1982 160 1 160 1 160
1983 180 2 360 4 180

Total = t =
700 0 200 10 700

By substituting the values from the table,


5A + OB = 700 A = 140
OA + 10B = 200 B = 20
The trend equation is y = 140 + 20x
That is, Y = 140 + 20 (X – 1981)
Corresponding to different values of X, the right hand side gives the trend component
(Yt). Hence, the equation is written as, 140 + 20 (X – 1981)
Putting X = 1979, Trend, Yt = 140 + 20 (-2) = 100
Putting X = 1980, Trend, Yt = 140 + 20 (-1) = 120
Putting X = 1981, Trend, Yt = 140 + 20 (0) = 140
Putting X = 1982, Trend, Yt = 140 + 20 (1) = 160
Putting X = 1983, Trend, Yt = 140 + 20 (2) = 180
Putting X = 1985, Trend, Yt = 140 + 20 (4) = 220
The values of sales in 1985 is estimated to be Rs. 220

Note: Values of sales Yt will be such that t =

The following four methods are used to estimate the seasonal variations.
1. Method of Simple Averages.
2. Method of Moving Averages
(a) Difference from Moving Averages
(b) Ratio – to – Moving Averages.
3. Ratio – to – Trend Method.
4. Method of Link Relatives.
1. Method of Simple Averages. This method assumes absence of trend in a time
series. The following are the steps:
(i) The data are arranged season – wise in chronological order.
(ii) For each seas0on, the total of the seasonal values is found and called seasonal
total.
(iii) Each seasonal total is divided by number of years and seasonal average is
obtained.
(iv) The total and the averages of the seasonal averages are found. The average is
called grand average.

46
(v) Seasonal index of every season is calculated as follows.

Seasonal Index= 100


Note:1. Total of the seasonal indices = 100 Number of seasons.
2. Seasonal index of each season can easily be obtained as Seasonal Index =

100

Example : Assuming no trend in the series, calculate seasonal indices for the
Following data:
Quarter
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
(C.A. Foundation, M 99)
Solution:
Quarter Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 8 74
1998 76 74 86 82
Seasonal Total 376 352 416 384 Total Grand Average
Seasonal Average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal Index 98.4 92.2 108.9 100.5 400.0 -

47

You might also like