0% found this document useful (0 votes)
78 views68 pages

Business Statistic Sem 2

The document discusses the concept of statistics, its definitions, importance, limitations, functions, and its application in various fields such as business, education, and medicine. It explains the distinction between population and sample, detailing types of populations and the significance of selecting a representative sample for research. Additionally, it highlights the role of statistics in decision-making and forecasting, while addressing the challenges and potential misuse of statistical data.

Uploaded by

predictora82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views68 pages

Business Statistic Sem 2

The document discusses the concept of statistics, its definitions, importance, limitations, functions, and its application in various fields such as business, education, and medicine. It explains the distinction between population and sample, detailing types of populations and the significance of selecting a representative sample for research. Additionally, it highlights the role of statistics in decision-making and forecasting, while addressing the challenges and potential misuse of statistical data.

Uploaded by

predictora82
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Shri Shraddhanath PG College, Gudha ,Gorji

Naresh Kumar Malav B.Com . Sem . 2nd


Assistance Professor Subject ABST
Paper - Business statistic
Business statistic :The word statistics of English language have been derived from
the Latin word status or Italian word „statista‟ or German word „statistik‟. In each
case it means "an organised political state‖. Although, in the past, statistics was
considered as the "science of statecraft" as it was used by the government of various
States to collect data regarding population,births , deaths, taxes etc.,. Statistics,
nowadays, have experienced a modern development. Statistics play a crucial role in
enriching a specific domain by collecting data in that field, analyse the data by
applying various statistical techniques and making inferences about the same.For
example, knowing the average height of the students will enable the engineer to
know about the size of the door.

DEFINITION OF STATISTICS:The definition of statistics can be expressed in two


ways to cover two different concepts. They are
1. Statistics as numerical data
2. Statistics for statistical method
1. Statistics as numerical data:When the word „statistics‟ is used in plural sense, it
refers to the collection of numerical data.
For example: - Export or Import quantity, Foreign Direct Investment, etc..,.
According to Webster,‖ statistics are classified facts representing the conditions of
the people in a state especially those facts which can be stated in number or in table
of numbers or in any tabular or classified arrangements"
This definition of Webster reveals that only numerical facts can be termed statistics.
This is an old, narrow and inadequate definition for modern times.
According to Bawley ―Statistics are numerical statement of facts in any department of
inquiry placed relation to each other"
Here, Bowley says that statistics is the science of counting and ignores other
aspects such as analysis, interpretations etc..,.
According to Yule and Kendall,‖ By statistics we mean quantitative data affected to a
market extent by multiplicity of cause"
Yule and Kendall‟s definition tells us that numerical data is affected by multiplicity of
cause. For example, the cost of production is affected by wage cost, exchange rate,
raw material etc..,.
According to Professor Horace Secrist," It is the aggregate of facts affected to mark
extent by multiplicity of causes, numerically

1
expressed, enumerated or estimated according to a reasonable standard of
accuracy, connecting in a systematic manner for the predetermined purpose and
placed in relation to each other"
Secrist‟s definition for statistics is more complete. The vital point that the definition
covers are
1) Aggregate of facts
2) Affected by multiplicity of cause
3) Numerically expressed
4) Estimated according to standard of accuracy
5) Systematic Collection of data
6) Data collected for a predetermined purpose
7) Comparable
2. Statistics as Statistical Methods:According to Bowley,‖ Statistics the science of
measurement of social organism, regarded as a whole in all its manifestation"
This definition of Bowley is insufficient
According to Wallis and Roberts," Statistics is a body of methods for making wise
decision on the face of uncertainty"
This definition is modern as it conveys statistical methods enable us to arrive at valid
decisions.
According to Croxton and Cowden‖ statistics must be defined as the science of
collection, presentation, analysis and interpretation of numerical data‖
This definition gives a more elaborate meaning to statistics as statistical tools.
IMPORTANCE OF STATISTICS:Statistics can be used to various areas of business
operations for effective results. Some prominent areas are given below.
1)Startups - While opening a new business or acquire one, we need to study the
market from a statistical point of view to get accuracy in the market demand and
supply .A businessman must do proper research by collecting data, analyzing and
interpreting them regarding market trends before starting his business.
2)Production - The production of the commodity depends upon various factors such
as demand, supply of capital etc..,. These factors must be analyzed statistically to
get a precise and accurate view of the same.
3)Marketing - An ideal marketing strategy requires statistical analysis on population,
income of consumers, availability of the product ect..,.
4) Investment - Statistics play a vital role in making decisions regarding buying
shares, debentures or real estate. Using this statistical data, an investor will buy
investments at a lesser price and sell when the price increases.

2
5)Banking - Banking sector is highly influenced by economic and market conditions.
Bank have separate research department which collect and analyse information
regarding inflation rate, interest rates, bank rates etc..,.
LIMITATIONS OF STATISTICS:
1)Statistics does not analyse qualitative phenomenon:As statistics is a science
which deals with numerical, it cannot be applied in data that cannot be measured in
terms of quantitative measurements. However statistical techniques can be used to
convert the qualitative data to quantitative data.
2)Statistics does study individuals:Statistics deals with aggregate quantities and
doesn't give importance to individual data. This is because individual data is not
useful for statistical analysis.
3)Statistical laws are not exact:Statistical interpretations are based on averages
and hence are only approximations can be made
4)Statistics may be misused:Statistical data when used by an inexperienced
person or illiterate person can lead to wrong interpretations. Hence it must be used
only by experts.
FUNCTIONS OF STATISTICS
1) Consolidation:Statistics enables you to consolidate and understand huge data by
providing only significant observations.
For example, instead of observing the marks of each and every individual with class
average will enable you to know the class's performance as a whole.
2) Comparison:Classification and tabulation of data are used to compare the data.
Various statistical tools such as graph, measure of depression dispersion, correlation
gives us huge scope for comparison.
For example, the market demand for a product can be compared among the states.
This enables the company to identify and analyse the target market.
3)Forecasting:Forecasting means predicting the future prospects. Statistics plays a
huge role in forecasting the future.
For example, with the data of the sales value for the past 10 years, we will be able to
predict the sales of the coming year approximately. Time series analysis and
regression analysis are important for forecasting.
4)Estimation:One of the main aims of statistics is to draw conclusions on a huge
population based on the analysis from a sample group.
For example, from a sample height of 10 students will be able to estimate the
average height of all the students from the class.
5)Test of hypothesis: Statistical hypothesis is portraying a huge population from the
inferences of a sample observation.

3
For example, if a particular fertilizer helps in increasing the crop yield in a particular
area then it will be used in other areas based on this sample.
SCOPE OF STATISTICS:
1)Statistics in Industries:Statistics is extensively used in huge number of
industries. Statistics may be used in sales forecasting, consumer preference, quality
control, inventory control, risk management etc. Sampling is vital for inspection
plans.
2)Statistics in Education:Statistics plays an important role in education. Statistics
help in measuring and evaluating the progress of the student, formulating policies
and also helps to predict the future performance of the students to help them
improve in the same.
3)Statistics in Economics:Statistics helps us to understand and analyse economic
theories. Right from analysing microeconomic factors like the demand for the
product, research regarding different markets to macroeconomic concept like
inflation, unemployment can be done easily using statistics.
4)Statistics in Medicine:Statistics helps in researching and analysing medical
experiments and investigations. Biostatic enables researchers to identify if a
particular treatment or drug is working and how effective it is.
5)Statistics in Modern Application: A lot of software‟s are developed day to day
for experimentation, forecasting and estimation.
For example, SYSAT is one such software which provides with scientific and
technical graphical options.
6)Statistics in Agriculture:Statistics can be applied in agriculture by analysing the
effectiveness of fertilizers. It can be used in taking decisions regarding inputs and
outputs, inventories etc..,.
CONCEPT OF POPULATION AND SAMPLE :Generally, inferential statistics is
used in quantitative type of educational, psychological and sociological researches.
For that, research is carried out on selected sample and the results are generalised
on a large or entire group of targeted subjects. Such a group is called population in
research. The researcher has to decide and define the population accurately before
starting research activities. Well defined population helps the researcher in selecting
sample of proper size, which represents the entire population. Success of research
and reliability of results mostly depend upon the sample. How to select such sample
that represents the entire population in real sense is discussed in this chapter. Let‘s
start with the meaning of population.
POPULATION: Any type of research has been based on objectives. Objectives,
clarify the subjects of study directly or indirectly. On which group the results of
research can be applied or for which group the findings can be generalised is
clarified by the objectives of study. Such group is known as population in research.
However, some researchers use the word ‗universe‘ in place of ‗population‘, but
there is a minute difference between these two. It can be clarified by referring
the definitions and meaning of both. Definition of Universe Universe refers to the
set of all the units, which possess a variable characteristic under study. Meaning of

4
Universe Referring to the definition of universe, we can say that it is a group or set of
all such units that possess the variable characteristic under study. Until and unless
clarification is given, universe accommodates all the units that possess the
characteristic to be studied and have existence in entire universe or in the area of
research. E.g. In a study of achievement motivation of the students of grade eight of
Ahmedabad in the context of their study habits, set of all the students of
Ahmedabad, who are studying in grade eight will be considered as universe,
irrespective of their medium of instruction. Achievement motivation and study habits
are the variables of study and students of grade VIII form the universe of the study in
this example. Depending upon the objectives of research, anything can be taken as
a unit of study out of Person, Object, Living Beings, Time, Incident, Occasion,
Words, Sentences, Place, Society, and Institution. Definition of Population
Population refers to the set or group of all the units on which the findings of the
research are to be applied. Meaning of Population Referring to the definition of
population, we can say that it consists of all the units on which the findings of
research can be applied. In other words, population is a set of all the units which
possess variable characteristic under study and for which findings of research can
be generalised. In the earlier mentioned example, if
findings of the research are restricted to be applied only on Gujarati medium
students of grade eight of Ahmedabad city, the population will consist of only
Gujarati medium students of grade eight of Ahmedabad city. Population may clearly
be defined in statement of the problem also. If it is clearly defined in research title,
universe may not be there in research.
See the following examples of research problem.

context of their study habits,

Ahmedabad in the context of their study habits, In the first example, the universe
consists of all the students of grade eight of Ahmedabad city. If researcher does not
limit his study, this universe will be population also. If he limits his study to Gujarati
medium students, the universe will include all the students of grade eight and
population will include only Gujarat medium students. In the second example,
researcher clearly mentions the population in the title. Here universe and
population will be the same. In this case also, if he limits his study to the students of
grant in aided schools, the population will include Gujarati medium grade eight
students of grant in aided schools of Ahmedabad city and universe will include
all Gujarati medium students of grade eight of Ahmedabad city. On the basis of this
discussion, it is revealed that researcher must finalise population of the study well in
advance before starting research activities, so that he can plan the process properly
and implement it easily and without any hindrance. We have seen that there is a
noticeable difference between population and universe, but many scholars use both
as alternative of each other in practical life. Besides having understanding of
population and universe, researcher must have clear knowledge of different types of
population.
TYPES OF POPULATION: Here some important types of population are discussed,
but remember this is not a final classification because different scholars have

5
classified it on the basis of different criteria. The most common classification is given
here. Finite and Infinite Population The population in which number of units is finite
and can be counted precisely, is called finite population. The following are some
examples of the same.

2017.

of Rajasthan. The population, in which the number of


units is infinite and cannot be counted is called infinite population. The following are
some examples of the same.

Homogeneous and Heterogeneous Population If all the units of population are


identical or similar in terms of certain characteristic/s, it is called homogeneous
population. Such a population is not found in the areas of social science, education
and psychology but found in basic and pure science. However, by applying some
statistical methods, population is made homogeneous in social sciences,
education and psychology. Blood in a living being, crop produced in a particular
farm, DNA of persons having blood relation, water in a particular vessel and atoms in
a particular element are the examples of homogeneous population. Testing a
single drop of blood reveals the condition of blood of whole body, one can predict
the quality of grain produced in a farm by checking a small quantity of grain, by
matching the DNA of two persons one can decide whether they have blood relation
or not. This is possible because a small part of a whole, represents it completely. If
all the units of population differ completely or in some aspects with one another, the
population is called heterogeneous population. All students of the same school and
class differ from one another in different mental and physical abilities. This is an
example of heterogeneous population. Such population is found in education,
psychology, social sciences and humanities. Existent and Hypothetical Population If
the units of population have physical existence, it is called existent population. Finite
and existent populations can be considered as alternative of each other in most of
the cases. Students and teachers in a city, district or state is the example of such a
type of population. Members in a family, working women in a city, children having
autism or dyslexia in a part of city are other examples of existent population. The
population, units of which do not have physical existence but their existence is
assumed or probability of their existence is found by statistical method is called
hypothetical population. Such population is also known as statistical population.
Probability of such population is decided on the basis of repetition of some incident
in past or with the help of statistical calculation. Some examples of such population
are as follows.

6
Lifespan of the bulbs assumed on the basis of durability period of bulbs produced
and used in past. Exact life of such items cannot be predicted precisely.

basis of past experience. It cannot be predicted precisely.


y tossing a coin
many times. Here, one cannot say exactly that how many times Head will occur, if a
coin is tossed 5 times, but he can assume that either 1, 2….. or 5 times Head can
come. Known and Unknown Population If the parameter of the population are
known, it is called known population. (Parameter refers to statistical measurement
taken from the data from the entire population.) E.g. In the study of achievement of
the students appeared in S. S. C. examination in 2019-2020, the set of all the
students appeared in SSC exam in that year will be called known population
because the average marks of all these students and standard deviation of their
marks can be calculated or known easily. If the parameter of population is unknown
or cannot be calculated easily, it is known as unknown population. E.g. In the study
of intelligence of secondary school students of Gujarat, the population will be of
unknown type because average marks and standard deviation of scores of the
students in intelligence test are not easily available and cannot be calculated
easily because this population consists of a large number of students. Such
unknown population is targeted in most of educational, psychological and
sociological researches. It is a quite clear and known fact that in most of the
researches in any field, data are collected from sample instead of from population
and findings of the same is generalised on the entire population. Therefore, in order
to get precise result of research, sample has to be selected by taking extreme care.
Let‘s discuss how it is possible.
DEFINITIONS OF SAMPLE :
Some definitions of sample are as follows.

sub set of population, which represents all the types of elements of


population is called sample.

the thing, it is taken from. Now we will discuss the meaning of sample.
MEANING OF SAMPLE: A part of population that represents it completely is known
as sample. It means, the units, selected from the population as a sample, must
represent all kind of characteristics of different types of units of population. Due to
various reasons, data are collected from units of sample instead of all units of
population in majority of researches and their findings are generalised in the context
of entire population. This can be done precisely only if the efforts are made to select
the sample by keeping in mind the characteristics of an ideal sample.
CHARACTERISTICS OF AN IDEAL / GOOD SAMPLE :Characteristics of an ideal

a size of sample must be in proper proportion of number of units in population.

7
units of population.

be selected fairly without any bias. It means all the


units of population must have equal chance to be selected in sample.

venient for researcher. It means, units of


sample should be within the reach of researcher.

Selection of such sample makes the task of researcher easy and precise. Therefore
sample is very much important in research.
IMPORTANCE OF SAMPLE Importance, need or utility of sample in research can
be described as discussed here.

as number of subjects in sample


has been less than that in population.

subjects in sample has been comparatively less.


ollection and research process remain under the control of the researcher,
as he has to collect data from, and deal with, less number of subjects.

the subjects are to be demolished for collecting data, sample selection is the
only way for conducting research. Research, done to know the life span of electric
bulb or to know the blood group of a person or to test the taste of cold drink are such
type of researches.

data collection process in such research consumes more time. So collecting data
from entire population becomes tedious and impractical.

time period for research purpose becomes tedious and impractical. So, sample
selection becomes essential in such researches.
pulation that spread over large geographical area
becomes impractical, the researcher has to collect data from sample.

selection becomes inevitable.


f infinite population, data can be collected from sample only.

8
convenient if sample is selected for research.
cts can be
made aware of sensitivity of the same properly because they have been
comparatively less in number. Researcher has to select the sample properly, so
that accurate data can be collected for research. For that, researcher has to keep in
mind the feasibility of selecting ideal sample. If it is impossible, he can compromise
with accuracy of selecting sample in unavoidable conditions. He can apply certain
type of sampling method in certain type of condition. How? We will discuss now.
But before that we shall clarify the concept of Sampling.
MEANING OF SAMPLING The process of selecting sample from population is
called sampling. A method used to select a sample is called sampling method.
Researcher can apply certain sampling method out of different methods according to
the objective of research.
TYPES OF SAMPLING METHOD Different sampling methods are categorised
mainly in two groups as (i) Probability Sampling Method and (ii) Non-Probability
Sampling Method.
MEANING OF PROBABILITY SAMPLING METHOD A sampling method, in which
subjects are selected without any bias or prejudice and in which all the units of
population have equal or predetermined and certain probability to be selected in a
sample, is known as probability sampling method. E.g. For selecting one student out
of ten, if chits with their names are prepared and one chit is taken out of them, all the
students will have equal chance to be selected. The probability of all students of
being selected will be .1 or 1/10. In this way, units of population have certain chance
or fixed probability to be selected in a sample. The subjects are selected without any
bias or prejudice in this method. It is considered as the best method of selecting
a sample due to its some specific characteristics.
SPECIAL FEATURES OF PROBABILITY SAMPLING METHOD The characteristics
of probability sampling are as follows.

population has certain probability to be selected in a sample. (In our earlier


mentioned example it is 1/10 or .1.)

select a sample by keeping in mind the size of the sample by applying suitable
method of probability sampling.

electing such a sample that represents the


population completely.

9
the difference between statistics and parameter.
properly.
NON-PROBABILITY SAMPLING METHOD :This method of sample selection does
not have any scientific base, so it increases the chances of selecting biased sample.
In most of the cases, such sample does not represent all characteristics of entire
population. All units do not have certain or fixed probability to be selected in sample
in this method. That is why, this is known as non-probability sampling method. E.g.
Researcher selects one student out of ten, according to his wish or selects a student
whoever is seen first.
SPECIAL FEATURES OF NON-PROBABILITY SAMPLING METHOD: The
following are special features (Characteristics) of non-probability sampling.

ersonal wish or willingness of researcher affects the selection of subjects in a


sample.

population entirely. Some units of population may have more chance to be selected
in sample than others
DATA:Data are pieces of factual information that are recorded and applied for
analysis. Data is a tool which helps us to understand certain problems by providing
us with information. They are a set of values with qualitative and quantitative
variable.
TYPES OF DATA:
Data of broadly classified into two based upon who collected the data
Primary data: Primary data is the data collected by investigator himself for the first
time for his own research and analysis. It is also known as first-hand information.
Primary data is collected using method such as personal interview, survey etc..,.
Secondary data:Secondary data is the data which is already been collected and
process by the person for the purpose of his research. Journals, internal sources,
journals, book etc..,. are sources of secondary data
DATA COLLECTING TECHNIQUES
PRIMARY DATA :
1)Direct Personal Investigation:Direct personal investigation is the method in
which the investigator directly goes to the source to collect information.
Merits:
(i) Information collected in this method is more authentic and accurate

10
(ii) There is high degree of accuracy in qualitative information
(iii) The original opinion or data shall be obtained.

Demerits:
(i) This is a time consuming process
(ii) If the investigator is not intelligent enough to understand the mental state of
the source it may lead to wrong interpretation.
(iii) It may result in personal bias.
2) Indirect Oral Investigation : Indirect oral investigation is when the investigator
investigates a person close to the source. This is done due to the reluctance of the
original person.
Merits
(i) It saves time and labour
(ii) It is easy and convenient
(iii) It covers a wide range of area.
Demerits
(i) Information received may not be reliable
(ii) Person chosen for this purpose me not be suitable
(iii) It may be expensive as information is collected from various sources.
3) Information collected from local agencies: In this method investigator appoints
a few agencies in various regions to cover various fields of inquiry. This method is
generally used by newspaper companies to get information from various places in
various topics such as sports, economics etc..,.
Merits
(i) Avoid area can be easily covered
(ii) This is a time saving method of collecting data
(iii) The cost of collecting data is less
Demerits
(i) Sometimes the information collected may contradict one another
(ii) The information can be less accurate
(iii) This method will be expensive and a full-time agent is hired in different places

11
4) Questionnaire method:Questionnaire method is the most famous method of
collecting primary data .A questionnaire is a set of questions device for conducting
survey. The questionnaire is sent to the respondent with the request to fill it and send
it back within a specific time.

Merits
(i) This method is cheaper
(ii) The time consumed for this process is very less
(iii) This is an unbiased method of collecting data
Demerits
(i) Sometimes the respondent may provide wrong information
(ii) There is no type of personal motivation in this method
(iii) There are chances of ignorance or late reply from the respondents
General principles of framing a questionnaire:
1)The questionnaire must not be very long:We must try to give the questions as
minimum as possible. Long questionnaire may lead to boredom or discontentment
among the respondents.
2)The question must move from general to specific
When the question moves from general to specific respondent become more
comfortable in answering the questions
3) The question should be ambiguous
The questions must be in such ways that the respondents are able to give clear and
quick answers to the questions
4) The person should not contain double negatives
Words like don't you or wouldn't you must not be used in the questions as they might
tempt the respondent to give a biased answer.
5) The question should not be lending questions
The questions should not give clues to the respondent on how they must answer it.
6) The question must not provide alternators for the answer.
For example, instead of asking would you like to do engineering or medicine after
class 12, the correct way of asking the question is would you like to do engineering?
1.4.2 SECONDARY DATA
1) Published sources

12
Certain government and non-government organisations publish various journals,
research papers, surveys etc which are very helpful and reliable. Some of them are
mentioned below
(i) Publications of international bodies like UNO, WTO and WHO etc..,.
(ii) Publications of research institutes like ISI, NCERT, ICAR etc..,.
(iii) Government publications
(iv) Publications of commercial and financial institutions
(v) Publications of governmental organisations
(vi) Newspaper, journals and periodicals.
2) Unpublished sources
Unpublished sources cover all the sources where data is maintained privately by
certain private agencies or companies. The data collected by universities, research
institutions also come under unpublished sources.
PRESENTATION OF DATA
In the previous topic we saw how data can be collected .As the data collected is
generally huge we need to comprise and deliver it in a presentable form. Generally
there are three ways of presenting presentation of data. They are
1) Textual or Descriptive Presentation
2) Tabular Presentation
3) Diagrammatic Presentation
Textual or Descriptive Presentation
When the data collected is presented in the form of a text it is called textual or
descriptive presentation. Generally this method cannot be used to present large
data.
For example, in the 2011 census, the population of India was 1,21,08,54,977
comprising of 58, 64, 69,174 females and 62, 37, 24,248 males. The literacy rate is
74.0 4 percentage and density of population is 382 person per square kilometer.
From the above example, we can see that the data is represented textually. One of
the major limitations of this method is that the readers must go through the entire text
and get the required information.
Tabular Presentation of Data : When the data is presented in the form of rows and
columns it is called tabular presentation of data.
Example:

AREA FEMALE MALE TOTAL

13
URBAN 90% 89% 89.5%

RURAL 87% 88% 87.5%

TOTAL 88.5% 88.5% 88.5%

The about table represents the pass percentage of the examination conducted in
Tamilnadu it has three rows (urban, rural, total) and three columns (female, male,
total). It is a 3×3 table where each small box is called the cell which gives information
regarding the pass percentage. This method is very significant as it enables us to
use it for further statistical treatment. This tabular representation is further classified
into four
(i)Qualitative Classification:Qualitative classification is when the collected
information is classified in the form of attributes such as gender, nationality etc..,.
The table given above is an example of qualitative classification where the
information is classified in the form of gender and location.
(ii)Quantitative Classification:When information can be measured quantitatively
like age, income, marks etc..,.then, such classifications are called quantitative
classification
Example

MARKS FREQUENCY

0-10 5

10-20 10

20-30 20

30-40 15

40-50 10

Example

DAYS OF A WEEK PRODUCTION (no of pairs of shoes)

MONDAY 2000

TUESDAY 1750

WEDNESDAY 3000

THURSDAY 2250

14
FRIDAY 1550

(iii) Spatial Classification: Spatial classification is when the data classification is


based on place like town, city, district, state, country etc..,.
Example

STATE LITERACY RATE

TAMIL NADU 80.09%

ANDHRA PRADESH 67.02%

KARNATAKA 75.36%

KERALA 93.91%

Diagrammatic Presentation:In this method the data is represented


diagrammatically and is very easy to understand generally data is represented
diagrammatically in three ways.
1) Geometric Diagram:This category consists of bar diagrams and pie charts
(i) Bar diagram:Bar diagram is a diagrammatic representation of data in equal
spaced and equalwidth rectangular bars for each class of data .The height or length
of the bar tells us about the magnitude of the class. Bar diagrams can be easily used
for comparison of data. Both qualitative and quantitative data can be represented in
bar diagram.
They can be further divided into two broad categories.
a)Multiple bar diagram:When there is a need to compare two set of data multiple
bar diagram is used. For example import and export, production and sale etc..,.
b)Component bar diagram:Component bar diagram also known as Sub diagrams
are used to compare different components of a particular class. For example, the
various components such as rent, medicine, education on which the monthly salary
spend can be easily understood from a component bar diagram.
(ii)Pie diagram:A pie diagram is similar to that of a component bar diagram but it is
represented in circle proportionally instead of bars. The values given in each class is
converted into percentage and then each figure is multiplied by 3.6 degree. (360/100
- 360 degree of a circle divided into 100 parts) the values are then divided
accordingly in the circle.

15
2)Frequency diagram:When the data is in the form of grouped frequency are
usually represented by frequency diagrams. Histogram, frequency polygon,
frequency curve and ogive are types of frequency diagram.
(i)Histogram:Histogram is a diagram which consists of rectangular bars whose area
is proportional to the frequency of a variable and whose width is equal to the class
interval.

(ii)Frequency polygon: A frequency polygon is another type of frequency


distribution graph. In a frequency polygon, the number of observations is marked
with a single point at the midpoint of each and every interval. Then the points are
connected using a straight line.
(iii) Frequency curve:The frequency curve is obtained by drawing a smooth
freehand curve that passes through the points of a frequency polygon closely as
possible.
(iv) Ogive :Ogive also known as the cumulative frequencies are of two types. When
the cumulative frequencies are plotted against their upper limits respectively, then it
is less than ogive. When the cumulative frequencies are plotted against their lower
limits respectively, then it is more than ogive.
3)Arithmetic line graph:An arithmetic line graph also known as time series graph is
a graph where the time ( months, years, weeks) are plotted in the x axis and their
respective values are plotted in the y axis. It helps us in analysing trends and
periodicity of data.

Unit 2
MEASURES OF CENTRAL TENDENCY: When working on a given set of data, it is
not possible to remember all the values in that set. But we require inference of the
data given to us. This problem is solved by mean, median and mode. Measures of
Central Tendency, represent all the values of the data. As a result, they help us to
draw an inference and an estimate of all the values. They are also known as
statistical averages. Their simple function is to mathematically represent all the
values in a particular set of data. Hence, this representation shows the general trend
and inclination of all the values.

16
When working on a given set of data, it is not possible to remember all the values in
that set. But we require inference of the data given to us. This problem is solved by
mean, median and mode. Measures of Central Tendency, represent all the values of
the data. As a result, they help us to draw an inference and an estimate of all the
values. They are also known as statistical averages. Their simple function is to
mathematically represent all the values in a particular set of data. Hence, this
representation shows the general trend and inclination of all the values.

An average provides a simple way of representation of all the individual data. It also
aids in the comparison of different groups of data. In addition to this, an average in
economic terms can represent the direction an economy is headed towards. Hence,
it can be easily used to formulate policies and bring about a reform for a better
economy.
MEAN:
ARITHMETIC MEAN:The arithmetic mean of a series of numbers is sum of all
observations divided by the total number of observations in the series.
Example:There are two brothers, with different heights. The height of the younger
brother is 138 cm and height of the elder brother is 154cm. The average height of
the two brother is total height divided into two equal parts,
(138+154) ÷ 2 = 292 ÷ 2 = 146 cm
So 146 cm is the average height of the brothers. Here 154 > 146
> 138. The average value lies in between the minimum value and the maximum
value.
Thus if x1, x2, ..., xn represent the values of n observations, then arithmetic mean
(A.M.) for n observations is: (direct method)

There are two methods for computing the arithmetic mean: (i) Direct method (ii)
Short cut method.

17
Direct Method:
Example:
The following data represent the number of books issued in a college library is
selected from 7 different days 17,1 9, 22, 25, 15, 40, 21 find the mean number of
books.
Solution:

x 20 + 39 + 22 + 25 + 45 + 40 + 54 = 245 = 35
7 7
Hence the mean of the number of books is 35
Indirect Method:
In this method an assumed mean or an arbitrary value (A) is used as the basis of
calculation of deviations (di) from individual values. If di
= xi – A

Example:
A student‘s marks in 5 subjects are 95, 78, 88, 72,99. Find the average of his marks.
Let us take the assumed mean, A = 88
xi di= xi– 88
95 7
78 10
88 0

72 -16
99 10
Total 11

Solution:

18
= 88 + 11 = 88 + 5.5 = 93.5
2

The arithmetic mean of average marks is 93.5

Discrete Grouped data


If x1, x2, ...,xn are discrete values with the corresponding frequencies f1, f2, …, fn.
Then the mean for discrete grouped data is defined as (direct method)

In the short cut method the formula is modified as

Example:
Given the following frequency distribution, calculate the arithmetic mean

Marks 64 63 62 61 60 59
No. Of. Students 8 18 12 9 7 6

Solution:
xi fi fi xi di = xi – A fidi
(A=62)

64 8 512 2 16

63 18 1134 1 18

19
62 12 744 0 0

61 9 549 -1 -9

60 7 420 -2 -14

59 6 354 -3 -18

60 3713 -7

Direct Method

x 3713 6 61.88
Short cut method Here A = 62
x 62 – 7 = 61.88
60
The mean mark is 61.88
Mean of continuous Grouped data: Direct method
2 WEIGHTED ARITHMETIC MEAN
For calculating simple mean, all the values or the sizes of items in the distribution
have equal importance. But in practical life this may not be so, in case some items
are more important than others, a simple average computed is not representative of
the distribution. Proper weightage has to be given to the various items.
For example a student may use a weighted in order to calculate their percentage
grade in a course, in this the student would multiply the weighing of all assessment
items in the course( eg: assignment, exams, projects, etc.)by respective grade that
was obtained in each of categories
The average whose component items are being multiplied by certain values known
as ―weights‖ and the aggregate of the multiplied results are divided by the total sum
of their ―weight‖
Let x1,x2,....,xn be the set of n values having weights w1,w2, ,wn respectively,
then the weighted mean is

Xw = 𝑤1 𝑥1 + 𝑤2 𝑥2 + … … … 𝑤𝑛 𝑥𝑛

𝑤1 + 𝑤2 + 𝑤3+⋯………+𝑤1n…

20
Example: A student obtained the marks 40,50,60,80, and 45 in math, statistics,
physics, chemistry and biology respectively. Assuming weights 5,2,4,3, and 1
respectively for the above mentioned subjects, find the weighted arithmetic mean per
subject.
Solution

Components Marks scored ( xi ) Weightage (wi ) wi xi


Maths 40 5 200
Statistics 50 2 100
Physics 60 4 240
Chemistry 80 3 240
Biology 45 1 45

Total 15 825
Weighted average:

Combined Mean:In the arithmetic averages and the number of items in two or more
related groups are known, the combined or the composite mean of the entire group

can be obtained by
the advantage of combined arithmetic mean is that we can determine the overall
mean of the combined data without going back to the original data
Example:
If a sample size of 22 items has a mean of 15 and another sample size of 18 items
has a mean of 20. Find the mean of the combined sample?
Solution:

= 22 x 15 + 18 x 20
22 + 18
= 330 + 360 = 690 = 172.5

21
40 40
Merits of AM
1. It can be calculated easily and is also easy to understand.
2. Fluctuation can be minimized
3. It can further be used for statistical treatement like
median,mode etc.,.
4. This method is rigidly defined and hence can be used for comparison
Demerits of AM
1. It cannot be plotted in a graph.
2. It is not applicable in qualitative data.
3. AM cannot be calculated if the class intervals have open ends.
4. It is highly influenced by extreme observations.
GEOMETRIC MEAN ( GM ):A geometric mean is a mean or average which shows
the central tendency of a set of numbers by using the product of their values.
The geometric mean of two numbers, say x, and y is the square root of their product
x×y. For three numbers, it will be the cube root of their products i.e., (x y z) 1⁄3.
The geometric mean of a series containing n observations is the nth root of the
product of the values. If x1, x2,……xn are observations then

Example:Calculate the geometric mean of the following growth of price of onions per
100 Kg per annum is 180, 250, 490, 1400, and 1050

22
x 180 250 490 1400 1050 Total
log x 2.2553 2.3979 2.6902 3.1461 3.0212 13.5107

= Antilog 13.5107
5
= Antilog 2.7021 = 503.6 Geometrical mean of onion rate is 503.6
Example:
Find the geometric mean for the following distribution of student‘s marks:
Marks 0 – 30 30 – 50 50 – 80 80 - 100
No . of students 20 30 40 10

Solution:
No of Mid
Marks f log x
students f points x
20 (log 15) = 20(1.1761) =
0 – 30 20 15
23.5218
30 (log 40) = 30 (1.6020)
30 – 50 30 40
= 48.0168
40 (log 65) = 20(1.8129) =
50 – 80 40 65
72.5165
80 - 10 (log 90) = 20(1.9542) =
10 90
100 19.5424
Total 100 163.6425

23
= Antilog 163.6425
100
= Antilog 1.6364 = 503.6 Geometrical mean of onion rate is 43.29 Merits of
Geometric mean:
1. It is strictly defined
2. It is based on all items
3. It is very suitable for averaging ratio, rates and percentages
4. It is capable of further mathematical treatment
5. Unlike AM, it is not affected much by the presence of extreme values
Demerits of geometric mean:
1. It cannot be used when the values are negative or if any of the
observations is zero
2. It is difficult to calculate particularly when the items are very large or
when there is a frequency distribution
3. It brings out the property of the ratio of the change and not the absolute
difference of change as the case in arithmetic mean
4. The GM may not be the actual value of the series
3.3.3 HARMONIC MEAN
Harmonic mean of a set of observations is defined as the reciprocal of the
arithmetic average of the reciprocal of the given values. If x1,x2…..xn are n
observations.
A harmonic mean is used in averaging of ratios. The most common examples
of ratios are that of speed and time, cost and unit of material, work and time etc. The
harmonic mean (H.M.) of n observations is

H.M. for ungrouped data

24
Example:
Calculate the harmonic mean of the numbers 13.5, 14.5, 14.8, 15.2 and 16.1
Solution:The harmonic mean is calculated as below:
x 1/x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
Total 0.3417

= 5 = 14.63
0.3417
H.M. Discrete Grouped data:
For a frequency distribution

Example:
The frequency distribution of first year students of a particular college, calculate the
harmonic mean

Age (years) 17 18 19 20 21
2 5 13 7 3

Solution:
Age ( years) x Number of students f f / x
17 2 0.1176
18 5 0.2778
19 13 0.6842
20 7 0.3500

25
21 3 0.1429
Total 30 1.5725

3 1.5725 19. 779 ≈ 19 years


Merits of H.M:
1. It is strictly defined
2. It is defined on all observations.
3. It is amenable to further algebraic actions
4. It is most suitable average when it is desired to give greater weight to
smaller observations and less weight to larger observations.
Demerits of H.M:
1. It is not easily understood.
2. It is difficult to calculate.
3. It is only an abstract figure and may not be the action of the series.
MEDIAN:The number of students in your classroom, the money your parents earns,
the temperature in your city is all important numbers. But how can you get the
information of the number of students in your school or the amount earned by the
citizen of your entire city?
The median is that value of the variable which divides thegroup into two equal parts,
one part comprising all values greater and the other all values less than median.
Ungrouped data
Arrange the given values in the ascending or descending order. If the number of
value is odd, median is the middle value.
For example if we have the number of values 12, 15, 21, 27, 35. So the numbers are
odd then taking the mean as the midpoint 21.

Median = (𝑛+1)ℎ term if n is odd


2
If the number of values is even, median is the mean of the middle two values.
For example if we have 12, 15, 21, 27, 35, 40. So the numbers are even then taking
the mean of the numbers

Median = Mean((𝑛)𝑡ℎ 𝑎𝑛𝑑 (𝑛+1)𝑡ℎ terms )

26
2 2
So in the above example, take the mean of 21 and 27 and divide it by 2 which will
give you 24.
Example:
The salaries of 8 employees who work for a small company are listed below. What is
the median salary?
40,000; 29,000; 35,500; 31,000; 43,000; 30,000; 27,000; 32,000
Solution:
Arrange the data in ascending order
27,000; 29,000; 30,000; 31,000; 32,000; 35,500; 40,000; 43,000
Since there is an even number of items in the data set, we compute the median by
taking the mean of the two middlemost numbers

Mean ((𝑛)ℎ 𝑎𝑛𝑑 (𝑛+1)𝑡ℎ terms ) = 4𝑡ℎ + 5𝑡ℎ 𝑖𝑡𝑒𝑚

2 2 2

= 31,000 + 32,000 = 63,000 = 31,500


2 2
The median salary is 31,500

Example: 13
Find the median of the following set of points in a game: 15, 14, 10, 8, 12, 8, 16

Solution:
First arrange the values in an ascending order 8, 8, 10, 12, 14, 15, 16
The number of point values is 7, an odd number. Hence, the median is the value in
the middle position.
Median = ( n+1)th term
2
= (7+1)th term /2 = 4th/

27
The median is 12
Grouped data:In grouped distribution, values are associated with frequencies.
Grouping can be in the form of a discrete frequency distribution or continuous
frequency distribution. Whatever may be the distribution, cumulative frequencies
have to be calculated the total number of items.
Cumulative frequency: (cf):Cumulative frequency of each class is the sum of the
frequency of the class and the frequencies of the pervious classes, ie adding the
frequencies successively, so that the last cumulative frequency gives the total
number of items.
When the data follows a discrete set of values grouped by size, we use the formula
(𝑛+1)ℎ item for finding the median. First we form a cumulative
2
frequency distribution, and the median is that value which corresponds to

the cumulative frequency in which (𝑛+1)𝑡ℎ item lies.


2
Example:
The following frequency distribution is classified according to the number of students
on different branches. Calculate the median number of leaves per branch

No of Students 1 2 3 4 5 6 7
Number of Branches 2 11 15 20 25 18 10

No of Students No of Branches Cumulative Frequency


x f cf
1 2 2
2 11 13

3 15 28
4 20 48
5 25 73
6 18 91
7 10 101
Total 101
Median = size of (𝑁+1)ℎ item
2

28
= size of (101+1)ℎ
item Class
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39
interval
Frequency 5 8 10 12 7 6 3 2 2

= 51th item
Median = 5 because 51th item corresponds to 5
Median for continuous grouped data
In case, the data is given in the form of a frequency table with class interval etc, then

the following formula is used for calculating median in continuous grouped data

Where l = Lower limit of the median class


m = cumulative frequency preceding the median c = width of the median class
f = frequency in the median class N = total frequency
Example:
Calculate median from the following data

Class Frequency True class Cumulative


interval f interval frequency
cf
0-4 5 0.5 - 4.5 5
5-9 8 4.5 - 9.5 13
10-14 10 9.5 - 14.5 23
15-19 12 14.5 - 19.5 35
20-24 7 19.5 - 24.5 42
25-29 6 24.5 - 29.5 48
30-34 3 29.5 - 34.5 51

29
35-39 2 34.5 - 39.5 53
53

𝑁 = 53 = 26.5
2 2
Here the cumulative frequency is greater than or equal to 26.5 is 14.5

l = 14.5
N/2 = 26.5
m = 23
f = 12
= 14.5 + (26.5 – 23) x 5 = 14.5 + 1.46 = 15.96
12

Merits of Median:
1. Median is not influenced by extreme values because it is a positional average.
2. Median can be calculated in case of distribution with open end intervals.
3. Median can be located even if the data are incomplete.
4. Median can be located even for qualitative factors such as ability, honesty etc.
Demerits of Median:
1. A slight change in the series may bring drastic change in median value
2. In case of even number of items or continuous series, median is an estimated
value other than any value in the series.
3. It is not suitable for further mathematical treatment except its
use in mean deviation.
4. It is not taken into account all the observation.

30
MODE: The mode is the most frequently occurring values or scores.The mode is
useful when there are a lot of repeated values. There can be no mode, one mode, or
multiple modes.
Its importance is very great in marketing studies where a manager is interested in
knowing about the size, which has the highest concentration of items. For example,
in placing an order foot shoes or ready-made garments the model size helps
because the sizes and other sizes around in common demand.
Ungrouped Data:
For ungrouped values or a series of individual observation mode is often found by
mere inspection
Example:
Find the mode for the following list of values:
13,18,13,14,13,16,14,21,13
Solution:
The mode is the number that is repeated more often than any other Therefore the
Mode = 13
In some cases the mode may be absent while in some cases there may be more
than one mode.

Example: Ms.Rossy asked students in her class how many siblings they each has.
Find the mode of the data : 0,0,0,1,1,1,1,2,2,2,2,3,3,4
Solution: The modes are 1 and 2 siblings
Grouped Data: For Discrete distribution, the highest frequency and corresponding
value of X is mode.
Continuous distribution:

Where L is the lower class limit of the modal class f1is the frequency of the modal
class
f0 is the frequency of the class preceding the modal class in the frequency table
f2is the frequency of the class succeeding the modal class in the frequency table
h is the class interval of the modal class

31
C 400 and
0-50 50-100 100-150 150-200 200-250 250-300 300-350 350-400
-I above
Calculate mode
f 5 14 40 91 450 87 60 38 15 for the
following

Solution:The highest frequency is 450 and corresponding class interval in 200 – 250,
which is the modal class
Here L = 200, f1 = 150, f0=91, f2=87, h=50

= 200 + 150 – 91 x 50
2 x 150 – 91 – 87

= 2450 = 200 + 24.18 = 224.18

122

Merits of mode:
1. It is easy to calculate and in some cases it can be located mere inspection.
2. Mode is not at all affected by extreme values
3. It can be calculated for open-end classes
4. It is usually an actual value of an important part of the series
5. In some circumstances it is the best representative of data
Demerits of mode:
1. It is not based on all observation
2. It is not capable of further mathematical treatment
3. Mode is ill defined generally it is not possible to find mode in some cases.
4. As compared with mean, mode is affected to a great extent by sampling
fluctuations
32
It is unsuitable in cases where relative importance of items has to be considered.

PARTITION MEASURES
QUARTILES
The quartiles divide the distribution in four parts. There are three
quartiles denoted by Q1, Q2 and Q3 divides the frequency
distribution in to four equal parts
That is 25% of data will lie below Q1, 50% of data below Q2
and 75percent below Q3. Here Q2 is called the Median. Quartiles
are obtained in almost the same way as median.

Ungrouped Data:
If the data set consist of n items and arranged in ascending order
then

Continuous series:
In the case of continuous series, find the cumulative frequency and then use the
interpolation formula.
• Find Cumulative frequencies
• Find N / 4
• Q1 class is the class interval corresponding to the value of the cumulative
frequency just greater than N / 4
• Q3 class is the class interval corresponding to the value of the cumulative
frequency just greater than 3 N / 4

Where N Σf total of all frequency values


l1 = lower limit of the first quartile class f1 = frequency of the first quartile class c1 =
width of the first quartile class
m1 = cumulative frequency preceding the first quartile
l3 = lower limit of the 3rd quartile class

33
f3 = frequency of the 3rd quartile class
m3 = cumulative frequency preceding the 3rd quartile
c3 = width of the third quartile class
DECILES: These are the values which divide the total number of observation into 10
equal parts. They are D1, D2, D3, D4, D5, D6, D7, D8, D9 and D10.
Ungrouped Data:
Example:
Compute the D7 for the data: 5, 24, 36, 12, 20, and 8.
Solution:
Arranging the given data in the ascending order 5,8,12,20,24,36

D5 = (5(𝑛+1))ℎ observation =(5(6+1))𝑡ℎ observation = ( 3.5)th observation


10 10

= 3rd item + ½ (4th item - 3rd item)


= 12 + ½ ( 20-12) = 12 + 4 = 16

3 PERCENTILE:The percentile values divide the distribution into 100 parts each
containing 1 percent of the cases. The percentile (Pk) is that value of the variable
upto which lie exactly k% of the total number of observation Relationship
P25 = Q1
P50 = Median = Q2
P75 = 3rd quartile = Q3

Unit 3 MEASURES OF DISPERSION


Dispersion is the extent till which a distribution can be stretched or squeezed. We
can understand variation with the help of the following example:

Series I Series II Series III

10 2 10

10 8 12

10 20 8

∑X = 30 30 30

34
In all three series, the value of arithmetic mean is 10. On the basis of this average,
we can say that the series are alike. If we carefully examine the composition of three
series, we find the following differences:
(i) In case of 1st series, three items are equal; but in 2nd and 3rd series, the
items are unequal and do not follow any specific order.
(ii) The magnitude of deviation, item-wise, is different for the 1st, 2nd and 3rd
series. But all these deviations cannot be ascertained if the value of simple mean is
taken into consideration.
(iii) In these three series, it is quite possible that the value of arithmetic mean is
10; but the value of median may differ from each other. This can be understood as
follows;

Series I Series II Series III

10 2 8

10 median 8 median 10 median

10 20 12

∑X = 30 30 30

The value of Median‘ in 1st series is 1 , in 2nd series 8 and in 3rd series 1 .
Therefore, the value of the Mean and Median are not identical.
(iv) As the average remains the same, the nature and extent of the distribution of
the size of the items may vary. In other words, the structure of the frequency
distributions may differ even though their means are identical.
PROPERTIES OF A GOOD MEASURE OF DISPERSION
There are certain pre-requisites for a good measure of dispersion:
1. It should be simple to understand.
2. It should be easy to compute.
3. It should be rigidly defined.
4. It should be based on each individual item of the distribution.
5. It should be capable of further algebraic treatment.
CHARACTERISTICS OF MEASURES OF DISPERSION
• A measure of dispersion should be rigidly defined

35
• It must be easy to calculate and understand
• Not affected much by the fluctuations of observations
• Based on all observations
CLASSIFICATION OF MEASURES OF DISPERSION
The measure of dispersion is categorized as:
(i)An absolute measure of dispersion:It involves the units of measurements of the
observations. For example, (i) the dispersion of salary of employees is expressed in
rupees, and (ii) the variation of time required for workers is expressed in hours. Such
measures are not suitable for comparing the variability of the two data sets which are
expressed in different units of measurements
(ii)A relative measure of dispersion:It is a pure number independent of the units of
measurements. This measure is useful especially when the data sets are measured
in different units of measurementFor example, a nutritionist would like to compare
the obesity of school children in India and Africa. He collects data from some of the
schools in these two countries. The weight is normally measured in kilograms in
India and in pounds in Africa. It will be meaningless, if we compare the obesity of
students using absolute measures. So it is sensible to compare them in relative
measures.
RANGE:
Raw Data:A range is the most common and easily understandable measure of
dispersion. It is the difference between the largest and smallest observations in the
data set
Range ( R ) = L - S
Grouped Data:The grouped frequency distribution of values in the data set, the
range is the difference between the upper class limit of the last class interval and the
lower class limit of the first class interval.
Coefficient of Range: The relative measure of range is called the coefficient of
range
Coefficient of range = (L-S) / (L + S)
Example: Find the value of range and its coefficient for the following data 49, 81, 36,
64, 121, 100.
Solution:
L = 121 : S = 36
Range : L – S = 121 – 36 = 85
Co-efficient of Range = (L-S) / (L+S) = 121-36 /121+36
= 85 / 157 = 0.5414

36
Example:
Calculate range and its coefficient from the following distribution
Solution: L = 30, S = 10
Range = L - S = 30 – 10 = 20
Coefficient of Range = (L-S) / (L+S) = 30 - 10 / 30 + 10
= 20/ 40= 0.5
Merits of Range
• It is the simplest of the measure of dispersion
• Easy to calculate
• Easy to understand
• Independent of change of origin
Demerits of Range
• It is based on two extreme observations. Hence, get affected by fluctuations
• A range is not a reliable measure of dispersion
• Dependent on change of scale
QUARTILE DEVIATION
The quartiles divide a data set into quarters. The first quartile, (Q1) is the middle
number between the smallest number and the median of the data. The second
quartile, (Q2) is the median of the data set. The third quartile, (Q3) is the middle
number between the median and the largest number. Quartile deviation is half of the
difference between the first and third quartiles. Hence it is called as Semi Inter
Quartile Range
Quartile deviation or semi-inter-quartile deviation is
Q = ½ × (Q3 – Q1)
Coefficient of Quartile Deviation
Coefficient of Q.D = Q3 – Q1 / Q3 + Q1
Merits of Quartile Deviation
•All the drawbacks of Range are overcome by quartile deviation
•It uses half of the data
•Independent of change of origin
•The best measure of dispersion for open-end classification

37
Demerits of Quartile Deviation
•It ignores fifty percent of the data
•Dependent on change of scale
•Not a reliable measure of dispersion
Example:

Calculate the quartile deviation and its coefficient for the wheat production (in
Kg) of 25 acres is given as : 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755,
1720, 1600, 1470, 1750 and1885.
Solution: Arrange the observation in increasing order:
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470,
1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885, 1960.
Q1 = value of (n+1) / 4 th item
= value of (20 +1) / 4 th item = value of (5.25)th item
= 5th item + 0.25 ( 6th item – 5th item)
= 1240 + 0.25 (1320 – 1240)
= 1240 + 20 = 1260
Q1 = 1260
Q3 = value of 3(n+1) / 4 th item
= value of 3(20 +1) / 4 th item = value of (15.75)th item
= 15th item + 0.75 ( 16th item –15th item)
= 1750 + 0.75 (1755 – 1750)
= 1750 + 3.75 = 1753.75
Q3 = 1753.75
Q.D = ( Q3 – Q1 ) / 2 = (1753.75 – 1260) / 2 = 492.75 / 2
= 246.875
Coefficient of QD = (Q3 – Q1) / ( Q3 + Q1 )
= (1753.75 – 1260) / (1753.75 + 1260)
= 0.164

38
MEAN DEVIATION:The average deviation, it is defined as the sum of the deviations
from an average divided by the number of items in a distribution The average can be
mean, median or mode. Theoretically median is d best average of choice because
sum of deviations from median is minimum, provided signs are ignored. However,
practically speaking, arithmetic mean is the most commonly used average for
calculating mean deviation and is denoted by the symbol MD.
Mean Deviation is of three types of series:
•Individual Data Series
•Discrete Data Series
•Continuous Data Series
Individual Data Series: For individual series, the Mean Deviation can be calculated
using the following formula

𝑴𝑫 = 𝟏 N ∑|𝑿 − 𝑨| ∑ |𝑫|/𝑵
Where
MD = Mean deviation. X = Variable values
A = Average of choices
N = Number of observations
Coefficient of Mean Deviation:
Mean deviation calculated by any measure of central tendency is an absolute
measure. The purpose of comparing variation among different series, a relative
mean deviation is require. The relative mean deviation are obtained by dividing the
mean deviation by the average used for calculating mean deviation
The Coefficient of Mean Deviation can be calculated using

𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐌𝐃 = 𝐌𝐃/𝐀
Example:
Calculate mean deviation and coefficient of mean deviation for the following
individual data:

Items 28 72 90 140 210


Solution:

28 + 72 + 90 + 140 + 210
5 = 540/ 5 = 108

Item X Deviation |D|

39
28 80

72 36

90 18

140 32

210 102

Ʃ|D|= 268

Mean Deviation = 𝑴𝑫 = 𝟏 N ∑|𝑿 − 𝑨| ∑ |𝑫 | / 𝑵

268 / 5 = 𝟓𝟑. 𝟔

Coefficient of Mean Deviation = MD / A

53.6/108= 𝟎. 𝟒𝟗𝟔𝟑
Discrete Data Series
For discrete series, the Mean Deviation can be calculated using

𝑴𝑫= ∑ 𝐟 |𝐱 − 𝐌𝐞| / N = ∑ 𝐟 |𝐃| / N


Where,
N = Number of observations.
f = Different values of frequency
f. x = Different values of items.
Me = Median.
Coefficient of Mean Deviation
The Coefficient of Mean Deviation can be calculated using the following formula.

C𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐌𝐃 = 𝐌𝐃/𝐌𝐞
Example: Calculate the mean deviation and for the following discrete data

Items 42 108 135 150 210

Frequency 6 15 3 3 9

Solution

40
Xi Frequency fi fixi |xi – Me| fi |xi – Me|

42 6 252 93 558

108 15 1620 27 405

135 3 405 0 0

150 3 550 15 45

210 9 1890 75 675

Ʃ fi |xi – Me| = 1683


N = 36

Median =( N + 1)th item


2 =(5 + 1)th item
2 = 6th item /2 = 3rd item
=135

Mean Deviation = 𝑴𝑫 ∑ 𝐟 |𝐱 − 𝐌𝐞| N ∑ 𝐟 |𝐃| / N

1683 / 36 = 𝟒𝟔. 𝟕𝟓

Coefficient of MD = MD/ Me
Continuous Data Series:The method of calculating mean deviation in a continuous
series is same as the discrete series. In continuous series, find a midpoint of the
various classes and take deviation of these points from the average selected

𝑀𝐷= f |x − Me|/ 𝑁 = f |D|/N


Where N = Number of observations.
f = Different values of frequency
f. x = Different values of items.
Me = Median.
Coefficient of Mean Deviation:The Coefficient of Mean Deviation can be calculated
using the following formula.
Coefficient of MD = MD/Me
Example :

41
Find out the mean deviation from the given data

Age in years 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80

No of persons 40 50 64 80 82 70 20 16

Solution:

Mid Mid
Item Frequenc |xi – fi |xi – Item Frequenc
poin fixi poin fixi
s y fi Me| Me| s y fi
t xi t xi

0- 31.4 1258. 0-
5 40 200 5 40 200
10 7 8 10

10- 21.4 1073. 10-


15 50 750 15 50 750
20 7 5 20

20- 11.4 734.0 20-


25 64 1600 25 64 1600
30 7 8 30

30- 30-
35 80 2800 1.47 117.6 35 80 2800
40 40

40- 776.5 40-


45 82 3690 9.47 45 82 3690
50 4 50

50- 19.4 1362. 50-


55 70 3850 55 70 3850
60 7 9 60

60- 29.4 60-


65 20 1300 589.4 65 20 1300
70 7 70

70- 39.4 631.5 70-


75 16 1200 75 16 1200
80 7 2 80

Ʃ fi |xi
– Me|
Ʃ fixi
N = 422 =1539 6544.3
0 4

42
Median = fixi/ N =15390/422= 36.47

Mean Deviation = 𝑀𝐷 f |x − Me| 𝑁 = f |D|/N


6544.34/422=15.5079
Coefficient of MD=15.5079/36.47=0.4252
Merits of Mean Deviation:
• It is simple to understand and easy to compute.
• It is based on each and every item of the data.
• MD is less affected by the values of extreme items than the Standard
deviation.
Demerits of Mean Deviation:
• The greatest drawback of this method is that algebraic signs are ignored while
taking the deviations of the items.
• It is not capable of further algebraic treatments.
• It is much less popular as compared to standard deviation
STANDARD DEVIATION:The concept of Standard Deviation was introduced by
Karl Pearson in 1893. It is by far the most important and widely used measure of
dispersion. Its significance lies in the fact that it is free from those defects which
afflicted earlier methods and satisfies most of the properties of a good measure of
dispersion. Standard Deviation is also known as root-mean square deviation as it is
the square root of means of the squared deviations from the arithmetic mean.
The standard deviation is defined as the positive square root of the mean of the
square deviations taken from the arithmetic mean of the data
Ungrouped data
x1 , x2 , x3 ... xn are the ungrouped data then standard deviation is calculated
bythere are two methods of calculating standard deviation in an individual series
•Actual mean method
•Assumed mean method
Actual Mean Method:
∑(X − X)2
Standard deviation σ n

43
Example:
Calculate the standard deviation from the following data 28, 44, 18, 30, 40, 34, 24,
22.
Solution:
Deviations from actual mean

Values (X) X - (X - )2

28 -2 4

44 -14 196

18 -12 144

30 0 0

40 10 100

34 4 16

24 -6 36

22 -8 64

240 560

𝑋= 240/ 8 = 30
σ 2

N = 560 / 8 = 70= 𝟖. 𝟑𝟔𝟔𝟔


Assumed Mean Method= This method is used when the arithmetic mean is
fractional value. Taking deviations from fractional value would be a very difficult and
tedious task. To save time and labour a short cut method is used; deviations are
taken from a assumed mean.

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 σ = ∑d2/n - ∑(d)2 /n


Example: The marks obtained by the college students in statistics. Using the
following data calculate standard deviation.

44
Students No: 1 2 3 4 5 6 7 8 9 10

Marks 53 58 46 67 32 70 35 68 88 99

Solution: Deviations from assumed mean

Students No Marks ( x) d = X – A ( A=67) d2

1 53 -14 196

2 58 -9 81

3 75 8 64

4 67 0 0

5 32 -35 1225

6 70 3 9

7 35 -32 1024

8 68 1 1

9 88 21 441

10 69 2 4

Ʃ d2 =
n = 10 Ʃd = -55
3045

σ= d2/n - d 2 /n

3045/10 -- ( 55)2/10 =
304.5 30.25 = 274.25

= 𝟏𝟔. 𝟓𝟔𝟎𝟓
CALCULATION OF STANDARD DEVIATION:
Discrete series: There are three methods for calculating standard deviation in
discrete series. They are

a) Actual mean method

45
b) Assumed mean method
c) Step deviation method
Actual mean method:Calculate the mean of the series. Find the deviations for
various items from the means and square the deviations and multiply by the
respective frequency and total the product the formula to calculate actual mean
method is

𝛔=√∑𝐟𝐝𝟐/∑𝐟
If the actual mean is fractions, the calculation takes lot of time and labour; and as
such this method is rarely used in practice
Assumed mean method:Here deviation is taken not from an actual mean but from
an assumed mean. Also this method is used, if the given variable values are not in
equal intervals.

𝛔= 𝐝𝟐 /f 𝐝) /f where d = X – A, N = Ʃf
Example:
Calculate standard deviation from the following data

X 20 22 25 31 35 40 42 45

f 5 12 15 20 25 14 10 6
Solution:
Deviation from assumed mean

x f d = X-A d2 fd fd2
(A=31)

20 5 -11 121 -55 605

22 12 -9 81 -108 972

25 15 -6 36 -90 540

31 20 0 0 0 0

35 25 4 16 100 400

40 14 9 81 126 1134

42 10 11 121 110 1210

45 6 14 196 84 504

N= 107 Ʃfd = 167 Ʃ fd2 = 5365

46
σ ∑fd 2 − (∑fd )2 5365 / 107 − (167 )2 /107 5 .16 − 2.44 𝟔. 𝟗𝟏
Step – deviation method:If the variable values are in equal intervals, then we adopt
this method

S𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 σ = fd2 /N fd 2/ N XC


Example: The frequency distribution of marks in mathematics given in the table

Marks 30 40 50 60 70 80 90

No of students 8 12 20 10 7 3 2

Solution:

Marks x f d= (x-50)/ 10 fd fd2

30 8 -2 -16 32

40 12 -1 -12 12

50 20 0 0 0

60 10 1 10 10

70 7 2 14 28

80 3 3 9 27

90 2 4 8 32

N = 62 Ʃfd = 13 Ʃfd2 = 141

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 σ = fd2 /N fd 2/ N C

141 /62 -- (13)2/62 X 10 = 1.4934 X 10 = 𝟏𝟒. 𝟗𝟑𝟒

Combined Mean and Combined Standard Deviation: Combined arithmetic mean


can be computed if we know the mean and number ofitems in each group of the
data. x1, x2, σ1, σ2 are mean and standard deviation of two data sets having n1
and n2 asnumber of elements respectively.

47
Example:Particulars regarding income of two company are given below:

Company

A B

No.of Employees 600 500

Average income 1500 1750

Standard deviation of income 10 9

Compute combined mean and combined standard deviation.


Solution:
Given n1 6 ; x1 15 ; σ1 1 n2 5 ; x2 175 ; σ2 9
= 600 x 1500 + 500 x 1750 = 900000 + 875000
600+500 1100
= 1613.6363
Combined Standard Deviation :

48
d1 x12 - x1 = 1613.6363 -1500 = 113.6363 d2 = x12 - x2 =
1613.6363 – 1750 = -136.3637

600 (100 + 12913 .209)+500 (81+18595 .0587 )


σ =
12 600 + 500
= 124.8488
Merits of Standard Deviation:
Among all measures of dispersion Standard Deviation is considered superior
because it possesses almost all the requisite characteristics of a good measure of
dispersion. It has the following merits:
 It is rigidly defined.
 It is based on all the observations of the series and hence it is
representative.
 It is amenable to further algebraic treatment.
 It is least affected by fluctuations of sampling.
Demerits:
 It is more affected by extreme items.
 It cannot be exactly calculated for a distribution with open-ended classes.
 It is relatively difficult to calculate and understand.

2.12 COEFFICIENT OF VARIATION


The coefficient of variation (CV) is a statistical measure of the dispersion of data
points in a data series around the mean. The coefficient of variation represents the
ratio of the standard deviation to the mean, and it is a useful statistic for comparing
the degree of variation from one data series to another, even if the means are
drastically different from one another.
Coefficient of Variation = (Standard Deviation / Mean) * 100.
σ
CV = ( ) X 100
x
The coefficient of variation (CV) is a measure of relative variability. It is the
ratio of the standard deviation to the mean (average). For example, the expression
―The standard deviation is 15% of the mean is a CV.
The CV is particularly useful when you want to compare results
from two different surveys or tests that have different measures or values. For
example, if you are comparing the results from two tests that have

49
different scoring mechanisms. If sample A has a CV of 12% and sample B has a CV
of 25%, you would say that sample B has more variation, relative to its mean.
Example:
Price of car in five years in two cities is given below :

Price in city A Price in city B

20,00000 10,00000

22,00000 20,00000

19,00000 18,00000

23,00000 12,00000

16,00000 15,00000
Which city has more stable prices?
Solution:

City A City B

Price X Deviation dx2 Price Y Deviation dy2


(in lakhs) = (in lakhs) y = 15
dx dx

20 0 0 10 -5 25

22 2 4 20 5 25

19 -1 1 18 3 9

23 3 9 12 -3 9

16 -4 16 15 0 0

Ʃ = 100 Ʃd = 0 Ʃd 2 Ʃy = 75 Ʃdy = 0 Ʃdy2


=30 =68

City A: x x n 1 5 2
σx= ∑(X − X)2/n = dx2/n
= 30/5=2.45

50
C. V. (X) (σ / x) X 1
=2.45/20 x100 = 𝟏𝟐. 𝟐𝟓%
City B: x x n 75 5 15
σy= ∑(y − y)2/n
= 68/5=3.69
C. V. (Y) ((σ / y ) X 100
=3.69/15 x100 = 𝟐𝟒. 𝟔%

CONCEPT OF SKEWNESS: Skewness means lack of symmetry. In mathematics, a


figure is called symmetric if there exists a point in it through which if a perpendicular
is drawn on the X-axis, it divides the figure into two congruent parts i.e. identical in all
respect or one part can be superimposed on the other i.e mirror images of each
other. In Statistics, a distribution is called symmetric if mean, median andmode
coincide. Otherwise, the distribution becomes asymmetric. If the right
tail is longer, we get a positively skewed distribution for which mean > median >
mode while if the left tail is longer, we get a negatively skewed distribution for which
mean < median < mode.
The example of the Symmetrical curve, Positive skewed curve and Negative skewed
curve are given as follows:

Frequency

51
Difference between Variance and Skewness:The following two points of difference
between variance and skewness should be carefully noted.

1.Variance tells us about the amount of variability while skewness gives the direction
of variability.

2.In business and economic series, measures of variation have greater practical
application than measures of skewness. However, in medical and life science field
measures of skewness have greater practical applications than the variance.

52
VARIOUS MEASURES OF SKEWNESS:Measures of skewness help us to know to
what degree and in which direction (positive or negative) the frequency distribution
has a departure from symmetry. Although positive or negative skewness can be
detected graphically depending on whether the right tail or the left tail is longer but,
we don‘t get idea of the magnitude. Besides, borderline cases between symmetry
and asymmetry may be difficult to detect graphically. Hence some statistical
measures are required to find the magnitude of lack of symmetry. A good measure of
skewness should possess three criteria:

1.It should be a unit free number so that the shapes of different distributions, so far
as symmetry is concerned, can be compared even if the unit of the underlying
variables are different;

2.If the distribution is symmetric, the value of the measure should be zero. Similarly,
the measure should give positive or negative values according as the distribution has
positive or negative skewness respectively; and

3.As we move from extreme negative skewness to extreme positive skewness, the
value of the measure should vary accordingly.

Measures of skewness can be both absolute as well as relative. Since in a


symmetrical distribution mean, median and mode are identical more the mean
moves away from the mode, the larger the asymmetry or skewness. An absolute
measure of skewness can not be used for purposes of comparison because of the
same amount of skewness has different meanings in distribution with small variation
and in distribution with large variation.

Absolute Measures of Skewness

Following are the absolute measures of skewness:

1. Skewness (Sk) = Mean – Median

2. Skewness (Sk) = Mean – Mode

3. Skewness (Sk) = (Q3 - Q2) - (Q2 - Q1)

For comparing to series, we do not calculate these absolute mearues we calculate


the relative measures which are called coefficient of skewness. Coefficient of
skewness are pure numbers independent of units of measurements.

Relative Measures of Skewness:In order to make valid comparison between the


skewness of two or more distributions we have to eliminate the distributing influence
of variation. Such elimination can be done by dividing the absolute skewness by
standard deviation. The following are the important methods of measuring relative
skewness:

53
1.β and γ Coefficient of Skewness:

Karl Pearson defined the following  and coefficients of skewness, based upon
the second and third central moments:





It is used as measure of skewness. For a symmetrical distribution, B1 shall


be zero. Y1 as a measure of skewness does not tell about the direction of
skewness, i.e. positive or negative. Because U3 being the sum of cubes of
the deviations from mean may be positive or negative but U32 is always
positive. Also, U2 being the variance always positive. Hence, B1 would be
always positive. This drawback is removed if we calculate Karl Pearson‘s
Gamma coefficient B1which is the square root of Y1 i. e.

1= 

(

Then the sign of skewness would depend upon the value of U3 whether it is
positive or negative. It is advisable to use Y1 as measure of skewness

2.Karl Pearson’s Coefficient of Skewness:This method is most frequently


used for measuring skewness. The formula for measuring coefficient of
skewness is given by

Sk = Mean -- Mode


The value of this coefficient would be zero in a symmetrical distribution. If mean is


greater than mode, coefficient of skewness would be positive otherwise negative.
The value of the Karl Pearson‘s coefficient of skewness usually lies between + -- 1
for moderately skewed distubution. If mode is not well defined, we use the formula

Sk = 3(Mean – Median)/

By using the relationship

Mode = (3 Median – 2 Mean)

Here,  3  Sk  3. In practice it is rarely obtained

54
3.Bowleys‘s Coefficient of Skewness This method is based on quartiles. The formula
for calculating coefficient of skewness is given by

Q3  Q2   Q2 Q1) /(Q3—Q1)


=(Q3 –2Q2-Q1)/ (Q3 –Q1)
The value of Sk would be zero if it is a symmetrical distribution. If the value is greater
than zero, it is positively skewed and if the value is less than zero it is negatively
skewed distribution. It will take value between +1 and -1.

4.Kelly‘s Coefficient of Skewness

The coefficient of skewness proposed by Kelly is based on percentiles and deciles.


The formula for calculating the coefficient of skewness is given by

Based on Percentiles

S 
P90  P50 P50 P10 
K= P90 P10 

P90  2P50  P10 


P90 P10 
where, P90, P50 and P10 are 90th, 50th and 10th Percentiles.

Based on Deciles

ck 
D9  2D5 D1  D9 D1

where, D9, D5 and D1 are 9th, 5th and 1st Decile.

Example1: For a distribution Karl Pearson‘s coefficient


of skewness is 0.64, standard deviation is 13 and mean
is 59.2 Find mode and median.
Solution: We have given Sk = 0.64, σ = 13 and Mean = 59.2
Therefore by using formulae


Mean  Mode / 
Sk 

55
0.64 =59.2 – Mode /13

Mode = 59.20 – 8.32 = 50.88

Mode = 3 Median – 2 Mean


50.88 = 3 Median - 2 (59.2)

50.88  118.4/ 3 169.28 / 3


Median =   56.42


CONCEPT OF KURTOSIS
If we have the knowledge of the measures of central tendency, dispersion and
skewness, even then we cannot get a complete idea of a distribution. In addition
to these measures, we need to know another measure to get the complete idea
about the shape of the distribution which can be studied with the help of Kurtosis.
Prof. Karl Pearson has called it the ―Convexity of a Curve‖. Kurtosis gives a
measure of flatness of distribution.

The degree of kurtosis of a distribution is measured relative to that of a normal


curve. The curves with greater peakedness than the normal curve are called
“Leptokurtic”. The curves which are more flat than the normal curve are called
“Platykurtic”. The normal curve is called ―Mesokurtic.‖ The Fig.4 describes the
three different curves mentioned above:

18
16
14
12
10
8
6
4
2
0
0 5 10 15 20

Fig.4.4: Platykurtic Curve, Mesokurtic Curve and Leptokurtic Curve

Measures of Kurtosis

1.Karl Pearson’s Measures of Kurtosis :For calculating the kurtosis, the second
and fourth central moments of variable are used. For this, following formula given by
Karl Pearson is used:

56


Or 2 =  2  3

where, u2 = Second order central moment of distribution

u4= Fourth order central moment of distribution

Description:

1. If 2 = 3 or 2 = 0, then curve is said to be mesokurtic;


2. If 2 < 3 or 2 < 0, then curve is said to be platykurtic
3. If 2> 3 or 2 > 0, then curve is said to be leptokurtic

Kelly’s Measure of Kurtosis: Kelly has given a measure of kurtosis based on


percentiles. The formula is given by

2 =
75  P25 / 90  P10

where,  ,  ,  , and  are 75th, 25th , 90th and 10th percentiles of


75 25 90 10
dispersion respectively.

If  2 > 0.26315, then the distribution is platykurtic.

If 2 < 0.26315, then the distribution is leptokurtic

. Example 2: First four moments about mean of a


distribution are 0, 2.5, 0.7 and 18.75. Find coefficient
of skewness and kurtosis
Solution: We have 1 = 0, 2 = 2.5, 3 = 0.7 and 4 = 18.75

therefore, Skewness, 



=
0.72 = 0.031
2.53

57
Kurtosis,= 2 =

18.75/2.5
2=
18.75/6.25 =3.

As 2is equal to 3, so the curve is mesokurtic

Unit 4

CORRELATION ANALYSES: Correlation is a statistical technique which measures


and analyses the degree or extent to which two or more variables fluctuate with
reference to one another. It denotes the inter-dependence amongst variables. The
degrees are expressed by a coefficient which ranges between -1 to +1. The
direction of change is indicated by + or - signs; the former, refers to the movement
in the same direction and the later, in the opposite direction. An absence of
correlation is indicated by zero. Correlation thus expresses the relationship through
a relative measure of change and it has nothing to do with the units in which the
variables are expressed

LINEAR CORRELATION:If the amount of change in one variable tends to bear


constant ratio to the amount of change in the other variable then the correlation is
said to be linear. For example,

X 5 10 15 20 25

Y 90 170 230 310 420

TYPES OF CORRELATION:There are three important types of correlation. They are

1. Positive and Negative correlation

2. Simple, Partial and Multiple correlation

3. Linear and Non-Linear correlation

1. Positive and Negative correlation

Correlation is classified according to the direction of change in the two variables. In


this regard, the correlation may either be positive or negative.

58
Positive correlation refers to the change (movement)of variables in the same
direction. Both the variables are increased or decreased in the same direction, it is
called positive correlation. It is otherwise called as direct correlation. For example, a
positive correlation exists between ages of husband and wife, height and weight of
a group of individuals, increase in rainfall and production of paddy, increase in the
offer and sales.

Negative correlation refers to the change (movement) of variables in the opposite


direction. In other words, an increase (decrease) in the value of one variable is
followed by a decrease by a decrease (increase) in the value of the other is said to
be negative correlation. It is otherwise called increase correlation. For example, a
negative correlation exists between price and demand, yield of crop and price.

The following expels illustrate the concept of positive correlation and


negative correlation.

Positive correlation

X 5 7 9 11 16 20 28

y 20 26 35 37 48 50 55

Negative Correlation

X 14 17 23 35 46

y 16 12 10 9 5

2.Simple, Partial and Multiple Correlations:

Simple correlation is a measure used to determine the strength and the direction of
the relationship between two variables, X and Y. A simple correlation coefficient can
range from –1 to 1. However, maximum (or minimum) values of some simple
correlations cannot reach unity (i.e., 1 or –1).

When we study only two variables, the relationship is described as simple


correlation; example, quantity of money and price level, demand and price, etc. But
in a multiple correlation we study more than two variables simultaneously; example,
the relationship of price, demand and supply of a commodity.

The study of two variables excluding some other variables is called partial
correlation. For example, we study price and demand, eliminating the supply side.

3.Linear and Non-Linear Correlation:Linear correlation is a measure of the


degree to which two variables vary together, or ameasure of the intensity of the
association between two variables.

59
If the ratio of change between two variables is uniform, then the there will be linear
correlation between them. Consider the following.

X 6 12 18 24

Y 5 10 15 20

The ratio of change between the variables is same.

In a curvilinear or non linear correlation, the amount of change in one variable does
not bear a constant ratio of the amount of change in the other variables. The graph of
non-linear or curvilinear relationship will form a curve.

In majority of cases, we find curvilinear relationship, which is a complicated one, so


we generally assume that the relationship between the variables under the study is
linear. In social sciences, linear correlation is rare, because the exactness is not as
perfect as in natural sciences.

SCATTER DIAGRAM:It is simple and attractive method of diagrammatic


representation. In this method, the given data are plotted on a graph sheet in the
form of dots. The x variables are plotted on the horizontal axis and y variables on the
vertical axis. Now we can know the scatter or concentration of thevarious points.

This will show the type of correlation.

60
TWO-WAY TABLE :A two-way table (also called a contingency table) is a useful
tool for examining relationships between categorical variables; the entries in the
cells of two-way table can be frequency counts or relative frequencies (just like a
one-way table).

Dance Sports TV Total

Men 2 10 8 20

Women 16 6 8 30

Total 18 16 16 50

Above a two-way table shows the favourite leisure activities for 50 adults-20 men
and 30 women. Because entries in the table are frequency counts, the table is a
frequency table

PEARSON’S CO-EFFICIENT OF CORRELATION: Karl Pearson (1867-1936), the


British biometrician suggested this method. It is popularly known as Pearson‘s co-
efficient of correlation. It is mathematical method for measuring the magnitude of
linear relationship between two variables.

Pearson's correlation coefficient is the test statistics that measures the statistical
relationship, or association, between two continuous variables. It is known as the
best method of measuring the association between variables of interest because it is
based on the method of covariance

a)Arithmetic mean Method

r ∑xy

∑x2 ∑y2

61
Example:

Find Pearson‘s Co-efficient of correlation from the following data

Sales 15 18 22 28 32 46 52

Profit 52 66 78 87 96 125 141

Solution:Let the sales be denoted by x and the profit by y. Computation of


coefficients of correlation

X 𝐗 X2 Y 𝐘 Y2 XY
−𝐗 −𝐘

15 -15.43 238.98 52 -40.14 1611.22 619.36

18 -12.43 154.50 66 -26.14 683.30 324.92

22 -8.43 71.06 78 -14.14 199.94 119.20

28 -2.43 5.90 87 -5.14 26.42 12.49

32 1.57 2.46 96 3.86 14.90 6.06

46 15.57 242.42 125 32.86 1079.78 511.63

52 21.57 465.26 141 48.86 2387.30 1053.91

∑ ∑ =- ∑ 2=1179.68 ∑y=645 ∑y= ∑y2 = ∑ y


0.01 0.02
=213 6,002.86 =2647.57

X ∑x N 213 7 3 .43

Y ∑y N 645 7 =92.14

∑x2 1179.68 ,∑y2 6 2.86, ∑xy 2647.57

r ∑xy

∑x2 ∑y2

= 2647 .57

1179.68x6, 2.86

= 2647.57 =2647.57 =0.99

34.35x77.48 2661.44

62
Therefore, there is a high degree positive correlation between the x and y.

SPEARMEN’S RANK CORRELATION CO- EFFICIENT: In statistics, Spearman's


rank correlation coefficient or Spearman's rho, named after Charles Spearman and
often denoted by the Greek letter P(rho)or as rs is a nonparametric measure of rank
correlation (statistical dependence between the rankings of two variables). It
assesses how well the relationship between two variables can be described using a
monotonic function.

The Spearman correlation between two variables is equal to the Pearson correlation
between the rank values of those two variables; while Pearson's correlation
assesses linear relationships, Spearman's correlation assesses monotonic
relationships (whether linear or not). If there are no repeated data values, a perfect
Spearman correlation of +1 or −1 occurs when each of the variables is a perfect
monotone function of the other

𝑟 =1 − 6∑𝐷2/n(𝑛 2−1)

Spearman's coefficient is appropriate for both continuous and discrete ordinal


variables. Both Spearmen‘s can be formulated as special cases of a more general
correlation coefficient.

Example: Two faculty members ranked 12 candidates for scholarships. Calculate


the spearman rank correlation coefficient.

Candidate 1 2 3 4 5 6 7 8 9 10

Professor A 8 12 6 4 9 15 8 7 16 13

Professor B 9 16 10 8 14 19 12 11 20 17

Solution

63
Rx Ry d= Rx- Ry d2

8 9 -1 1

12 16 -4 16

6 10 -4 16

4 8 -4 16

9 5 4 16

15 10 5 25

8 7 1 1

7 11 -4 16

16 15 1 1

13 18 -5 25

∑ d2=133

rs = 1-- 6 D2

n(n2 --1)

= 1 – 6(133)/ 10(100-1)

=1 -- 798/ 990

= 1-0.8060 = r = 0.194

PROPERTIES OF CORRELATION CO-EFFICIENT :

1.Coefficient of Correlation lies between -1 and +1:The coefficient of correlation cannot take
value less than -1 or more than one +1. Symbolically,-1<=r<= + 1 or | r | <1.

2.Coefficients of Correlation are independent of Change of Origin: This property reveals


that if we subtract any constant from all the values of X and Y, it will not affect the
coefficient of correlation.

3.Coefficients of Correlation possess the property of symmetry: The degree of relationship


between two variables is symmetric.

64
3.Coefficient of Correlation is independent of Change of Scale: This property reveals
that if we divide or multiply all the values of X and Y, it will not affect the coefficient of
correlation.

4.The value of the co efficient of correlation shall always lie between +1 and -1.

5.When r = + 1, then there is perfect positive correlation between the variables.

6. When r = - 1, then there is perfect negative correlation between the variables.

7.When r = 0, then there is no relationship between the variables.


The third formula given above, that is

r= ∑xy /√∑x2 ∑y2

It is easy to calculate, and it is not necessary to calculate the standard deviation of X and Y
series separately.

CONCURRENT DEVIATION METHOD

It is based on the direction of change in two paired variables. This method is suitable
when it is desired to study the direction of change rather than its quantity. In other
words when it is required to study whether the correlation is positive or negative, the
concurrent deviation method is applied.

The coefficient of Concurrent Deviation between two series of direction of change is


called the coefficient of Concurrent Deviation. It is denoted as rc .The formula for its
calculation is:

rc = + -- 2( c—n)/ n

Here: C is the number of positive signs after multiplying the change of direction of
change of X-series and Y-series.

N is the number of pairs of observations computed

Process of Calculation:

1.Find out the direction of change in the present value as compared to previous one
in X variable. If the second value is less than the first, put a sign (-) minus; if it is
more put (+) plus; and if it is equal put zero or (=). Repeat the same process for
other values and denote it with ‗Dx‘.

2.In the same way, ascertain the direction of change in Y variable and denote the
same with ‗Dy‘

3.Multiply ‗Dx‘ with corresponding ‗Dy‘ and determine the number of positive signs
(means ‗C‘). Here, in is importamt that product of ( ) and ( ) is (+).

65
4.Significance of ± signs outside and inside square root is, if the value of
is negative, keep minus outside as well inside square root, so as to make it positive.

We cannot take the square root of a negative value. If is positive, then we get
a positive value of coefficient of correlation. In case it is negative, coefficient of
correlations is negative.

Merits:

Simple to understand and easy to calculate. It is suitable of large value of n.

Limitations :

This method does not differentiate between small and big values. It works with
approximation

Example 1: Calculate the coefficient of Concurrent Deviations from the following data:

Year 2003 2004 2005 2006 2007 2008 2009


Series X 150 154 160 172 160 165 180
Series Y 200 180 170 160 190 180 172

Solution:

Year X Dx Y Dy Dx.Dy
2003 150 200
2004 154 + 180 - -
2005 160 + 170 - -
2006 172 + 160 - -
2007 160 - 190 + -
2008 165 + 180 - -
2009 180 + 172 - -
n=6 c=0

There is a perfect negative correlation between price and supply

= + -- +- (2 c—n)/ n

= + -- √ +-2( 0—6)/ 6

= = rc = --1

66
The coefficient of determination, often denoted as R², quantifies the proportion of
variance in a dependent variable that can be predicted from an independent variable,
essentially measuring how well a model fits the data. It's the square of the correlation
coefficient and ranges from 0 to 1, with higher values indicating a stronger
relationship.

Definition:The coefficient of determination (R²) represents the proportion of variance


in the dependent variable that's explained by the independent variable in a statistical
model.

Calculation:

In simple linear regression, R² is the square of the correlation coefficient (r).

Interpretation:

R² = 0: The independent variable explains none of the variance in the dependent


variable.

R² = 1: The independent variable explains all of the variance in the dependent


variable, meaning the model perfectly predicts the outcome.

R² between 0 and 1: The independent variable explains a portion of the variance in


the dependent variable, and the higher the value, the better the model fits the data.

Example:

An R² of 0.80 means that 80% of the variance in the dependent variable is explained
by the independent variable.

Goodness of Fit:

R² is a measure of how well a regression model fits the data, with higher R² values
generally indicating a better fit.

Correlation vs. Causation:

It's important to note that a high R² value does not necessarily imply a causal
relationship between the variables; it only indicates a strong correlation.

Other Names:

R² is also sometimes referred to as "r-squared".

complete

67
68

You might also like