Business Statistic Sem 2
Business Statistic Sem 2
1
expressed, enumerated or estimated according to a reasonable standard of
accuracy, connecting in a systematic manner for the predetermined purpose and
placed in relation to each other"
Secrist‟s definition for statistics is more complete. The vital point that the definition
covers are
1) Aggregate of facts
2) Affected by multiplicity of cause
3) Numerically expressed
4) Estimated according to standard of accuracy
5) Systematic Collection of data
6) Data collected for a predetermined purpose
7) Comparable
2. Statistics as Statistical Methods:According to Bowley,‖ Statistics the science of
measurement of social organism, regarded as a whole in all its manifestation"
This definition of Bowley is insufficient
According to Wallis and Roberts," Statistics is a body of methods for making wise
decision on the face of uncertainty"
This definition is modern as it conveys statistical methods enable us to arrive at valid
decisions.
According to Croxton and Cowden‖ statistics must be defined as the science of
collection, presentation, analysis and interpretation of numerical data‖
This definition gives a more elaborate meaning to statistics as statistical tools.
IMPORTANCE OF STATISTICS:Statistics can be used to various areas of business
operations for effective results. Some prominent areas are given below.
1)Startups - While opening a new business or acquire one, we need to study the
market from a statistical point of view to get accuracy in the market demand and
supply .A businessman must do proper research by collecting data, analyzing and
interpreting them regarding market trends before starting his business.
2)Production - The production of the commodity depends upon various factors such
as demand, supply of capital etc..,. These factors must be analyzed statistically to
get a precise and accurate view of the same.
3)Marketing - An ideal marketing strategy requires statistical analysis on population,
income of consumers, availability of the product ect..,.
4) Investment - Statistics play a vital role in making decisions regarding buying
shares, debentures or real estate. Using this statistical data, an investor will buy
investments at a lesser price and sell when the price increases.
2
5)Banking - Banking sector is highly influenced by economic and market conditions.
Bank have separate research department which collect and analyse information
regarding inflation rate, interest rates, bank rates etc..,.
LIMITATIONS OF STATISTICS:
1)Statistics does not analyse qualitative phenomenon:As statistics is a science
which deals with numerical, it cannot be applied in data that cannot be measured in
terms of quantitative measurements. However statistical techniques can be used to
convert the qualitative data to quantitative data.
2)Statistics does study individuals:Statistics deals with aggregate quantities and
doesn't give importance to individual data. This is because individual data is not
useful for statistical analysis.
3)Statistical laws are not exact:Statistical interpretations are based on averages
and hence are only approximations can be made
4)Statistics may be misused:Statistical data when used by an inexperienced
person or illiterate person can lead to wrong interpretations. Hence it must be used
only by experts.
FUNCTIONS OF STATISTICS
1) Consolidation:Statistics enables you to consolidate and understand huge data by
providing only significant observations.
For example, instead of observing the marks of each and every individual with class
average will enable you to know the class's performance as a whole.
2) Comparison:Classification and tabulation of data are used to compare the data.
Various statistical tools such as graph, measure of depression dispersion, correlation
gives us huge scope for comparison.
For example, the market demand for a product can be compared among the states.
This enables the company to identify and analyse the target market.
3)Forecasting:Forecasting means predicting the future prospects. Statistics plays a
huge role in forecasting the future.
For example, with the data of the sales value for the past 10 years, we will be able to
predict the sales of the coming year approximately. Time series analysis and
regression analysis are important for forecasting.
4)Estimation:One of the main aims of statistics is to draw conclusions on a huge
population based on the analysis from a sample group.
For example, from a sample height of 10 students will be able to estimate the
average height of all the students from the class.
5)Test of hypothesis: Statistical hypothesis is portraying a huge population from the
inferences of a sample observation.
3
For example, if a particular fertilizer helps in increasing the crop yield in a particular
area then it will be used in other areas based on this sample.
SCOPE OF STATISTICS:
1)Statistics in Industries:Statistics is extensively used in huge number of
industries. Statistics may be used in sales forecasting, consumer preference, quality
control, inventory control, risk management etc. Sampling is vital for inspection
plans.
2)Statistics in Education:Statistics plays an important role in education. Statistics
help in measuring and evaluating the progress of the student, formulating policies
and also helps to predict the future performance of the students to help them
improve in the same.
3)Statistics in Economics:Statistics helps us to understand and analyse economic
theories. Right from analysing microeconomic factors like the demand for the
product, research regarding different markets to macroeconomic concept like
inflation, unemployment can be done easily using statistics.
4)Statistics in Medicine:Statistics helps in researching and analysing medical
experiments and investigations. Biostatic enables researchers to identify if a
particular treatment or drug is working and how effective it is.
5)Statistics in Modern Application: A lot of software‟s are developed day to day
for experimentation, forecasting and estimation.
For example, SYSAT is one such software which provides with scientific and
technical graphical options.
6)Statistics in Agriculture:Statistics can be applied in agriculture by analysing the
effectiveness of fertilizers. It can be used in taking decisions regarding inputs and
outputs, inventories etc..,.
CONCEPT OF POPULATION AND SAMPLE :Generally, inferential statistics is
used in quantitative type of educational, psychological and sociological researches.
For that, research is carried out on selected sample and the results are generalised
on a large or entire group of targeted subjects. Such a group is called population in
research. The researcher has to decide and define the population accurately before
starting research activities. Well defined population helps the researcher in selecting
sample of proper size, which represents the entire population. Success of research
and reliability of results mostly depend upon the sample. How to select such sample
that represents the entire population in real sense is discussed in this chapter. Let‘s
start with the meaning of population.
POPULATION: Any type of research has been based on objectives. Objectives,
clarify the subjects of study directly or indirectly. On which group the results of
research can be applied or for which group the findings can be generalised is
clarified by the objectives of study. Such group is known as population in research.
However, some researchers use the word ‗universe‘ in place of ‗population‘, but
there is a minute difference between these two. It can be clarified by referring
the definitions and meaning of both. Definition of Universe Universe refers to the
set of all the units, which possess a variable characteristic under study. Meaning of
4
Universe Referring to the definition of universe, we can say that it is a group or set of
all such units that possess the variable characteristic under study. Until and unless
clarification is given, universe accommodates all the units that possess the
characteristic to be studied and have existence in entire universe or in the area of
research. E.g. In a study of achievement motivation of the students of grade eight of
Ahmedabad in the context of their study habits, set of all the students of
Ahmedabad, who are studying in grade eight will be considered as universe,
irrespective of their medium of instruction. Achievement motivation and study habits
are the variables of study and students of grade VIII form the universe of the study in
this example. Depending upon the objectives of research, anything can be taken as
a unit of study out of Person, Object, Living Beings, Time, Incident, Occasion,
Words, Sentences, Place, Society, and Institution. Definition of Population
Population refers to the set or group of all the units on which the findings of the
research are to be applied. Meaning of Population Referring to the definition of
population, we can say that it consists of all the units on which the findings of
research can be applied. In other words, population is a set of all the units which
possess variable characteristic under study and for which findings of research can
be generalised. In the earlier mentioned example, if
findings of the research are restricted to be applied only on Gujarati medium
students of grade eight of Ahmedabad city, the population will consist of only
Gujarati medium students of grade eight of Ahmedabad city. Population may clearly
be defined in statement of the problem also. If it is clearly defined in research title,
universe may not be there in research.
See the following examples of research problem.
Ahmedabad in the context of their study habits, In the first example, the universe
consists of all the students of grade eight of Ahmedabad city. If researcher does not
limit his study, this universe will be population also. If he limits his study to Gujarati
medium students, the universe will include all the students of grade eight and
population will include only Gujarat medium students. In the second example,
researcher clearly mentions the population in the title. Here universe and
population will be the same. In this case also, if he limits his study to the students of
grant in aided schools, the population will include Gujarati medium grade eight
students of grant in aided schools of Ahmedabad city and universe will include
all Gujarati medium students of grade eight of Ahmedabad city. On the basis of this
discussion, it is revealed that researcher must finalise population of the study well in
advance before starting research activities, so that he can plan the process properly
and implement it easily and without any hindrance. We have seen that there is a
noticeable difference between population and universe, but many scholars use both
as alternative of each other in practical life. Besides having understanding of
population and universe, researcher must have clear knowledge of different types of
population.
TYPES OF POPULATION: Here some important types of population are discussed,
but remember this is not a final classification because different scholars have
5
classified it on the basis of different criteria. The most common classification is given
here. Finite and Infinite Population The population in which number of units is finite
and can be counted precisely, is called finite population. The following are some
examples of the same.
2017.
6
Lifespan of the bulbs assumed on the basis of durability period of bulbs produced
and used in past. Exact life of such items cannot be predicted precisely.
the thing, it is taken from. Now we will discuss the meaning of sample.
MEANING OF SAMPLE: A part of population that represents it completely is known
as sample. It means, the units, selected from the population as a sample, must
represent all kind of characteristics of different types of units of population. Due to
various reasons, data are collected from units of sample instead of all units of
population in majority of researches and their findings are generalised in the context
of entire population. This can be done precisely only if the efforts are made to select
the sample by keeping in mind the characteristics of an ideal sample.
CHARACTERISTICS OF AN IDEAL / GOOD SAMPLE :Characteristics of an ideal
7
units of population.
Selection of such sample makes the task of researcher easy and precise. Therefore
sample is very much important in research.
IMPORTANCE OF SAMPLE Importance, need or utility of sample in research can
be described as discussed here.
the subjects are to be demolished for collecting data, sample selection is the
only way for conducting research. Research, done to know the life span of electric
bulb or to know the blood group of a person or to test the taste of cold drink are such
type of researches.
data collection process in such research consumes more time. So collecting data
from entire population becomes tedious and impractical.
time period for research purpose becomes tedious and impractical. So, sample
selection becomes essential in such researches.
pulation that spread over large geographical area
becomes impractical, the researcher has to collect data from sample.
8
convenient if sample is selected for research.
cts can be
made aware of sensitivity of the same properly because they have been
comparatively less in number. Researcher has to select the sample properly, so
that accurate data can be collected for research. For that, researcher has to keep in
mind the feasibility of selecting ideal sample. If it is impossible, he can compromise
with accuracy of selecting sample in unavoidable conditions. He can apply certain
type of sampling method in certain type of condition. How? We will discuss now.
But before that we shall clarify the concept of Sampling.
MEANING OF SAMPLING The process of selecting sample from population is
called sampling. A method used to select a sample is called sampling method.
Researcher can apply certain sampling method out of different methods according to
the objective of research.
TYPES OF SAMPLING METHOD Different sampling methods are categorised
mainly in two groups as (i) Probability Sampling Method and (ii) Non-Probability
Sampling Method.
MEANING OF PROBABILITY SAMPLING METHOD A sampling method, in which
subjects are selected without any bias or prejudice and in which all the units of
population have equal or predetermined and certain probability to be selected in a
sample, is known as probability sampling method. E.g. For selecting one student out
of ten, if chits with their names are prepared and one chit is taken out of them, all the
students will have equal chance to be selected. The probability of all students of
being selected will be .1 or 1/10. In this way, units of population have certain chance
or fixed probability to be selected in a sample. The subjects are selected without any
bias or prejudice in this method. It is considered as the best method of selecting
a sample due to its some specific characteristics.
SPECIAL FEATURES OF PROBABILITY SAMPLING METHOD The characteristics
of probability sampling are as follows.
select a sample by keeping in mind the size of the sample by applying suitable
method of probability sampling.
9
the difference between statistics and parameter.
properly.
NON-PROBABILITY SAMPLING METHOD :This method of sample selection does
not have any scientific base, so it increases the chances of selecting biased sample.
In most of the cases, such sample does not represent all characteristics of entire
population. All units do not have certain or fixed probability to be selected in sample
in this method. That is why, this is known as non-probability sampling method. E.g.
Researcher selects one student out of ten, according to his wish or selects a student
whoever is seen first.
SPECIAL FEATURES OF NON-PROBABILITY SAMPLING METHOD: The
following are special features (Characteristics) of non-probability sampling.
population entirely. Some units of population may have more chance to be selected
in sample than others
DATA:Data are pieces of factual information that are recorded and applied for
analysis. Data is a tool which helps us to understand certain problems by providing
us with information. They are a set of values with qualitative and quantitative
variable.
TYPES OF DATA:
Data of broadly classified into two based upon who collected the data
Primary data: Primary data is the data collected by investigator himself for the first
time for his own research and analysis. It is also known as first-hand information.
Primary data is collected using method such as personal interview, survey etc..,.
Secondary data:Secondary data is the data which is already been collected and
process by the person for the purpose of his research. Journals, internal sources,
journals, book etc..,. are sources of secondary data
DATA COLLECTING TECHNIQUES
PRIMARY DATA :
1)Direct Personal Investigation:Direct personal investigation is the method in
which the investigator directly goes to the source to collect information.
Merits:
(i) Information collected in this method is more authentic and accurate
10
(ii) There is high degree of accuracy in qualitative information
(iii) The original opinion or data shall be obtained.
Demerits:
(i) This is a time consuming process
(ii) If the investigator is not intelligent enough to understand the mental state of
the source it may lead to wrong interpretation.
(iii) It may result in personal bias.
2) Indirect Oral Investigation : Indirect oral investigation is when the investigator
investigates a person close to the source. This is done due to the reluctance of the
original person.
Merits
(i) It saves time and labour
(ii) It is easy and convenient
(iii) It covers a wide range of area.
Demerits
(i) Information received may not be reliable
(ii) Person chosen for this purpose me not be suitable
(iii) It may be expensive as information is collected from various sources.
3) Information collected from local agencies: In this method investigator appoints
a few agencies in various regions to cover various fields of inquiry. This method is
generally used by newspaper companies to get information from various places in
various topics such as sports, economics etc..,.
Merits
(i) Avoid area can be easily covered
(ii) This is a time saving method of collecting data
(iii) The cost of collecting data is less
Demerits
(i) Sometimes the information collected may contradict one another
(ii) The information can be less accurate
(iii) This method will be expensive and a full-time agent is hired in different places
11
4) Questionnaire method:Questionnaire method is the most famous method of
collecting primary data .A questionnaire is a set of questions device for conducting
survey. The questionnaire is sent to the respondent with the request to fill it and send
it back within a specific time.
Merits
(i) This method is cheaper
(ii) The time consumed for this process is very less
(iii) This is an unbiased method of collecting data
Demerits
(i) Sometimes the respondent may provide wrong information
(ii) There is no type of personal motivation in this method
(iii) There are chances of ignorance or late reply from the respondents
General principles of framing a questionnaire:
1)The questionnaire must not be very long:We must try to give the questions as
minimum as possible. Long questionnaire may lead to boredom or discontentment
among the respondents.
2)The question must move from general to specific
When the question moves from general to specific respondent become more
comfortable in answering the questions
3) The question should be ambiguous
The questions must be in such ways that the respondents are able to give clear and
quick answers to the questions
4) The person should not contain double negatives
Words like don't you or wouldn't you must not be used in the questions as they might
tempt the respondent to give a biased answer.
5) The question should not be lending questions
The questions should not give clues to the respondent on how they must answer it.
6) The question must not provide alternators for the answer.
For example, instead of asking would you like to do engineering or medicine after
class 12, the correct way of asking the question is would you like to do engineering?
1.4.2 SECONDARY DATA
1) Published sources
12
Certain government and non-government organisations publish various journals,
research papers, surveys etc which are very helpful and reliable. Some of them are
mentioned below
(i) Publications of international bodies like UNO, WTO and WHO etc..,.
(ii) Publications of research institutes like ISI, NCERT, ICAR etc..,.
(iii) Government publications
(iv) Publications of commercial and financial institutions
(v) Publications of governmental organisations
(vi) Newspaper, journals and periodicals.
2) Unpublished sources
Unpublished sources cover all the sources where data is maintained privately by
certain private agencies or companies. The data collected by universities, research
institutions also come under unpublished sources.
PRESENTATION OF DATA
In the previous topic we saw how data can be collected .As the data collected is
generally huge we need to comprise and deliver it in a presentable form. Generally
there are three ways of presenting presentation of data. They are
1) Textual or Descriptive Presentation
2) Tabular Presentation
3) Diagrammatic Presentation
Textual or Descriptive Presentation
When the data collected is presented in the form of a text it is called textual or
descriptive presentation. Generally this method cannot be used to present large
data.
For example, in the 2011 census, the population of India was 1,21,08,54,977
comprising of 58, 64, 69,174 females and 62, 37, 24,248 males. The literacy rate is
74.0 4 percentage and density of population is 382 person per square kilometer.
From the above example, we can see that the data is represented textually. One of
the major limitations of this method is that the readers must go through the entire text
and get the required information.
Tabular Presentation of Data : When the data is presented in the form of rows and
columns it is called tabular presentation of data.
Example:
13
URBAN 90% 89% 89.5%
The about table represents the pass percentage of the examination conducted in
Tamilnadu it has three rows (urban, rural, total) and three columns (female, male,
total). It is a 3×3 table where each small box is called the cell which gives information
regarding the pass percentage. This method is very significant as it enables us to
use it for further statistical treatment. This tabular representation is further classified
into four
(i)Qualitative Classification:Qualitative classification is when the collected
information is classified in the form of attributes such as gender, nationality etc..,.
The table given above is an example of qualitative classification where the
information is classified in the form of gender and location.
(ii)Quantitative Classification:When information can be measured quantitatively
like age, income, marks etc..,.then, such classifications are called quantitative
classification
Example
MARKS FREQUENCY
0-10 5
10-20 10
20-30 20
30-40 15
40-50 10
Example
MONDAY 2000
TUESDAY 1750
WEDNESDAY 3000
THURSDAY 2250
14
FRIDAY 1550
KARNATAKA 75.36%
KERALA 93.91%
15
2)Frequency diagram:When the data is in the form of grouped frequency are
usually represented by frequency diagrams. Histogram, frequency polygon,
frequency curve and ogive are types of frequency diagram.
(i)Histogram:Histogram is a diagram which consists of rectangular bars whose area
is proportional to the frequency of a variable and whose width is equal to the class
interval.
Unit 2
MEASURES OF CENTRAL TENDENCY: When working on a given set of data, it is
not possible to remember all the values in that set. But we require inference of the
data given to us. This problem is solved by mean, median and mode. Measures of
Central Tendency, represent all the values of the data. As a result, they help us to
draw an inference and an estimate of all the values. They are also known as
statistical averages. Their simple function is to mathematically represent all the
values in a particular set of data. Hence, this representation shows the general trend
and inclination of all the values.
16
When working on a given set of data, it is not possible to remember all the values in
that set. But we require inference of the data given to us. This problem is solved by
mean, median and mode. Measures of Central Tendency, represent all the values of
the data. As a result, they help us to draw an inference and an estimate of all the
values. They are also known as statistical averages. Their simple function is to
mathematically represent all the values in a particular set of data. Hence, this
representation shows the general trend and inclination of all the values.
An average provides a simple way of representation of all the individual data. It also
aids in the comparison of different groups of data. In addition to this, an average in
economic terms can represent the direction an economy is headed towards. Hence,
it can be easily used to formulate policies and bring about a reform for a better
economy.
MEAN:
ARITHMETIC MEAN:The arithmetic mean of a series of numbers is sum of all
observations divided by the total number of observations in the series.
Example:There are two brothers, with different heights. The height of the younger
brother is 138 cm and height of the elder brother is 154cm. The average height of
the two brother is total height divided into two equal parts,
(138+154) ÷ 2 = 292 ÷ 2 = 146 cm
So 146 cm is the average height of the brothers. Here 154 > 146
> 138. The average value lies in between the minimum value and the maximum
value.
Thus if x1, x2, ..., xn represent the values of n observations, then arithmetic mean
(A.M.) for n observations is: (direct method)
There are two methods for computing the arithmetic mean: (i) Direct method (ii)
Short cut method.
17
Direct Method:
Example:
The following data represent the number of books issued in a college library is
selected from 7 different days 17,1 9, 22, 25, 15, 40, 21 find the mean number of
books.
Solution:
x 20 + 39 + 22 + 25 + 45 + 40 + 54 = 245 = 35
7 7
Hence the mean of the number of books is 35
Indirect Method:
In this method an assumed mean or an arbitrary value (A) is used as the basis of
calculation of deviations (di) from individual values. If di
= xi – A
Example:
A student‘s marks in 5 subjects are 95, 78, 88, 72,99. Find the average of his marks.
Let us take the assumed mean, A = 88
xi di= xi– 88
95 7
78 10
88 0
72 -16
99 10
Total 11
Solution:
18
= 88 + 11 = 88 + 5.5 = 93.5
2
Example:
Given the following frequency distribution, calculate the arithmetic mean
Marks 64 63 62 61 60 59
No. Of. Students 8 18 12 9 7 6
Solution:
xi fi fi xi di = xi – A fidi
(A=62)
64 8 512 2 16
63 18 1134 1 18
19
62 12 744 0 0
61 9 549 -1 -9
60 7 420 -2 -14
59 6 354 -3 -18
60 3713 -7
Direct Method
x 3713 6 61.88
Short cut method Here A = 62
x 62 – 7 = 61.88
60
The mean mark is 61.88
Mean of continuous Grouped data: Direct method
2 WEIGHTED ARITHMETIC MEAN
For calculating simple mean, all the values or the sizes of items in the distribution
have equal importance. But in practical life this may not be so, in case some items
are more important than others, a simple average computed is not representative of
the distribution. Proper weightage has to be given to the various items.
For example a student may use a weighted in order to calculate their percentage
grade in a course, in this the student would multiply the weighing of all assessment
items in the course( eg: assignment, exams, projects, etc.)by respective grade that
was obtained in each of categories
The average whose component items are being multiplied by certain values known
as ―weights‖ and the aggregate of the multiplied results are divided by the total sum
of their ―weight‖
Let x1,x2,....,xn be the set of n values having weights w1,w2, ,wn respectively,
then the weighted mean is
Xw = 𝑤1 𝑥1 + 𝑤2 𝑥2 + … … … 𝑤𝑛 𝑥𝑛
𝑤1 + 𝑤2 + 𝑤3+⋯………+𝑤1n…
20
Example: A student obtained the marks 40,50,60,80, and 45 in math, statistics,
physics, chemistry and biology respectively. Assuming weights 5,2,4,3, and 1
respectively for the above mentioned subjects, find the weighted arithmetic mean per
subject.
Solution
Total 15 825
Weighted average:
Combined Mean:In the arithmetic averages and the number of items in two or more
related groups are known, the combined or the composite mean of the entire group
can be obtained by
the advantage of combined arithmetic mean is that we can determine the overall
mean of the combined data without going back to the original data
Example:
If a sample size of 22 items has a mean of 15 and another sample size of 18 items
has a mean of 20. Find the mean of the combined sample?
Solution:
= 22 x 15 + 18 x 20
22 + 18
= 330 + 360 = 690 = 172.5
21
40 40
Merits of AM
1. It can be calculated easily and is also easy to understand.
2. Fluctuation can be minimized
3. It can further be used for statistical treatement like
median,mode etc.,.
4. This method is rigidly defined and hence can be used for comparison
Demerits of AM
1. It cannot be plotted in a graph.
2. It is not applicable in qualitative data.
3. AM cannot be calculated if the class intervals have open ends.
4. It is highly influenced by extreme observations.
GEOMETRIC MEAN ( GM ):A geometric mean is a mean or average which shows
the central tendency of a set of numbers by using the product of their values.
The geometric mean of two numbers, say x, and y is the square root of their product
x×y. For three numbers, it will be the cube root of their products i.e., (x y z) 1⁄3.
The geometric mean of a series containing n observations is the nth root of the
product of the values. If x1, x2,……xn are observations then
Example:Calculate the geometric mean of the following growth of price of onions per
100 Kg per annum is 180, 250, 490, 1400, and 1050
22
x 180 250 490 1400 1050 Total
log x 2.2553 2.3979 2.6902 3.1461 3.0212 13.5107
= Antilog 13.5107
5
= Antilog 2.7021 = 503.6 Geometrical mean of onion rate is 503.6
Example:
Find the geometric mean for the following distribution of student‘s marks:
Marks 0 – 30 30 – 50 50 – 80 80 - 100
No . of students 20 30 40 10
Solution:
No of Mid
Marks f log x
students f points x
20 (log 15) = 20(1.1761) =
0 – 30 20 15
23.5218
30 (log 40) = 30 (1.6020)
30 – 50 30 40
= 48.0168
40 (log 65) = 20(1.8129) =
50 – 80 40 65
72.5165
80 - 10 (log 90) = 20(1.9542) =
10 90
100 19.5424
Total 100 163.6425
23
= Antilog 163.6425
100
= Antilog 1.6364 = 503.6 Geometrical mean of onion rate is 43.29 Merits of
Geometric mean:
1. It is strictly defined
2. It is based on all items
3. It is very suitable for averaging ratio, rates and percentages
4. It is capable of further mathematical treatment
5. Unlike AM, it is not affected much by the presence of extreme values
Demerits of geometric mean:
1. It cannot be used when the values are negative or if any of the
observations is zero
2. It is difficult to calculate particularly when the items are very large or
when there is a frequency distribution
3. It brings out the property of the ratio of the change and not the absolute
difference of change as the case in arithmetic mean
4. The GM may not be the actual value of the series
3.3.3 HARMONIC MEAN
Harmonic mean of a set of observations is defined as the reciprocal of the
arithmetic average of the reciprocal of the given values. If x1,x2…..xn are n
observations.
A harmonic mean is used in averaging of ratios. The most common examples
of ratios are that of speed and time, cost and unit of material, work and time etc. The
harmonic mean (H.M.) of n observations is
24
Example:
Calculate the harmonic mean of the numbers 13.5, 14.5, 14.8, 15.2 and 16.1
Solution:The harmonic mean is calculated as below:
x 1/x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
Total 0.3417
= 5 = 14.63
0.3417
H.M. Discrete Grouped data:
For a frequency distribution
Example:
The frequency distribution of first year students of a particular college, calculate the
harmonic mean
Age (years) 17 18 19 20 21
2 5 13 7 3
Solution:
Age ( years) x Number of students f f / x
17 2 0.1176
18 5 0.2778
19 13 0.6842
20 7 0.3500
25
21 3 0.1429
Total 30 1.5725
26
2 2
So in the above example, take the mean of 21 and 27 and divide it by 2 which will
give you 24.
Example:
The salaries of 8 employees who work for a small company are listed below. What is
the median salary?
40,000; 29,000; 35,500; 31,000; 43,000; 30,000; 27,000; 32,000
Solution:
Arrange the data in ascending order
27,000; 29,000; 30,000; 31,000; 32,000; 35,500; 40,000; 43,000
Since there is an even number of items in the data set, we compute the median by
taking the mean of the two middlemost numbers
2 2 2
Example: 13
Find the median of the following set of points in a game: 15, 14, 10, 8, 12, 8, 16
Solution:
First arrange the values in an ascending order 8, 8, 10, 12, 14, 15, 16
The number of point values is 7, an odd number. Hence, the median is the value in
the middle position.
Median = ( n+1)th term
2
= (7+1)th term /2 = 4th/
27
The median is 12
Grouped data:In grouped distribution, values are associated with frequencies.
Grouping can be in the form of a discrete frequency distribution or continuous
frequency distribution. Whatever may be the distribution, cumulative frequencies
have to be calculated the total number of items.
Cumulative frequency: (cf):Cumulative frequency of each class is the sum of the
frequency of the class and the frequencies of the pervious classes, ie adding the
frequencies successively, so that the last cumulative frequency gives the total
number of items.
When the data follows a discrete set of values grouped by size, we use the formula
(𝑛+1)ℎ item for finding the median. First we form a cumulative
2
frequency distribution, and the median is that value which corresponds to
No of Students 1 2 3 4 5 6 7
Number of Branches 2 11 15 20 25 18 10
3 15 28
4 20 48
5 25 73
6 18 91
7 10 101
Total 101
Median = size of (𝑁+1)ℎ item
2
28
= size of (101+1)ℎ
item Class
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39
interval
Frequency 5 8 10 12 7 6 3 2 2
= 51th item
Median = 5 because 51th item corresponds to 5
Median for continuous grouped data
In case, the data is given in the form of a frequency table with class interval etc, then
the following formula is used for calculating median in continuous grouped data
29
35-39 2 34.5 - 39.5 53
53
𝑁 = 53 = 26.5
2 2
Here the cumulative frequency is greater than or equal to 26.5 is 14.5
l = 14.5
N/2 = 26.5
m = 23
f = 12
= 14.5 + (26.5 – 23) x 5 = 14.5 + 1.46 = 15.96
12
Merits of Median:
1. Median is not influenced by extreme values because it is a positional average.
2. Median can be calculated in case of distribution with open end intervals.
3. Median can be located even if the data are incomplete.
4. Median can be located even for qualitative factors such as ability, honesty etc.
Demerits of Median:
1. A slight change in the series may bring drastic change in median value
2. In case of even number of items or continuous series, median is an estimated
value other than any value in the series.
3. It is not suitable for further mathematical treatment except its
use in mean deviation.
4. It is not taken into account all the observation.
30
MODE: The mode is the most frequently occurring values or scores.The mode is
useful when there are a lot of repeated values. There can be no mode, one mode, or
multiple modes.
Its importance is very great in marketing studies where a manager is interested in
knowing about the size, which has the highest concentration of items. For example,
in placing an order foot shoes or ready-made garments the model size helps
because the sizes and other sizes around in common demand.
Ungrouped Data:
For ungrouped values or a series of individual observation mode is often found by
mere inspection
Example:
Find the mode for the following list of values:
13,18,13,14,13,16,14,21,13
Solution:
The mode is the number that is repeated more often than any other Therefore the
Mode = 13
In some cases the mode may be absent while in some cases there may be more
than one mode.
Example: Ms.Rossy asked students in her class how many siblings they each has.
Find the mode of the data : 0,0,0,1,1,1,1,2,2,2,2,3,3,4
Solution: The modes are 1 and 2 siblings
Grouped Data: For Discrete distribution, the highest frequency and corresponding
value of X is mode.
Continuous distribution:
Where L is the lower class limit of the modal class f1is the frequency of the modal
class
f0 is the frequency of the class preceding the modal class in the frequency table
f2is the frequency of the class succeeding the modal class in the frequency table
h is the class interval of the modal class
31
C 400 and
0-50 50-100 100-150 150-200 200-250 250-300 300-350 350-400
-I above
Calculate mode
f 5 14 40 91 450 87 60 38 15 for the
following
Solution:The highest frequency is 450 and corresponding class interval in 200 – 250,
which is the modal class
Here L = 200, f1 = 150, f0=91, f2=87, h=50
= 200 + 150 – 91 x 50
2 x 150 – 91 – 87
122
Merits of mode:
1. It is easy to calculate and in some cases it can be located mere inspection.
2. Mode is not at all affected by extreme values
3. It can be calculated for open-end classes
4. It is usually an actual value of an important part of the series
5. In some circumstances it is the best representative of data
Demerits of mode:
1. It is not based on all observation
2. It is not capable of further mathematical treatment
3. Mode is ill defined generally it is not possible to find mode in some cases.
4. As compared with mean, mode is affected to a great extent by sampling
fluctuations
32
It is unsuitable in cases where relative importance of items has to be considered.
PARTITION MEASURES
QUARTILES
The quartiles divide the distribution in four parts. There are three
quartiles denoted by Q1, Q2 and Q3 divides the frequency
distribution in to four equal parts
That is 25% of data will lie below Q1, 50% of data below Q2
and 75percent below Q3. Here Q2 is called the Median. Quartiles
are obtained in almost the same way as median.
Ungrouped Data:
If the data set consist of n items and arranged in ascending order
then
Continuous series:
In the case of continuous series, find the cumulative frequency and then use the
interpolation formula.
• Find Cumulative frequencies
• Find N / 4
• Q1 class is the class interval corresponding to the value of the cumulative
frequency just greater than N / 4
• Q3 class is the class interval corresponding to the value of the cumulative
frequency just greater than 3 N / 4
33
f3 = frequency of the 3rd quartile class
m3 = cumulative frequency preceding the 3rd quartile
c3 = width of the third quartile class
DECILES: These are the values which divide the total number of observation into 10
equal parts. They are D1, D2, D3, D4, D5, D6, D7, D8, D9 and D10.
Ungrouped Data:
Example:
Compute the D7 for the data: 5, 24, 36, 12, 20, and 8.
Solution:
Arranging the given data in the ascending order 5,8,12,20,24,36
3 PERCENTILE:The percentile values divide the distribution into 100 parts each
containing 1 percent of the cases. The percentile (Pk) is that value of the variable
upto which lie exactly k% of the total number of observation Relationship
P25 = Q1
P50 = Median = Q2
P75 = 3rd quartile = Q3
10 2 10
10 8 12
10 20 8
∑X = 30 30 30
34
In all three series, the value of arithmetic mean is 10. On the basis of this average,
we can say that the series are alike. If we carefully examine the composition of three
series, we find the following differences:
(i) In case of 1st series, three items are equal; but in 2nd and 3rd series, the
items are unequal and do not follow any specific order.
(ii) The magnitude of deviation, item-wise, is different for the 1st, 2nd and 3rd
series. But all these deviations cannot be ascertained if the value of simple mean is
taken into consideration.
(iii) In these three series, it is quite possible that the value of arithmetic mean is
10; but the value of median may differ from each other. This can be understood as
follows;
10 2 8
10 20 12
∑X = 30 30 30
The value of Median‘ in 1st series is 1 , in 2nd series 8 and in 3rd series 1 .
Therefore, the value of the Mean and Median are not identical.
(iv) As the average remains the same, the nature and extent of the distribution of
the size of the items may vary. In other words, the structure of the frequency
distributions may differ even though their means are identical.
PROPERTIES OF A GOOD MEASURE OF DISPERSION
There are certain pre-requisites for a good measure of dispersion:
1. It should be simple to understand.
2. It should be easy to compute.
3. It should be rigidly defined.
4. It should be based on each individual item of the distribution.
5. It should be capable of further algebraic treatment.
CHARACTERISTICS OF MEASURES OF DISPERSION
• A measure of dispersion should be rigidly defined
35
• It must be easy to calculate and understand
• Not affected much by the fluctuations of observations
• Based on all observations
CLASSIFICATION OF MEASURES OF DISPERSION
The measure of dispersion is categorized as:
(i)An absolute measure of dispersion:It involves the units of measurements of the
observations. For example, (i) the dispersion of salary of employees is expressed in
rupees, and (ii) the variation of time required for workers is expressed in hours. Such
measures are not suitable for comparing the variability of the two data sets which are
expressed in different units of measurements
(ii)A relative measure of dispersion:It is a pure number independent of the units of
measurements. This measure is useful especially when the data sets are measured
in different units of measurementFor example, a nutritionist would like to compare
the obesity of school children in India and Africa. He collects data from some of the
schools in these two countries. The weight is normally measured in kilograms in
India and in pounds in Africa. It will be meaningless, if we compare the obesity of
students using absolute measures. So it is sensible to compare them in relative
measures.
RANGE:
Raw Data:A range is the most common and easily understandable measure of
dispersion. It is the difference between the largest and smallest observations in the
data set
Range ( R ) = L - S
Grouped Data:The grouped frequency distribution of values in the data set, the
range is the difference between the upper class limit of the last class interval and the
lower class limit of the first class interval.
Coefficient of Range: The relative measure of range is called the coefficient of
range
Coefficient of range = (L-S) / (L + S)
Example: Find the value of range and its coefficient for the following data 49, 81, 36,
64, 121, 100.
Solution:
L = 121 : S = 36
Range : L – S = 121 – 36 = 85
Co-efficient of Range = (L-S) / (L+S) = 121-36 /121+36
= 85 / 157 = 0.5414
36
Example:
Calculate range and its coefficient from the following distribution
Solution: L = 30, S = 10
Range = L - S = 30 – 10 = 20
Coefficient of Range = (L-S) / (L+S) = 30 - 10 / 30 + 10
= 20/ 40= 0.5
Merits of Range
• It is the simplest of the measure of dispersion
• Easy to calculate
• Easy to understand
• Independent of change of origin
Demerits of Range
• It is based on two extreme observations. Hence, get affected by fluctuations
• A range is not a reliable measure of dispersion
• Dependent on change of scale
QUARTILE DEVIATION
The quartiles divide a data set into quarters. The first quartile, (Q1) is the middle
number between the smallest number and the median of the data. The second
quartile, (Q2) is the median of the data set. The third quartile, (Q3) is the middle
number between the median and the largest number. Quartile deviation is half of the
difference between the first and third quartiles. Hence it is called as Semi Inter
Quartile Range
Quartile deviation or semi-inter-quartile deviation is
Q = ½ × (Q3 – Q1)
Coefficient of Quartile Deviation
Coefficient of Q.D = Q3 – Q1 / Q3 + Q1
Merits of Quartile Deviation
•All the drawbacks of Range are overcome by quartile deviation
•It uses half of the data
•Independent of change of origin
•The best measure of dispersion for open-end classification
37
Demerits of Quartile Deviation
•It ignores fifty percent of the data
•Dependent on change of scale
•Not a reliable measure of dispersion
Example:
Calculate the quartile deviation and its coefficient for the wheat production (in
Kg) of 25 acres is given as : 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755,
1720, 1600, 1470, 1750 and1885.
Solution: Arrange the observation in increasing order:
1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470,
1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885, 1960.
Q1 = value of (n+1) / 4 th item
= value of (20 +1) / 4 th item = value of (5.25)th item
= 5th item + 0.25 ( 6th item – 5th item)
= 1240 + 0.25 (1320 – 1240)
= 1240 + 20 = 1260
Q1 = 1260
Q3 = value of 3(n+1) / 4 th item
= value of 3(20 +1) / 4 th item = value of (15.75)th item
= 15th item + 0.75 ( 16th item –15th item)
= 1750 + 0.75 (1755 – 1750)
= 1750 + 3.75 = 1753.75
Q3 = 1753.75
Q.D = ( Q3 – Q1 ) / 2 = (1753.75 – 1260) / 2 = 492.75 / 2
= 246.875
Coefficient of QD = (Q3 – Q1) / ( Q3 + Q1 )
= (1753.75 – 1260) / (1753.75 + 1260)
= 0.164
38
MEAN DEVIATION:The average deviation, it is defined as the sum of the deviations
from an average divided by the number of items in a distribution The average can be
mean, median or mode. Theoretically median is d best average of choice because
sum of deviations from median is minimum, provided signs are ignored. However,
practically speaking, arithmetic mean is the most commonly used average for
calculating mean deviation and is denoted by the symbol MD.
Mean Deviation is of three types of series:
•Individual Data Series
•Discrete Data Series
•Continuous Data Series
Individual Data Series: For individual series, the Mean Deviation can be calculated
using the following formula
𝑴𝑫 = 𝟏 N ∑|𝑿 − 𝑨| ∑ |𝑫|/𝑵
Where
MD = Mean deviation. X = Variable values
A = Average of choices
N = Number of observations
Coefficient of Mean Deviation:
Mean deviation calculated by any measure of central tendency is an absolute
measure. The purpose of comparing variation among different series, a relative
mean deviation is require. The relative mean deviation are obtained by dividing the
mean deviation by the average used for calculating mean deviation
The Coefficient of Mean Deviation can be calculated using
𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐌𝐃 = 𝐌𝐃/𝐀
Example:
Calculate mean deviation and coefficient of mean deviation for the following
individual data:
28 + 72 + 90 + 140 + 210
5 = 540/ 5 = 108
39
28 80
72 36
90 18
140 32
210 102
Ʃ|D|= 268
268 / 5 = 𝟓𝟑. 𝟔
53.6/108= 𝟎. 𝟒𝟗𝟔𝟑
Discrete Data Series
For discrete series, the Mean Deviation can be calculated using
C𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐌𝐃 = 𝐌𝐃/𝐌𝐞
Example: Calculate the mean deviation and for the following discrete data
Frequency 6 15 3 3 9
Solution
40
Xi Frequency fi fixi |xi – Me| fi |xi – Me|
42 6 252 93 558
135 3 405 0 0
150 3 550 15 45
1683 / 36 = 𝟒𝟔. 𝟕𝟓
Coefficient of MD = MD/ Me
Continuous Data Series:The method of calculating mean deviation in a continuous
series is same as the discrete series. In continuous series, find a midpoint of the
various classes and take deviation of these points from the average selected
41
Find out the mean deviation from the given data
Age in years 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
No of persons 40 50 64 80 82 70 20 16
Solution:
Mid Mid
Item Frequenc |xi – fi |xi – Item Frequenc
poin fixi poin fixi
s y fi Me| Me| s y fi
t xi t xi
0- 31.4 1258. 0-
5 40 200 5 40 200
10 7 8 10
30- 30-
35 80 2800 1.47 117.6 35 80 2800
40 40
Ʃ fi |xi
– Me|
Ʃ fixi
N = 422 =1539 6544.3
0 4
42
Median = fixi/ N =15390/422= 36.47
43
Example:
Calculate the standard deviation from the following data 28, 44, 18, 30, 40, 34, 24,
22.
Solution:
Deviations from actual mean
Values (X) X - (X - )2
28 -2 4
44 -14 196
18 -12 144
30 0 0
40 10 100
34 4 16
24 -6 36
22 -8 64
240 560
𝑋= 240/ 8 = 30
σ 2
44
Students No: 1 2 3 4 5 6 7 8 9 10
Marks 53 58 46 67 32 70 35 68 88 99
1 53 -14 196
2 58 -9 81
3 75 8 64
4 67 0 0
5 32 -35 1225
6 70 3 9
7 35 -32 1024
8 68 1 1
9 88 21 441
10 69 2 4
Ʃ d2 =
n = 10 Ʃd = -55
3045
σ= d2/n - d 2 /n
3045/10 -- ( 55)2/10 =
304.5 30.25 = 274.25
= 𝟏𝟔. 𝟓𝟔𝟎𝟓
CALCULATION OF STANDARD DEVIATION:
Discrete series: There are three methods for calculating standard deviation in
discrete series. They are
45
b) Assumed mean method
c) Step deviation method
Actual mean method:Calculate the mean of the series. Find the deviations for
various items from the means and square the deviations and multiply by the
respective frequency and total the product the formula to calculate actual mean
method is
𝛔=√∑𝐟𝐝𝟐/∑𝐟
If the actual mean is fractions, the calculation takes lot of time and labour; and as
such this method is rarely used in practice
Assumed mean method:Here deviation is taken not from an actual mean but from
an assumed mean. Also this method is used, if the given variable values are not in
equal intervals.
𝛔= 𝐝𝟐 /f 𝐝) /f where d = X – A, N = Ʃf
Example:
Calculate standard deviation from the following data
X 20 22 25 31 35 40 42 45
f 5 12 15 20 25 14 10 6
Solution:
Deviation from assumed mean
x f d = X-A d2 fd fd2
(A=31)
22 12 -9 81 -108 972
25 15 -6 36 -90 540
31 20 0 0 0 0
35 25 4 16 100 400
40 14 9 81 126 1134
45 6 14 196 84 504
46
σ ∑fd 2 − (∑fd )2 5365 / 107 − (167 )2 /107 5 .16 − 2.44 𝟔. 𝟗𝟏
Step – deviation method:If the variable values are in equal intervals, then we adopt
this method
Marks 30 40 50 60 70 80 90
No of students 8 12 20 10 7 3 2
Solution:
30 8 -2 -16 32
40 12 -1 -12 12
50 20 0 0 0
60 10 1 10 10
70 7 2 14 28
80 3 3 9 27
90 2 4 8 32
47
Example:Particulars regarding income of two company are given below:
Company
A B
48
d1 x12 - x1 = 1613.6363 -1500 = 113.6363 d2 = x12 - x2 =
1613.6363 – 1750 = -136.3637
49
different scoring mechanisms. If sample A has a CV of 12% and sample B has a CV
of 25%, you would say that sample B has more variation, relative to its mean.
Example:
Price of car in five years in two cities is given below :
20,00000 10,00000
22,00000 20,00000
19,00000 18,00000
23,00000 12,00000
16,00000 15,00000
Which city has more stable prices?
Solution:
City A City B
20 0 0 10 -5 25
22 2 4 20 5 25
19 -1 1 18 3 9
23 3 9 12 -3 9
16 -4 16 15 0 0
City A: x x n 1 5 2
σx= ∑(X − X)2/n = dx2/n
= 30/5=2.45
50
C. V. (X) (σ / x) X 1
=2.45/20 x100 = 𝟏𝟐. 𝟐𝟓%
City B: x x n 75 5 15
σy= ∑(y − y)2/n
= 68/5=3.69
C. V. (Y) ((σ / y ) X 100
=3.69/15 x100 = 𝟐𝟒. 𝟔%
Frequency
51
Difference between Variance and Skewness:The following two points of difference
between variance and skewness should be carefully noted.
1.Variance tells us about the amount of variability while skewness gives the direction
of variability.
2.In business and economic series, measures of variation have greater practical
application than measures of skewness. However, in medical and life science field
measures of skewness have greater practical applications than the variance.
52
VARIOUS MEASURES OF SKEWNESS:Measures of skewness help us to know to
what degree and in which direction (positive or negative) the frequency distribution
has a departure from symmetry. Although positive or negative skewness can be
detected graphically depending on whether the right tail or the left tail is longer but,
we don‘t get idea of the magnitude. Besides, borderline cases between symmetry
and asymmetry may be difficult to detect graphically. Hence some statistical
measures are required to find the magnitude of lack of symmetry. A good measure of
skewness should possess three criteria:
1.It should be a unit free number so that the shapes of different distributions, so far
as symmetry is concerned, can be compared even if the unit of the underlying
variables are different;
2.If the distribution is symmetric, the value of the measure should be zero. Similarly,
the measure should give positive or negative values according as the distribution has
positive or negative skewness respectively; and
3.As we move from extreme negative skewness to extreme positive skewness, the
value of the measure should vary accordingly.
53
1.β and γ Coefficient of Skewness:
Karl Pearson defined the following and coefficients of skewness, based upon
the second and third central moments:
1=
(
Then the sign of skewness would depend upon the value of U3 whether it is
positive or negative. It is advisable to use Y1 as measure of skewness
Sk = Mean -- Mode
Sk = 3(Mean – Median)/
54
3.Bowleys‘s Coefficient of Skewness This method is based on quartiles. The formula
for calculating coefficient of skewness is given by
Based on Percentiles
S
P90 P50 P50 P10
K= P90 P10
Based on Deciles
ck
D9 2D5 D1 D9 D1
Mean Mode /
Sk
55
0.64 =59.2 – Mode /13
CONCEPT OF KURTOSIS
If we have the knowledge of the measures of central tendency, dispersion and
skewness, even then we cannot get a complete idea of a distribution. In addition
to these measures, we need to know another measure to get the complete idea
about the shape of the distribution which can be studied with the help of Kurtosis.
Prof. Karl Pearson has called it the ―Convexity of a Curve‖. Kurtosis gives a
measure of flatness of distribution.
18
16
14
12
10
8
6
4
2
0
0 5 10 15 20
Measures of Kurtosis
1.Karl Pearson’s Measures of Kurtosis :For calculating the kurtosis, the second
and fourth central moments of variable are used. For this, following formula given by
Karl Pearson is used:
56
Or 2 = 2 3
Description:
2 =
75 P25 / 90 P10
=
0.72 = 0.031
2.53
57
Kurtosis,= 2 =
18.75/2.5
2=
18.75/6.25 =3.
Unit 4
X 5 10 15 20 25
58
Positive correlation refers to the change (movement)of variables in the same
direction. Both the variables are increased or decreased in the same direction, it is
called positive correlation. It is otherwise called as direct correlation. For example, a
positive correlation exists between ages of husband and wife, height and weight of
a group of individuals, increase in rainfall and production of paddy, increase in the
offer and sales.
Positive correlation
X 5 7 9 11 16 20 28
y 20 26 35 37 48 50 55
Negative Correlation
X 14 17 23 35 46
y 16 12 10 9 5
Simple correlation is a measure used to determine the strength and the direction of
the relationship between two variables, X and Y. A simple correlation coefficient can
range from –1 to 1. However, maximum (or minimum) values of some simple
correlations cannot reach unity (i.e., 1 or –1).
The study of two variables excluding some other variables is called partial
correlation. For example, we study price and demand, eliminating the supply side.
59
If the ratio of change between two variables is uniform, then the there will be linear
correlation between them. Consider the following.
X 6 12 18 24
Y 5 10 15 20
In a curvilinear or non linear correlation, the amount of change in one variable does
not bear a constant ratio of the amount of change in the other variables. The graph of
non-linear or curvilinear relationship will form a curve.
60
TWO-WAY TABLE :A two-way table (also called a contingency table) is a useful
tool for examining relationships between categorical variables; the entries in the
cells of two-way table can be frequency counts or relative frequencies (just like a
one-way table).
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
Above a two-way table shows the favourite leisure activities for 50 adults-20 men
and 30 women. Because entries in the table are frequency counts, the table is a
frequency table
Pearson's correlation coefficient is the test statistics that measures the statistical
relationship, or association, between two continuous variables. It is known as the
best method of measuring the association between variables of interest because it is
based on the method of covariance
r ∑xy
∑x2 ∑y2
61
Example:
Sales 15 18 22 28 32 46 52
X 𝐗 X2 Y 𝐘 Y2 XY
−𝐗 −𝐘
X ∑x N 213 7 3 .43
Y ∑y N 645 7 =92.14
r ∑xy
∑x2 ∑y2
= 2647 .57
1179.68x6, 2.86
34.35x77.48 2661.44
62
Therefore, there is a high degree positive correlation between the x and y.
The Spearman correlation between two variables is equal to the Pearson correlation
between the rank values of those two variables; while Pearson's correlation
assesses linear relationships, Spearman's correlation assesses monotonic
relationships (whether linear or not). If there are no repeated data values, a perfect
Spearman correlation of +1 or −1 occurs when each of the variables is a perfect
monotone function of the other
𝑟 =1 − 6∑𝐷2/n(𝑛 2−1)
Candidate 1 2 3 4 5 6 7 8 9 10
Professor A 8 12 6 4 9 15 8 7 16 13
Professor B 9 16 10 8 14 19 12 11 20 17
Solution
63
Rx Ry d= Rx- Ry d2
8 9 -1 1
12 16 -4 16
6 10 -4 16
4 8 -4 16
9 5 4 16
15 10 5 25
8 7 1 1
7 11 -4 16
16 15 1 1
13 18 -5 25
∑ d2=133
rs = 1-- 6 D2
n(n2 --1)
= 1 – 6(133)/ 10(100-1)
=1 -- 798/ 990
= 1-0.8060 = r = 0.194
1.Coefficient of Correlation lies between -1 and +1:The coefficient of correlation cannot take
value less than -1 or more than one +1. Symbolically,-1<=r<= + 1 or | r | <1.
64
3.Coefficient of Correlation is independent of Change of Scale: This property reveals
that if we divide or multiply all the values of X and Y, it will not affect the coefficient of
correlation.
4.The value of the co efficient of correlation shall always lie between +1 and -1.
It is easy to calculate, and it is not necessary to calculate the standard deviation of X and Y
series separately.
It is based on the direction of change in two paired variables. This method is suitable
when it is desired to study the direction of change rather than its quantity. In other
words when it is required to study whether the correlation is positive or negative, the
concurrent deviation method is applied.
rc = + -- 2( c—n)/ n
Here: C is the number of positive signs after multiplying the change of direction of
change of X-series and Y-series.
Process of Calculation:
1.Find out the direction of change in the present value as compared to previous one
in X variable. If the second value is less than the first, put a sign (-) minus; if it is
more put (+) plus; and if it is equal put zero or (=). Repeat the same process for
other values and denote it with ‗Dx‘.
2.In the same way, ascertain the direction of change in Y variable and denote the
same with ‗Dy‘
3.Multiply ‗Dx‘ with corresponding ‗Dy‘ and determine the number of positive signs
(means ‗C‘). Here, in is importamt that product of ( ) and ( ) is (+).
65
4.Significance of ± signs outside and inside square root is, if the value of
is negative, keep minus outside as well inside square root, so as to make it positive.
We cannot take the square root of a negative value. If is positive, then we get
a positive value of coefficient of correlation. In case it is negative, coefficient of
correlations is negative.
Merits:
Limitations :
This method does not differentiate between small and big values. It works with
approximation
Example 1: Calculate the coefficient of Concurrent Deviations from the following data:
Solution:
Year X Dx Y Dy Dx.Dy
2003 150 200
2004 154 + 180 - -
2005 160 + 170 - -
2006 172 + 160 - -
2007 160 - 190 + -
2008 165 + 180 - -
2009 180 + 172 - -
n=6 c=0
= + -- +- (2 c—n)/ n
= + -- √ +-2( 0—6)/ 6
= = rc = --1
66
The coefficient of determination, often denoted as R², quantifies the proportion of
variance in a dependent variable that can be predicted from an independent variable,
essentially measuring how well a model fits the data. It's the square of the correlation
coefficient and ranges from 0 to 1, with higher values indicating a stronger
relationship.
Calculation:
Interpretation:
Example:
An R² of 0.80 means that 80% of the variance in the dependent variable is explained
by the independent variable.
Goodness of Fit:
R² is a measure of how well a regression model fits the data, with higher R² values
generally indicating a better fit.
It's important to note that a high R² value does not necessarily imply a causal
relationship between the variables; it only indicates a strong correlation.
Other Names:
complete
67
68