0% found this document useful (0 votes)
102 views112 pages

Business Statistics

Unit 1 focuses on the concepts of correlation and regression, emphasizing their applications in business operations, including multiple regression and reliability of estimates. It covers the importance of correlation coefficients, the distinction between simple and multiple correlations, and the role of regression in predicting values not present in data sets. The unit also includes practical activities and self-assessment questions to reinforce understanding of these statistical concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views112 pages

Business Statistics

Unit 1 focuses on the concepts of correlation and regression, emphasizing their applications in business operations, including multiple regression and reliability of estimates. It covers the importance of correlation coefficients, the distinction between simple and multiple correlations, and the role of regression in predicting values not present in data sets. The unit also includes practical activities and self-assessment questions to reinforce understanding of these statistical concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 1: Correlation and Regression

Learning Outcomes:
• Students will define the concept of multiple regression and correlation in day-to-day scenarios
in business operations.
• Students will identify and differentiate between various types of correlation being used in
various measures involving operational business processes.
• Students will demonstrate effective usage of multiple and partial correlations in the business
decision-making process.
• Students will describe the reliability of the estimate in business operations.
• Students will analyse the various findings of partial and multiple regression in decision-making
scenarios.

Structure:
1.1 Multiple regression and correlation: Linear regression equation, Regression equation in
terms of simple correlation; coefficients
1.2 Reliability of the estimate
• Knowledge Check 1
• Outcome-Based Activity 1
1.3 Multiple Correlation; Partial Correlation
• Knowledge Check 2
• Outcome-Based Activity 2
1.4 Summary
1.5 Keywords
1.6 Self-Assessment Questions
1.7 References / Reference Reading
1.1 Multiple Regression and correlation: Linear regression equation, Regression equation in
terms of simple correlation; coefficients
Statistics, as we all know, is used to deal with numbers. By the term dealing with numbers, it is
inferred that it uses statistics to process them and generate use insights from these numbers or the
data set.
One of the outputs of meaningful information is in the process of making predictions. The term,
prediction is known as regression.
Regression is thus a process to generate the values from the data set which do not have explicit
occurrence in the data set.
Let us take an example as to what we are discussing.
Consider the table given below:
Year Number of Cars Sold
1999 3000
2000 3500
2001 2800
2004 3456
2005 6746
2006 7468
2007 3457
2008 4568
2009 7455

Referring to the above figure, it is seen that the above is a data set; that is it is a set of numbers.
This is the data set which is in its pure form. Now, by applying statistics, we develop a regression
equation to determine the possible value in the years 2014, 2017 or 1986.
Thus through regression, we are able to determine the values which are not in the data set.

Now consider Figure 1 given below:


Figure 1.1: Depiction of Multiple Regression Equation
The above figure depicts the generalised formula for multiple regression.
It is seen that the left side of the equation, represented by the value Y is the output that is the
dependent variable. It is like seeking the answer to the question of what the value would be in the
year 2014.
On the other hand, the right side of the equation represents the various input variables, represented
by X, which contribute to the generation of the output variable. The coefficients represented by β
represent the weights, which may be correlation coefficients which have a say in the output
variable.
One of the means for determining the weight includes the determination of multiple correlation
coefficients. The formula for the same is given below:

The formula for determining the multiple correlation coefficient


The error term represented by € represents the error term. This is the error which creeps into the
process and which does have its influence on the final say. It is taken in every statistical and
mathematical equation.
Refer again to figure 2, which depicts the multiple correlation values among the variables in the
generation of the regression equation.

Figure 1.2: Depiction of Regression Equation

Here it is seen that the values of β are replaced with the weights, which are derived on the basis of
mathematical calculation. Thus factor X1 has a weight of 0.154 in the final of the output variable.
Linear Regression Equation
In the multiple regression equation represented in figure 1, when the input variable is 1, then it is
reduced to linear regression. It is given by the equation.

Regression equation in terms of simple correlation; coefficients.


Note that it contains one input variable and one output variable. It has an error component part
associated with it.
Now the value of β can be derived in terms of correlation coefficient to get the above equation to
be converted as
The formula for regression coefficients and the values are substituted in the equation.

1.2 Reliability of the Estimate


When we are dealing with data processing in statistics, we are required to develop estimates. In
other words, it means that we must arrive at some calculations on the basis of which we are in a
position that our estimation model or the process is correct and that it will generate the same output
irrespective of the inputs provided to the model.
In other words, our model is reliable enough to withstand the test of time.
Thus, this section deals with the reliability of the estimate.
Referring to the previous section, we have in the regression equation formula we a have a term
known as error term represented by the symbol €. When our model is reliable that the difference
between the actual output and the output produced by the model is insignificant then we can say
our model is reliable.
Let us take an example of what we are discussing. Consider the data given below:
Figure 1.3: Depiction of the Concept of Reliability
From the figure, it is seen that a model is developed for the smoke detectors in living spaces
comprising of residential, commercial and institutional with various uses of the property. The
sample size so chosen is 10. It is seen that reliability is estimated at 95% interval at the upper and
the lower end. Thus, for the residential apartment, we have the values of 69.9 and 68.7 at the upper
end and the lower end of the confidence interval. This means that the model is successful, with a
reliability of 69.9% most of the time. In the worst-to-worst-case scenario, it will give a reliability
estimate of 68.7%
It is to be noted that the concept of reliability of the model is subjected to issues and challenges.
This is due to the fact that the reliability of estimates is never 100% accurate due to the types of
reliability tests and their nature.
In general, figure 4 below depicts the various types of reliability tests.
Figure 1.4: Depiction of Various Types of Reliability Tests
From the figure it is seen that the nature of the reliability tests produces different findings. Hence
we may have different values for reliability.

• Knowledge Check 1
State True or False.
1. Correlation is widely is the conduct of exams (True / False)
2. The regression is used to draw out values which are not a part of the data set (True / False)
3. The reliability is used to generate the same output again (True / False)
4. The value of R is between ±∞ (True / False)
5. The value of r is between ±∞ (True / False)

• Outcome-Based Activity 1
Prepare and excel sheet to demonstrate the concept of correlation coefficient. Prepare the sheet
for at least 20 events/activities. Prepare the sheet in the format given below as an example.

Sr. No. Event / Activities / Instances Variables used


1. Buying a car Price and category of car
2 Buying an incense stick Smell and price
Price and category
Colour and price
3 Buying a house Location and Conveyance
Price and Area Size

1.3 Multiple Correlation; Partial Correlation


We have discussed the concept of correlation. When two variables are involved in the correlation,
it is a simple correlation. However, when we deal in reality, we do not deal with two variables; we
have more than two variables in the picture. This is a multiple correlation.
Thus, multiple correlation is the process of dealing with input variables with more than one input
variable.
On the other hand, we have the concept of partial correlation coefficient. In the case of partial
correlation, we keep the value of one or more values constant, and then we study the impact of the
other variables which play their role in multiple correlation handling.
Refer to figure 5, which depicts the formula to be used to arrive at the multiple correlation.

Figure 1.5: Depiction of Formula for Multiple Correlation Coefficient


From the formula it is seen that the multiple correlation coefficient is represented by the letter R.
The subscripts a, b, c represent the multiple correlation coefficient based on the input values b and
c to the output value a.
Further, the value of R is dependent on the values of the simple coefficient values or r.
The concept of the partial correlation coefficient.
The concept of partial correlation is represented by the fact that one or two or more variables are
fixed, and then the partial multiple regression is determined.
Refer to the figure given below:

Figure 1.6: Depiction of Application of Partial Correlation

• Knowledge Check 2
Fill in the Blanks
1. The value of R lies between 1 and _______ ( 0 / 2)
2. The value of r lies between 1 and _________ (-1/2)
3. The partial correlation is when ________ of the variables is kept constant (One / Two )
4. When we get the same output again and again, it is known as____________ (Reliability/
consistency)
5. Correlation measures the degree and depth of __________ between two variables
(association/correspondence)
• Outcome-Based Activity 2
Prepare an Excel sheet to demonstrate the concept of multiple correlation. Take at least 15
instances /events /activities from real-life scenarios.
Prepare the Excel sheet in the format given below as an example.
Sr. No. Event / Activities / Variables used
Instances
1. Buying a car Colour, price, social status, category of class

1.4 Summary
• In statistics, we often use the term correlation and regression.
• Correlation refers to the degree and depth of the association of the relationship between the
two variables, which demonstrates a linear relationship.
• The value of the correlation lies between -1 and +1.
• The correlation is denoted by the letter r.
• A value of +1 indicates a perfect relationship, while the value -1 demonstrates a perfect
negative relationship.
• The concept of correlation and regression is widely used in industry.
• However, real-life situations demand that there may be more than two variables that impact
the event or the phenomenon.
• In such cases, we construct a linear equation comprising of the variables which, even though
they may demonstrate the linear relationship amongst themselves.
• This is known as multiple correlation.
• It is represented by R.
• Close to the multiple relationship, we have the concept of partial correlation.
• Partial correlation is the case wherein we keep the values of one or more of the variables
constant while we estimate the correlation between the other two variables.
• Regression, on the other hand, refers to the process of generating a straight line based on
statistical techniques which is used to determine the values which are not available in the data
set.
1.5 Keywords
• Regression equation: It is the equation of a line based on the coefficients obtained by the
application of statistical techniques using correlation ratio.
• Reliability: It is a term which is used to demonstrate the generation of the same output by
altering the input conditions over a long period of time.
• Linear relationship: This is the relationship between two variables which are correlated and
their relationship can be constructed in the form of a mathematical equation of a straight line.
• Correlation: It is the unit of measure which is used to assess the degree and the strength of
the linear relationship between two variables.
• Partial correlation: It is used to assess the degree and the strength of more than two variables
when they are joined by means of a linear relationship.

1.6 Self-Assessment Questions


1. What is meant by the multiple regression? Explain with an example.
2. What is meant by the term partial regression? Explain its importance in day-to-day operations.
3. What is meant by the term reliability of an estimate? Explain with examples.
4. What is meant by the term linear equation? Explain.
5. What is the purpose of the regression equation? Explain its purpose.

1.7 References / Reference Reading


• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, Sultan Chand &
Sons, September 2020.
• Fundamentals of Mathematical Statistics by Steffen Lauritzen, CRC Press, February 2023.
• Fundamentals of Statistics by B Dasgupta, August 2013.
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, September 2020.
Unit 2: Correlation and Regression

Learning Outcomes:
• Students will be able to define the concept of index numbers and their uses in business
operations.
• Students will be able to identify and differentiate between various types of index numbers.
• Students will be able to demonstrate effective usage of chain base index numbers and base
shifting in the decision-making process.
• Students will be able to describe the tests of adequacy for the construction of index numbers
in business operations.
• Students will be able to analyse the various problems and findings of the solution in
constructing index numbers in decision-making scenarios.

Structure:
2.1 Index Numbers: Meaning, types and uses; Methods of constructing price and quality indices
Simple and aggregate
2.2 Test of adequacy; Chain base index numbers
• Knowledge Check 1
• Outcome-Based Activity 1
2.3 Base Shifting, splicing and deflating; Problems in constructing index numbers; Consumer
Price Index
• Knowledge Check 2
• Outcome-Based Activity 2
2.4 Summary
2.5 Keywords
2.6 Self-Assessment Questions
2.7 References / Reference Reading
2.1 Index Numbers: Meaning, types and uses; Methods of constructing price and quality
indices Simple and aggregate
In statistics, we are required to deal with data. The term dealing with data is meant to convert the
data into a form wherein we can derive some knowledge and make decisions that will have far-
reaching consequences.
Let us take an example of what we mean by the term process; the data derived some useful
information to make decisions.
Consider the table given below, which represents the downtime of a particular machine.
Month Duration (In Hours)
Jan 10
Feb 24
March 52
April 17
May 24
June 36

From this table, we cannot determine any useful information, that is, the information from which
we can make decisions. This means that we need to do something more than this mere
representation of the fact that the machine was down for a certain duration in the last six months.
If we are able to say calculate the average downtime, then we can use this information to make
decisions. For example, we can concentrate on estimating the revenue loss occurring due to
machine downtime and thus consider this point while preparing the proposal.
With this concept, we now move to the concept of index numbers. The concept of index number
is also linked to the process of generating useful information from wherein we can compare the
changes in the variable.
Just as the average in the above table provided us with a central figure, index numbers are used to
provide the changes in the variable with respect to a specific base point.

Refer to the table given below:


Year Value in Rupees

2018 1200
2019 1256
2020 2345
2021 6543
2022 2345
2023 6423
2024 1235

The table above depicts the data in terms of the value of an entity in rupees. From the table, it is
very difficult to make the comparison. Further, we need to compare so that we are in a position to
determine the amount of extra effort needed to maintain our position in the market.
We can process this data in this manner to generate useful information to make decision-making
easier.
Refer to the table given below:
Year Value in Rupees Index number

2018 1200 100


2019 1256 104.6666667
2020 2345 186.7038217
2021 6543 279.0191898
2022 2345 35.83982882
2023 6423 273.901919
2024 1235 19.22777518
The table provides some useful information from wherein we can make decisions. For example,
refer to the figures in the column on the extreme right represented by index number. The value for
2018 is 100 and then the subsequent figures are 104.66, 186, 70 and so on.
In order to understand these figures, the methodology adopted is to choose a reference point. In
this case, we have taken the base value of 2018 as 100 and further calculations are done with
reference to this base year. With this approach, we are in a position to measure the relative change
in the value. For example, the value of 104.66 for the year 2019 represents that the change is to
the tune of 4.6 %; that is, the change in the year 2019 is 4.66 % as compared to the year 2018.
Similarly, we see that there has been a heavy relative change of 279.01 in the year 2021. This
means that from 2018, the change is huge, which 179.01 % is. Thus, we are able to measure the
change.
Meaning of Index number. The index number is thus a measurement technique which is used to
measure the rate of change of a variable with respect to the variable in terms of the reference points.
Types of Index numbers.
Having understood the basic concept and application of index numbers, let us now discuss the
various types of index numbers.
The following points enumerate the types of index numbers.
1. Value Index
A value index number represents the ratio of the aggregate value for a period with that of the
aggregate value represented by the base period or the reference period. The value index is
utilised for measuring the changes in inventories, changes in production, changes in foreign
trade and the like.
2. Quantity Index
A quantity index number is used to measure the changes in the quantity of goods within a
stipulated period.
3. Price Index
A price index number is used to measure the changes in the price within the stipulated period.
4. Consumer Price Index
This is the index that the consumer incurs when buying goods or services for the corresponding
period compared to the base period.
5. Wholesale price Index
This is the index which measures the change of whole prices with respect to the base period
during the stipulated period.
Uses of index numbers
We have discussed the various aspects of index numbers. Now, let us discuss the uses and
limitations of index numbers.
▪ As discussed, the index numbers provide an opportunity to measure the relative change in
variables. This has several uses, such as wage calculation, providing employee appraisal
increments, and so on.
▪ The government uses several indexes for the measurement of performance at global and
international levels. For example, ease of doing business in India, happiness index,
satisfaction index and index of industrial production.
Advantages of Index Number
The following points enumerate the advantages of using index numbers
▪ It provides an easy mechanism for comparing data with reference to the corresponding time
frame.
▪ It reduces the data set to a single point of comparison so that we can make decisions.
▪ Index numbers help us to remove inconsistencies and develop a plan of action to improve
processes.
▪ Index numbers eliminate the difference of units while making comparisons for decision-
making.
Limitations of Index Number
▪ Index numbers provide one side of the story. In other words, they represent the number,
which is in its absolute form and does not take into account the other environmental factors
that play an important role in the decision-making process.
▪ Index number is only a relative measurement and fails to produce the actual variation
during the stipulated period. In other words, we cannot determine the variation within the
variables. For that, we have to use the time series method.
• Methods of constructing price and quality indices simple and aggregate
Let us now discuss the various methods of constructing index numbers.
Refer to Figure 1 given below, which depicts the concepts pertaining to the construction of index
numbers.
Figure 2.1: Depiction of Construction of Index Numbers
From the figure, it can be seen that there are basically two methods for constructing index numbers.
They are unweighted and weighted index numbers. Further, each of them is classified as simple
aggregative and simple price relatives. In the category of weighted, we have the weighted
aggregative and weighted average of price relatives.
First let us discuss what is meant by the terms weighted and unweighted.
In simple parlance, the term unweighted refers to a method wherein we do not allot weights to the
commodity variable under consideration. On the other hand, when we allot weight to the variable
under consideration, it is known as weighted index numbers.
Simple index number construction
In this method, we determine the price relative of individual items. Price relative is the percentage
ratio of the value of a variable in the current year to its value in the reference year.
We can use this formula for simple index number construction.
Price relative (R) = (P1÷P2) × 100.
We have already covered this as an example in outcome-based activities.
Simple index number construction
• Aggregate Index number
An aggregate index number is calculated by summing all the items in the composite for the given
period and then dividing this result by the sum of the elements during the reference period. An
aggregate price index is a simple method for constructing index numbers.
Refer to Figure 2 for constructing index numbers.

Figure 2.2: Methods for Constructing Various Index Numbers

2.2 Test of Adequacy; Chain Base Index Numbers


When dealing with the index numbers, we must test whether the method being used is adequate or
not. This means that we are required to determine whether the data presented for the purpose does
not adversely impact the outcome. In other words, whether all the variables are enough to generate
the output.
For example, this test means if an index number for the current year is determined on the basis of
the base year represented by P01 and then the index number of the base year is calculated on the
basis of the current year represented by P10, then both the index numbers should be reciprocal to
each other in other words, the product of both of them should be one.

Figure 2.3: Concept of Test of Adequacy


The figure shows that two index numbers are calculated with the reciprocal base years, and then
the product of these two reciprocal index numbers is one. It is to be noted the index numbers are
aggregative of the two values P1 Q 0 .
Type of test of consistency
Refer to figure 4 given below, which depicts the concepts of type of adequacy.

Figure 2.4: Types of Tests of Accuracy with Respect To Base Period

From the figure, it is evident that there are basically three types of rest of adequacy. We have time
reversal tests, factor reversal tests and circular tests. It is seen that the variables are interchanged
in all three cases to provide the check for adequacy, resulting in the final outcome.
Chain base index numbers
Chain-based index numbers are a special type of index number that measure the changes in the
value or quantity of a variable relative to the reference period. They are determined by linking the
index numbers of consecutive time periods by applying the values of the previous time period,
thus forming the new base for the subsequent period.
Refer to the figure 5 given below:
Figure 2.5: Depiction of Calculation of Chain-Based Numbers

From the figure, it can be seen that the value of the recalculated index numbers is the number that
is obtained by keeping the base, which itself is an index for the previous number. For example, for
the year 1988, we have the figure 109 / 108 *100 = 100.9. Here it is seen that that of the year 1987
was 108 while the index for the year 1988 was 109; hence, we have the index as 109 / 108 that is
the index of the previous year, which forms the base, while the current index of 109 is the
numerator. Hence, by using proper calculations, we generate the chain-based numbers.

• Knowledge Check 1
State True or False
1. Index numbers are used to demonstrate the change of prices (True / False)
2. The index numbers are mainly used for comparison (True / False)
3. The construction of index numbers is easy (True / False)
4. The concept of chain base index numbers is used to determine the index number for future
events (True / False)
5. The tests of reliability generate the same input (True / False)

• Outcome-Based Activity 1
Prepare the excel sheet to demonstrate the concept of aggregate index numbers. Refer to the
figure given below as an example.

Note: You have to take at least 10 commodities by taking hypothetical values and assumptions.

2.3 Base Shifting, splicing and deflating; Problems in constructing index numbers;
Consumer Price Index
We now discuss some of the various techniques that are used to determine the index number.
Base shifting
This is one of the techniques most commonly used to determine index numbers. In essence, the
concept of base shifting refers to the process of changing the base year or the reference point
contact of the index number.
This base shifting may be triggered by any of the reasons, including that the base has become
obsolete and does not serve the intended purpose. Hence, it needs to be changed so as to ensure
that it is capable of meeting the purpose. Other reasons for shifting the base may include that no
data is available for the other variables, which are taken for consideration of index numbers.
Let us take an example: suppose that we have developed the index number for the base year 2010.
As the business world is highly dynamic, this index does not suit well, and hence, it needs to be
changed to include modern global factors. In other words, we need to shift the base from 2010 and
may include the data pertaining to 2015; that is, we have shifted the base by 5 years.
Refer to the figure below, which depicts the concept of base shifting.
Figure 2.6: Depiction of Base Shifting

Splicing and deflating


When we are dealing with the index numbers construction process, we try to apply various
techniques so as to meet our purpose. Splicing and deflating are techniques which are applied in
the process.
Splicing is the process of joining or combining two or more of the index numbers, each covering
different reference points of contact, into a single series. Worth mentioning is the fact that we have
forward splicing and backward splicing.
Refer to the example given below:
Deflating
It is the process used in the construction of index numbers wherein we try to adjust the values of
the current price so as to determine the real level of changes. This is called deflating the index
numbers.
Refer to the example given below to depict the concept of deflating.

Figure 2.7: Depiction of Example of Deflating


From the figure, it can be seen that the deflating procedure is based on the price index for the year
based on real wages.
Problems in constructing index numbers
We have discussed the various aspects of index numbers. However, there are several problems
associated with the process of constructing index numbers.
The following points enumerate some of the problems in the construction of index numbers.
a. The objective of the index number. This is the most common problem which is encountered
during the process of construction of index numbers. Often, the objective is missing. This
means that the purpose of the index number is not fixed. This results in the collection of
incorrect data, resulting in time and money losses.
b. The problem is determining what formula is needed. This problem, again, is driven by
incorrect application of basic concepts.
c. The other problem that is encountered is the selection of the variables during the
construction of index numbers. This problem again has its origin in the non-fixation of
purpose, that is the basic objective.
d. The other problem that is encountered is the process of determining the base year from
which the index numbers are needed to be determined.
Refer to the figure given below, which depicts some of the basic problems encountered during the
construction process of index numbers.

Figure 2.8: Some of the Basic Problems Encountered in the Process of Construction of Index
Numbers
Consumer price index
The Consumer Price Index is used to determine the overall variance or the change in consumer
prices based on a commodity, which may be a product or a service over a given time that is
stipulated time. It is to be noted that the concept of consumer price index is very important in terms
of determining economic policies pertaining to the consumption and demand concept. It is on the
basis of CPI that the government determines the rate of inflation and other aspects from time to
time.

Figure 2.9: Depicting the Areas or the Domains of CPI


From the figure, it is evident that the concept of the consumer price index has larger ramifications
in terms of application wherein the consumer is active and willing to pay heavily. For example,
the consumer will not compromise on food and medical care. He will go all out to spend whatever
it takes to get these two basic necessities of life.

• Knowledge Check 2
Fill in the Blanks
1. Index number if generally represented as _______. (Fraction / Decimal / Percentage)
2. The reference point of consideration is also known as __________ year. (Base / Current )
3. The construction of index numbers is ________ and complicated. (Complex/ Simple)
4. Index numbers are widely used by __________. (Business Units / Individuals /
Government)
5. Index number is widely used to measure ________ over a reference point.
(changes/variations)
• Outcome-Based Activity 2
Prepare an excel sheet to demonstrate the concept of aggregate index numbers. Prepare the
sheet in the format given below as an example.
Year Quantity Quantity Value Value Index
index number
Number
2018 200 100 10000 100
2019 240 120 12480 124.8
2020 220 110 12320 123.2
2021 300 150 18000 180
2022 320 160 20480 204.8

Note you are required to generate the index number data set for at least 10 years. Assume any
arbitrary values.

2.4 Summary
• In statistics, we are required to deal with data.
• Data processing involves several methods, such as classification, tabulation and measurement
of change.
• When we deal with the change of one variable of an entity with the change in the other variable
of the entity, we obtain a ratio. This ratio, when expressed in percentages, is known as index
number.
• The index number thus represents the change in the variable, such as price with respect to the
change in the price of the variable over the reference time period.
• The reference period is known as the base period.
• The index number is widely used in the day-to-day operations of various agencies.
• The government uses index numbers to formulate various economic policies.
• There are various types of index numbers.
• Simple index numbers use the price or the quantity of the entity and provide the measure of
the relative change over the reference period.
• Aggregate index numbers take into consideration the product of price and quantity of one
commodity with respect to the reference period for the product of price and quantity.
• The process of constructing index numbers is a well-thought-out off methodology.
• It involves taking into aspects several points of consideration.
• This includes the decision on agreeing on the base or the reference year, the units of the
quantities or the price, and the factors that need to be considered.
• There are several tests which are taken into consideration when we deal with the process index
numbers.
• This includes the test of adequacy. Test of adequacy is the test which is executed to determine
whether the test is adequate enough to provide the relevant knowledge for the purpose of
making decisions.
• Test of reliability is a measure of the index number to generate the same output when different
inputs are provided.
• Chain-based index numbers are a method that is used to generate index numbers when the
previous index numbers are available. The method deployed uses the concept of chain wherein
we use the index number of the previous year to generate the index number of another year
and so.
• Base shifting is a term used to generate index numbers based on changing the reference point
that is the base of index number calculation.
• Consumer price index is an index which reflects the change in the prices paid by consumers in
the purchase of goods and services.

2.5 Keywords
• Index number: An Index number is a ratio which is designed to measure the variation amongst
two variables. It is usually measured in terms of percentages.
• Price Index number: It is the ratio that depicts the rate of change of price of the same
commodity with respect to price in a particular year, known as the base year.
• Simple Index: A simple index number is the ratio of two entities which represent the same
variable, which may be quantity or price in the context of a reference time frame known as
base year.
• Linear relationship: This is the relationship between two variables that are correlated, and
their relationship can be constructed in the form of a mathematical equation of a straight line.
• Correlation: It is the unit of measure which is used to assess the degree and the strength of
the linear relationship between two variables.
• Partial correlation: It is used to assess the degree and the strength of more than two variables
when they are joined by means of a linear relationship.

2.6 Self-Assessment Questions


1. What is an index number? Explain its usage in day-to-day business scenarios.
2. What are the problems encountered in the construction of Index numbers? Explain briefly with
examples.
3. What is meant by the term test of adequacy? Explain with examples.
4. What is meant by the term chain base index numbers? Explain with examples.
5. What is meant by base splitting? Explain with examples.

2.7 References / Reference Reading


• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, Sultan Chand &
Sons, September 2020.
• Fundamentals of Mathematical Statistics by Steffen Lauritzen, CRC Press, February 2023.
• Fundamentals of Statistics by B Dasgupta, August 2013.
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, September 2020.
Unit 3: Time Series

Learning Outcomes:
• Students will define the concept of time series and its use in business operations.
• Students will identify and differentiate between various models of time series.
• Students will demonstrate effective usage of time series in the decision-making process.
• Students will describe the various methods of constructing time series for making decisions.
• Students will analyse the estimated seasonal variations of time series for the purpose of making
decisions.

Structure:
3.1 Time Series Analysis: Components of a time series, Models of time series analysis- additive
and multiplicative
3.2 Methods of constructing seasonal index; Adjusting time series data for seasonal variations
• Knowledge Check 1
• Outcome-Based Activity 1
3.3 Estimation of seasonal variations
• Knowledge Check 2
• Outcome-Based Activity 2
3.4 Summary
3.5 Keywords
3.6 Self-Assessment Questions
3.7 References / Reference Reading

3.1 Time Series Analysis: Components of a time series, Models of time series analysis-
additive and multiplicative
Statistics is a branch of mathematics that is used for the purpose of data analysis. By the term
analysis of data, it is construed to mean that we deploy various tools and methods to uncover more
information from the data itself than what is depicted at face value.
In other words, we look beyond the data and then what is reflected in the form of tables or a data
distribution.
Let us take an example of what we are discussing at the moment.
Consider the table given below:
Sr. Year Number of Customer Complaints
No.
1 2001 367
2 2002 562
3 2003 345
4 2004 623
5 2005 286
6 2006 345
7 2007 653
8 2008 235
9 2009 642
10 2010 345

From the table, it is impossible to look beyond what the data can reflect. At present, the data in its
present form is just a display board. Yet it contains a wealth of information for the decision-makers.
The following points enumerate the wealth of information.
▪ At present, the data is a time series data. This is because it contains data over a period of 10
years.
▪ We need to apply some statistical measures to process the data. In other words, we need to
apply procedures to unearth the wealth of information so that we are in a position to make
some decisions.
▪ We can plot the graph of the data pertaining to customer complaints. This graph will depict the
trends and patterns of customer complaints.
▪ Alternatively, we can determine the mean and the median on the data set to determine the
central figure and location from which we can understand the time series better and thus arrive
at a decision that may be effective and efficient.
▪ Or we can apply the techniques of dispersion to understand the spread of data.

Refer to the graphic info below based on the data set given in the table.

Time Series Data


700

600

500

400

300

200

100

0
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

Graphic info data depicting extra information from that given in the table
It is seen that the line graph is depicting the irregular pattern and trend of the data presented in the
table. With this visual info, we can understand the time series data in a better manner, and thus,
we are in a much better position to undertake decisions.
Thus, these are some of the wealth of information which can be uncovered from this data.
Components of time series
Time series, as discussed, contains data which is arranged chronologically. In other words, time
series data contains data over a period of time. This period of time is used to determine the
components of time series.
Thus, when the data is collected over a periodic cycle, the components of the time series contain
cyclic data.
If the data is collected over a season, say summer season, winter season, festival season or
clearance season, then the component is a seasonal component of the time series.
If the data is collected for the purpose of deterring the trend or pattern, then it is the seasonal
component.
If the purpose of the data is to determine the outlier in the process, then it is known as the outlier
component.
Refer to Figure 3.1 given below, which further elaborates on the various components of time series.

Figure 3.1: Depiction of various components of time series in descriptive form


From the figure, it can be seen that the irregular component is used to determine the noise that is
the outlier, which refers to the detection of an exceptional phenomenon in the general pattern.
The same components are depicted in the visual form as given in Figure 2 below:

Figure 3.2: Depiction of time series components in visual form


From the figure, it is seen that the irregular component of the time series is depicted by the red
circle. This circle contains the outlier, which is the exceptional part of the data set. It is seen that
the circled part has a different pattern from the usual pattern depicted by a straight line.
In a similar manner, we can observe that the pattern of lines depicts the seasonality, that is, the data
collected over a particular season.
• Models of time series analysis - additive and multiplicative
When we analyse the time series, we are required to keep the overall objective in mind. This means
that it is the purpose which is responsible for deciding the various models or techniques needed to
start the process of achieving the purpose.
Hence, we may be required to classify the data set, allocate the data set to the defined categories,
plot the curve to the data points, and study the relationship between the variables.
The main purpose of applying these techniques is to make predictions so as to meet the objectives .
In general, two-time series models are applied in the process of forecasting on the basis of the data
set given in the time series.
They are additive and multiplicative.
Let us discuss these models.
• Additive time series
In the case of additive time series, we add the components of time series to generate a new time
series. It is represented by the mathematical equation given below:
Y (t) = level + Trend + seasonality + noise
Where the level refers to the averaging of the time series components which may be based in
quarterly or moving average method.
The trend is the trend which is depicted by the time series.
Seasonality is the seasonal data in the time series.
Noise is the presence of an outlier.
Multiplicative time series
In the case of multiplicative time series, we multiply the various components. The mathematical
equation is given as:
Y (t) = Level * Trend * seasonality * Noise
Refer to Figure 3.3 given below to depict the visual aspects of additive and multiplicative time
series.
Figure 3.3: Depiction of additive and multiplicative time series
From the figure, it is seen that in the case of additive component, we have a sort of straight line
graph, whereas in the case of multiplicative, we have the exponential graph

3.2 Methods of constructing seasonal index; Adjusting time series data for seasonal
variations
Before we dwell further, let us understand what the term seasonal indices means. In general,
seasonal variations are the variations or fluctuations within a year over different seasons that occur
during the year. In other words, we have the data pertaining to the various seasons of the year.
Examples include the winter season, summer season, or festival season.
All these variations during the season are required to be studied so that we are in a position to
make decisions. The main objective for measuring the seasonal variation is to study the effect of
seasons and to isolate them from the prevailing trend.
In general, the following are the methods for dealing with seasonal indices.
There are four methods of constructing seasonal indices.
1. Simple averages method
2. Ratio to trend method
3. Percentage moving average method
4. Link relatives method
The following figure depicts the processing for seasonal indices.
Hence, it is observed that we first calculate the seasonal average for the individual entities. In this
case, we have obtained the following seasonal average to get the values s 169.82, 290.22, 498.1,
and 24.18. Next, we calculate the grand average to get the as 245.58.
The next is the series of steps to calculate the seasonal index by using the formula.
Seasonal average/grand average * 100
Using this, we get the seasonal average as 69, 118, 203 and 10
Adjusting time series data for seasonal variations
When we are dealing with the process of estimations and predictions, variances and noises are
bound to occur. This means that in due course of time, there are chances that error may crept in
due to unknown factors. For example, in rounding off of the numbers, we may take incorrect
values. Thus, these are required to be adjusted. Seasonal adjustment is thus a technique that is
widely used to enable the timely interpretation of time series data so that these variances or errors
may be accounted for. The main purpose of seasonal adjustment is to detect and remove systematic
variation which is associated with seasonal effects.
Refer to the figure given below to depict the concept of seasonal variation.
Figure 3.4: Depicting seasonal variation with adjustments
From the figure, we see that we have a column that depicts the seasonal variation. This variation
is obtained from the difference of the trend and production. Hence, the variation of +15 indicates
that an executive needs to adjust this variation. In other words, we have to ensure that action is
needed in those cases where the variation is negative, and hence, more resources are needed.
Refer to another detailed figure which is directly dealing with the various procedures in real-life
scenarios.

Figure 3.4: Depicting the adjustments in time series data for seasonal variations
From the figure, it is seen that there are various instances or seasons such as economic depression,
bankers panic, recession, oil crises and dot com bubble are the various seasons and these are
isolated from the main time schedule. These are adjusted accordingly to get the seasonal index.

• Knowledge Check 1
State True or False
1. Time series data must be available in chronological order (True / False)
2. Time series data is used to depict trends only (True / False)
3. Time series data is used in decision-making as well as in executing those decisions (True /
False)
4. Seasonal index is a component of time series (True / False)
5. Seasonal variations are used to construct time series (True / False)

• Outcome-Based Activity 1
Prepare an Excel sheet to demonstrate the concept of the time series of FMCG company by
using the concept of cyclical trends for at least 20 FMCG products of a particular organisation.
Prepare the Excel sheet in the format given below as an example of advertising containing
cyclical data.
Time series data for various Products Sold
Month Product Type Number of units sold
Jan 2000 Blower 34
Feb 2000 Blower 23
Mar 2000 Fan 42
Apr 2000 Fan 22
May 2000 Cooler 45
Jun 2000 Air Conditioner 23
Jul 2000 Cooler 22
Aug 2000 Fan 22
Sept 2000 Fan 23
Oct 2000 Room Heater 12
Nov 2000 Blower 15
Dec 2000 Blower 16

3.3 Estimation of seasonal variations


We have been discussing the various aspects of seasonal indices. In essence this involves the aspect
of isolating the variation from the main trend and using this isolation for the purpose of calculation
of future trends or objectives of the business.
In order to understand the concept, let us take a specific example.

From this data, we will calculate a three-month moving average, as we can see a basic cycle that
follows a three-monthly pattern (increases January - March, drops for April, then increases April -
June, drops for July and so on). In an exam, the question will state what time period to use for this
cycle/pattern in order to calculate the averages required.
Step 1 — Create a table
Create a table with 5 columns, shown below, and list the data items given in columns one and two.
The first three rows from the data given above have been input in the table:

Step 2 — Calculate the three-month moving average.


Add together the first three sets of data; for this example, it would be January, February and March.
This gives a total of (125+145+186) = 456. Put this total in the middle of the data you are adding,
so in this case, across from February.
Then calculate the average of this total, by dividing this figure by 3 (the figure you divide by will
be the same as the number of time periods you have added in your total column). Our three-month
moving average is, therefore (456 + 3) = 152.
The average needs to be calculated for each three-month period. To do this, you move your average
calculation down one month, and the next calculation will involve February, March and April. The
total for these three months would be (145+185+131) = 462, and the average would be (452 + 3)
= 154.

Continue working down the data until you no longer have three items to add together. Note: you
will have fewer averages than the original observations as you will lose the beginning and end
observations in the averaging process.

Step 3 — Calculate the trend


The three-month moving average represents the trend. From our example, we can see a clear trend
in that each moving average is $2,000 higher than the preceding month moving average. This
suggests that the sales revenue for the company is, on average, growing at a rate of $2,000 per
month.
This trend can now be used to predict future underlying sales values.
Step 4 — Calculate the seasonal variation
Once a trend has been established, any seasonal variation can be calculated. The seasonal variation
can be assumed to be the difference between the actual sales and the trend (three-month moving
average) value. Seasonal variations can be calculated using additive or multiplicative models.
Using the additive model:
To calculate the seasonal variation, go back to the table and for each average calculated: compare
the average to the actual sales figure for that period.

A negative variation means that the actual figure in that period is less than the trend, and a positive
figure means that the actual figure is more than the trend.
From the data, we can see a clear three-month cycle in the seasonal variation.
Every first month has a variation of -7, suggesting that this month is usually $7,000 below the
average. Every second month has a variation of 32 suggesting that this month is usually $32,000
above the average. In month 3, the variation suggests that every third month, the actual will be
$25,000 below the average.
It is assumed that this pattern of seasonal adjustment will be repeated for each three-month period
going forward.
Using the multiplicative model:
If we had used the multiplicative model, the variations would have been expressed as a percentage
of the average figure, rather than an absolute. For example:

This suggests that month 1 is usually 95% of the trend, month 2 is 121% and month 3 is 84%. The
multiplicative model is a better method to use when the trend is increasing or decreasing over time,
as the seasonal variation is also likely to be increasing or decreasing.
Note that with the additive model, the three seasonal variations must add up to zero (32-25-7 = 0).
Where this is not the case, an adjustment must be made. With the multiplicative model, the three
seasonal variations add up to three (0.95 + 1.21 + OR4 = 3). (If it was a four-month average, the
four seasonal variations would add to four, etc). Again, if this is not the case, an adjustment must
be
made.
In this simplified example, the trend shows an increase of exactly $2,000 each month, and the
pattern of seasonal variations is exactly the same in each three-month period. In reality, a time
series is unlikely to give such a perfect result.

• Knowledge Check 2
Fill in the Blanks
1. The time series must contain data in ___________ order (chronological / structured)
2. The time series is used to determine ___________ (Trends/variations decisions)
3. ___________ variations are those which occur in a fixed order (Cyclic / Seasonal)
4. The outcome of time series analysis is output which can be used for making ______
(decisions/compliances agreements)
5. An exceptionally brilliant student is an example of __________ (Outlier / Trend)

• Outcome-Based Activity 2
Prepare an Excel sheet to demonstrate Tata Motors' quarterly moving average for the last 16
years. The data must display the information of the sale of passenger cars. You may assume
any hypothetical figures to demonstrate the concept.
Prepare the Excel sheet in the format given below as an example of hypothetical data.

Time Series Data for Sale of properties in Undri Region, Pune


Year Property Type Number of units Quarterly
sold moving average

2001 Residential 20
2002 Commercial 21
31.75
2003 Urban 34
2004 Semi-Urban 52
2005 Residential 12
2006 Residential 16
25
2007 Residential 27
2008 Residential 45
2009 Commercial 33
2010 Urban 35
38
2011 Rural 43
2012 Urban 41

3.4 Summary
• A time series is a collection of data over a period of time.
• The collection of data may comprise cyclical variations or seasonal variations, or it may pertain
to trends and patterns of a particular event or activity.
• The time series data comprise of basic components.
• These components are used to construct the time series or to analyse the time series.
• The components of time series outliers are cyclical, seasonal, and irregular.
• The classification of these components is done on the basis of the data which is available and
on the objective or the purpose for which the time series is required to be analysed.
• Seasonal time series data contains the data for the particular season.
• This season may be a natural season, such as summer, winter, or spring.
• Alternatively, the season may be defined by the business executives based on the business
priorities.
• This may include aspects such as festival season, stock clearance season, or it may be
customised for a grand event.
• The outlier component of a time series is a component that is used to determine the outlier, that
is, the noise in the data set.
• The noise is the data point or points which do not follow the normal pattern demonstrated by
the data set.
• Cyclical components of the time series are those which are used to study the behaviour or the
patterns based on cycles.
• These cycles may be natural cycles or manmade cycles.
• For example, we may use the cyclical data pertaining to the clearance sale cycle.
• In other words, we may have the data wherein the discount is offered based on repeated cycles
of 4 months. In other words, after every 4 months, the discount is given to the customers.
• An index is a number that is it a ratio which is used to depict the comparison of two entities
over a period of time.
• An index is widely used in the decision-making process.
• The main reason for using the index is the fact that an index makes the information unit free
while providing the comparative status of two variables.
• There are challenges and issues which are encountered during the process of analysing the time
series.
• This includes aspects such as fixing the purpose for the analysis, fixing the methodology to be
adopted for the purpose of analysis, and other aspects such as timelines and cost considerations.
3.5 Keywords
• Time Series: It is the representation of data arranged in chronological order, which is used to
assist the executive decision-maker in the process of making decisions.
• Seasonal time series: It is the representation of time series data, which contains data based on
seasonal variations. For example, a representation of data which contains the number of desert
coolers sold during the summer months.
• Index number: An Index number is a ratio which is designed to measure the variation amongst
two variables. It is usually measured in terms of percentages.
• Linear relationship: This is the relationship between two variables which are correlated, and
their relationship can be constructed in the form of a mathematical equation of a straight line.
• Methods of moving average: This is a method which is deployed for analysing the time series
data wherein the data is partitioned on the basis of quarterly averages.
• Smoothing: This is a method that is deployed for analysing the time series to remove the
extreme variations that impact the decision-making process.
• Trend: It is the characteristic that is demonstrated by the data that is collected over a large
period of time.
• Cyclic variations: These are the variations which are demonstrated by the data in the time
series on the basis of the particular cycle. For example, the same trend is repeated after every
quarter.
• Lag: It is the difference in the data between a given interval.
• Forecasting: It is the technique used to make predictions or estimations based on the data
represented by time series data.

3.6 Self-Assessment Questions


1. What is a time series? Explain its importance in the decision-making process.
2. What are the various types of time series? Explain with examples.
3. What are the components of time series? Explain their importance with respect to the decision-
making processes.
4. What is a seasonal index? Explain its role in decision-making processes.
5. What are the challenges encountered in the process of analysing time series? Explain with
examples.
3.7 References / Reference Reading
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, Sultan Chand &
Sons, September 2020.
• Fundamentals of Mathematical Statistics by Steffen Lauritzen, CRC Press, February 2023.
• Fundamentals of Statistics by B Dasgupta, August 2013.
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, September 2020.
Unit 4: Theory of Probability

Learning Outcomes:
• Students will be able to define the concept of the theory of probability.
• Students will be able to identify and differentiate between various approaches to defining
probability for its use in day-to-day scenarios by business units.
• Students will be able to demonstrate the Addition and multiplication laws of probability.
• Students will be able to describe the findings and interpretation of the outputs generated by
various approaches to probability in the context of business objectives.
• Students will be able to analyse the impact of methodologies when the Bayes theorem is
applied in decision-making processes.

Structure:
4.1 Theory of Probability: Probability as a concept: Approaches to defining probability
4.2 Addition and multiplication laws of probability; conditional probability
• Knowledge Check 1
• Outcome-Based Activity 1
4.3 Bayes Theorem
• Knowledge Check 2
• Outcome-Based Activity 2
4.4 Summary
4.5 Keywords
4.6 Self-Assessment Questions
4.7 References / Reference Reading

4.1 Theory of Probability: Probability as a concept: Approaches to defining probability


We have discussed the various aspects of statistics and its application in various day-to-day
operations.
One important application of statistics is the concept of probability. This is one of the most
important applications of statistics wherein we can generate assurance based on the data and the
application of statistical techniques about the success and failure of the event.
Let us take an example as to what we are discussing.
We will take the most common example pertaining to day-to-day activities/scenarios. This
example is related to World Cup cricket being played by two countries. The match is T20, and this
is the final match. The winner would be the team who will win the match.
Now, before the start of the match, the social media, coffee break and other meeting points are
buzzing with conversations such as:
▪ If Team A wins the toss, there is a high probability or chance that Team A will win the cup
as the batting lineup is very strong, and they can set a big score for the team batting second.
▪ If Team B wins the Toss and bats first, they have a fair chance of winning the cup as their
batting line is not as strong as the other team. However, they have a strong boiling line up,
and it has worked wonders.
▪ Also, we sometimes discuss issues such as the sky was overcast yesterday night, and there
is a high probability that the match may be replayed again as the chances of rain are heavy.
Thus, these points highlight nothing but the various applications of probability in real-life
scenarios.
It is to be noted that the final results may go haywire, and the above discussions may generate
unexpected scenarios.
Thus, what we have determined is the fact that we are able to fairly generate an assurance about a
likely outcome of the final match based on taking into consideration the various conditions and
scenarios.
Further, the above discussions are generic in nature; statistics provide the numbers to these generics
in the form of probability values. Thus, instead of saying if team A wins the toss, then there is a
90% chance of team A winning the match.
Thus, having understood the concept of probability and its applications in day-to-day scenarios,
let us now dwell further.
We will cover the basic terms which are widely used in the concepts of probability.
1. Sample: A sample is the extraction of a portion from the whole part. For example, if we
have 10 cookies and we select only two of the cookies, then we have selected a sample of
size 2. That is, we have extracted some portion from a bigger portion.
2. Population: A population is a term that is used for the bigger or larger portion from where
the samples are drawn. For example, the entire collection of 10 cookies represents the
population. It is to be noted that the population may be homogenous or heterogeneous. For
example, if there are 300 cookies and 120 cookies of orange flavour, 30 of strawberry
flavour, and so on,
3. Sample space: It is the collection of all the possible outcomes in the sample. For example,
suppose we are required to determine the probability of selecting 5 cookies from 300
cookies. What is the chance that we may 2 cookies of strawberry flavour, 2 of orange and
1 of, say, blueberry flavour?
4. Event. This is the term used to indicate that a thing has happened and what the outcome of
this happening is. For example, we may roll a dice. This is an event, and the outcome may
be 1, 2, 3, 4, 5 or 6. We are required to determine the probability of all these possible
outcomes.
5. Independent events: It is an event wherein the outcome is not at all impacted by the
happening or non-happening of an event.
6. Exhaustive events: These are the events in which the entire sample space is used.
Refer to Figure 4.1 below, which depicts some other definitions of probability.

Figure 4.1: Depiction of Some of the Basic Terms Used in Probability


Approaches to Defining Probability
Having understood the various concepts of probability, let us now discuss the concepts pertaining
to approaches that are adopted in the process of determining probability.

4.2 Addition and multiplication laws of probability; conditional probability


Having understood the various aspects of probability applications, let us now move to another
deeper concept of addition and multiplication laws of probability.
Laws of addition of probability
Again, there are situations in real-world applications wherein we have to give priority to the
concepts, such as either this Team or another team member. In other words, it does not matter who
is selected because both are equally strong in their fields; hence, it is an either-or situation.
Refer to the figure given below to depict the concept of the Rule of addition.

Figure 4.2: Probability

From the figure, it is evident that looking at the two circles, we have Circle A and Circle B with
the common portion, which is the overlapping portion.
Hence, we need to select any one of the members of the circle by eliminating the common portion
represented by P (AᴖB) and adding the probabilities of circle and circle B.
The multiplication rule or law of probability
This is the law of probability, which is widely used when the situation demands that both events
occur together.
Let us take an example of the multiplication rule of probability.
Suppose we are required to select a team of three members for a mountaineering expedition. But
there is a condition that two members are required to be in the team while another member can be
anyone.
The prime reason the two specific members must be in the team is that they are very experienced
on this mountainous trek. They have surpassed several issues and challenges and have climbed the
mountain 10 times. In such a scenario, due to the nature of the task, it is imperative to include these
two experienced members in the team. Hence, we apply the multiplication rule of probability.
Refer to the figure, which depicts the formula for the application of the multiplication rule.

Figure 4.3: Depiction of Multiplication Rule of Probability

From the figure, we see that we have independent and dependent events. If the events are
independent, that is, team member A is as good as Team Member B or Team Member C, then it
makes no difference as to which two team members we select as all the mountaineers are equally
capable in terms of skill and expertise while the third team member can be a novice.
On the other hand, if we have a Team member who is skilled and experienced in snow walking
while another team is skilled and capable in snow climbing, we have to select both of these on
account of the nature of the job. In other words, there is dependency. And consequently, the formal
is modified.
• Knowledge Check 1
State True or False.
1. The value of probability is generally expressed in percentages. (True)
2. Probability does not work in routine work. (False)
3. Doctors widely use probability on patients who are on critical support. (True)
4. Sometimes, the calculations of the probability go haywire. (True)
5. In mission-critical operations such as spacecraft on a mission, things do not work out on
probability. (False)

• Outcome-Based Activity 1
Prepare an Excel sheet to demonstrate the concept of independent, mutually exclusive and
conditional events.
Prepare the sheet in the following format given below as an example.
You should include at least 20 examples from day-to-day scenarios.
Sr. Event / Activity/day to day Event type
No. scenarios
1. Passing a board exam Dependent event
2. Flipping a coin Independent event
3. Floating a dice Independent and Mutually exclusive

4.3 Bayes Theorem


When we are dealing with the concepts of probability, we are likely to encounter various conditions
based on real-life scenarios. For example, we must determine the probability of an event happening
based on the events that have already happened. In other words, we introduce the concept of
conditional probability to determine the probability of future events, given that the event has
already occurred.
Let us take an example to illustrate the concept of Bayes theorem
There are 3 boxes, and each of the boxes contains some white marble and some black marble. A
black marble is drawn at random. What are the chances that the black marble is from the second
box?
So, it can be seen from these types of applications that we determine the probability based on
conditions.
Let us take some different examples of practical oriented.
There is a multi-storeyed building which has 10 flats. A small child unknowingly drops the soft
toy from one of the balconies of the flats. What are the chances that the soft toy has come from the
8th floor?
Refer to the figure given below to depict the formula for applying the Bayes theorem.

Figure 4.4: formula for applying the Bayes theorem


From the figure, it is seen that we have, on the left-hand side of the formula, the notation P (A|B);
this | is used to denote the probability of happening of the future event given that event B has
already happened. In other words, it links the previous probability value to the future value,
conditional probability.
On the other hand, considering the right-hand side, we see that that formula contains the terms
related to individual probability values of A and B, given that B has already occurred.
Refer to another example given a more realistic approach.
Example: Screening for a disease redux
• Knowledge Check 2
Fill in the Blanks.
1. The ___________ probability is impacted by the happening of a prior event (Conditional/
non-conditional)
2. The Bayes theorem assumes that you have the ________ of the previous events (knowledge
details/ output)
3. The output of an _________ event is not all impacted by previous events (independent/
mutually exclusive)
4. The _____ heavily use the concept of probability while running their business (Share
broker/ marketers)
5. The Bayes theorem is related to _________ of the previous information (knowledge/ data)

• Outcome-Based Activity 2
Prepare an Excel sheet to demonstrate the various applications of the theory of probability. You
should choose at least 15 examples from day to day or activities or events.
Prepare the Excel sheet in the following format, which is supported with an example.
Sr. Event / Activity / day to day Probability concept
No. scenarios
1. Appearing in the civil The chances of success in Civil services exam
services exam is 40% as my General Knowledge Section
went haywire
2. Standing at the traffic light I will be able to clear the red light in the next
cycle as the traffic in front of me is too much.
There is a chance of 10% clearing the traffic
light in this imminent moment.

4.4 Summary
• In real life, there are several instances which rely on chance.
• We use statistics to mathematically treat these chances.
• Statistics is a branch of mathematics which deals with numbers.
• The concept of probability is used to deal with the events associated with the chances.
• When we mathematically treat the concepts of chances, we are using the basic concepts of
probability.
• Probability is the happening or non-happening of an event.
• The happening of an event results in success.
• The non-happening of an event results in failure.
• The value of the probability lies between 0 and 1 and is often expressed as a percentage.
• Decision-makers use this percentage or probability of happening or non-happening events for
future course of action.
• When we are dealing with the mathematical concepts of probability, we encounter the terms
mutually exclusive, exhaustive, independent, and conditional events.
• These terms are taken into consideration while arriving at the values of probability.
• Independent events are those events whose outcome is not all impacted by the earlier
happening of an event.
• Mutually exclusive events are those events whose occurrence eliminates the chances of the
other event happening.
• Laws of probability are the mathematical laws in which the outcome is determined.
• There are two laws of probability: laws of addition and multiplication.

4.5 Keywords
• Conditional Probability: It is the probability of an event happening based on the condition of
an event that has already happened.
• Bayes's theorem: It is the probability of an event, which is based on the information of the
previous conditions of an event.
• Distribution: It is the extent to which the data is spread across the domain.
• Discrete data: It is the data where the transition from one value to another is not smooth.
• Continuous data: It is the data wherein the transition from one value to another value is
smooth.
• Probability: It is the chance of an event happening or not happening.
• Probability distribution: The distribution is the spread of the probability values of happening
or not happening of an event.
• Normal distribution: It is the continuous distribution wherein the mean, median and mode
coincide, and the curve is symmetric. It extends from ±∞.

4.6 Self-Assessment Questions


1. What are the various approaches to probability? Explain with examples.
2. What is a Bayes Theorem? Explain with examples in day-to-day scenarios.
3. What are the laws of addition of Probability? Explain with an example.
4. What are the laws of multiplication of Probability? Explain with an example.
5. What is a conditional Probability? Explain with examples.

4.7 References / Reference Reading


• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, Sultan Chand &
Sons, September 2020.
• Fundamentals of Mathematical Statistics by Steffen Lauritzen, CRC Press, February, 2023.
• Fundamentals of Statistics by B Dasgupta, August 2013.
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, September 2020.
Unit 5: Probability Distribution

Learning Outcomes:
• Students will define the concept of the probability distribution.
• Students will identify and differentiate between various types of probability distributions that
are used in day-to-day scenarios by business units.
• Students will demonstrate effective usage of binomial and Poisson distribution.
• Students will evaluate the findings and interpretation of the outputs generated by binomial,
Poisson and normal distributions in the context of the business objectives.
• Students will analyse the impact of methodologies applied in the process of determining the
findings generated by normal distribution to make decisions.

Structure:
5.1 Probability Distributions: Probability Distribution as a Concept
5.2 Binomial Distribution, Poisson Distribution
• Knowledge Check 1
• Outcome-Based Activity 1
5.3 Normal Distribution of their Properties and Parameters
• Knowledge Check 2
• Outcome-Based Activity 2
5.4 Summary
5.5 Keywords
5.6 Self-Assessment Questions
5.7 References / Reference Reading

5.7 Probability distributions: Probability Distribution as a Concept


Before we discuss the concept of the probability distribution, let us discuss the term probability.
In simple terms, the word probability refers to the application of techniques to arrive at an output
based on all the possible conditions. The value of the probability lies between 1 and zero.
The probability of 1 signifies that it is certain that the event will happen, while zero signifies that
the event will not happen.
In other words, probability is a value that will provide the answers that have a high tendency to
occur or not to occur.
For example, we are all familiar with a cricket match being played between two countries. We
always talk in terms of probability. We often say that the chances of happening (or not happening)
the match is 70% as the weather is top turvy.
This means that we arrive at that figure of 70% on the basis of taking all the factors which lead us
to conclude that the match will happen. For example, the heavy rain has transitioned to a light
drizzle, and the cloud cover is dissipating with intermittent hide-and-seek by the sunlight.
It is to be noted that the concept of probability is widely used in day-to-day scenarios.
The following points enumerate the areas where the concept of probability is widely used.
1. Probability is widely used in the determination of the success or failure of the launch of a
new product. For example, the marketing managers always discuss aspects such as the
chances of success of this new product is, say 20%; hence we should postpone the launch
as the consumers have become resilient due to geo political events at the global level.
2. Probability is widely used in scenarios such as clearing the exam, say, in civil services. For
example, the students who have appeared in the Civil services exam always say that the
chances of success are slim, meaning that the probability of clearing the exam is 10%.
3. On the other hand, we always explore the chances of getting good accommodation when
we travel. This may happen when we have planned our journey in the peak season.
Having understood the concept of application of probability, let us now focus on the uses and
applications of probability.
1. The concept of probability enables the executive management to undertake a proactive
backup plan. For example, we know that the probability of not happening the cricket match
is 10% due to geo-political events or otherwise; the management then comes up with an
alternative plan. Worth mentioning the fact that due to highly disturbed geo-political
conditions, the entire World Cup was shifted to South Africa.
2. The concept of probability enables the management to assess the risk and develop a risk
mitigation plan. For example, it is common that when the cyclone is about to strike the
coastal area, the people are evacuated to a safe place. This is an example of risk assessment
and implementing a risk mitigation plan.
Refer to Figure 1 given below, which depicts the concepts of application of probability in day-to-
day operations:

Figure 5.1: Depiction of Various Uses of Probability Theory in day to day Applications

• Concept of Probability Distribution


Having understood the definition of a probability distribution.
In simple terms, the word distribution refers to the spread of the data set. This means that the
distribution is used to define the boundaries or the range of the data set wherein the variable's new
value is likely to fall. For example, we may flip the coin 10 times and then determine the
probability that there will 3 heads and 7 tails. Thus, we have the limit or the boundaries; within
these boundaries, we are required to determine the probity. That is happening or not because of the
3 heads and 7 tails requirement.
Refer to figure 2, which depicts the concept of probability distribution:
Figure 5.2: Depiction of Concept of Probability Distribution

Let us discuss these diagrams individually.


As we know that the data in the data set is either discrete or continuous. Discrete data is the one
wherein the output of the event is a number, and the transition to another number is not so smooth.
For example, when we flip a coin, once we may have either Head or Tail. Note that it is very
difficult to change this outcome, for we have to flip the coin again.
Hence, if we note the probability of these outcomes, then we have a discrete distribution.
For example, the figure on the left hand calculates the probability of flipping a dice and then plots
the distribution with the individual value on the dice. For example, the value six can occur once
out of the possible values: out 1,2,3,4,5 and 6. Hence, the probability of getting six is 1/6; similarly,
we generate the probability values of the other figures on the dice.
On the other hand, we have the continuous distribution. In the case of continuous distribution, the
transition of the values from one form to another is very smooth; for example, we may see the
weight of the person as say 56 kg. Now, another person measures the weight of the same person
as 56.1 kg or 56.12 kg. So we see that the transition is very smooth.
Refer to the figure on the right. This is an example of the continuous distribution. Here, the shaded
portion in red depicts the variable's different types of values, from a certain value to an infinite
value.
It is to be noted that the choice of the distribution depends on the objective.

5.2 Binomial Distribution, Poisson Distribution


Having understood the concept of a probability distribution, let us now move deeper into the
probability distribution.
The probability distributions that we will discuss are the binomial probability distribution and
Poisson distribution.
• Binomial distribution
In simple terms, the word binomial means two. Thus, when we use binomial distribution, we use
this concept.
Let us first define what exactly a binomial distribution is. In simple terms, binomial distribution
refers to the probability distribution which is used to summarise the probability values taken during
the event. For example, flipping a coin 15 times and plotting the probability distribution of these
15 values, either head or tail.
Refer to the table given below to illustrate the concept of binomial distribution.
Sr. No, Outcome Probability
1. Head 0.5
2. Head 0.5
3. Tail 0.5
4. Head 0.5
5. Tail 0.5
6. Head 0.5
7. Head 0.5
8. Tail 0.5
9. Head 0.5
10. Head 0.5
11. Tail 0.5
12. Head 0.5
13. Head 0.5
14. Tail 0.5
15. Head 0.5

Referring to the table, there are various combinations of Head and Tails at each of the flipping of
the coin with the probability being the same in all of these outcomes.
The points to be noted in the binomial distribution are the following:
1. Each of the outcomes is independent of the other outcome.
2. The number of trials is fixed.
3. The probability does not change for the outcome; the head will have the same probability
of 0.5 for the number of trials to be conducted.
• Binomial distribution formula
As statistics is a branch of mathematics, we must use the formula to arrive at the estimations part
when dealing with any distribution.
Refer to the figure 3 given below:

Figure 5.3: Depiction of Binomial Distribution Formula

From the figure, it is evident that we can toss the coin any number of times and get the outcome in
the form of at least 6 heads and the like by substituting the value in the formula.
For example, when the coin is flipped 15 times, we put the value of 15, with p=1/2 and 1=1/2 and
getting the number of desired success values.
The Poisson distribution
Like the binomial distribution, we have another discrete distribution known as the Poisson
distribution. In simple terms, the poison distribution is used to answer the questions where we
know the mean value or the average value, and then we are required to determine the probability
of an event happening in a fixed or certain number of times.
Poisson distribution is widely used in call centre operations or any business unit where we have
the mean data, such as the average calls being received, and then we are required to determine the
probability in a fixed number of times.
Refer to Figure 4 below, which depicts the formula as well as the usage of Poisson distribution.

Figure 5.4: Depiction of Formula as well as Application of Poisson Distribution


Refer to the figure given below:
It is seen that the formula for the Poisson distribution has the single controlling variable, lambda,
which is used to depict the mean value of the distribution.
On the other hand, it uses the constant of exponential e with the value of 2.718.
The variable x is the value for which the answer is needed.
For example, considering the value of x=35, we wanted to know the chances of success being 35
within a time period of 10 minutes if the average value is 3.
Hence, putting the values in the formula gets an answer of 0.045.

• Knowledge Check 1
State True or False.
1. The normal distribution is the basic probability distribution (True / False)
2. In a normal distribution, the extent or the domain of the distribution is provided by standard
deviation ((True / False)
3. In a binomial distribution, the outcome is continuous (True / False)
4. Probability distributions are widely used in marketing operations (True / False)
5. Probability distributions are difficult to understand (True / False)
• Outcome-Based Activity 1
Prepare an Excel sheet to depict the concept of probability distribution in day-to-day scenarios.
Prepare the sheet in the format given below as an example.
You should mention at least 10 scenarios.
Sr. No. Event Description of the event with
respect to probability
1 A player is likely to score at least Binomial distribution
3 goals in a football match.

2 A student will clear the civil Poisson Distribution


services exam on his 4th attempt

5.3 Normal Distribution of their Properties and Parameters


Normal distribution is a continuous distribution where the data set values are clustered around the
mean, median and mode. It is a symmetric curve and is widely used in various operations involving
data analytics and the application of statistical tools to various operational issues and challenges.
To understand the concept of normal distribution, refer to Figure 4 below.

Figure 5.5: Depiction of Normal Distribution


From the figure, it can be seen that the normal distribution is a symmetric curve and is bell-shaped.
The symmetry begins from the mean, as depicted in the figure. The word symmetry means that the
left image of the curve is the mirror image of the right side from the mean position.
It is the only curve wherein the mean, median and mode coincide. Further, it is seen that the curve
extends from ±∞ that the span of the curve lies between + infinity and – infinity. The other figures
in the curve, such as 1 standard deviation, 2 standard deviations, and 3 standard deviations, depict
the concept wherein the data set values are assumed to lie. Thus, the area represented by 34.1% on
both sides from the mean indicates that around 68 % of the data set values are assumed to lie within
this region.
• Properties of normal distribution
The following points enumerate the properties of normal distribution
▪ In the case of the normal distribution, the mean, median and mode coincide. This means
that mean, median and mode values are all equal. If these values are not equal, then the
normal distribution is not symmetric. It is a skewed distribution.
▪ A skewed distribution is the one wherein the data set is biased. For example, if the students
very much like a particular teacher, the ratings will be heavily biased. It can go the other
way also.
▪ In the case of normal distribution, the consideration of the data set points may be heavily
loaded around the mean.
▪ In such cases, we say that there is kurtosis in the data set.
Refer to figure 5.6, which depicts the concepts of skewness and kurtosis.

Figure 5.6: Depiction of Kurtosis and Skewness


From the figure 5.6 it is seen that the left-hand side figure depicts the kurtosis from the original
curve of the normal distribution. For example, the green curve is much more extended from the
normal curve represented by orange colour.
On the other hand, we have another curve which is below the main curve represented by blue
colour,
Anyway, the kurtosis is a sign that the data set is highly concentrated around the mean in an abrupt
manner.
For example, if we are to choose a team for a project work, the team must be balanced. This
balances the values determined by the normal curve. On the other hand, if brilliant and highly
intelligent people form the team, then the project work will never be completed. The same is the
case if we have very mediocre team members; the work will never be completed.
Coming to the discussion of the figure on the right side. This is an example of skewed normal
distribution. This means that more importance is given to highly intelligent persons who are
positively skewed or the sycophants who rejoice in leading the management in a different
direction.
The formula for the normal distribution equation is given below.

It is seen that the normal distribution is driven by two parameters viz., mean and standard
deviation.
• Solved examples in a normal distribution
The figure below depicts the process of solving the problems based on normal distribution.
• Knowledge Check 2
Fill in the Blanks
1. The normal distribution is a _________ distribution. (symmetric / Non symmetric)
2. The binomial distribution is a __________ distribution. (Continuous / Discrete)
3. In a continuous distribution, the transition from one value to another is _____. (smooth /
easy)
4. A binomial distribution is used to determine the number of success in ________. (finite /
infinite) trials
5. The Poisson distribution is a __________. (Continuous / Discrete)

• Outcome-Based Activity 2
Prepare an excel sheet to depict the concept of binomial distribution in day to day operations
wherein you are familiar with
Prepare the excel sheet in the format given below as an example
Sr. No. Event Discrete variables
1 Depositing of tuition fee Yes or No
2 Return of library book Yes or No

The number of examples should be at least 15.

5.4 Summary
• Statistics is the branch of mathematics that deals with the data and the processing of the data.
• The term processing of data means arriving at or leading the decision maker wherein he is in a
position to make decisions.
• There are two techniques for data processing in statistics.
• Descriptive statistics deals with the process of providing the basic structure of the data set.
• It involves arriving at mean, median, mode, standard deviation, and so on.
• On the other hand, statistics makes use of inferential statistics.
• This involves dealing with estimations and interpretations.
• It involves the determination of probabilistic values of the data set.
• The probabilistic values then lead to the decisions based on the findings.
• There are several types of probability distribution based on the outcome being discrete or
continuous.
• A discrete outcome is one in which the transition from one value to another value is not smooth.
• A continuous probability distribution is one wherein the transition from one value to another
value is smooth.
• The continuous distribution is bounded.
• The normal distribution is the most common example of the continuous distribution.
• It is a bell-shaped symmetric distribution wherein the mean, median and mode coincide.
• The boundary of the normal distribution is ±∞
• The binomial distribution is a discrete distribution where the outcome value is a discrete value.
• An example of binomial distribution is the flipping of a coin wherein the outcome is either
Head or Tail.
• A Poisson distribution is another discrete distribution wherein the outcome is discrete, and in
addition to this, it provides the happening or not happening of an event a certain number of
times.

5.5 Keywords
• Statistics: A branch of mathematics deals with data and data processing.
• Distribution: The extent to which the data is spread across the domain.
• Discrete data: It is the data where the transition from one value to another is not smooth.
• Continuous data: It is the data wherein the transition from one value to another value is
smooth.
• Probability: It is the chance of an event happening or not happening.
• Probability distribution: The distribution is the spread of the probability values of happening
or not happening of an event.
• Normal distribution: It is the continuous distribution wherein the mean, median and mode
coincide, and the curve is symmetric. It extends from ±∞

5.6 Self-Assessment Questions


1. What is meant by the term probability? Explain with examples.
2. What is meant by discrete probability distribution? Explain with examples.
3. What is a Normal Distribution? Explain with examples.
4. What is the importance of Poisson distribution? Explain its role in the decision-making
process.
5. What is a binomial distribution? Explain its importance.

5.7 References / Reference Reading


• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, Sultan Chand &
Sons, September 2020.
• Fundamentals of Mathematical Statistics by Steffen Lauritzen, CRC Press, February 2023.
• Fundamentals of Statistics by B Dasgupta, August 2013.
• Fundamentals of Mathematical Statistics by S. C. Gupta and V. K. Kapoor, September 2020.
Multiple and Partial Correlation
Course: Business Statistics
[Link]. Previous - II Semester

under the aegis of:


Centre of Distance and Online Education
Kurukshetra University, Kurukshetra

P R E PA R E D & D E L I V E R E D B Y
PA N K A J C H A U D HARY
A S S I S TA N T P R O F E S S O R
Learning Outcomes
After completing this chapter, a student will be able to:
• Understand the concept of variables and its types
• Comprehend the concept of Multiple and Partial
correlation
• Describe the relationship between several Independent
variables and a dependent variable using partial and
multiple correlation
• Applications of Multiple and Partial Correlation
Statistics – Introduction
• H.G. Wells, an English author and historian 100 years ago said
that quantitative information will become necessary for every
walk of life.
• It would not be only be used by businesses for decision making
but also in personal lives.
• In Every newspaper, periodical/magazine and reports, you will
find the numerical information.
• One should the basic knowledge and skills to collect, organize,
analyse and transform data to make sense that is called
information.
Statistics – Why do we study?
To express Numerical Information
Product # No. of Units Sold
201 700 To make decisions using
202 250
203 300
statistical techniques
204 325 Insurance companies can use statistical analysis to
205 216 set rates for life and general insurance like health,
Total 1791 automobiles etc.

How decisions are made?


Knowledge of Statistical Methods Helps in better understanding.
Statistics – Meaning/Concept
C Collecting

O Organizing D
Statistics is a
P Presenting
A
science of

A Analysing
T
A
I Interpreting
Statistics - Examples
• The Average time waiting for technical support
is 20 minutes.
• The Average salary of graduates
• Industrial average
• Number of deaths due to Dengue
• DRS in Cricket Match taken by a particular team
– 20% successful and 80% unsuccessful
Statistics - Types

Descriptive Inferential

• Compiling and organizing data in • Methods used to draw


the form of tables, charts and inferences/conclusions/
graphs estimations/predictions
• Establishes relationship among • Explains the general characteristics of
data population on the basis of sample
• Descriptive measures are Tables, data
Diagrams, Mean, SD, Correlation • Inferential measures are Time Series
Coefficient etc. Analysis, Hypothesis testing etc.
Statistics for Business/Management – Why?
Statistics plays an important role in business
decisions. These can be:
• Present and describe business data and
information
• Draw conclusions on the basis of information
collected
• For making forecasts about business activities
• To improve business processes
Variable – What it is?
Is a characteristic, number or quantity that
can be measured or counted.

Also called a data item/data point


Variable
Examples – Age, Gender, Business Income,
Place of Birth, Class Grades, eye colour etc.
Example
Income is a variable
People or businesses may
Value may vary between data units or may not have same incomes
and can also go up or
change over time down.
Variable – Types
• Quantitative Variable
 generates numeric response
 Real numbers than can be manipulated using arithmetic operations
 Examples: the age of an employee, the price of a product, distance
travelled etc
 Classified as – Discrete and Continuous
o Discrete Data: whole number or integer data
o For Example: the number students in a class, the number of cars sold
o Continuous Data: any number that can occur in an interval
o For Example: Volume of fuel in a car tank between 10 ltr to 50 ltr,
assemebly time for a part can be between 20 minutes to 30 minutes.
Expressed as - 43.4 ltr, 25.5 minutes etc
Variable – Types
• Qualitative Variable
 generates non numeric response or categorical data
 Represented by categories only
 Examples: gender of employees either male or female; Qualification
either matric, certificate, diploma or degree
 Numbers are assigned to represent categories
 E.g. 1= male, 2= female

• Scale of measurement
 Indicates the strength of the data in terms of arithmetic manipulation is possible
 Determines which statistical methods are appropriate to validate statistical results
Scale of Measurement – Types
Scale of
Measurement

Categorical/ Numeric/
Qualitative Quantitative

Nominal Interval

Ordinal Ratio
Scale of Measurement – Defined
• Nominal Data
 associated with categorical data
 All categories are of equal importance
 Examples – Gender (1=male, 2=female, 3=transgender), Employment(1=Shopkeeper,
2=Serviceman, 3=Engineer, 4=freelancer)
 Can only be counted or tabulated
 Limited statistical methods are applied
 Can be discrete only
• Ordinal Data
 associated with categorical data
 Implied ranking between different categories
 Next category either more or less than the previous category of a given characteristic
 Examples – Income (1=lower, 2=middle, 3=high), Cloth Size (1=Small, 2=Medium, 3=large)
 Provides order or rank to data
 Limited statistical methods are applied
 Can be discrete only
Scale of Measurement – Defined
• Interval Data
 associated with numeric data
 Generated from Rating Scales
 Used in questionnaires to measure attitude, perception, motivation,
preferences etc
 Possesses both rank order and distance in terms of how much more or how
much less of a given characteristic
 No zero point
 Not meaningful to compare the ratio of interval values with one another
 Wider range of statistical methods can be used.
 Examples : How satisfied are you with your current job? 1= very dissatisfied,
2=Dissatisfied, 3= Uncertain, 4=Satisfied, 5=Very Satisfied
 Can be discrete only
Scale of Measurement – Defined
• Ratio Data
 associated with numeric data i.e real number
 It can be order, distance and an absolute origin of zero
 Data can be manipulated using arithmetic operations
 The zero means the ratio can be computed (like 4 is one quarter of 16)
 Strongest data for statistical analysis
 Compared with other data types – nominal, ordinal , interval
 All statistical methods can be applied
 Examples : Age (in years), volume (ml), distance travlled (kms), product
prices(Rs) etc
 Can be discrete and continuous
Independent and Dependent Variables
• Independent variable
o An independent variable can be manipulated in an experiment
o Independent variables are also termed as explanatory variables or manipulated
variables.
o Independent variables are mostly used in experiments to determine the effect on
dependent variables.
o independent variables are the variables that are selected to determine their possible
effects on other variables being studied.
• Dependent variable
o A dependent variable cannot be manipulated by the experimenter
o Dependent variables are also termed as outcome variable, or the responding variable
or the explained variable.
o A dependent variable responds to the one or many independent variables.
o The effect on the dependent variable forms the basis of any experiment.
Independent and Dependent Variables

Cause Effect

Independent
Dependent Variable
Variable
Correlation - Concept
As the supply of onions increase in local market, the price
•comes down. Thus, supply related to price.

Hence, Correlation analyses such relationships between


variables.
Correlation answers the following questions:
 Is there any relationship between variables?
 To check the direction of variables?
 To check the strength of relationship?
Correlation - Concept
Studies the association between the variables

Measures the intensity of relationship


Correlation
Studies the strength of relationship between
variables

Measures the direction of relationship


Coefficient of Correlation – What it is?

The measure of
correlation is called • Denoted by ‘r’
coefficient of • Ranges between -1 to
correlation. +1
The degree of
• Positive or Negative
relationship is • Interval or Ratio
expressed by coefficient scaled/Continuous
of correlation scale data is required
Correlation – Types

Types of
Correlation

Based on the
Based on Based on Ratio of
Direction of Number of Change
Variables Variables between
Variables

Positive Negative Simple Multiple Partial Linear Non Linear


Correlation Correlation Correlation Correlation Correlation Correlation Correlation
Multiple Correlation- Concept and Examples
Measures the relationship among three or more
variables
Multiple
Correlation
Is the study of joint effect of n Independent variables
on a Single dependent variable.

Studying the relationship between Anxiety Level,


Example Academic Achievement and Intelligence.
There are three variables –
1 Independent Variables – Anxiety Level & Intelligence
Dependent Variable – Academic Achievement
Multiple Correlation- Concept and Examples
Studying the relationship among Performance of a student in
exams, IQ level, Parent’s qualification, number of hours studied
Example Variables are –
Independent Variables – IQ Level, Parents Qualification &
2 Number of hours studied
Dependent Variable – Performance

Studying the relationship among Crime depends on Illiteracy


and unemployment in the nation
Example Variables are –
Independent Variables – Illiteracy & Unemployment
3 Dependent Variable – Crime
Coefficient of Multiple Correlation
Measures the degree and strength of
relationship among variables
Multiple
Coefficient of Denoted by R
Correlation

Value of R ranges between 0 and 1


R1.23 Denotes X1 –DV, X2-IV, X3-IV
R2.13 Denotes X2 –DV, X1-IV, X3-IV
Denotes X3 –DV, X1-IV, X2-IV
R3.13 Note : DV-Dependent Variable, IV-Independent Variable
Applications of Multiple Correlation - Notations

For computing Multiple Coefficients of Correlation, you need


simple coefficients of correlation

r12 or rxy= denotes the relationship between first and second variable
r23 or ryz= denotes the relationship between second and third variable
r13 or rxz= denotes the relationship between first and third variable
Also,
r12 = r21, r13 = r31, r23 = r32
These are called simple or zero order correlation coefficients
Applications of Multiple Correlation - Formulae

Here,
2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 1 – Dependent Variable
13 12 13 23
R1.23 = 2
2 - Independent Variable
1−𝑟23 3 -Independent Variable

Here, 2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟21 23 21 23 13
2 – Dependent Variable R2.13 = 2
1 - Independent Variable 1−𝑟13
3 -Independent Variable

Here,
2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟31 32 31 32 12 3 – Dependent Variable
R3.12 = 2 1 - Independent Variable
1−𝑟12 2 -Independent Variable
Applications of Multiple Correlation – through Practical Illustrations

Illustration-1
Compute R1.23, R2.13 and R3.12: r12 = 0.60, r13= 0.70, r23= 0.65
Solution:
Substituting the values in formula,
2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 13 12 13 23 (𝟎.𝟔𝟎)𝟐 +(𝟎.𝟕𝟎)𝟐 −𝟐(𝟎.𝟔𝟎)(𝟎.𝟕𝟎)(𝟎.𝟔𝟓)
R1.23 = 2 R1.23 = = 𝟎. 𝟕𝟐𝟓
1−𝑟23 𝟏−(𝟎.𝟔𝟓)𝟐

2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟21 23 21 23 13 (𝟎.𝟔𝟎)𝟐 +(𝟎.𝟔𝟓)𝟐 −𝟐(𝟎.𝟔𝟎)(𝟎.𝟔𝟓)(𝟎.𝟕𝟎)
R2.13 = 2 R2.13 = = 𝟎. 𝟔𝟖𝟏
1−𝑟13 𝟏−(𝟎.𝟕𝟎)𝟐

2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟31 (𝟎.𝟕𝟎)𝟐 +(𝟎.𝟔𝟓)𝟐 −𝟐(𝟎.𝟕𝟎)(𝟎.𝟔𝟓)(𝟎.𝟔𝟎)
R3.12 = 32 31 32 12 R3.12 = = 𝟎. 𝟕𝟓𝟔
2 𝟏−(𝟎.𝟔𝟎)𝟐
1−𝑟12
Applications of Multiple Correlation – through Practical Illustrations

Illustration-2
The values of Zero order correlation coefficients are r12=0.98, r13=0.44 and r23=0.54.
Compute Multiple correlation coefficient if first variable is dependent and second and
third variable are independent.
Solution: as per the question, you have to compute R1.23
Substituting the values in formula,

2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 13 12 13 23
R1.23 = 2
1−𝑟23

(𝟎.𝟗𝟖)𝟐 +(𝟎.𝟒𝟒)𝟐 −𝟐(𝟎.𝟗𝟖)(𝟎.𝟒𝟒)(𝟎.𝟓𝟒)


R1.23 = = 𝟎. 𝟗𝟖𝟓
𝟏−(𝟎.𝟓𝟒)𝟐
Applications of Multiple Correlation – through Practical Illustrations

Illustration-3 2 +𝑟 2 −2𝑟 .𝑟 .𝑟
If R1.23=1, prove that R2.13=1 𝑟21 23 21 23 13
R2.13 = 2
1−𝑟13
Solution: You know,
Dividing both sides by 𝟏 − 𝒓𝟐𝟏𝟑
2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 13 12 13 23
R1.23 = 2 --------- (1) 2
𝑟12 2
+ 𝑟23 2
− 2𝑟12 . 𝑟13 . 𝑟23 1 − 𝑟13
1−𝑟23 2 = 2
1 − 𝑟13 1 − 𝑟13

Replacing R 1.23 =1 in equation (1) & Squaring both sides 2 +𝑟 2 −2𝑟 .𝑟 .𝑟


𝑟12 23 12 13 23
2 =1
2 +𝑟 2 −2𝑟 .𝑟 .𝑟 1−𝑟13
𝑟12 13 12 13 23
1= 2 (by Cross multiplication)
1−𝑟23
2 2 2 Taking square root of both sides
𝑟12 + 𝑟13 − 2𝑟12 . 𝑟13 . 𝑟23 = 1 − 𝑟23

2
𝑟12 2
+ 𝑟23 2
− 2𝑟12 . 𝑟13 . 𝑟23 = 1 − 𝑟13 R1.23=1 (Hence Proved)
Partial Correlation- Concept and Examples
Measures the relationship between two variables
while eliminating the influence of third variable
Partial
Correlation Measures the association between two variables
after controlling or adjusting the effect of one or
additional variables.

Studying the relationship between Age, Hours Worked and


Example Hours Exercised.
Relationship will be computed between hours worked and
1 hours exercised.
Age affects both, therefore Age will remain the control
variable.
Partial Correlation- Concept and Examples
Studying the relationship among Fertilizer, Crop Yield and
Temperature
Example Relationship will be computed between Fertilizer and Crop
2 Yield.
And temperature will remain the control variable/constant.

Studying the relationship among Anxiety level, academic


achievement and Intelligence
Example Relationship will be computed between Anxiety level and
Academic Achievement
3 Intelligence affects both, therefore Intelligence will remain
the control variable.
Coefficient of Partial Correlation
Measures the degree and strength of
relationship between variables
Partial
Coefficient of Denoted by r12.3, r13.2, r23.1
Correlation

Value of R ranges between -1 and 1


r12.3 Denotes Partial correlation between 1 variable and 2 variable and Keeping the 3 variable –
Control/Constant
r23.1 Denotes Partial correlation between 2 variable and 3 variable and Keeping the 1 variable –
Control/Constant
r13.2 Denotes Partial correlation between 1 variable and 3 variable and Keeping the 2 variable –
Control/Constant
Applications of Partial Correlation - Formulae

𝑟12 −𝑟13 𝑟23


r12.3 =
2 1−𝑟 2
1−𝑟13 23

𝑟23 −𝑟21 𝑟31


r23.1 = Here,
2 1−𝑟 2
1−𝑟21 r12, r13, r23 – Simple or Zero
31
Order correlation
Coefficients
𝑟13 −𝑟12 𝑟32
r13.2 =
2 1−𝑟 2
1−𝑟12 32
Applications of Partial Correlation – through Practical Illustrations

Illustration-4
Compute r12.3, r23.1 and r13.2: r12 = 0.70, r13= 0.61, r23= 0.40
Solution:
Substituting the values in formula,
𝑟12 −𝑟13 𝑟23 0.70−(0.61)(0.40)
r12.3 = r = = 𝟎. 𝟔𝟐𝟗
2 2 12.3 1−(0.61)2 1−(0.40)2
1−𝑟13 1−𝑟23

𝑟23 −𝑟21 𝑟31 0.40−(0.70)(0.61)


r23.1 = r23.1 = = −𝟎. 𝟎𝟒𝟖
2 1−𝑟 2
1−𝑟21 31
1−(0.70)2 1−(0.61)2

𝑟13 −𝑟12 𝑟32 0.61−(0.70)(0.40)


r13.2 = r13.2 = = 𝟎. 𝟓𝟎𝟓
2 1−𝑟 2
1−𝑟12 1−(0.70)2 1−(0.40)2
32
Applications of Partial Correlation – through Practical Illustrations

Illustration-5
On the basis of the following information, compute the partial correlation between Yield of Cotton and Seed
Vessels basis eliminating the effect of Height. x1= Yield of Cotton, x2=Seed vessels, x3=Height and r12=0.8,
r13=0.65, r23=0.7
Solution: Here you have to compute r12.3

𝑟12 −𝑟13 𝑟23


r12.3 =
2 1−𝑟 2
1−𝑟13 23

0.8−(0.65)(0.7)
r12.3 = = 𝟎. 𝟔𝟑𝟓
1−(0.65)2 1−(0.8)2
Applications of Partial Correlation – through Practical Illustrations

Illustration-6
Check the consistency of the following data where r12=0.6, r23=0.8, r31=-0.5
Solution: if r12.3 >1, the data is inconsistent

𝑟12 −𝑟13 𝑟23 0.6−(−0.5)(0.8)


r12.3 = r12.3 = = 𝟏. 𝟗𝟐
2 1−𝑟 2
1−𝑟13 1−(−0.5)2 1−(0.8)2
23

OR Since, the value of partial or


multiple coff is greater than 1, so
2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 the given data is inconsistent.
13 12 13 23
R1.23 = 2
1−𝑟23
(𝟎.𝟔)𝟐 +(−𝟎.𝟓)𝟐 −𝟐(𝟎.𝟔)(−𝟎.𝟓)(𝟎.𝟖)
R1.23 = 𝟐
= 𝟏. 𝟕𝟒
𝟏−(𝟎.𝟖)
Relationship between Simple, Partial and Multiple Correlation

2 2 2
1-𝑅1.23 =(1- 𝑟12 ) (1- 𝑟13.2 )
2 2 2
1-𝑅2.13 =(1- 𝑟21 ) (1- 𝑟23.1 )
2 2 2
1-𝑅3.12 =(1- 𝑟31 ) (1- 𝑟32.1 )
Applications – through Practical Illustrations
Illustration-7
x1, x2 and x3 are measured from the means with : N=10,
𝑥12 = 90, 𝑥22 = 160, 𝑥32 = 40, 𝑥1 𝑥2 = 60, 𝑥2 𝑥3 = 60, 𝑥3 𝑥1 = 40, 𝐶𝑜𝑚𝑝𝑢𝑡𝑒 r12.3 and R1.23
Solution:
𝒙𝟏 𝒙𝟐 60 60 𝑟12 −𝑟13 𝑟23
𝒓𝟏𝟐 = 𝑟12 = = = 0.5 r12.3 =
𝒙𝟐𝟏 𝒙𝟐𝟐 90 160 120 2 1−𝑟 2
1−𝑟13 23
𝒙𝟏 𝒙𝟑 40 40
𝒓𝟏𝟑 =
𝑟13 = = = 0.67 r12.3 =
0.5−(0.67)(0.75)
𝒙𝟐𝟏 𝒙𝟐𝟑 90 40 60
1−(0.67)2 1−(0.75)2
𝒙𝟐 𝒙𝟑
𝒓𝟐𝟑 = 60 60
𝒙𝟐𝟐 𝒙𝟐𝟑 𝑟23 = = = 0.75 0.0025
160 40 80 =− = −0.0051
0.4910
Applications – through Practical Illustrations

2 +𝑟 2 −2𝑟 .𝑟 .𝑟
𝑟12 13 12 13 23
R1.23 = 2
1−𝑟23

(𝟎.𝟕𝟓)𝟐 +(𝟎.𝟓)𝟐 −𝟐(𝟎.𝟕𝟓)(𝟎.𝟓)(𝟎.𝟔𝟕)


R1.23 =
𝟏−(𝟎.𝟔𝟕)𝟐

𝟎.𝟑𝟏
R1.23 = = 0.75
𝟎.𝟓𝟓𝟏𝟏
Other formulae

𝑁. 𝑋𝑌− 𝑋 𝑌
r12= Actual Value Formula for
𝑁 𝑋 2 −( 𝑋)2 𝑁 𝑌 2 −( 𝑌)2
computing Zero Order
r13=
𝑁. 𝑋𝑍− 𝑋 𝑍 Correlation
𝑁 𝑋 2 −( 𝑋)2 𝑁 𝑍 2 −( 𝑍)2

𝑁. 𝑌𝑍− 𝑌 𝑍 When smaller values are


r23=
𝑁 𝑌 2 −( 𝑌)2 𝑁 𝑍 2 −( 𝑍)2 given
Applications – through Practical Illustrations
Illustration-8
Compute r12.3 and R1.23 from the following data:

X 3 4 5 6 7 8 9
Y 2 5 6 4 3 2 4
Z 5 6 4 5 6 5 6
Answer: r12 = -0.155, r13=0.546, r23=-0.075
Answer: r12.3 = -0.1366 Answer : R1.23 = 0.443
Questions for Practice
1. Compute r23.1and R1.23: r12 = 0.80, r23= -0.40,, r31= -0.56
2. The linear correlation coefficient between x1=Yield, x2=Irrigation and x3=Fertilizer are as follows: r12 =
0.81, r13= 0.90, r23= 0.65. Calculate the partial correlation coefficient of: (i) Yield with Irrigation (ii) Yield
with Fertilizer
3. Is it possible to get the following from a set of experimental data: (i) r12 = 0.60, r13= 0.50, r23= 0.80 (ii) r12 =
0.60, r13= -0.40, r23= 0.70
4. Following data shows the correlation of three variables x1=Height, x2=Weight and x3=Diameter of Chest of
10 randomly selected players: r12 = 0.863, r13= 0.648, r23= 0.709. Calculate r12.3 and R1.23
5. If r12 = 0.90, r13= 0.75, r23=0.70, find the R1.23

Answers
Ques-1 Ques-2 Ques-3 Ques-4 Ques-5
R23.1=-0.436 (i) r12.3=0.679 (i) r12.3 =0.384, r12.3 = 0.752 R1.23=0.916
R1.23=0.802 (ii) r13.2=0.838 Consistent R1.23=0.865
(ii) R12.3=1.344,
Inconsistent
References & Resources

• Statistics for Management, Levin & Rubin, Pearson


Publications
• Business Statistics- S.P. Gupta & M.P. Gupta, Sultan Chand &
Sons, New Delhi
• Business Statistics- J.K. Sharma, Vikas Publications, New Delhi
• Business Statistics- P.K. Mathur, M.P. Singh & Ashutosh
Bajpai, Himalaya Publishing House, New Delhi
• Business Statistics- S.C. Gupta & Indira Gupta, Himalaya
Publishing House, New Delhi

You might also like