MODULE Three : Correlation & Regression – 2023-24
3 Mathematical and
Statistical Methods
ECON F213
Dr. Rahul Arora (IC)
Assistant Professor,
Department of Economics & Finance,
BITS Pilani, Pilani Campus
[Link]@[Link]
Mob: +91 – 7607481292
Background design is taken from the presentation slides of Salvatore:
International Economics, 10th Edition © 2013 John Wiley & Sons, Inc.
Introduction
❑ Analyze the case of more than one variable
❑ It measures the degree of relationship between the variables
❑ Relationship between Price and Quantity demanded, income
and expenditure, etc…
❑ Relationship between two variables can help in estimating the
value of one variable given the value of another – but through
regression analysis
❑ Measure of correlation is called correlation coefficient
2
Correlation: Steps involved
The problem of analyzing relationship can be studied through
three steps:
1. Determine whether relationship exist?
2. Test the significance of it
3. Establish the cause and effect relationship
3
Correlation and Causation
The problem of analyzing relationship can be studied through
three steps:
1. Correlation does not implies causation however, causation
implies correlation
2. Correlation implies mutually influencing phenomenon
while causation provides the information on cause and
effect variables
3. Correlation can occur because of chance and in such cases
one can interpret the correlation as Spurious/nonsense
correlation
4
Types of Correlation
❑ Positive or Negative – Depends upon direction of change of
the variables
❑ Positive when variables are varying in same direction
❑ Negative when variables are varying in opposite direction
❑ Simple, Partial and Multiple – Based on the number of
variables studied
❑ Simple correlation studies two variables
❑ Partial and multiple studies the relationship between three
or more variables simultaneously
5
Types of Correlation
❑ Simple, Partial and Multiple
❑ Multiple contains studying the relationship between
variable 1 and other remaining variable
❑ Partial contains studying the relationship between variable
1 and variable 2 keeping the effect of third or other
variables constant
❑ Linear and Non-Linear
❑ Linear relationship implies same ratio of change between
the two variables
❑ In case of Non-linear, amount of change in one variable
does not bear constant ratio to the amount of change in
other variable
6
Karl Pearson’s Coefficient of
Correlation
Assumptions
❑ There is a linear relationship between variables
❑ Variables under study are affected by large number of
independent causes
❑ Population being studied is normally distributed
❑ There is cause and effect relationship between the forces
affecting the distribution of items in the two series
Merit and Limitations
❑ Merit is that it provide degree as well as direction of
relationship (positive or negative)
❑ Limitations: it assumes linear relationship; value is affected
by extreme values
7
Karl Pearson’s Coefficient of
Correlation - Calculations
σ 𝑥𝑦
𝑟=
𝑁𝜎𝑋 𝜎𝑌
➢ r = Pearson coefficient of correlation (lies between +1 and -1)
➢ x and y are the deviations of X and Y taken from their
respective actual arithmetic mean
➢ N = number of pairs of observations
➢ 𝜎𝑋 𝜎𝑌 are the SD of X and Y
8
Karl Pearson’s Coefficient of
Correlation - Interpretation
❑ When r = +1, It means there is perfect positive relationship
between the variables
❑ When r = -1, It means there is perfect negative relationship
between the variables
❑ When r = 0, It means there is no relationship between the
variables
❑ The closer r is to +1 or -1, closer the relationship and closer to
0 means less close the relationship
9
Karl Pearson’s Coefficient of
Correlation - Calculations
Direct Method
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑟=
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2
Assumed Mean Method
𝑁 σ 𝑑𝑥 𝑑𝑦 − σ 𝑑𝑥 σ 𝑑𝑦
𝑟=
2
𝑁 σ 𝑑𝑥 2 − σ 𝑑𝑥 2 𝑁 σ 𝑑𝑦 2 − σ 𝑑𝑦
Where, 𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean
10
Properties of Coefficient of
Correlation
❑ The coefficient of correlation lies between -1 and +1
❑ The coefficient of correlation is independent of change of scale
and origin of the variable X and Y
❑ The degree of relationship between the two variables is
symmetric
𝑟𝑥𝑦 = 𝑟𝑦𝑥
❑ The coefficient of correlation is the geometric mean of two
regression coefficients
𝑏𝑥𝑦 × 𝑏𝑦𝑥
11
Coefficient of Correlation &
Probable Error
❑ Probable error is used to check the reliability of correlation value
1 − 𝑟2
𝑃. 𝐸. = 0.6745
𝑁
Where N is number of pairs of observations.
❑ If r is less than probable error then there is no evidence of correlation
❑ If r is more than six times the P.E., then r is significant
❑ One can also get upper and lower limits of r by adding and
subtracting the value of P.E. from r. Within this range, the coefficient
of correlation in the population (𝜌) is expected to lie.
Conditions to use P.E.: Normal distribution & r must be calculated from
the sample
12
Coefficient of Determination
❑ Square of correlation coefficient
❑ Useful way of interpreting the value of r
❑ It is the ratio of Explained variation to total variation
❑ Ranges from 0 to +1
❑ Measure the degree of relationship not the direction
Given r = 0.6 and r = 0.3. Calculate coefficient of determination
and interpret
13
Rank Correlation Coefficient
❑ Appropriate when population is not Normal
❑ Based on Ranks not on actual observations
6 σ 𝐷2
𝑅 =1− 3
𝑁 −𝑁
➢ R = Rank coefficient of correlation
➢ D refers to the difference of rank between paired items in two
series
➢ N is the number of pairs of observations
Two types of problems –
❑ When ranks are given
❑ When ranks are not given
14
Rank Correlation Coefficient
Problem of Equal Ranks
❑ In case of equal ranks, take average of the given rank and next
rank and assign the average value to both
❑ Similarly, same applies when one confronts three equal ranks
Adjusted formula becomes –
1 1
6 σ 𝐷2 + 𝑚1 3 − 𝑚1 + 𝑚2 3 − 𝑚2 + ⋯
𝑅 =1− 12 12
𝑁3 − 𝑁
Where m is number of items with common rank
15
Features of Rank Correlation
Coefficient
1. It is non-parametric
2. It can be interpreted in the similar manner as Pearson
correlation coefficient because it is the Karl Pearson’s
correlation coefficient between ranks instead of between
actual observations
16
Merits & Limitations of Rank
Correlation Coefficient
Merits –
❑ Simpler to understand
❑ Answers obtained will be equal to answers obtained using
Karl Pearson’s coefficient provided no value is repeated
❑ Useful where data is of qualitative nature
❑ Useful when ranks are given
❑ Useful when data are not normally distributed
Limitations –
❑ Not applicable in case of grouped data with frequency
❑ Difficult to apply with large data
17
Correlation in Time-Series Data
❑ With time-series data, extra precaution should be taken while
calculating correlation
❑ Two kinds of periods in time-series data – long term and short
term
❑ Type of correlation may vary as per the time period
❑ Important to study both fluctuations separately
18
Correlation for Long Term Changes
❑ Need to find out the trend values in the data
❑ Use moving average or OLS method to calculate trend values
❑ Use correlation coefficient to calculate correlation between two
series (use trend values of each series instead of actual data)
An example
19
Correlation for Short Term Changes
❑ Calculate trend values using moving average method
❑ Take deviations of actual data from the trend values
❑ Use the following formula to arrive at r
σ 𝑥𝑦
𝑟=
σ 𝑥2 × σ 𝑦2
An example
20
Lag and Lead in Correlation
❑ It consider the role of time gap in policy action and its effect
Calculations include –
❑ First adjust the pairing of items
❑ Apply Karl Pearson’s formula to calculate coefficient of
correlation
An example
21
Partial and Multiple Correlation
❑ Partial and multiple correlation study the relationship
between three or more variables simultaneously
❑ Partial correlation contains studying the relationship between
variable 1 and variable 2 keeping the effect of third or other
variables constant
❑ Multiple correlation contains studying the relationship
between variable 1 and other remaining variables
22
Partial Correlation Coefficient
𝑟12 − 𝑟13 𝑟23
𝑟12.3 =
2 2
1 − 𝑟13 1 − 𝑟23
❑ 𝑟12.3 is the partial correlation coefficient between variable 1
and 2 assuming the effect of 3rd variable as constant on
variables 1 & 2
❑ 𝑟12 is the simple correlation coefficient between variable 1 & 2
❑ Others are simple correlation coefficients between two
variables
23
Multiple Correlation Coefficient
2 2
𝑟12 + 𝑟13 − 2𝑟12 𝑟23 𝑟13
𝑅1.23 = 2
1 − 𝑟23
❑ 𝑅1.23 is the multiple correlation coefficient between variable 1
(dependent variable) and variable 2 & 3 (independent
variables)
❑ 𝑟12 is the simple correlation coefficient between variable 1 & 2
❑ Others are simple correlation coefficients between two
variables
24
Multiple Correlation Coefficient
❑ The value of 𝑅1.23 is non-negative (lies between 0 to 1)
❑ Its square is known as coefficient of multiple determination
❑ 𝑅1.23 is same as 𝑅1.32
❑ If 𝑅1.23 = 0 then 𝑟12 = 0 𝑟13 = 0
❑ Others are simple correlation coefficients between two
variables
25
Introduction to Regression Analysis
❑ It is the measure of average relationship between two or more
variables in terms of original units of the data
❑ Useful in estimating the value of one variable (unknown)
given the value of another (known)
❑ The variable used to predict unknown variable is known as
independent/explanatory variable
❑ The variable which is going to predict is known as
dependent/explained variable
❑ Example is equation of a straight line
26
Regression and Correlation
1. Correlation is a tool to find out the degree of relationship
between variables whereas using Regression it is possible to
study the cause-and-effect relationship
2. Correlation coefficients between x and y and between y and x
are symmetric whereas regression coefficients are not
symmetric
3. Correlation can be nonsense, but regression can never be
nonsense
4. Correlation is independent of change in origin and scale, but
regression is independent of change in origin only
27
Regression Equations
❑ Equation of straight line (Y = a + bX)
❑ ‘a’ and ‘b’ are the numerical constants
❑ Need to determine the values of ‘a’ and ‘b’ using the OLS
method
❑ Method of Least square states that sum of square of
deviations of actual y from computed y is the least
❑ Explain in terms of regression line (best fit line obtained
from method of Least Square)
28
Least Square Method
❑ Following two normal equations, if solved simultaneously,
will yield values of parameters (a & b) such that least
square requirement is fulfilled [X on Y]
𝑋 = 𝑁𝑎 + 𝑏 𝑌
𝑋𝑌 = 𝑎 𝑌 + 𝑏 𝑌 2
• N is number of paired observations
• X and Y are the variables on which data would be given
• a and b are the parameters of regression equation
29
Least Square Method
Derivation of regression Coefficient (b)
𝑁 σ 𝑋𝑌 − (σ 𝑋)(σ 𝑌)
𝑏𝑥𝑦 =
𝑁 σ 𝑌2 − σ 𝑌 2
𝑁 σ 𝑋𝑌 − (σ 𝑋)(σ 𝑌)
𝑏𝑦𝑥 =
𝑁 σ 𝑋2 − σ 𝑋 2
𝑎𝑥𝑦 = 𝑋ത − 𝑏𝑥𝑦 𝑌ത
𝑎𝑦𝑥 = 𝑌ത − 𝑏𝑦𝑥 𝑋ത
30
Estimation using Deviations from
Mean
Regression equation of X on Y
𝑥 = 𝑏𝑥𝑦 𝑦
𝜎𝑥
Where 𝑏𝑥𝑦 = 𝑟 ; x and y are the deviations from their
𝜎𝑦
respective mean
Derive –
σ 𝑥𝑦
𝑏𝑥𝑦 =
σ 𝑦2
Similarly derive regression coefficient for regression
equation of Y on X
31
Correlation and Regression
❑ Under root of the product of two regression coefficients
provides the value of correlation coefficient
Derive r – Class Notes
Points to Remember –
❑ Both regression coefficients can’t be greater than one
❑ Both the regression coefficient must have similar sign
32
Estimation using Deviations using A
Derivation of regression Coefficient (b)
𝑁 σ 𝑑𝑥 𝑑𝑦 − (σ 𝑑𝑥 )(σ 𝑑𝑦 )
𝑏𝑥𝑦 = 2
2
𝑁 σ 𝑑𝑦 − σ 𝑑𝑦
𝑁 σ 𝑑𝑥 𝑑𝑦 − (σ 𝑑𝑥 )(σ 𝑑𝑦 )
𝑏𝑦𝑥 =
𝑁 σ 𝑑𝑥 2 − σ 𝑑𝑥 2
Where,
𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean
33
Estimation from Correlation Table
Derivation of regression Coefficient (b) – Not covered
𝑁 σ 𝑓𝑑𝑥 𝑑𝑦 − (σ 𝑓𝑑𝑥 )(σ 𝑓𝑑𝑦 ) 𝑖𝑥
𝑏𝑥𝑦 = 2 ×
2
𝑁 σ 𝑓𝑑𝑦 − σ 𝑓𝑑𝑦 𝑖𝑦
𝑁 σ 𝑓𝑑𝑥 𝑑𝑦 − (σ 𝑓𝑑𝑥 )(σ 𝑓𝑑𝑦 ) 𝑖𝑦
𝑏𝑦𝑥 = ×
𝑁 σ 𝑓𝑑𝑥 2 − σ 𝑓𝑑𝑥 2 𝑖𝑥
Where,
𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean;
𝑖𝑥 and 𝑖𝑦 are the class intervals of variables X and Y
34
More on Coefficient of Determination
❑ The coefficient of determination ( 𝑟 2 ) explains the
variability in dependent variable explained by
independent variable
❑ Derive 𝑟 2 using regression equation
❑ Being a squared term, it is not a good measure to reflect
the direction of relationship
❑ Look at the sign of slope coefficients for the direction of
relationship between independent and dependent
variables
35
Multiple Regression Analysis
𝑌 = 𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘
❑ a and b’s are the numerical constants
❑ Need to determine the values of numerical constants
through the application of least square principle
❑ Method of Least square states that sum of square of
deviations of actual y from computed y is the least
❑ In other words, minimize the sum of square of error for
the model (error = Actual – Expected)
36
Normal Equations
❑ To estimate three variable regression (including one
dependent variable), one need to specify following three
normal equations
❑ Normal equations for the multiple regression equation of
Y on 𝑋1 and 𝑋2
𝑌 = 𝑁𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2
𝑌𝑋1 = 𝑎 𝑋1 + 𝑏1 𝑋12 + 𝑏2 𝑋1 𝑋2
𝑌𝑋2 = 𝑎 𝑋2 + 𝑏1 𝑋1 𝑋2 + 𝑏2 𝑋22
37
Coefficient of Multiple Determination
Coefficient of Multiple Determination 𝑅2
❑ Analogous to coefficient of determination discussed in
multiple correlation
End of Module 3
*****Happy Learning******
38
Reference
(Statistics) – Gupta, S.P., Statistical Methods, Sultan
Chand and Sons, 45th Revised Edition (2017)
39