0% found this document useful (0 votes)
12 views39 pages

Module 3 - Correlation Regression

The document outlines the concepts of correlation and regression analysis, emphasizing their importance in understanding relationships between multiple variables. It discusses various types of correlation, methods of calculation, and the distinction between correlation and causation. Additionally, it covers regression analysis, including the least squares method for estimating relationships and the interpretation of regression coefficients.

Uploaded by

f20240962
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views39 pages

Module 3 - Correlation Regression

The document outlines the concepts of correlation and regression analysis, emphasizing their importance in understanding relationships between multiple variables. It discusses various types of correlation, methods of calculation, and the distinction between correlation and causation. Additionally, it covers regression analysis, including the least squares method for estimating relationships and the interpretation of regression coefficients.

Uploaded by

f20240962
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE Three : Correlation & Regression – 2023-24

3 Mathematical and
Statistical Methods
ECON F213

Dr. Rahul Arora (IC)


Assistant Professor,
Department of Economics & Finance,
BITS Pilani, Pilani Campus
[Link]@[Link]
Mob: +91 – 7607481292

Background design is taken from the presentation slides of Salvatore:


International Economics, 10th Edition © 2013 John Wiley & Sons, Inc.
Introduction

❑ Analyze the case of more than one variable

❑ It measures the degree of relationship between the variables

❑ Relationship between Price and Quantity demanded, income


and expenditure, etc…

❑ Relationship between two variables can help in estimating the


value of one variable given the value of another – but through
regression analysis

❑ Measure of correlation is called correlation coefficient

2
Correlation: Steps involved

The problem of analyzing relationship can be studied through


three steps:

1. Determine whether relationship exist?

2. Test the significance of it

3. Establish the cause and effect relationship

3
Correlation and Causation

The problem of analyzing relationship can be studied through


three steps:

1. Correlation does not implies causation however, causation


implies correlation

2. Correlation implies mutually influencing phenomenon


while causation provides the information on cause and
effect variables

3. Correlation can occur because of chance and in such cases


one can interpret the correlation as Spurious/nonsense
correlation
4
Types of Correlation
❑ Positive or Negative – Depends upon direction of change of
the variables

❑ Positive when variables are varying in same direction


❑ Negative when variables are varying in opposite direction

❑ Simple, Partial and Multiple – Based on the number of


variables studied

❑ Simple correlation studies two variables


❑ Partial and multiple studies the relationship between three
or more variables simultaneously

5
Types of Correlation
❑ Simple, Partial and Multiple
❑ Multiple contains studying the relationship between
variable 1 and other remaining variable
❑ Partial contains studying the relationship between variable
1 and variable 2 keeping the effect of third or other
variables constant

❑ Linear and Non-Linear


❑ Linear relationship implies same ratio of change between
the two variables
❑ In case of Non-linear, amount of change in one variable
does not bear constant ratio to the amount of change in
other variable

6
Karl Pearson’s Coefficient of
Correlation
Assumptions
❑ There is a linear relationship between variables
❑ Variables under study are affected by large number of
independent causes
❑ Population being studied is normally distributed
❑ There is cause and effect relationship between the forces
affecting the distribution of items in the two series

Merit and Limitations


❑ Merit is that it provide degree as well as direction of
relationship (positive or negative)
❑ Limitations: it assumes linear relationship; value is affected
by extreme values

7
Karl Pearson’s Coefficient of
Correlation - Calculations
σ 𝑥𝑦
𝑟=
𝑁𝜎𝑋 𝜎𝑌

➢ r = Pearson coefficient of correlation (lies between +1 and -1)

➢ x and y are the deviations of X and Y taken from their


respective actual arithmetic mean

➢ N = number of pairs of observations

➢ 𝜎𝑋 𝜎𝑌 are the SD of X and Y

8
Karl Pearson’s Coefficient of
Correlation - Interpretation
❑ When r = +1, It means there is perfect positive relationship
between the variables

❑ When r = -1, It means there is perfect negative relationship


between the variables

❑ When r = 0, It means there is no relationship between the


variables

❑ The closer r is to +1 or -1, closer the relationship and closer to


0 means less close the relationship

9
Karl Pearson’s Coefficient of
Correlation - Calculations
Direct Method
𝑁 σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑟=
𝑁 σ 𝑋2 − σ 𝑋 2 𝑁 σ 𝑌2 − σ 𝑌 2

Assumed Mean Method

𝑁 σ 𝑑𝑥 𝑑𝑦 − σ 𝑑𝑥 σ 𝑑𝑦
𝑟=
2
𝑁 σ 𝑑𝑥 2 − σ 𝑑𝑥 2 𝑁 σ 𝑑𝑦 2 − σ 𝑑𝑦

Where, 𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean

10
Properties of Coefficient of
Correlation
❑ The coefficient of correlation lies between -1 and +1

❑ The coefficient of correlation is independent of change of scale


and origin of the variable X and Y

❑ The degree of relationship between the two variables is


symmetric
𝑟𝑥𝑦 = 𝑟𝑦𝑥

❑ The coefficient of correlation is the geometric mean of two


regression coefficients
𝑏𝑥𝑦 × 𝑏𝑦𝑥

11
Coefficient of Correlation &
Probable Error
❑ Probable error is used to check the reliability of correlation value
1 − 𝑟2
𝑃. 𝐸. = 0.6745
𝑁
Where N is number of pairs of observations.

❑ If r is less than probable error then there is no evidence of correlation

❑ If r is more than six times the P.E., then r is significant

❑ One can also get upper and lower limits of r by adding and
subtracting the value of P.E. from r. Within this range, the coefficient
of correlation in the population (𝜌) is expected to lie.

Conditions to use P.E.: Normal distribution & r must be calculated from


the sample
12
Coefficient of Determination
❑ Square of correlation coefficient

❑ Useful way of interpreting the value of r

❑ It is the ratio of Explained variation to total variation

❑ Ranges from 0 to +1

❑ Measure the degree of relationship not the direction

Given r = 0.6 and r = 0.3. Calculate coefficient of determination


and interpret

13
Rank Correlation Coefficient
❑ Appropriate when population is not Normal
❑ Based on Ranks not on actual observations

6 σ 𝐷2
𝑅 =1− 3
𝑁 −𝑁

➢ R = Rank coefficient of correlation


➢ D refers to the difference of rank between paired items in two
series
➢ N is the number of pairs of observations

Two types of problems –


❑ When ranks are given
❑ When ranks are not given
14
Rank Correlation Coefficient
Problem of Equal Ranks

❑ In case of equal ranks, take average of the given rank and next
rank and assign the average value to both

❑ Similarly, same applies when one confronts three equal ranks

Adjusted formula becomes –


1 1
6 σ 𝐷2 + 𝑚1 3 − 𝑚1 + 𝑚2 3 − 𝑚2 + ⋯
𝑅 =1− 12 12
𝑁3 − 𝑁

Where m is number of items with common rank

15
Features of Rank Correlation
Coefficient
1. It is non-parametric

2. It can be interpreted in the similar manner as Pearson


correlation coefficient because it is the Karl Pearson’s
correlation coefficient between ranks instead of between
actual observations

16
Merits & Limitations of Rank
Correlation Coefficient
Merits –
❑ Simpler to understand
❑ Answers obtained will be equal to answers obtained using
Karl Pearson’s coefficient provided no value is repeated
❑ Useful where data is of qualitative nature
❑ Useful when ranks are given
❑ Useful when data are not normally distributed

Limitations –
❑ Not applicable in case of grouped data with frequency
❑ Difficult to apply with large data

17
Correlation in Time-Series Data

❑ With time-series data, extra precaution should be taken while


calculating correlation

❑ Two kinds of periods in time-series data – long term and short


term

❑ Type of correlation may vary as per the time period

❑ Important to study both fluctuations separately

18
Correlation for Long Term Changes
❑ Need to find out the trend values in the data

❑ Use moving average or OLS method to calculate trend values

❑ Use correlation coefficient to calculate correlation between two


series (use trend values of each series instead of actual data)

An example

19
Correlation for Short Term Changes
❑ Calculate trend values using moving average method

❑ Take deviations of actual data from the trend values

❑ Use the following formula to arrive at r

σ 𝑥𝑦
𝑟=
σ 𝑥2 × σ 𝑦2

An example

20
Lag and Lead in Correlation
❑ It consider the role of time gap in policy action and its effect

Calculations include –

❑ First adjust the pairing of items


❑ Apply Karl Pearson’s formula to calculate coefficient of
correlation

An example

21
Partial and Multiple Correlation

❑ Partial and multiple correlation study the relationship


between three or more variables simultaneously

❑ Partial correlation contains studying the relationship between


variable 1 and variable 2 keeping the effect of third or other
variables constant

❑ Multiple correlation contains studying the relationship


between variable 1 and other remaining variables

22
Partial Correlation Coefficient
𝑟12 − 𝑟13 𝑟23
𝑟12.3 =
2 2
1 − 𝑟13 1 − 𝑟23

❑ 𝑟12.3 is the partial correlation coefficient between variable 1


and 2 assuming the effect of 3rd variable as constant on
variables 1 & 2

❑ 𝑟12 is the simple correlation coefficient between variable 1 & 2

❑ Others are simple correlation coefficients between two


variables

23
Multiple Correlation Coefficient
2 2
𝑟12 + 𝑟13 − 2𝑟12 𝑟23 𝑟13
𝑅1.23 = 2
1 − 𝑟23

❑ 𝑅1.23 is the multiple correlation coefficient between variable 1


(dependent variable) and variable 2 & 3 (independent
variables)

❑ 𝑟12 is the simple correlation coefficient between variable 1 & 2

❑ Others are simple correlation coefficients between two


variables

24
Multiple Correlation Coefficient

❑ The value of 𝑅1.23 is non-negative (lies between 0 to 1)

❑ Its square is known as coefficient of multiple determination

❑ 𝑅1.23 is same as 𝑅1.32

❑ If 𝑅1.23 = 0 then 𝑟12 = 0 𝑟13 = 0

❑ Others are simple correlation coefficients between two


variables

25
Introduction to Regression Analysis
❑ It is the measure of average relationship between two or more
variables in terms of original units of the data

❑ Useful in estimating the value of one variable (unknown)


given the value of another (known)

❑ The variable used to predict unknown variable is known as


independent/explanatory variable

❑ The variable which is going to predict is known as


dependent/explained variable

❑ Example is equation of a straight line

26
Regression and Correlation
1. Correlation is a tool to find out the degree of relationship
between variables whereas using Regression it is possible to
study the cause-and-effect relationship

2. Correlation coefficients between x and y and between y and x


are symmetric whereas regression coefficients are not
symmetric

3. Correlation can be nonsense, but regression can never be


nonsense

4. Correlation is independent of change in origin and scale, but


regression is independent of change in origin only

27
Regression Equations

❑ Equation of straight line (Y = a + bX)

❑ ‘a’ and ‘b’ are the numerical constants

❑ Need to determine the values of ‘a’ and ‘b’ using the OLS
method

❑ Method of Least square states that sum of square of


deviations of actual y from computed y is the least

❑ Explain in terms of regression line (best fit line obtained


from method of Least Square)

28
Least Square Method
❑ Following two normal equations, if solved simultaneously,
will yield values of parameters (a & b) such that least
square requirement is fulfilled [X on Y]

෍ 𝑋 = 𝑁𝑎 + 𝑏 ෍ 𝑌

෍ 𝑋𝑌 = 𝑎 ෍ 𝑌 + 𝑏 ෍ 𝑌 2

• N is number of paired observations


• X and Y are the variables on which data would be given
• a and b are the parameters of regression equation

29
Least Square Method
Derivation of regression Coefficient (b)

𝑁 σ 𝑋𝑌 − (σ 𝑋)(σ 𝑌)
𝑏𝑥𝑦 =
𝑁 σ 𝑌2 − σ 𝑌 2

𝑁 σ 𝑋𝑌 − (σ 𝑋)(σ 𝑌)
𝑏𝑦𝑥 =
𝑁 σ 𝑋2 − σ 𝑋 2

𝑎𝑥𝑦 = 𝑋ത − 𝑏𝑥𝑦 𝑌ത

𝑎𝑦𝑥 = 𝑌ത − 𝑏𝑦𝑥 𝑋ത

30
Estimation using Deviations from
Mean
Regression equation of X on Y

𝑥 = 𝑏𝑥𝑦 𝑦

𝜎𝑥
Where 𝑏𝑥𝑦 = 𝑟 ; x and y are the deviations from their
𝜎𝑦
respective mean

Derive –
σ 𝑥𝑦
𝑏𝑥𝑦 =
σ 𝑦2

Similarly derive regression coefficient for regression


equation of Y on X
31
Correlation and Regression
❑ Under root of the product of two regression coefficients
provides the value of correlation coefficient

Derive r – Class Notes

Points to Remember –

❑ Both regression coefficients can’t be greater than one

❑ Both the regression coefficient must have similar sign

32
Estimation using Deviations using A
Derivation of regression Coefficient (b)

𝑁 σ 𝑑𝑥 𝑑𝑦 − (σ 𝑑𝑥 )(σ 𝑑𝑦 )
𝑏𝑥𝑦 = 2
2
𝑁 σ 𝑑𝑦 − σ 𝑑𝑦

𝑁 σ 𝑑𝑥 𝑑𝑦 − (σ 𝑑𝑥 )(σ 𝑑𝑦 )
𝑏𝑦𝑥 =
𝑁 σ 𝑑𝑥 2 − σ 𝑑𝑥 2
Where,

𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean

33
Estimation from Correlation Table
Derivation of regression Coefficient (b) – Not covered

𝑁 σ 𝑓𝑑𝑥 𝑑𝑦 − (σ 𝑓𝑑𝑥 )(σ 𝑓𝑑𝑦 ) 𝑖𝑥


𝑏𝑥𝑦 = 2 ×
2
𝑁 σ 𝑓𝑑𝑦 − σ 𝑓𝑑𝑦 𝑖𝑦

𝑁 σ 𝑓𝑑𝑥 𝑑𝑦 − (σ 𝑓𝑑𝑥 )(σ 𝑓𝑑𝑦 ) 𝑖𝑦


𝑏𝑦𝑥 = ×
𝑁 σ 𝑓𝑑𝑥 2 − σ 𝑓𝑑𝑥 2 𝑖𝑥
Where,

𝑑𝑥 and 𝑑𝑦 are the deviations taken from assumed mean;


𝑖𝑥 and 𝑖𝑦 are the class intervals of variables X and Y

34
More on Coefficient of Determination

❑ The coefficient of determination ( 𝑟 2 ) explains the


variability in dependent variable explained by
independent variable

❑ Derive 𝑟 2 using regression equation

❑ Being a squared term, it is not a good measure to reflect


the direction of relationship

❑ Look at the sign of slope coefficients for the direction of


relationship between independent and dependent
variables

35
Multiple Regression Analysis

𝑌 = 𝑎 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑘 𝑋𝑘

❑ a and b’s are the numerical constants

❑ Need to determine the values of numerical constants


through the application of least square principle

❑ Method of Least square states that sum of square of


deviations of actual y from computed y is the least

❑ In other words, minimize the sum of square of error for


the model (error = Actual – Expected)

36
Normal Equations
❑ To estimate three variable regression (including one
dependent variable), one need to specify following three
normal equations

❑ Normal equations for the multiple regression equation of


Y on 𝑋1 and 𝑋2
෍ 𝑌 = 𝑁𝑎 + 𝑏1 ෍ 𝑋1 + 𝑏2 ෍ 𝑋2

෍ 𝑌𝑋1 = 𝑎 ෍ 𝑋1 + 𝑏1 ෍ 𝑋12 + 𝑏2 ෍ 𝑋1 𝑋2

෍ 𝑌𝑋2 = 𝑎 ෍ 𝑋2 + 𝑏1 ෍ 𝑋1 𝑋2 + 𝑏2 ෍ 𝑋22

37
Coefficient of Multiple Determination

Coefficient of Multiple Determination 𝑅2

❑ Analogous to coefficient of determination discussed in


multiple correlation

End of Module 3
*****Happy Learning******

38
Reference

(Statistics) – Gupta, S.P., Statistical Methods, Sultan


Chand and Sons, 45th Revised Edition (2017)

39

You might also like