0% found this document useful (0 votes)

30 views7 pages

Regression

Uploaded by

Deepak S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Regression

Uploaded by

Deepak S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

4.2.

REGRESSION

The term “regression” literally means “stepping back towards the average”. It was first
used by British biometrician Sir Francos Galton (1822-1911), in connection with the inheritance
of stature. Galton found that the offsprings of abnormally tall or short parents tend to “regress”
or “step back” to the average population height. But the term “regression” as now used in
Statistics is only a convenient term without having any reference to biometry.

Regression analysis is a mathematical measure of the average relationship between two

or more variables in terms of the original units of the data. In regression analysis, there are two
types of variables. The variable whose value is influenced or is to be predicted is called
dependent variable and the variable, which influences the values or is used for prediction is
called independent variable. In regression analysis independent variable is also known as
regressor or predictor or explanatory variable, which the dependent variable is also known as
regressed or explained variable.

If the variables in a bivariate distribution are related, we will find that the points in the
scatter diagram will cluster round some curve called the “curve of regression”. If the curve is a
straight line, it is called the line of regression and there is said to be linear regression between
the variables, otherwise regression is said to be curvilinear.

4.2.1. Prediction using the Regression Equations

The line of regression is the line, which gives the best estimate to the value of one
variable for any specific value of the other variable. Thus, the line of regression is the line of
“best fit” and is obtained by the principle of least squares. Let us suppose that in the bivariate
distribution (xi, yi); i = 1,2,…n; Y is dependent variable and X is independent variable Let in line
of regression of Y on X be Y = a + bx. The above equation represents a family of straight lines
for different values of the arbitrary constants ‘a’ and ‘b’. The problem is to determine ‘a’ and ‘b’
so that the line is the line of best fit. The term ‘best fit’ is interpreted in accordance with
Legender’s principle of least squares, which consists in minimizing the sum of the squares of
the deviations of the actual value as given by the line of best fit.

For any bivariate data (X,Y), there will be two regression equations namely, i)
Regression equation of Y on X and ii) Regression equation of X on Y. The regression equation
of Y on X is used to predict or estimate the value Y for any given value of X =x. Similarly the

74
regression equation of X on Y is used to predict or estimate the value of X for any given value of
Y = y.

i) The regression equation of Y on X is defined as

Y yr
Y
X

X x   
Y  y  bYX X  x 
 
Y  bYX X  x  y
Y
Where byx is the Regression coefficient of Y on X =  r
X
ii) The regression equation of X on Y is defined as

X xr
X
Y

Yy   
X  x  bXY Y  y 
 
X  bXY Y  y  x
X
Where bXY is the Regression coefficient of X on Y =  r
Y

Regression coefficient, bYX is the slope of the line of regression of Y on X is also called
the coefficient of regression of Y on X. It represents the increment in the value of dependent
variable Y corresponding to a unit change in the value of independent variable X. Similarly, the
regression coefficient, bXY the slope of the line of regression of X on Y is also called the
coefficient of regression of X on Y. It represents the increment in the value of dependent
variable X corresponding to a unit change in the value of independent variable Y.

4.2.2. Properties of Regression Coefficients

(a) Correlation coefficient is the geometric mean between the regression coefficients.

(b) If one of the regression coefficients is greater than one, the other must be less than one.

(c) The modulus value of the arithmetic mean of the regression coefficients is not less than
the modulus value of the correlation coefficient r.

Example: Obtain the equations of two lines of regression for the following data represents the
Age of Wifes (X) and their Age of Husbands (Y) from a sample of 8 pairs observed from a
locality.

75
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71

1) Find out the correlation between the age of wifes and the age of their husbands.

2) Obtain the regression equation of Y on X and the regression equation of X on Y

3) Predict the age of wife (X) when the age of husband is (Y) is70.

4) Also predict the age of husband(Y) when the age of wife(X) is 60.

Solution: Since the correlation coefficient does not affect the change of Origin, take the origin
for A as 68 and B as 69. This is easy method for calculation.

A B X=A-68 Y=B-69 X2 Y2 XY
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8

TOTAL 0 0 36 44 24

1 0 1 0
X=
n
 X   0,
8
Y
n
 Y  0
8

1
Cov X , Y  n
 XY  X Y
r(X,Y) = 
 x y 1 2  1 2
  X  X   Y  Y 
2 2

n  n 

76
1
 24  (0  0) 3
= 8   0.603
 36 2  44 2 4.5  5.5
  0    0  
 8  8 

Since correlation coefficient is independent of change of origin, we get r(X,Y) = r (A,B) = 0.6

1). Hence, the correlation between the age of wifes and their husbands is 0.6 which shows that
there exists high degree of positive correlation.

y
The regression equation of Y on X is: Y - Y  r
x

XX 

 Y = 69 + 0.6 
2.35
 X  68  Y = 0.65X + 23.78.
2.12

The regression equation of X on Y is: X - X  r

x
y
Y Y 

 X = 68 + 0.6 
2.12
Y  69  X = 0.54Y + 30.74
2.35

2) Hence, the regression equation of Y on X is: Y = 0.65X +23.78 and the regression equation
of X on Y is: X = 0.54Y + 30.74.

To predict the age of wife(X) when the age of husband is Y = 70, we have to use the regression
 
equation of X on Y. That is, X = 0.54(70) + 30.74 = 68.54, where X is the estimate of X.

3) Hence, when the age of husband is 70, the estimated age of wife is 68.54.

To predict the age of husband(Y) when the age of wife is X = 60, we have to use the regression
 
equation of Y on X. That is, Y = 0.65(60) +23.78 = 63.68, where Y is the estimate of Y.

4) Hence, when the age of wife is 60, the estimated age of husband is 63.68.

I. OBJECTIVE TYPE KEYS

1 The data obtained from a group of individuals on two variables or characters, it is known as

77
Bivariate data.

2 The statistical measure for measuring the relationship between two variables of a bivariate
data is called Correlation.

3 If the values of both variables either increases or decreases, then the correlation is said to be
Positive correlation.

4 If the values of one variable increase results the decrease of the other variable, then the
correlation is said to be Negative correlation.

5 If the change of one variable does not affect or influence the change of other variable, the
correlation is said to be Zero or No correlation.

6 The study of correlation based on the graphical representation is called Scatter diagram
method.

7 The correlation coefficient (r) always lies in between -1 and 1. i.e., -1≤r≤1.

8 If the correlation coefficient r is positive, then correlation is said to be Positive correlation.

9 If the correlation coefficient r is negative, then correlation is said to be Negative correlation.

10 If the correlation coefficient r is zero, then correlation is said to be Zero correlation.

11 If the correlation coefficient r is +1, then correlation is said to be Perfect Positive correlation.

12 If the correlation coefficient r is -1, then correlation is said to be Perfect Negative correlation.

13 The correlation coefficient is the ratio of covariance to the product of standard deviations.

14 If (X,Y) be a bivariate data on n individuals then the Karl Pearson’s coefficient of correlation is
(𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑋 & 𝑌) 𝐶
defined as 𝑟 = (𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑋)(𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑌) = 𝜎 or
𝑋 𝜎𝑌

∑ 𝑋𝑌
−𝑋𝑌 ∑(𝑋−𝑋)(𝑌−𝑌) 𝑛 ∑ 𝑋𝑌−∑ 𝑋 ∑ 𝑌
𝑛
𝑟= ∑𝑋 2 2 ∑ 𝑌2 2
or 𝑟 = or 𝑟 =
√( −𝑋 )( −𝑌 ) √∑(𝑋−𝑋)2 ∑(𝑌−𝑌)2 √(𝑛 ∑ 𝑋 2 –(∑ 𝑋)2 )(𝑛 ∑ 𝑌 2 –(∑ 𝑌)2 )
𝑛 𝑛

15 If the two variables are independent in a bivariate data, then the variables are said to be

78
uncorrelated.

16 Correlation coefficient is independent of change of Origin and Scale.

17 The average relationship between two variables of a bivariate data is provided by Regression.

18 The regression equation of Y on X is used to predict the value of Y if the value of X is given.

19 The regression equation of X on Y is used to predict the value of X if the value of Y is given.

20 The geometric mean of the two regression coefficients is the Correlation coefficient.

21 If the value of one regression coefficient is greater than one, the other must be less than one.

22 Regression coefficients are independent of origin but not on scale.

23 The regression line of Y on X is defined as (Y-𝑌) = 𝑏𝑌𝑋 (𝑋 − 𝑋), where 𝑏𝑌𝑋 is known as
𝑟 𝜎𝑌
regression coefficient of Y on X. i.e., 𝑏𝑌𝑋 = .
𝜎𝑋

24 The regression line of X on Y is defined as (X-𝑋) = 𝑏𝑋𝑌 (𝑌 − 𝑌), where 𝑏𝑋𝑌 is known as
𝑟 𝜎𝑋
regression coefficient of X on Y. i.e., 𝑏𝑋𝑌 = .
𝜎𝑌

25 The value of Y can be predicted using the Regression equation of Y on X for any given value
𝑟 𝜎𝑌
of X = x is defined as Y = ( )(x-𝑥)+ 𝑦
𝜎𝑋

26 The value of X can be predicted using the Regression equation of X on Y for any given value
𝑟 𝜎𝑋
of Y = y is defined as X = ( )(y-𝑦)+ 𝑥
𝜎𝑌

II. LONG ANSWER TYPE QUESTIONS

1. Explain correlation with its types and applications in agricultural analysis.

2. Write a note on the scatter diagram method of observing the correlation.

3. Explain the concept of regression and the regression lines with its practical applications.
III. PRACTICAL EXERCISES

79
1. The following bivariate distribution shows the yield of chilies(X) and amount of fertilizer used
(both in Kgs.) from a sample of 10 plots. Calculate the Karl Pearson’s coefficient of correlation
for the above data and interpret your result.

X 18 22 25 15 24 20 16 23 22 15
Y 4.5 5 5.5 4 6 4 4.5 4 4 3.5

2. The following bivariate data shows the yield of Brinjal(X) and amount of pesticides used (both
in Kgs.) from a sample of 12 plots.

X 28 32 35 25 34 30 26 33 32 25
Y 5.5 4 5.25 4.75 6.5 4.5 4.25 4 4 3.25

Calculate the following:

1. Karl Pearson’s coefficient of correlation for the above data and interpret your result.
2. Obtain the two regression equations.
3. Predict the yield of brinjal when the amount of pesticides used is 5.75.
4. Estimate the amount of pesticides use if the yield of brinjal is 40.

Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
7 pages
Chapter Five Regression
No ratings yet
Chapter Five Regression
12 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
13 pages
Chapter 6 (Business Statistics 1 BA 1315)
No ratings yet
Chapter 6 (Business Statistics 1 BA 1315)
12 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
1.1.2simple Linear Regression
No ratings yet
1.1.2simple Linear Regression
14 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Unit III Part B
No ratings yet
Unit III Part B
31 pages
Correlation and Regression Fundamentals
No ratings yet
Correlation and Regression Fundamentals
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
15 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Exploring Quantitative Variable Associations
No ratings yet
Exploring Quantitative Variable Associations
7 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Chapter 4 and 5
No ratings yet
Chapter 4 and 5
15 pages
Regression & Correlation Basics
No ratings yet
Regression & Correlation Basics
17 pages
Regression
No ratings yet
Regression
14 pages
Regression and Correlation Guide
No ratings yet
Regression and Correlation Guide
13 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
DSC 402
No ratings yet
DSC 402
14 pages
Probability Distributions & Regression
No ratings yet
Probability Distributions & Regression
53 pages
CH VII - Regression & Correlation
No ratings yet
CH VII - Regression & Correlation
7 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Regression 2
No ratings yet
Regression 2
6 pages
Understanding Population R-Squared
No ratings yet
Understanding Population R-Squared
20 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Regression and Correlation Guide
No ratings yet
Regression and Correlation Guide
13 pages
Regression in Agricultural Statistics Important
No ratings yet
Regression in Agricultural Statistics Important
7 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
25 pages
Regression Basics
No ratings yet
Regression Basics
8 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
12 pages
Understanding Regression Analysis
No ratings yet
Understanding Regression Analysis
18 pages
UNIT-3: Correlation and Regression Analysis
No ratings yet
UNIT-3: Correlation and Regression Analysis
3 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
Regression
No ratings yet
Regression
6 pages
Chapter Regression PDF
No ratings yet
Chapter Regression PDF
95 pages
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
No ratings yet
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
8 pages
Chapter 4 (Regression Part)
No ratings yet
Chapter 4 (Regression Part)
13 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
03 ES Regression Correlation
No ratings yet
03 ES Regression Correlation
14 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
CH 6
No ratings yet
CH 6
42 pages
Module 3
No ratings yet
Module 3
92 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Correlation
No ratings yet
Correlation
22 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Introduction to Linear Regression Analysis
No ratings yet
Introduction to Linear Regression Analysis
10 pages
M3 Part 2: Regression Analysis
No ratings yet
M3 Part 2: Regression Analysis
21 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Ma724 - 38
No ratings yet
Ma724 - 38
7 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Association of Social Media Use With Social Well-Being
No ratings yet
Association of Social Media Use With Social Well-Being
15 pages
ECONOMETRIE Tema 1
0% (1)
ECONOMETRIE Tema 1
11 pages
Earthquake Magnitude Conversion Problem
No ratings yet
Earthquake Magnitude Conversion Problem
13 pages
Regression With Dummy Variables Econ420 1
No ratings yet
Regression With Dummy Variables Econ420 1
47 pages
Q4 Week 6 - Statistics and Probability
No ratings yet
Q4 Week 6 - Statistics and Probability
22 pages
Lecture Notes On Multicollinearity
No ratings yet
Lecture Notes On Multicollinearity
16 pages
Spss and Statistics Guide
100% (1)
Spss and Statistics Guide
28 pages
Fourier Transform Infrared Spectrometry Second Edition Peter R. Griffiths ebook updated chapters
100% (1)
Fourier Transform Infrared Spectrometry Second Edition Peter R. Griffiths ebook updated chapters
157 pages
Introduction to Simple Linear Regression
No ratings yet
Introduction to Simple Linear Regression
47 pages
1993, Bodnar, GCA 57
No ratings yet
1993, Bodnar, GCA 57
2 pages
Artificial Neural Networks An Econometric Perspective
No ratings yet
Artificial Neural Networks An Econometric Perspective
98 pages
F Test for Nonlinear Model Fit Assessment
No ratings yet
F Test for Nonlinear Model Fit Assessment
6 pages
Logistic Regression Model in Jupyter
No ratings yet
Logistic Regression Model in Jupyter
22 pages
Logistic Regression: Prof. Andy Field
No ratings yet
Logistic Regression: Prof. Andy Field
34 pages
WT & DA Solved Slips
No ratings yet
WT & DA Solved Slips
144 pages
Lesson 203.11 20worksheet 20 (Bivariate 20data)
No ratings yet
Lesson 203.11 20worksheet 20 (Bivariate 20data)
4 pages
Regression Analysis of Sleep and Weight
100% (2)
Regression Analysis of Sleep and Weight
5 pages
Cpa f1.1 - Business Mathematics & Quantitative Methods - Study Manual
No ratings yet
Cpa f1.1 - Business Mathematics & Quantitative Methods - Study Manual
573 pages
Skill Importance in Volleyball
No ratings yet
Skill Importance in Volleyball
15 pages
Optimizing Pharmaceutical Supply Chains An Intelligent Approach To Sustainable Business Growth
No ratings yet
Optimizing Pharmaceutical Supply Chains An Intelligent Approach To Sustainable Business Growth
14 pages
Data Science Selection Questions and Their Answers 2022
No ratings yet
Data Science Selection Questions and Their Answers 2022
5 pages
STAT 135: Linear Regression: Joan Bruna
No ratings yet
STAT 135: Linear Regression: Joan Bruna
232 pages
Model Poster 2
No ratings yet
Model Poster 2
1 page
Zuffa's Opposition to Class Cert Motion
No ratings yet
Zuffa's Opposition to Class Cert Motion
21 pages
The Cross-Section of Expected Corporate Bond Returns: Betas or Characteristics?
No ratings yet
The Cross-Section of Expected Corporate Bond Returns: Betas or Characteristics?
43 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Essential Math For AI - ML
100% (1)
Essential Math For AI - ML
22 pages
Sales and Operations Planning
No ratings yet
Sales and Operations Planning
15 pages
s4 - Arima - Sarima
No ratings yet
s4 - Arima - Sarima
57 pages
Asset Growth and Firm Performance: The Moderating Role of Asset Utilization
No ratings yet
Asset Growth and Firm Performance: The Moderating Role of Asset Utilization
18 pages