0% found this document useful (0 votes)
30 views7 pages

Regression

Uploaded by

Deepak S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views7 pages

Regression

Uploaded by

Deepak S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

4.2.

REGRESSION

The term “regression” literally means “stepping back towards the average”. It was first
used by British biometrician Sir Francos Galton (1822-1911), in connection with the inheritance
of stature. Galton found that the offsprings of abnormally tall or short parents tend to “regress”
or “step back” to the average population height. But the term “regression” as now used in
Statistics is only a convenient term without having any reference to biometry.

Regression analysis is a mathematical measure of the average relationship between two


or more variables in terms of the original units of the data. In regression analysis, there are two
types of variables. The variable whose value is influenced or is to be predicted is called
dependent variable and the variable, which influences the values or is used for prediction is
called independent variable. In regression analysis independent variable is also known as
regressor or predictor or explanatory variable, which the dependent variable is also known as
regressed or explained variable.

If the variables in a bivariate distribution are related, we will find that the points in the
scatter diagram will cluster round some curve called the “curve of regression”. If the curve is a
straight line, it is called the line of regression and there is said to be linear regression between
the variables, otherwise regression is said to be curvilinear.

4.2.1. Prediction using the Regression Equations

The line of regression is the line, which gives the best estimate to the value of one
variable for any specific value of the other variable. Thus, the line of regression is the line of
“best fit” and is obtained by the principle of least squares. Let us suppose that in the bivariate
distribution (xi, yi); i = 1,2,…n; Y is dependent variable and X is independent variable Let in line
of regression of Y on X be Y = a + bx. The above equation represents a family of straight lines
for different values of the arbitrary constants ‘a’ and ‘b’. The problem is to determine ‘a’ and ‘b’
so that the line is the line of best fit. The term ‘best fit’ is interpreted in accordance with
Legender’s principle of least squares, which consists in minimizing the sum of the squares of
the deviations of the actual value as given by the line of best fit.

For any bivariate data (X,Y), there will be two regression equations namely, i)
Regression equation of Y on X and ii) Regression equation of X on Y. The regression equation
of Y on X is used to predict or estimate the value Y for any given value of X =x. Similarly the

74
regression equation of X on Y is used to predict or estimate the value of X for any given value of
Y = y.

i) The regression equation of Y on X is defined as

Y yr
Y
X

X x   
Y  y  bYX X  x 
 
Y  bYX X  x  y
Y
Where byx is the Regression coefficient of Y on X =  r
X
ii) The regression equation of X on Y is defined as

X xr
X
Y

Yy   
X  x  bXY Y  y 
 
X  bXY Y  y  x
X
Where bXY is the Regression coefficient of X on Y =  r
Y

Regression coefficient, bYX is the slope of the line of regression of Y on X is also called
the coefficient of regression of Y on X. It represents the increment in the value of dependent
variable Y corresponding to a unit change in the value of independent variable X. Similarly, the
regression coefficient, bXY the slope of the line of regression of X on Y is also called the
coefficient of regression of X on Y. It represents the increment in the value of dependent
variable X corresponding to a unit change in the value of independent variable Y.

4.2.2. Properties of Regression Coefficients

(a) Correlation coefficient is the geometric mean between the regression coefficients.

(b) If one of the regression coefficients is greater than one, the other must be less than one.

(c) The modulus value of the arithmetic mean of the regression coefficients is not less than
the modulus value of the correlation coefficient r.

Example: Obtain the equations of two lines of regression for the following data represents the
Age of Wifes (X) and their Age of Husbands (Y) from a sample of 8 pairs observed from a
locality.

75
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71

1) Find out the correlation between the age of wifes and the age of their husbands.

2) Obtain the regression equation of Y on X and the regression equation of X on Y

3) Predict the age of wife (X) when the age of husband is (Y) is70.

4) Also predict the age of husband(Y) when the age of wife(X) is 60.

Solution: Since the correlation coefficient does not affect the change of Origin, take the origin
for A as 68 and B as 69. This is easy method for calculation.

A B X=A-68 Y=B-69 X2 Y2 XY
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8

TOTAL 0 0 36 44 24

1 0 1 0
X=
n
 X   0,
8
Y
n
 Y  0
8

1
Cov X , Y  n
 XY  X Y
r(X,Y) = 
 x y 1 2  1 2
  X  X   Y  Y 
2 2

n  n 

76
1
 24  (0  0) 3
= 8   0.603
 36 2  44 2 4.5  5.5
  0    0  
 8  8 

Since correlation coefficient is independent of change of origin, we get r(X,Y) = r (A,B) = 0.6

1). Hence, the correlation between the age of wifes and their husbands is 0.6 which shows that
there exists high degree of positive correlation.

y
The regression equation of Y on X is: Y - Y  r
x

XX 

 Y = 69 + 0.6 
2.35
 X  68  Y = 0.65X + 23.78.
2.12

The regression equation of X on Y is: X - X  r


x
y
Y Y 

 X = 68 + 0.6 
2.12
Y  69  X = 0.54Y + 30.74
2.35

2) Hence, the regression equation of Y on X is: Y = 0.65X +23.78 and the regression equation
of X on Y is: X = 0.54Y + 30.74.

To predict the age of wife(X) when the age of husband is Y = 70, we have to use the regression
 
equation of X on Y. That is, X = 0.54(70) + 30.74 = 68.54, where X is the estimate of X.

3) Hence, when the age of husband is 70, the estimated age of wife is 68.54.

To predict the age of husband(Y) when the age of wife is X = 60, we have to use the regression
 
equation of Y on X. That is, Y = 0.65(60) +23.78 = 63.68, where Y is the estimate of Y.

4) Hence, when the age of wife is 60, the estimated age of husband is 63.68.

I. OBJECTIVE TYPE KEYS

1 The data obtained from a group of individuals on two variables or characters, it is known as

77
Bivariate data.

2 The statistical measure for measuring the relationship between two variables of a bivariate
data is called Correlation.

3 If the values of both variables either increases or decreases, then the correlation is said to be
Positive correlation.

4 If the values of one variable increase results the decrease of the other variable, then the
correlation is said to be Negative correlation.

5 If the change of one variable does not affect or influence the change of other variable, the
correlation is said to be Zero or No correlation.

6 The study of correlation based on the graphical representation is called Scatter diagram
method.

7 The correlation coefficient (r) always lies in between -1 and 1. i.e., -1≤r≤1.

8 If the correlation coefficient r is positive, then correlation is said to be Positive correlation.

9 If the correlation coefficient r is negative, then correlation is said to be Negative correlation.

10 If the correlation coefficient r is zero, then correlation is said to be Zero correlation.

11 If the correlation coefficient r is +1, then correlation is said to be Perfect Positive correlation.

12 If the correlation coefficient r is -1, then correlation is said to be Perfect Negative correlation.

13 The correlation coefficient is the ratio of covariance to the product of standard deviations.

14 If (X,Y) be a bivariate data on n individuals then the Karl Pearson’s coefficient of correlation is
(𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑋 & 𝑌) 𝐶
defined as 𝑟 = (𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑋)(𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑌) = 𝜎 or
𝑋 𝜎𝑌

∑ 𝑋𝑌
−𝑋𝑌 ∑(𝑋−𝑋)(𝑌−𝑌) 𝑛 ∑ 𝑋𝑌−∑ 𝑋 ∑ 𝑌
𝑛
𝑟= ∑𝑋 2 2 ∑ 𝑌2 2
or 𝑟 = or 𝑟 =
√( −𝑋 )( −𝑌 ) √∑(𝑋−𝑋)2 ∑(𝑌−𝑌)2 √(𝑛 ∑ 𝑋 2 –(∑ 𝑋)2 )(𝑛 ∑ 𝑌 2 –(∑ 𝑌)2 )
𝑛 𝑛

15 If the two variables are independent in a bivariate data, then the variables are said to be

78
uncorrelated.

16 Correlation coefficient is independent of change of Origin and Scale.

17 The average relationship between two variables of a bivariate data is provided by Regression.

18 The regression equation of Y on X is used to predict the value of Y if the value of X is given.

19 The regression equation of X on Y is used to predict the value of X if the value of Y is given.

20 The geometric mean of the two regression coefficients is the Correlation coefficient.

21 If the value of one regression coefficient is greater than one, the other must be less than one.

22 Regression coefficients are independent of origin but not on scale.

23 The regression line of Y on X is defined as (Y-𝑌) = 𝑏𝑌𝑋 (𝑋 − 𝑋), where 𝑏𝑌𝑋 is known as
𝑟 𝜎𝑌
regression coefficient of Y on X. i.e., 𝑏𝑌𝑋 = .
𝜎𝑋

24 The regression line of X on Y is defined as (X-𝑋) = 𝑏𝑋𝑌 (𝑌 − 𝑌), where 𝑏𝑋𝑌 is known as
𝑟 𝜎𝑋
regression coefficient of X on Y. i.e., 𝑏𝑋𝑌 = .
𝜎𝑌

25 The value of Y can be predicted using the Regression equation of Y on X for any given value
𝑟 𝜎𝑌
of X = x is defined as Y = ( )(x-𝑥)+ 𝑦
𝜎𝑋

26 The value of X can be predicted using the Regression equation of X on Y for any given value
𝑟 𝜎𝑋
of Y = y is defined as X = ( )(y-𝑦)+ 𝑥
𝜎𝑌

II. LONG ANSWER TYPE QUESTIONS

1. Explain correlation with its types and applications in agricultural analysis.

2. Write a note on the scatter diagram method of observing the correlation.

3. Explain the concept of regression and the regression lines with its practical applications.
III. PRACTICAL EXERCISES

79
1. The following bivariate distribution shows the yield of chilies(X) and amount of fertilizer used
(both in Kgs.) from a sample of 10 plots. Calculate the Karl Pearson’s coefficient of correlation
for the above data and interpret your result.

X 18 22 25 15 24 20 16 23 22 15
Y 4.5 5 5.5 4 6 4 4.5 4 4 3.5

2. The following bivariate data shows the yield of Brinjal(X) and amount of pesticides used (both
in Kgs.) from a sample of 12 plots.

X 28 32 35 25 34 30 26 33 32 25
Y 5.5 4 5.25 4.75 6.5 4.5 4.25 4 4 3.25

Calculate the following:

1. Karl Pearson’s coefficient of correlation for the above data and interpret your result.
2. Obtain the two regression equations.
3. Predict the yield of brinjal when the amount of pesticides used is 5.75.
4. Estimate the amount of pesticides use if the yield of brinjal is 40.

80

You might also like