0% found this document useful (0 votes)
24 views14 pages

Regression

Uploaded by

Hajraa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

Regression

Uploaded by

Hajraa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Chapter 5

Regression and Correlation


Regression:
The dependence of one variable (dependent), on one or more than one variable is
called as regression.
It is used for estimating or predicting the average value of dependent variable from
the known values of independent variable.

Simple regression:
When we study the dependence of a variable on a single independent variable, it is
called as simple regression.
Example:
1. Consumption depends on income.
Here the consumption is dependent, whereas income is independent variable.

Multiple Regressions:
The dependent of one variable on more than one independent variable is called as
multiple regressions.
Example:
Yields of wheat depend upon fertilizers, seed, sand etc.

Deterministic and probabilistic relations

Deterministic Relation:
When there exists exact relationship between two variables, then it is called as
deterministic relation.
Example:
9
F= 32 + 5 c

Probabilistic relation:
When the relationship is inexact, that is called as probabilistic relation.

Example:
Dependence of wheat production on different factors is an example of probabilistic
relation.
Scatter diagram:
For checking whether or not a relationship between two variables is linear or not,
scatter diagram is used. For constructing this diagram, pairs of (X2, Y2) are
considered. On X-axis we take X (independent variable) and on Y-axis we take Y
(dependent variable).

Example: Draw a scatter diagram.


X Y
2 1

4 2

6 3

8 4
10 5
Scatter Diagram
6

4
Scatter Diagram
3

0
2 4 6 8 10

Equation of Regression:

Formula:

nƩXY −( ƩX )(ƩY )
b= nƩ X 2−¿ ¿

and
a= Y - b X
ƩY ƩX
[Y = n , X = n ]
Where,
^y =
a + bX
Y= dependent variable
X= independent variable
a= Constant or intercept
b= regression co-efficient or slope of line
Example Table:

X Y XY X2

TOTA
L

Q. For the data given below, find regression equation of Y on X.


X Y XY X2 Y2
1 1 1 1 1
2 4 8 4 16
3 9 27 9 81
4 16 64 16 256
6 18 108 36 324
TOTAL 16 48 208 66 678

nƩXY −( ƩX ) ( ƩY )
b= 2
nƩ X −¿ ¿

5(208)−(16)(48)
b= 5(66)−¿ ¿

1040−768
b= 330−256
b= 3.675
a= Y - b X
ƩY 48
Y = n
= =9.6
5

ƩX 16
X = n = 5 = 3.2
a= 9.6 – 3.675 (3.2)
a= -2.16
Regression Equation:
^y =
-2.16+3.675x
Q. Data has been defined as
X= rainfall, Y= Yield of Wheat, then find
i. Regression equation that predicts yields.
ii. Find the error/residuals and show that Ʃe=0
iii. Show the data on scatter diagram.
X Y XY X2 ^y =15.73
+ 3.16x e=Y- ^y
12.9 62.5 806.25 166.41 56.5 6
7.2 28.7 206.64 51.84 38.5 -9.8
11.3 52.2 589.86 127.69 51.4 0.8
18.6 80.6 1499.16 345.96 74.5 6.1
8.8 41.6 366.08 77.44 43.5 -1.9
10.3 71.3 734.39 106.09 48.3 23.0
15.9 54.4 864.96 252.81 66.0 -11.6
13.1 44.5 582.95 171.61 57.1 -12.6
TOTAL 98.1 435.8 5650.29 1299.85 435.8 0
^y = a + bX ƩY 435.8
Y = n
=
8
=54.475
Where
nƩXY −( ƩX ) ( ƩY ) ƩX 98.1
b= 2
nƩ X −¿ ¿
X= n = 8 = 12.2625
8(5650.29)−( 98.1 )( 435.8 )
b= 8(1299.85)−¿ ¿
a= Y - b X
b= 3.16 a= 54.475 – 3.16 (12.2625)
a= 15.73

Regression equation: ^y = a + bX: ^y = 15.73 + 3.16X


Scatter Diagram

90

80

70

60

50

40

30

20

10

0
6 8 10 12 14 16 18 20

X (x-axis) Y (y-axis)
1 ( 12.9, 62.5 )
2. ( 7.2, 28.7 )
3. ( 11.3, 52.2 )
4. ( 18.6, 80.6 )
5. ( 8.8, 41.6 )
6. ( 10.3, 71.3 )
7. ( 15.9, 54.4 )
8. ( 13.1, 44.5 )

Where
Y= Wheat Production (y-axis)
X= Rainfall (x-axis)

Standard deviation of regression or Standard error of estimate

Sy.x= √ Ʃ y 2−aƩy−bƩxy
n−2

Activity: for the previous question, find the standard error of estimate.
CO-EFFICIENT OF DETERMINATION

R2= aƩy +bƩxy−¿ ¿ ¿

Practice:
Find r2 for previous data.
Correlation:
The inter-dependence between two variables is called as correlation.
Example: the relationship between gold prices and oil prices.

Positive Correlation:
Two variables are said to be positively Correlated if tend to increase or
decrease in the same direction.
Example: The length of iron bar will increase if the temperature
increases.
Negative Correlation:
When one variable increases and other decreases then it is called as
negative correlation.
Example: Number of Corona patients will decrease by increased stay at
home or in isolation.
Postive Correlation
30
25
20 Postive Correlation
15
10
5
0
2 4 6 8 10

Negative Correlation
30
25
20 Negative Correla-
tion
15
10
5
0
2 4 6 8 10

Correlation is measured by “r” the correlation co-efficient.

nƩxy−( Ʃx ) ( Ʃy)
r= √ ¿ ¿¿

Columns will be made:

x y xy X2 Y2

TOTAL

Q. Find the correlation co-efficient for the given data.

X Y X2 Y2 XY
1 2 1 4 2
2 5 4 25 10
3 3 9 9 9
4 8 16 64 32
5 7 25 49 35
TOTAL 15 25 55 151 88

nƩxy−( Ʃx ) ( Ʃy)
r= √ ¿ ¿¿

5(88)−( 15 ) (25)
r= √¿¿¿

13
r = √1 0∗26 = 0.806
Properties of correlation co-efficient
i. “r” is symmetrical with respect to “x” and “y”.
rxy= ryx .
ii. It lies between -1 and +1.
iii. It is independent of origin and scale, rxy= ruv.
iv. “r” is the geometric mean between two regressions co-efficient.
r = √ b yx∗b xy where

nƩxy−( Ʃx ) ( Ʃy)
byx = 2
nƩ x −¿ ¿

nƩxy−( Ʃx ) ( Ʃy)
bxy = 2
nƩ y −¿ ¿
Q. For the data.

X 1 2 3 4 5
Y 2 5 3 8 7
Solution:
we already know that in this question
n=5, ƩX=15, ƩY=25, ƩX2=55, ƩY2=151, ƩXY=88, r = 0.80

nƩxy – ( Ʃx )( Ʃy ) nƩxy−( Ʃx ) ( Ʃy)


byx = 2
nƩ x – ¿ ¿
bxy = 2
nƩ y −¿ ¿

5(88)−( 15 ) (25) 5(88)−( 15 ) (25)


byx = 5(55)−¿ ¿ bxy = 5( 151)−¿ ¿

440−375 65 65 65
byx = 275−225 = 50 = 1.3 bxy = 755−625 = 130 = 0.5

r = √ b yx∗b xy
r = √ 1.3∗0.5
r=0.80
0.8=0.80

Q. For the data given below, show that

rxy= ruv; where u= X-69, v=Y-112

X Y XY X2 Y2 U V U2 V2 UV
78 125 9750 6084 15625 9 13 81 169 117
89 137 12193 7921 18769 20 25 400 625 500
97 156 15132 9409 24336 28 44 784 1936 1232
69 112 7728 4761 12544 0 0 0 0 0
59 107 6313 3481 11449 -10 -5 100 25 50
79 136 10744 6241 18496 10 24 100 576 240
68 123 8364 4624 15129 -1 11 1 121 11
61 108 6588 3721 11664 -8 -4 64 16 32
TOT 600 1004 76812 46242 128012 48 108 1530 3468 2160
AL
nƩxy−( Ʃx ) ( Ʃy) nƩuv−( Ʃu ) (Ʃv)
rxy = √ ¿ ¿¿
ruv = √¿ ¿ ¿

8(76812)−( 600 ) (1004) 8(2160)−( 48 ) (108)


r= √¿¿¿
ruv = √¿¿¿

r = 0.96 ruv = 0.96

Q. Find the rank correlation.

Rank S
a b X Y d= X-Y d2
7.4 8.5 3 2 1 1
9.0 6.1 2 4 -2 4
11.0 2.4 1 6 -5 25
2.5 6.7 6 3 3 9
4.6 12.6 5 1 4 16
6.5 3.3 4 5 -1 1
Ʃd2=56

2
6Ʃd
rs = 1 - 2
n(n −1)

6(56)
rs = 1 - 6(6 2−1)

336
rs = 1 - 210
rs = -0.60
Rank Correlation for tied values
Q. Find rank Correlation

Ranks
X Y a b d= a-b d2
3+4
50 90 2
= 3.5 8 -4.5 20.25
6+7
43 95 5 2
= 6.5 -1.5 2.25
50 112 3.5 4 -0.5 0.25
6+7+ 8 2+ 3
40 120 3
= 7 2
= 2.5 4.5 20.25
60 95 1 6.5 -5.5 30.25
40 170 7 1 6 36
40 100 7 5 2 4
55 120 2 2.5 -0.5 0.25
Ʃd2=113.5

Ʃ (t – t)
1
T= 12
3
i

1
T= 12 [(2 – 2) + (3 – 3) + (2 – 2) + (2 – 2)]
3 3 3 3

1
T= 12 [ 6+24+6+6]
T = 3.5

Adj Ʃd = Ʃd + T = 113.5 + 3.5 = 117


2
i
2
i
2
6 Ʃ Adj Ʃ d i
rs = 1 - n (n −1)
2

6(117)
rs = 1 - 8(8 2−1)
rs= 0.35

You might also like