Regression and correlation
you will lear bow 0
«interpret seater diagrams foe Bivariate data
» eatte the equations ofthe lest ques eesion ines cluster tn etimate alae
* alclate and interpret he valve a the roductemoment carlton coefiant
SCATTER DIAGRAMS
sppoce you wiht lavigne he elaoosip beeen ro ail SH Fr example
= the weigheat doe end ofa spring tx) and she length of the spring 9),
= the weight ee Spnt dng or am xanax) and ae msi aclewed (9)
Teepe cate ima Pench test) and che mark in a Gorman ts (92,
— a af ese fa ane (2) and the average ng af eat af the ple OD,
vss esas wo varalenarckuews ox bivariate dts, When pai of vlus st let
prance diagram ix produced: Hese are some exatpes
Dependent and independent variables
one ofthe variables bs bem cont called the independent or exglaainy vars
Foor ease ithen the dependent ar response variable:For example if you place weights of 10g. 20 5,25 gs 30.535 8 50, etcon the end af a
Spring and recon the length ofthe mig foreach igh the weight is conroled soi is the
independent variable, The length ofthe spring ithe dependent variable
REGRESSION FUNCTION
oving drawn a seater diagram, you can then lock: for a mathematica! lationship between,
the wariablen = (a), where the Action f known asthe regression fanetioa, ito be
eter.
LINEAR CORRELATION AND REGRESSION LINES:
‘Consider the simples rye of regression function, where y= (x) i saight line.
eke points on te scater diagram appest tlie nea 4 seight ine, called repression ine
"you Would say that dere Binearcoerelation between x andy. Flere are some examples
Ponkive linear conrelation Negative inca correlation No correlation
teas one Yemen dhcrese Nenlaioonin
an ‘sine mien sand 7
Conon serve and eate ate needed when inteepreting seater diagrams,
{© Mathematical, there may appear tobe a relationship, bu dis does oc imply tba there
isa relaonship in reali. You mighe find, For example, that over period of time in a
patticulr ety there has en an nereace inte namber of eobberes and ax increase in
{he numberof health food shops. I woul however be foolish to imply that dete i 8
relationship between these two vatiables.
‘+ The appearance ofa matheinaccal telasionsip doer ot lnply tha tere i causal
Fela. An increase in one variable doesnot novesarily case an increase, or desea,
In the ater variable
ic appeas from the scatter gram dt linear relationship ies sensible interpretation, ou
ray den atempt eo find » medal forthe elatoaship in she Form
ff regression line. x
I previous work you may have drawn line of best fc onthe
seater diagram ttpting to craw its that thee are ae many
points above the line ssbelow ior as many points othe lft of
te ine as to the sight of it The line should also go through che
point (2,9), themes ofthe wo ses of dara
ineEy
al“ris meahod, ow a raring bye er bape Theresa wae 67 of
Fane i cegreion fine known. as th emethod of as squares and ht lhatsted in tt
following example
CGonaier he sivaton in whic the mast of chennai slated othe fag, x REE
on fe chemeal reaction has been taking place, according tthe tbl:
Se a
i ee
Mas ye Fo ae
“Those sesults can be Hastrated on a seater diagsem.
From the seater diagram there seems co be x positive
Tinvar greelation between the mass and the ime
-Theline of best fit must pass through the means of bot
Tarot daa, ue. the pot (9). You show find, by
erealotion, shar thi the polmt (12, 15.8) It has been
plored on the diagram.
roan ea
Diagrams | and 2 show attempts at lrawing the ine of best ft
bib se otunm
ee ogee?
tn each ofthe arenmps the doit ine aes through (and here are three Points howe the
Tec ah wo poums below it. Ye neither of thee Lines corre:
Diingeam 3 shows the srue ine of bet fi
ehas equation 4
Lage tae *
"This equation has been calculated by using che method
of leat squares 8 secre
‘ego‘Note thar the times, x, ae chosen by the person holding the stopwatch, 30 xis the
{ndependeot variable. The values ofthe mass, 7, depend onthe results ofthe chemical process
at these imesy therefore y isthe dependent varlible, IF yeu were to repeat the experiment with
the sume values of, you Would almost certainly gees differen stuf values af yo fora
‘ned value of you could have several diferent values of yal inthe sane vertical neon the
seater diagram,
Least squares regression line of y on x
fo ind the satan ofthe lent uquaessgeesson fine oy on foe the chemical experimen
sata consider verti dans My Ps daw fom each point othe regesson
ine These distances wil he pose or epatine seeding 0 whether he pot are abot Ot
below the ine eo tend work with te pee of thee ves aed conser tel my
omftnes mete mg +72 sorta way af weting chs is, where
Em iamptm2smitm | tm? fori=12.34,5
Aline tha is the dats well rome dha makes Eo an sal a pone, be it is dean so
thet Eb ened.
“Thisline ill she least squares regeesion line of y on.
Consider our three artempes at drawing the Line of best Sit. The vertical distanoes have teen
shown and you can see that Eni is lest in diagram 3.
oe |
Useful formulae when calculating regression lines
Before lacking show wfnd ve equation ofthe regrenion tn oe eid the
formulae
For the dt
“he mean of hes dao # where» = 2
tance in ily wine it nn a ea de von th et 8
could wrie Usually, however, when working in te context of regtesion and coreation,
the varance of the x doin le welten‘Remember that there ase alertatve Formans of the variance:
Let
hy RE
Lyi
org EY “F
For the x and y datar
Fee slants fg comnts the andy does al he formes
=
ze
1 1
Legeeniy = orto bar- ay"
The equation of the regression line y on
‘You are probably imilise with the equation of raihs [sein
the Foes
gemnse
‘aherem isthe gradient andl the y-ntecert
“Ween writing the equation ofthe repression line,» sighty
Foner fo usually ssed i which the ostane xm it
seer fore the terme and the fiers used are @ abd.
‘The Formats
yeatbs
where b isthe praliens and a the intsecept“Ta define the rgreason Line, you need to find the value of and B or «particular set of daca
“This is cone as follows:
For the regression line yon x waitten in the form
yravbe
the gradient, #, can he calculated as follows:
et
‘Note that b is known asthe eegression coeficent of y on x.
“Vo find a, wae the fact dha) is om she line,
Weyaasbe then j=a8R
Reacrging = - BE
“To find the equation of the regression line yon x forthe chernical experiment data
= z a
5 4 2 ”
2 n ° a“
2 4 m4 6
6 a 286 386
2 ™ 00 480
Pee Baye t16
“There are five pits of data, so = S
sae ter apa bttne-t2e oars
vareertarns
atta:
aoa = BR= 15.81.2207 12 ASO ou LAS AR pd
So the equation of the segreston line yom ey ~ 1.1 11.22%.Making predictions using the regression line y on x
"The regression line y-on x gives you the aberage value of y fora giren salut of so in certain
circumstances ican be sed to peedice or estimate missing vals This is awn as
Interpolatng from the given information
“The regression line y on xis wed
‘© when i the independent variable and you want to estimate y fora piven value of x, 0¢
‘You want to estimate x for 2 given value of y
‘9 when nceher variable fs cantolled snd you wane to estimate for a given value of
Far the chemical reaction data, in which x the independent variable, you ean ase the
rogtestion bine y— 11S + 1.220 co estimate (a) y when 3 10, (b) = when y = 20,5 follows
(a) The estimate of y when x © 10, written Js given by
JrMS4 1.22104 13.38
(b) The estimate forx when y~20, writen is wiven by
w= 11se122¢
Lang= 188s
154
‘Warning: you mus ake carey though, as estimating outside the range yor data
Lntliable, Por example, forthe chetal reaction dit, when the reactants have formed thie
rode, the reaction ceaves and the mass would not cntnue to inereaye, Going outside the
ange of davai know as everapolating From the given information,
[portant note Inthe situation here nether variable x controlled and you want to estimate
fora given value ofy, yo voll we «different segrerson ine, the Feast squares line x on 9.
"Yow sol lea use the regression line eon yy is the independent variable.
The equation of the regression line x on y
ste equation ofthe regrenion Line x on. fen wen in she Fors
wectdy whee eS dv
and =
‘Alan, since the ke goes through (9, the equation can be writes
soxediyos
dis known as the repression coefficient of = 00 7Example
"The fllawing data repre che length (4 and bends (9 of 12 uckoos etme
ilmetees.