0% found this document useful (0 votes)
98 views10 pages

Understanding Correlation and Regression

Correlation and regression are statistical methods used to determine the relationship between variables. Correlation determines whether a relationship exists, while regression describes the nature of the relationship as positive or negative, linear or nonlinear. A scatter plot graphs the independent and dependent variables. The correlation coefficient measures the strength and direction of the linear relationship between two variables. Regression finds the equation of the line of best fit to the data using the least squares method.

Uploaded by

Reanne Mediano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views10 pages

Understanding Correlation and Regression

Correlation and regression are statistical methods used to determine the relationship between variables. Correlation determines whether a relationship exists, while regression describes the nature of the relationship as positive or negative, linear or nonlinear. A scatter plot graphs the independent and dependent variables. The correlation coefficient measures the strength and direction of the linear relationship between two variables. Regression finds the equation of the line of best fit to the data using the least squares method.

Uploaded by

Reanne Mediano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Correlation and Regression

Correlation and Regression

Libeeth B. Guevarra
Department of Mathematics and Natural
Sciences

August 31, 2018

Data Management 1
Correlation and Regression

Correlation and Regression


Correlation is a statistical method used to
determine whether a relationship between
variables exists.
Regression is a statistical method used to
describe the nature of the relationship between
variables, that is, positive or negative, linear or
nonlinear.
A scatter plot is a graph of the ordered pairs
(x, y) of numbers consisting of the independent
variable x and the dependent variable y.
Data Management 2
Correlation and Regression

Example
Construct a scatter plot for the data shown for
car rental companies in City A for a recent year.

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 3
Correlation and Regression

The Correlation coefficient measures the


strength and direction of a linear relationship
between two variables.
The range of the correlation coefficient is from
−1 to +1.
Formula for the Correlation Coefficient r
P P P
n( xy ) − ( x)( y )
r=p P P P P
[n( x 2 ) − ( x)2 ][n( y 2 ) − ( y )2 ]

where n is the number of data pairs.

Data Management 4
Correlation and Regression

Example
Compute the correlation coefficient for the data:

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 5
Correlation and Regression

If the value of the correlation coefficient is


significant, the next step is to determine the
equation of the regression line, which is the
data’s line of best fit.
This enables the researcher to see the trend
and make predictions on the basis of the data.
The equation of the least-squares line for the
ordered pairs (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) is the
line

y − ȳ = m(x − x̄)

Data Management 6
Correlation and Regression

y − ȳ = m(x − x̄)
where:
x̄ = mean of variable x
ȳ = mean of variable y
m =slope of the line
P
xy − nx̄ ȳ
m=P 2
x − n(x̄)2

Data Management 7
Correlation and Regression

Example
Find the equation of the regression line for the
data

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 8
Correlation and Regression

Another formula for the Regression line


y = a + bx.
( y)( x 2 ) − ( x)( xy)
P P P P
a= P P
n( x 2 ) − ( x)2
P P P
n( xy ) − ( x)( y)
b= P P
n( x 2 ) − ( x)2
where a is the y intercept and b is the slope of the line.

Data Management 9
Correlation and Regression

The Coefficient of Determination is a measure


of the variation of the dependent variable that is
explained by the regression line and the
independent variable. The symbol for the
coefficient of determination is r 2 . If r = 0.90,
then r 2 = 0.81, which is equivalent to 81%. This
result means that 81% of the variation in the
dependent variable is accounted for by the
variations in the independent variable. The rest
of the variation, 0.19, or 19 %, is unexplained.

Data Management 10

You might also like