0% found this document useful (0 votes)

22 views36 pages

Unit 2 Notes

Uploaded by

Hongru Li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views36 pages

Unit 2 Notes

Uploaded by

Hongru Li

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

U2-1

Examining Relationships
In statistics, we often encounter problems which
involve more than one variable. We often want to
compare two (or more) different populations with
respect to the same variable.

We use tools such as side-by-side boxplots and

back-to-back stemplots to make comparisons
between the groups.
U2-2
Examining Relationships
Often, however, we wish to examine relationships
between several variables for the same population.

When we are interested in examining the relationship

between two variables, we may find ourselves in one
of two situations:
§ We may simply be interested in the nature of the
relationship.
§ One of the variables may be thought to explain or
cause changes in the other.

U2-3
Explanatory and Response Variables
In this second case, one of the variables is an
explanatory variable (which we denote by X) and
the other is a response variable (denoted by Y).

A response variable takes values representing the

outcome of a study, while an explanatory variable
helps explain this outcome.
U2-4
Example
Does the caffeine in coffee really help keep you
awake? Researchers interviewed 300 adults and
asked them how many cups of coffee they drink on
an average day, as well as how many hours of sleep
they get at night.

The response variable Y is the hours of sleep, while

the explanatory variable X is number of cups of
coffee per day.

U2-5
Example
Are students who excel in English also good at
mathematics, or are most people strictly left- or
right-brained? A psychology professor at a university
locates 450 students who have taken the same
introductory English course and the same Math
course and compares their percentage grades in the
two courses at the end of the semester.

In this case, there is no explanatory or response

variable; we are simply interested in the nature of
the relationship.
U2-6
Graphical Displays
The best way to display the relationship between two
quantitative variables is with a scatterplot.

A scatterplot displays the values of two different

quantitative variables measured on the same
individuals. The data for each individual (for both
variables) appears as a single point on the scatterplot.
If there is an explanatory and a response variable, they
should be plotted on the x- and y-axes, respectively.
Otherwise the choice of axes is arbitrary.

U2-7
Example
Consider the relationship between a country’s fertility
rate (average number of children per adult female) and
life expectancy (average lifespan for its citizens). The
table on the following page gives the values for both
variables for a sample of eight countries.
U2-8
Example
Country Fertility Rate Life Expectancy
Bangladesh 3.2 62.5
Brazil 1.9 71.9
Columbia 2.5 72.3
Haiti 4.9 53.2
Italy 1.3 79.8
Lithuania 1.2 74.4
Pakistan 4.0 63.4
Rwanda 5.4 47.3

U2-9
Example
The scatterplot for these data is shown below:
90

80
Life Expectancy

40
0 1 2 3 4 5 6
Fertility Rate
U2-10
Examining Relationships
We look for four things when examining a scatterplot:

1) Direction
§ In this case, there is a negative association
between the two variables. An above-average
value of Fertility Rate tends to be accompanied
by a below-average value of Life Expectancy,
and vice-versa. If the pattern of points slopes
upward from left to right, we say there is a
positive association.

U2-11
Examining Relationships
2) Form
§ A straight line would do a fairly good job
approximating the relationship between the two
variables. It is therefore not unreasonable to
assume that these two variables share a linear
relationship.
U2-12
Examining Relationships
3) Strength
§ The strength of the relationship is determined by
how close the points lie to a simple form such as
a straight line. In our example, if we draw a line
which roughly approximates the relationship
between the two variables, all points will fall
quite close to the line. As such, the linear
relationship is quite strong.

U2-13
Examining Relationships
3) Strength (cont’d)
§ Not all relationships are linear in form. They
can be quadratic, logarithmic or exponential, to
name a few. Sometimes the points appear to be
“randomly scattered”, in which case many of
them will fall far from a line used to approximate
the relationship. In this case, we say the linear
relationship between the two variables is weak.
U2-14
Examining Relationships
4) Outliers
§ There are several types of outliers for bivariate
data. An observation may be outlying in either
the x- or y-directions (or both). Another type of
outlier occurs when an observation simply falls
outside the general pattern of points, even if it is
extreme in neither the x- nor y-directions. Some
types of outliers have more of an impact on our
analysis than others, as we will discuss shortly.

U2-15
Strength of Linear Relationship
The STAT 1000 and STAT 2000 percentage grades for
a sample of students who have taken both courses are
displayed in the scatterplot below:
100
90
80
70
STAT 2000

60
50
40
30
20
10
40 50 60 70 80 90 100
STAT 1000
U2-16
Strength of Linear Relationship
The scatterplot shows a moderately strong positive
linear relationship. Does the relationship for the data
in the following scatterplot appear stronger?
140

120

100
STAT 2000

0
0 20 40 60 80 100 120 140
STAT 1000

U2-17
Strength of Linear Relationship
It might, but these are the same data; the scatterplots
are just constructed with different scales!
100 140
90 120
80
100
70
STAT 2000

STAT 2000

60 80
50 60
40
40
30
20 20
10 0
40 50 60 70 80 90 100 0 20 40 60 80 100 120 140
STAT 1000 STAT 1000
U2-18
Strength of Linear Relationship
This example shows that our eyes are not the best
tools to assess the strength of relationship between two
variables.

Can we find a numerical measure that will give us a

concrete description of the strength of a linear
relationship between two quantitative variables?

The measure we use is called correlation.

U2-19
Correlation
The correlation r measures the direction and strength
of a linear relationship between two quantitative
variables.

Suppose the values of two quantitative variables X

and Y have been measured for n individuals. Then
1 n æ xi - x öæ yi - y ö
r= åç ÷çç ÷
n - 1 i =1 è sx øè s y ÷ø
1 n
= å ( xi - x )( yi - y )
(n - 1) sx s y i =1
U2-20
Correlation
We will use the second version of the formula, which
is computationally simpler. To calculate the
correlation r:
(i) Calculate x , y , sx and sy
(ii) Calculate the deviations xi - x and yi - y
(iii) Multiply the corresponding deviations for x and y
( xi - x )( yi - y ) n
(iv) Add the n products å ( xi - x )( yi - y )
i =1
(v) Divide by (n – 1)sxsy
1 n
r= å ( xi - x )( yi - y )
(n - 1) sx s y i =1

U2-21
Correlation
For the Fertility Rate and Life Expectancy example,
(i) x = 3.05, y = 65.6, sx = 1.60, s y = 11.13
xi yi (ii) xi - x yi - y (iii) ( xi - x )( yi - y )
3.2 62.5 0.15 – 3.1 – 0.465
1.9 71.9 – 1.15 6.3 – 7.245
2.5 72.3 – 0.55 6.7 – 3.685
4.9 53.2 1.85 – 12.4 – 22.940
1.3 79.8 – 1.75 14.2 – 24.850
1.2 74.4 – 1.85 8.8 – 16.280
4.0 63.4 0.95 – 2.2 – 2.090
5.4 47.3 2.35 – 18.3 – 43.005
sum = 0 sum = 0 (iv) sum = – 120.56
U2-22
Correlation
(v) r = 1 n
å ( xi - x )( yi - y )
(n - 1) sx s y i =1
1
= (-120.56) = – 0.9671
7(1.60)(11.13)

U2-23
Association vs. Causation
We must be careful when interpreting correlation.
Despite the very strong negative correlation, we
cannot conclude that having more children causes
a shorter life expectancy.

There are many other variables that could help explain

the strong relationship between Fertility Rate and Life
Expectancy. One such variable is the Wealth of a
country.
U2-24
Association vs. Causation
Women in richer industrialized countries tend to
have fewer children than women in poor third-world
countries. We also know that life expectancy is higher
in richer countries, because of better health care,
education, resources, etc.

The Wealth of a country in this example is known as a

lurking variable. A lurking variable is one that helps
explain the relationship between variables in a study,
but which is not itself included in the study.

U2-25
Association vs. Causation
Regardless of the existence of identifiable lurking
variables, we must remember that correlation
measures only the linear association between two
quantitative variables. It gives us no information
about the causal nature of the relationship.

Association does not imply causation!

U2-26
Correlation
Some properties of correlation:
§ Positive values of r indicate a positive association
and negative values indicate a negative association.
§ r falls between –1 and 1, inclusive. Values of r
close to –1 or 1 indicate a strong linear association
(negative or positive, respectively). A correlation
of –1 or 1 is obtained only in the case of a perfect
linear relationship; i.e., when all points fall on a
straight line. Values of r close to zero indicate a
weak linear relationship.

U2-27
Correlation
Some properties of correlation (cont’d):
§ r has no units.
§ The correlation makes no distinction between X
and Y. As such, an explanatory and response
variable are not necessary.
§ Changing the units of X and Y has no effect on the
correlation. i.e., It doesn’t matter if we measure a
variable in pounds or kilograms, feet or meters,
dollars or cents, etc.
U2-28
Correlation
Some properties of correlation (cont’d):
§ r measures only the strength of a linear
relationship. In other cases, it is a useless measure.
§ Because the correlation is a function of several
measures that are affected by outliers, r is itself
strongly affected by outliers.

U2-29
Regression
When a relationship appears to be linear in nature,
we often wish to estimate this relationship between
variables with a single straight line.

A regression line is a straight line that describes how

a response variable Y changes as an explanatory
variable X changes. This line is often used to predict
values of Y for given values of X.
U2-30
Regression
Note that with correlation, we didn’t require a
response variable and an explanatory variable.

Usually in regression, we have an explanatory

variable X and a response variable Y, but this need
not be the case.

U2-31
Regression
Given a value of X, we would like to predict the
corresponding value of Y. Unless there is a perfect
relationship, we won’t know the exact value of Y,
because Y is a variable.
U2-32
Regression
We will use a sample to estimate the true relationship
between the two variables. Our estimate of the true
line is
yˆ = b0 + b1 x

ŷ is the predicted value of Y for a given value of X.

b0 is the intercept of the line and b1 is the slope.

We will use this regression line to make our

predictions.

U2-33
Regression
We would like to find the line that fits our data the
best. That is, we need to find the appropriate values
of b0 and b1.

But there are infinitely many possible lines. Which

one is the “best” line?

Since we are using X to predict Y, we would like the

line to lie as close to the points as possible in the
vertical direction.
U2-34
Regression
The line we will use is the line that minimizes the
sum of squared deviations in the vertical direction:
n
å ( yi - yˆ i )
2

i =1
20

15
yi

10 ŷi
Y

0
0 2 4 6 8 10
X

U2-35
Least Squares Regression
The values of b0 and b1 that give us the line that
minimizes this sum of squared deviations are:

sy
b1 = r and b0 = y - b1 x
sx

The line yˆ = b0 + b1 x is called the least squares

regression line, for obvious reasons.
U2-36
Least Squares Regression
The slope of the regression line, b1, is defined as the
predicted increase in y when x increases by one unit.
20
D yˆ
15 b1 =
Dx
10 D ŷ
Y

5
Dx
0
0 2 4 6 8 10
X

U2-37
Least Squares Regression
The intercept of the regression line, b0 , is defined as
the predicted value of y when x = 0.
20

10
Y

5
b0
0
0 2 4 6 8 10
X
U2-38
Least Squares Regression
Some variability in Y is accounted for by the fact that,
as X changes, it pulls Y along with it. The remaining
variation is accounted for by other factors (which we
usually don’t know).

The value of r2 has a special meaning in least-squares

regression. It is the fraction of variation in Y that is
accounted for by its regression on X.

U2-39
Least Squares Regression
If r = –1 or 1, then r2 = 1. That is, we can predict Y
exactly for any value of X, as regression on X
accounts for all of the variation in Y.

If r = 0, then r2 = 0, and so regression on X tells us

nothing about the value of Y.

Otherwise, r2 is between 0 and 1.

U2-40
Example
Fishermen are concerned that the pollution from a pulp and paper plant near
a river is contaminating the river’s fish with mercury, which makes the fish
unsafe to eat. They would like to use the age of a fish to predict the
mercury concentration in its system. The age of a fish is a good predictor
for mercury concentration, as older fish have been exposed to the mercury
longer and will have higher levels of contamination.

It is impossible to measure the age of a fish, so it is decided that the length

of the fish will be used as the explanatory variable, as length and age are
highly correlated.

A sample of fish will be taken from the river and a regression will be run to
predict the mercury concentration of fish that will be caught in the future.
This must be done because a fish cannot be sold or eaten after it has been
tested for mercury.

U2-41
Example
A sample of ten fish is collected and the lengths X (in inches)
and mercury concentrations Y are as follows:

X 5.5 6.1 6.7 7.0 7.5 7.9 8.6 9.2 9.8 10.3
Y 0.11 0.19 0.24 0.37 0.36 0.49 0.59 0.60 0.81 0.78
U2-42
Example
The scatterplot for these data is shown below:
1

0.8
Concentration

0.6

0.4

0.2

0
5 6 7 8 9 10 11
Length

U2-43
Example
We see a strong positive linear relationship between
Length and Concentration. From the data, we
calculate
x = 7.860, y = 0.454, sx = 1.597, s y = 0.241, r = 0.985

And so
sy 0.241 ö
b1 = r = 0.985 æç ÷ = 0.149
sx è 1.597 ø

b0 = y - b1 x = 0.454 - 0.149 (7.860) = - 0.717

U2-44
Example
The equation of the least squares regression line is
therefore yˆ = - 0.717 + 0.149 x . The line is shown on
the scatterplot below:
1

0.8
Concentration
0.6

0.4

0.2

0
5 6 7 8 9 10 11
Length

U2-45
Example
The slope b1 = 0.149 tells us that, as the length of a
fish increases by one inch, we predict the mercury
concentration to increase by 0.149 ppm.
The intercept b0 = – 0.717 is statistically meaningless
in this case. A fish cannot have a length of 0 inches,
and a negative concentration is impossible .

We also see that r2 = (0.985)2 = 0.97, which tells us

that 97% of the variation in mercury concentration is
accounted for by its regression on the length of a fish.
U2-46
Example
We can now use this line to predict the mercury
concentration for a fish of a given length.

To do this, we simply plug in the length of a fish into

the equation of the least-squares regression line. For
example, the predicted mercury concentration for a
fish that is 7.0 inches long is

ŷ = - 0.717 + 0.149 (7.0) = 0.326 ppm

U2-47
Example
We call this the predicted value of Y when X = 7.

0.8
Concentration

0.6

0.4

0.2

0
5 6 7 8 9 10 11
Length
U2-48
Residuals
Note that there is a fish in the sample that is 7.0 inches
long. How does the actual mercury concentration for
this fish compare with the predicted concentration?
y4 - yˆ 4 = 0.37 - 0.326 = 0.044 ppm

The value yi - yˆ i is called the residual for the ith

observation. The residual for any value of X reflects
the error of our prediction.

U2-49
Residuals
residual = actual value of y – predicted value of y
1

0.8
Concentration

0.6

0.4 actual
predicted
0.2

0
5 6 7 8 9 10 11
Length
U2-50
Residuals
A positive residual indicates that an observation falls
above the regression line and a negative residual indicates
that it falls below the line. As an example, check that
the residual for the 9.2 inch fish in the sample is equal
to – 0.0538.
Note that it was in fact the sum of squared residuals that is
minimized in calculating the least squares regression line.
What if we want to predict the mercury concentration for a
fish that is 12 inches long? Our prediction is

yˆ = - 0.717 + 0.149 (12) = 1.071 ppm

U2-51
Extrapolation
Mathematically, there is no problem with making this
prediction. However, there is a statistical problem.
Our range of values for X is from 5.5 to 10.3 inches.
We have good evidence of a linear relationship within
this range of values. However, we have no fish in our
sample as long as 12 inches, and so we have no idea
whether this relationship continues to hold outside our
range of data.
The process of predicting a value of Y for a value of X
outside our range of data is known as extrapolation, and
should be avoided if at all possible.
U2-52
Outliers
We have seen that an outlier can be defined as a point
that is far from the other data points in the x-direction
or the y-direction, or if it falls outside the general
pattern of points.

We now examine the effect of each of these three

types of outliers.

U2-53
Outliers
Point # 1 is an outlier in the y-direction. It generally
has little effect on the regression line.
14
#1
12

8
Y

0
0 5 10 15
X
U2-54
Outliers
Point # 2 is not an outlier in either the x- or
y-directions, but falls outside the pattern of points.
14

8
Y

2 #2
0
0 5 10 15
X

U2-55
Outliers
A bivariate outlier such as this generally has little
effect on the regression line.
14

8
Y

2 #2
0
0 5 10 15
X
U2-56
Outliers
Point # 3 is an outlier in the x-direction. It has a
strong effect on the regression line.
14

8
Y

2
#3

0
0 5 10 15
X

U2-57
Influential Observations
An observation is called influential if removing it
from the data set would dramatically alter the position
of the regression line (and the value of r2).

In the above illustration, Point # 3 is an influential

observation, which is often the case for outliers in the
x-direction.
U2-58
Outliers
In our example, suppose the length of the tenth fish in
our sample had been 16 inches instead of 10.3 inches.
The equation of the regression line changes to
yˆ = - 0.102 + 0.066 x
In addition, the value of r2 reduces to 0.663. The
outlying value has had a large effect on the equation
of the line and the value of r2.

U2-59
Outliers
We see that, with the outlier included, the regression
line is a less accurate description of the relationship.
1

LSR line with

0.8
outlier excluded
Concentration

0.6
LSR line with
outlier included
0.4

0.2

0
6 8 10 12 14 16
Length
U2-60
Least Squares Regression
One property of the least squares regression line is
that it always passes through the point ( x , y ).
Consider our previous example for the regression of
Concentration vs. Length of a fish. The mean length
of the fish in the sample was 7.860 inches. The
predicted value for a fish of this length is
yˆ = - 0.717 + 0.149 (7.860) = 0.454

which is exactly equal to the mean mercury

concentration of the fish in the sample.

U2-61
Association vs. Causation
Recall our discussion of association vs. causation.
The former does not imply the latter. In the fish
example, there was a strong positive relationship
between the length of a fish and its mercury
concentration. However, this doesn’t mean that
getting longer causes a fish’s mercury concentration
to increase. In fact, we know in this case that it is the
age of a fish that causes its concentration to increase,
as it has been exposed to the mercury for a longer
period of time. As such, the age of a fish is a lurking
variable.
U2-62
Experiment vs. Observational Study
The best way to avoid lurking variables is to perform
an experiment rather than an observational study.

In an experiment, the value of the explanatory

variable is randomly “assigned” to the sample units,
rather than being simply observed prior to the study.

For example, consider the issue of smokers and

weight. Does smoking cause weight gain?

U2-63
Experiment vs. Observational Study
If we simply ask people how much they smoke (say,
in number of cigarettes per week) and measure their
weight, we might see a positive correlation between
the two variables. But this does not imply that
smoking causes weight gain. (In fact, many people
believe smoking causes weight loss!)

There are other lurking variables that we are not

considering. For example, a person who smokes is
generally more likely to have other unhealthy habits
such as eating a bad diet or not exercising as much as
they should.
U2-64
Experiment vs. Observational Study
To eliminate these effects, we could perform an
experiment where we actually assign a group of
former non-smokers a certain number of cigarettes to
start smoking each week. After a given amount of
time, we can measure each person’s weight gain (or
loss) and calculate the correlation and the regression
line. If we still see a positive association, we can then
say that smoking does in fact cause weight gain.

U2-65
Experiment vs. Observational Study
The reason for this is that we have diversified away
the similarities within groups of similar smoking
habits with respect to all possible lurking variables.

For example, some people with bad diets will be

assigned to smoke many cigarettes per week and
some will be assigned to smoke very few.
U2-66
Experiment vs. Observational Study
This example provides a good illustration that it is not
always possible to perform an experiment rather than
an observational study. It is not realistic to expect to
find a group of non-smokers willing to start smoking
(much less the amount of cigarettes we tell them to
smoke!).

Note however that this doesn’t mean observational

studies are “bad”. We must just remember that
association does not imply causation!

U2-67
Categorical Variables on a Scatterplot
Sometimes a scatterplot may actually be displaying
two or more distinct relationships.
For example, the Average Driving Distance X and
the Average Score Y are recorded for a sample of
professional golfers. (A “drive” is a golfer’s first
shot on a golf hole).
U2-68
Categorical Variables on a Scatterplot
The data are plotted on the scatterplot below. The
relationship does not appear to be linear, but…..

73
Score

70
220 230 240 250 260 270 280 290 300
Distance

U2-69
Categorical Variables on a Scatterplot
This scatterplot is actually displaying two distinct
linear relationships, one for male golfers and one for
female golfers.

73
Score

70
220 230 240 250 260 270 280 290 300 310
Distance
U2-70
Categorical Variables on a Scatterplot
This example illustrates that we should be careful
when examining a relationship to make sure that the
data belong to only one population.

In this case, a separate regression line should be fit to

the data for the male and female golfers.

Unit 2 Notes 1pp
No ratings yet
Unit 2 Notes 1pp
79 pages
Analyzing Categorical Variable Relationships
No ratings yet
Analyzing Categorical Variable Relationships
14 pages
Stat215 Test 2
No ratings yet
Stat215 Test 2
18 pages
Correg
No ratings yet
Correg
19 pages
Pho341 Lecture 9 Spring 2025
No ratings yet
Pho341 Lecture 9 Spring 2025
22 pages
Correlation & Regression Analysis Guide
No ratings yet
Correlation & Regression Analysis Guide
5 pages
Econometrics Lectures
No ratings yet
Econometrics Lectures
240 pages
Introduction To Correlation Analysis GB6023 2012
No ratings yet
Introduction To Correlation Analysis GB6023 2012
34 pages
Pearson R
No ratings yet
Pearson R
25 pages
L3 Bivariate Worksheet Answers
No ratings yet
L3 Bivariate Worksheet Answers
25 pages
GER1000 Quantitative Reasoning - Help Sheet
No ratings yet
GER1000 Quantitative Reasoning - Help Sheet
2 pages
Statistics & Probability Q4 - Week 7-8
No ratings yet
Statistics & Probability Q4 - Week 7-8
15 pages
Chapter2-ESTA3042 2020S2
No ratings yet
Chapter2-ESTA3042 2020S2
80 pages
Math Data Analysis for Educators
No ratings yet
Math Data Analysis for Educators
69 pages
L3 Bivariate Worksheet
No ratings yet
L3 Bivariate Worksheet
25 pages
Research Paper
No ratings yet
Research Paper
20 pages
FSS 204 Lecture 6
No ratings yet
FSS 204 Lecture 6
2 pages
Bivariate Statistics in Business Journalism
No ratings yet
Bivariate Statistics in Business Journalism
16 pages
Module9-Correlation and Regression (Business)
No ratings yet
Module9-Correlation and Regression (Business)
15 pages
Understanding Bivariate Data Relationships
No ratings yet
Understanding Bivariate Data Relationships
21 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
34 pages
Lesson 11 Pearsons R
No ratings yet
Lesson 11 Pearsons R
12 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
37 pages
Unit 1
No ratings yet
Unit 1
24 pages
Lectures 5 6 - Correlation Analysis
No ratings yet
Lectures 5 6 - Correlation Analysis
29 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
102 pages
IPS7e LecturePPT ch02
No ratings yet
IPS7e LecturePPT ch02
105 pages
Lecture 5 - Correlation
No ratings yet
Lecture 5 - Correlation
48 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
Measures of Association
No ratings yet
Measures of Association
14 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Algebra Lesson 2 Homework Key
No ratings yet
Algebra Lesson 2 Homework Key
2 pages
Presidential Approval & Election Outcome
No ratings yet
Presidential Approval & Election Outcome
17 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Correlation
No ratings yet
Correlation
20 pages
Lecture 05
No ratings yet
Lecture 05
20 pages
Bi Variate 1
No ratings yet
Bi Variate 1
75 pages
Quantitative Analysis in Social Sciences: An Brief Introduction For Non-Economists
No ratings yet
Quantitative Analysis in Social Sciences: An Brief Introduction For Non-Economists
26 pages
Regression Analysis for Ticket Sales
No ratings yet
Regression Analysis for Ticket Sales
58 pages
The Correlational Research Strategy - 1
No ratings yet
The Correlational Research Strategy - 1
28 pages
Correlation
No ratings yet
Correlation
9 pages
Chapter 2 - EDA - Examining Relationships
No ratings yet
Chapter 2 - EDA - Examining Relationships
40 pages
p10 PDF
No ratings yet
p10 PDF
30 pages
Simple Regression and Correlation Analysis
100% (2)
Simple Regression and Correlation Analysis
27 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Correlation D 17
No ratings yet
Correlation D 17
8 pages
CORRELATION
No ratings yet
CORRELATION
4 pages
Examining Relationships: Quantitative Data: Concepts in Statistics
No ratings yet
Examining Relationships: Quantitative Data: Concepts in Statistics
21 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Understanding Correlation in Research
No ratings yet
Understanding Correlation in Research
5 pages
Lecture SLR
No ratings yet
Lecture SLR
60 pages
The Nature of Regression Analysis
No ratings yet
The Nature of Regression Analysis
20 pages
Correlation
No ratings yet
Correlation
24 pages
Quantitative Methods
No ratings yet
Quantitative Methods
4 pages
Module Week 10 11 Biostat Lec
No ratings yet
Module Week 10 11 Biostat Lec
17 pages
Chapter 3 - Regression
No ratings yet
Chapter 3 - Regression
8 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
17 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
26 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
64 pages
Unit 5 Addendum
No ratings yet
Unit 5 Addendum
1 page
Patient Satisfaction Analysis
No ratings yet
Patient Satisfaction Analysis
7 pages
Ancova Reading
No ratings yet
Ancova Reading
20 pages
Ch7slides Polynomial Regression
No ratings yet
Ch7slides Polynomial Regression
32 pages
OLS Estimation and Monte Carlo Simulation: Introductory Notes and Comments
No ratings yet
OLS Estimation and Monte Carlo Simulation: Introductory Notes and Comments
38 pages
E:/Data Kuliah/Olah Data KERJAAN/SKRIPSI 49/lampiran Olah Data/OUTPUT SPSS/Path Amos - Amw
No ratings yet
E:/Data Kuliah/Olah Data KERJAAN/SKRIPSI 49/lampiran Olah Data/OUTPUT SPSS/Path Amos - Amw
9 pages
Examples: Mixture Modeling With Cross-Sectional Data
No ratings yet
Examples: Mixture Modeling With Cross-Sectional Data
56 pages
Applied Linear Statistical Model
No ratings yet
Applied Linear Statistical Model
111 pages
Mostly Harmless Econometrics Notes Part 1
No ratings yet
Mostly Harmless Econometrics Notes Part 1
3 pages
MATH 524 Nonparametric Statistics
No ratings yet
MATH 524 Nonparametric Statistics
16 pages
Ch.2 The Simple Regression Model
No ratings yet
Ch.2 The Simple Regression Model
6 pages
(Activity 1) - RAMOS, Mark Jerald B. - BSMA 2-9
No ratings yet
(Activity 1) - RAMOS, Mark Jerald B. - BSMA 2-9
2 pages
Chap 3 - SEM - Simultaneous Equations
No ratings yet
Chap 3 - SEM - Simultaneous Equations
52 pages
Smple Linear Rssion
No ratings yet
Smple Linear Rssion
10 pages
Mirror Therapy Impact on Post-Stroke Muscle Strength
No ratings yet
Mirror Therapy Impact on Post-Stroke Muscle Strength
5 pages
Chapter 3 - Basic Biostatistics
No ratings yet
Chapter 3 - Basic Biostatistics
12 pages
Econometrics II Chapter THREE 2024
No ratings yet
Econometrics II Chapter THREE 2024
53 pages
Credit Risk Classification Analysis
No ratings yet
Credit Risk Classification Analysis
16 pages
Regression
No ratings yet
Regression
50 pages
Applied Regression Analysis Course Overview
No ratings yet
Applied Regression Analysis Course Overview
103 pages
Code Diamond
No ratings yet
Code Diamond
6 pages
STAT 141 Statistics Formula Sheet
No ratings yet
STAT 141 Statistics Formula Sheet
2 pages
Bayesian Optimization for SampEn Hyperparameters
No ratings yet
Bayesian Optimization for SampEn Hyperparameters
27 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
Faddoul 2015
No ratings yet
Faddoul 2015
11 pages
Mock Exam Chap1To7 2
No ratings yet
Mock Exam Chap1To7 2
5 pages
2017 - End Spring Semester 2016-17 - Mathematics - MA60056 - Regression and Time Series Models
No ratings yet
2017 - End Spring Semester 2016-17 - Mathematics - MA60056 - Regression and Time Series Models
2 pages
Chapter 9-Correlation and Regression
No ratings yet
Chapter 9-Correlation and Regression
23 pages
Bayesian Interval Data Modeling
No ratings yet
Bayesian Interval Data Modeling
20 pages
Regression PPT Lecture
No ratings yet
Regression PPT Lecture
49 pages
Final+Project 2
No ratings yet
Final+Project 2
2 pages

Unit 2 Notes

Uploaded by

Unit 2 Notes

Uploaded by

U2-1

We use tools such as side-by-side boxplots and

When we are interested in examining the relationship

A response variable takes values representing the

The response variable Y is the hours of sleep, while

In this case, there is no explanatory or response

A scatterplot displays the values of two different

Can we find a numerical measure that will give us a

The measure we use is called correlation.

Suppose the values of two quantitative variables X

There are many other variables that could help explain

The Wealth of a country in this example is known as a

Association does not imply causation!

A regression line is a straight line that describes how

Usually in regression, we have an explanatory

ŷ is the predicted value of Y for a given value of X.

We will use this regression line to make our

But there are infinitely many possible lines. Which

Since we are using X to predict Y, we would like the

The line yˆ = b0 + b1 x is called the least squares

The value of r2 has a special meaning in least-squares

If r = 0, then r2 = 0, and so regression on X tells us

Otherwise, r2 is between 0 and 1.

It is impossible to measure the age of a fish, so it is decided that the length

b0 = y - b1 x = 0.454 - 0.149 (7.860) = - 0.717

We also see that r2 = (0.985)2 = 0.97, which tells us

To do this, we simply plug in the length of a fish into

ŷ = - 0.717 + 0.149 (7.0) = 0.326 ppm

The value yi - yˆ i is called the residual for the ith

yˆ = - 0.717 + 0.149 (12) = 1.071 ppm

We now examine the effect of each of these three

In the above illustration, Point # 3 is an influential

LSR line with

which is exactly equal to the mean mercury

In an experiment, the value of the explanatory

For example, consider the issue of smokers and

There are other lurking variables that we are not

For example, some people with bad diets will be

Note however that this doesn’t mean observational

In this case, a separate regression line should be fit to

You might also like