0% found this document useful (0 votes)
27 views1 page

Bivariate Data Analysis

Uploaded by

Emilia Fichter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views1 page

Bivariate Data Analysis

Uploaded by

Emilia Fichter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Bivariate Data: data ordered in pairs

(x,y), each representing values of two


variables for an individual.

Scatterplots: visual representation of


bivariate data, where each point
represents an ordered pair.

Correlation Coefficient :
Measures the strength and direction
of a linear relationship between two
variables. Key properties:

datavizcatalogue.com
Positive linear Strong linear Negative linear
association association association

Direction
r is not affected by the units of
measurement or which variable is
labeled
r only measures linear

Form
relationships and is sensitive to
outlier

Correlation vs. Causation:

Strength
A strong correlation does not imply
that one variable causes changes in
the other. There may be confounding
factors or coincidences...

The Least-Squares Regression Line


In a two-variables relationship, Using the Line for Prediction:
the least-squares regression Substitute a value for x
line summarizes the data with into the equation to predict y.
a straight line that best fits the The line always passes through
points by minimizing the sum the point of averages
of squared vertical distances
(residuals) between the
observed values and the line.
The regression line predicts
the average outcome for a
: slope, indicating the given x, not the effect of
predicted change in y changing x for a single
y for a one-unit increase in individual

: Intercept, the predicted


value of y when x=0 (only Module 4 Extrapolation:
meaningful if x=0 is within the Predictions should only be made
data range). Summarizing for x-values within the range of
Bivariate Data the data used to fit the line.
Outlier: A point far from Predicting for values
the general data pattern. Residual Plots: outside this range
A residual plot graphs (extrapolation) is unreliable
Influential point: An residuals versus x. because the linear
outlier that substantially relationship may not hold.
changes the regression
line if included or
If the plot shows no pattern,
excluded.
a linear model is appropriate.
If a pattern (such as curvature) is
The regression line
present, the relationship is not
should be
linear and a straight line is not
computed
appropriate.
both with and
without such
points Coefficient of Determination:
to assess is the square of the correlation coefficient.
their impact. It represents the proportion of the variance in the outcome
variable explained by the regression line.

If close to 1: most of the variation is explained by the model


If close to 0: little is explained by the model

Statistically speaking, your brain gets


stronger with every calculation!

You might also like