Bivariate Data: data ordered in pairs
(x,y), each representing values of two
variables for an individual.
Scatterplots: visual representation of
bivariate data, where each point
represents an ordered pair.
Correlation Coefficient :
Measures the strength and direction
of a linear relationship between two
variables. Key properties:
datavizcatalogue.com
Positive linear Strong linear Negative linear
association association association
Direction
r is not affected by the units of
measurement or which variable is
labeled
r only measures linear
Form
relationships and is sensitive to
outlier
Correlation vs. Causation:
Strength
A strong correlation does not imply
that one variable causes changes in
the other. There may be confounding
factors or coincidences...
The Least-Squares Regression Line
In a two-variables relationship, Using the Line for Prediction:
the least-squares regression Substitute a value for x
line summarizes the data with into the equation to predict y.
a straight line that best fits the The line always passes through
points by minimizing the sum the point of averages
of squared vertical distances
(residuals) between the
observed values and the line.
The regression line predicts
the average outcome for a
: slope, indicating the given x, not the effect of
predicted change in y changing x for a single
y for a one-unit increase in individual
: Intercept, the predicted
value of y when x=0 (only Module 4 Extrapolation:
meaningful if x=0 is within the Predictions should only be made
data range). Summarizing for x-values within the range of
Bivariate Data the data used to fit the line.
Outlier: A point far from Predicting for values
the general data pattern. Residual Plots: outside this range
A residual plot graphs (extrapolation) is unreliable
Influential point: An residuals versus x. because the linear
outlier that substantially relationship may not hold.
changes the regression
line if included or
If the plot shows no pattern,
excluded.
a linear model is appropriate.
If a pattern (such as curvature) is
The regression line
present, the relationship is not
should be
linear and a straight line is not
computed
appropriate.
both with and
without such
points Coefficient of Determination:
to assess is the square of the correlation coefficient.
their impact. It represents the proportion of the variance in the outcome
variable explained by the regression line.
If close to 1: most of the variation is explained by the model
If close to 0: little is explained by the model
Statistically speaking, your brain gets
stronger with every calculation!