Regression
In studying relationships between two variables, collect data and create a scatter plot to assess the
nature of the relationship (positive, negative, curvilinear, or none). Compute the correlation coefficient
and test its significance. If significant, determine the regression line equation for making predictions
based on the data trend. Note: Creating a regression line when the correlation coefficient is not
significant is meaningless.
Example;
When predicting beyond available data (extrapolation), be cautious. In 1979, some thought the U.S.
would run out of oil by 2003 based on consumption and reserves, but new technologies, discoveries,
and changes in trends can alter predictions. For instance, the forecasted $10 gasoline price did not
happen. Predictions rely on current conditions or the assumption that trends persist, which may not
hold true in the future. Be careful when interpreting extrapolated predictions.
Here are the example of step 2 and 3
Check for outliers on a scatter plot, which are points that seem out of place. Some outliers, called
influential points, can impact the regression line. To identify an influential point, graph the regression
line with and without the point. If the second line shifts significantly, the point is influential. Use
judgment when deciding whether to include influential points in the analysis. If unnecessary, exclude
them; if important, consider adding nearby data values for a more accurate study.
Residual Plots
The values y-y’ are called residuals, these values can be plotted with the x values, and the plot called a
residual plot. It can be used to determine how well the regression line can be used to make predictions.
The residual plot shows that the regression line y’= 4.8 + 2.8x is somewhat questionable for making
predictions due to a small sample size.
Coefficient of Determination
The coefficient of determination is the ratio of the explained variation to the total variation and is
denoted by r2
The coefficient of determination is a measure of the variation of the dependent variable that is
explained by the regression line and the independent variable. The symbol for the coefficient of
determination is r2
Standard Error of the Estimate
denoted by sest, is the standard deviation of the observed y values about the predicted values. The
formula for the standard error of the estimate is
Prediction Interval
The standard error of the estimate can be used for constructing a prediction interval (similar to a
confidence interval) about a y value. When a specific value x is substituted into the regression equation,
the y that you get is a point estimate for y.