0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression

The document explains correlation and regression analysis, focusing on the correlation coefficient which measures the linear relationship between two variables. It details how to perform regression analysis in Excel, including input ranges, output options, and interpretation of key statistics like R Square and significance levels. Additionally, it provides an example of interpreting regression results, demonstrating how to use coefficients for forecasting and assessing model accuracy through residuals.

Uploaded by

ankitakar3690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Correlation and Regression

The document explains correlation and regression analysis, focusing on the correlation coefficient which measures the linear relationship between two variables. It details how to perform regression analysis in Excel, including input ranges, output options, and interpretation of key statistics like R Square and significance levels. Additionally, it provides an example of interpreting regression results, demonstrating how to use coefficients for forecasting and assessing model accuracy through residuals.

Uploaded by

ankitakar3690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Correlation

Correlation between two variables shows how they are related and helps us understand
patterns, trends, and balance in data.

What Is the Correlation Coefficient?

The correlation coefficient is a statistical metric that quantifies the degree and direction of
the linear relationship between two variables, with values ranging from -1 to +1. A
coefficient of +1 indicates a perfect positive correlation, where increases in one variable are
exactly associated with increases in the other; a coefficient of -1 represents a perfect
negative correlation, where an increase in one variable corresponds to a proportional
decrease in the other; and a value of 0 suggests no linear relationship between the
variables.

In data analysis, this metric is crucial for understanding how closely predicted values align
with actual outcomes, which is especially important in fields like demand forecasting.
Business managers rely on accurate models—developed by studying historical data and
running regression analyses—to make informed decisions across production, marketing, and
procurement. Knowing the strength and direction of relationships between key variables
helps refine these models, ensuring that strategic decisions are based on reliable, data-
driven insights.

Three distinct approaches: manually calculating the coefficient step-by-step, using the built-
in PEARSON function, and leveraging the Data Analysis Toolpak.

The results from both the PEARSON and CORREL functions are identical, as they calculate
the same correlation coefficient. Since both functions compute the Pearson correlation
coefficient in the same way, they can be used interchangeably in Excel. Whether you use
PEARSON or CORREL, the outcome remains the same.

Whichever method you use answer will be the same.

Regression

Options in the Regression Dialog Box

 Input Y Range: The range of cells containing your dependent variable (the outcome
you want to predict).
 Input X Range: The range of cells containing your independent variable(s) (the
predictor(s)).

 Labels: Tick this if your data ranges include headers; this helps Excel label the output
correctly.

 Constant is Zero: Forces the regression line to go through the origin (sets the
intercept to zero). Usually left unchecked unless you have a specific reason.

 Confidence Level: Lets you set the confidence interval for the coefficients (default is
95%).

 Output Options: Choose where you want the output to appear (new worksheet, new
workbook, or a specific area in your current sheet).

 Residuals: Check this to display the residuals (differences between actual and
predicted values).

 Standardized Residuals: Shows residuals in standardized (z-score) form, useful for


identifying outliers.

 Residual Plots: Provides a plot of residuals to help check if your model fits well.

 Line Fit Plots: Shows a plot comparing actual vs. predicted values.

 Normal Probability Plots: Helps check if residuals are normally distributed (important
for some statistical assumptions).

Main elements of regression output

1. Regression Statistics

 Multiple R
This is a number between -1 and 1 that shows how closely your two variables move
together.

 If it’s 1, it means there is a perfect positive relationship: as X increases, Y


always increases at a constant rate.

 If it’s -1, it means a perfect negative relationship: as X increases, Y always


decreases at a constant rate.

 If it’s 0, there’s no relationship at all.

 R Square
This tells you what percent of the changes in your result (Y) can be explained by
changes in your input (X). For example, if R Square is 0.80, it means 80% of the
changes in Y are because of X, and the rest is due to other factors.
 Adjusted R Square
This is almost the same as R Square, but it adjusts for the number of variables in your
model. If you add more predictors (X’s), Adjusted R Square makes sure you’re not just
getting a higher score by adding extra, unnecessary variables. For simple regression
(only one X), it’s nearly the same as R Square.

 Standard Error
This tells you, on average, how far your predictions are from the actual results.
Smaller numbers mean your model is more accurate.

 Observations
This is just the number of data points you used in your analysis. A data point is one
row of information, like one person’s height and weight.

2. ANOVA Table

 df (Degrees of Freedom)
This is a technical term that basically counts how much information you have left to
estimate things after using up some for the model.

 SS (Sum of Squares)
This measures how much the data varies. There are two types:

 Regression SS: The part of the variation explained by your model.

 Residual SS: The part not explained by your model (the “leftover” or error).

 MS (Mean Square)
This is just SS divided by df. It’s a way to average out the variation for calculations.

 F
This is a number that compares how much of the variation your model explains to
how much it doesn’t. A bigger F means your model does a better job explaining the
data.

 Significance F (P-value for F)


This is a probability that tells you if your model is actually useful or if the results
could have happened by chance. If this number is less than 0.05, your model is
considered statistically significant (meaning it’s probably not just luck).

3. Coefficients Table

 Intercept
This is the starting value of Y when X is zero. It shows where your line crosses the Y-
axis.
 X Variable 1 (Slope)
This tells you how much Y will change if X goes up by 1. For example, if the slope is 2,
every time X increases by 1, Y increases by 2.

 Standard Error
This shows how sure we are about the slope and intercept. Smaller numbers mean
we’re more confident in these values.

 t Stat
This tells you how much bigger the estimated value (like the slope) is compared to its
standard error. A bigger t Stat means the value is more likely to be real and not just
due to random chance.

 P-value
This is the chance that the coefficient (like the slope) is actually zero (meaning X has
no effect on Y). If the P-value is less than 0.05, it means the effect is very likely real
and important.

 Lower 95% / Upper 95%


These give a range where the real value of the slope or intercept is likely to be, 95%
of the time.

4. Residuals (if you select them)

 Residuals
These are the differences between what your model predicted and what actually
happened for each data point. They help you see if your model is making big
mistakes or if there are outliers.

In short:
These numbers and tables help you see if your model fits the data well, if the relationship is
strong or weak, and if your results are meaningful or just due to chance.

Interpretation of the example done in class

This example teaches you how to run a linear regression analysis in Excel and how to
interpret the Summary Output.

R Square equals 0.962, which is a very good fit. 96% of the variation in Quantity Sold is
explained by the independent variables Price and Advertising. The closer to 1, the better the
regression line (read on) fits the data.

To check if your results are reliable (statistically significant), look at Significance F (0.001). If
this value is less than 0.05, you're OK. If Significance F is greater than 0.05, it's probably
better to stop using this set of independent variables. Delete a variable with a high P-value
(greater than 0.05) and rerun the regression until Significance F drops below 0.05.

The regression line is: y = Quantity Sold = 8536.214 -835.722 * Price + 0.592 * Advertising. In
other words, for each unit increase in price, Quantity Sold decreases by 835.722 units. For
each unit increase in Advertising, Quantity Sold increases by 0.592 units. This is valuable
information. You can also use these coefficients to do a forecast. For example, if price equals
$4 and Advertising equals $3000, you might be able to achieve a Quantity Sold of 8536.214 -
835.722 * 4 + 0.592 * 3000 = 6970.

The residuals show you how far away the actual data points are from the predicted data
points (using the equation). For example, the first data point equals 8500. Using the
equation, the predicted data point equals 8536.214 -835.722 * 2 + 0.592 * 2800 = 8523.009,
giving a residual of 8500 - 8523.009 = -23.009.

You might also like