0% found this document useful (0 votes)
16 views5 pages

Multiple Linear Regression

Multiple Linear Regression (MLR) analyzes the impact of multiple explanatory variables on a single dependent variable, extending Simple Linear Regression (SLR) by estimating the effects of each variable explicitly. Key concepts include Adjusted R-squared, Standard Error, and multicollinearity, which can complicate interpretation of results. The document also provides examples, including CGPA and apartment price predictions, illustrating how MLR can yield different significance levels compared to SLR due to multicollinearity.

Uploaded by

gkearth8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views5 pages

Multiple Linear Regression

Multiple Linear Regression (MLR) analyzes the impact of multiple explanatory variables on a single dependent variable, extending Simple Linear Regression (SLR) by estimating the effects of each variable explicitly. Key concepts include Adjusted R-squared, Standard Error, and multicollinearity, which can complicate interpretation of results. The document also provides examples, including CGPA and apartment price predictions, illustrating how MLR can yield different significance levels compared to SLR due to multicollinearity.

Uploaded by

gkearth8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Multiple Linear Regression (MLR): Simplified &

Structured Notes

1. Introduction to Multiple Linear Regression (MLR)

Definition and Purpose

• MLR is used to study the impact of multiple explanatory variables on a single dependent
variable.
• It extends Simple Linear Regression (SLR), which involves only one explanatory variable.
• MLR helps analyze the individual and combined effects of multiple variables on the outcome,
including interactions between correlated variables.

Comparison: MLR vs. SLR

• SLR Equation: Y = β0 + β1X1 + ε


• One explanatory variable.
• Other variables' effects are absorbed into the error term (ε).
• MLR Equation: Y = β0 + β1X1 + β2X2 + ... + βkXk + ε
• Multiple explanatory variables.
• Each variable's effect is estimated explicitly.

MLR Model Components

• Y: Response variable
• X1, X2, ..., Xk: Explanatory variables
• β0, β1, ..., βk: Coefficients
• ε: Error term

Error Term Assumptions

• Errors are independent


• Errors have equal variance (homoscedasticity)
• Errors are normally distributed
• E[ε] = 0

Expected Value of Y

• E[Y | X1, ..., Xk] = β0 + β1X1 + β2X2 + ... + βkXk

1
2. Key Concepts in MLR

Adjusted R-squared (R̄²)

• Definition: Adjusted R² accounts for the number of predictors (k) and sample size (n).
• Purpose: Prevents misleading increases in R² by penalizing unnecessary variables.
• Key Points:
• Adjusted R² is generally less than R².
• Higher Adjusted R² = Better model.

Standard Error (Se)

• Definition: Estimate of population standard deviation of error terms (σε).


• Smaller Se = Better model.
• In MLR, Se usually decreases as more variables are added.
• R̄² and Se² move in opposite directions.

Coefficient of Correlation (R)

• In SLR: R = correlation between X and Y.


• In MLR: R = correlation between observed Y and predicted Y (Ŷ).

Marginal vs. Partial Slopes

• Marginal Slope (SLR): Total effect of a variable on Y, ignoring other variables.


• Partial Slope (MLR): Effect of a variable on Y holding other variables constant.
• Marginal and partial slopes are the same only if explanatory variables are independent (rare).

3. Collinearity / Multicollinearity
• Definition: High correlation among explanatory variables.
• Effect: Makes MLR results hard to interpret.
• Tools: Path diagram, Variance Inflation Factor (VIF).

4. Path Diagram: Direct and Indirect Effects


• Visualizes relationships among explanatory variables and with Y.
• Direct Effect: From X1 to Y (partial slope).
• Indirect Effect: From X1 → X2 → Y.
• Total Effect = Direct Effect + Indirect Effect
• Total Effect ≈ Marginal Slope from SLR.

2
5. Example: CGPA Prediction (Business School Admissions)

Dataset

• 15 Students
• Y: CGPA
• X1: Entrance Exam Score (0-10)
• X2: Interview Score (0-10)

Correlations

• CGPA and Entrance: 0.74


• CGPA and Interview: 0.76
• Entrance and Interview: 0.54 → Sign of multicollinearity

SLR: X1 on Y

• R: 0.74, R²: 0.55, Se: 0.785


• Coefficient: 0.72 (Marginal slope)
• p-value: 0.001 → Significant

SLR: X2 on Y

• R: 0.763, Se: 0.741


• Coefficient: 0.934 (Marginal slope)
• p-value: 0.0001 → Significant

MLR: X1 and X2 on Y

• Multiple R: 0.86, R²: 0.74, Adjusted R²: 0.69, Se: 0.628


• p-value: 0.0003 → Model is significant

Coefficients (Partial Slopes):

• Intercept: -0.7
• X1: 0.455 (p = 0.019, CI: 0.10 to 0.81)
• X2: 0.622 (p = 0.010, CI: 0.15 to 1.08)

Regression Equation: CGPA = -0.7 + 0.455(Entrance) + 0.622(Interview)

Path Diagram Quantification

• X1 → X2: Coefficient = 0.42


• Indirect Effect (X1): 0.42 * 0.622 = 0.26
• Total Effect (X1): 0.455 + 0.26 = 0.715 ≈ 0.72 (Marginal)
• X2 → X1: Coefficient = 0.68
• Indirect Effect (X2): 0.68 * 0.455 = 0.31
• Total Effect (X2): 0.622 + 0.31 = 0.932 ≈ 0.934 (Marginal)

3
6. Variance Inflation Factor (VIF)

Definition

• Measures how much variance in X_i is explained by other Xs.


• Formula: VIF(Xi) = 1 / (1 - Ri²)
• Ri²: R-squared from regression of Xi on all other explanatory variables.

Interpretation

• VIF = 1: No collinearity
• VIF > 1: Indicates multicollinearity

Impact on Standard Error

• SE(bi)_with_VIF = SE(bi)_without_VIF * √(VIF)


• Higher VIF → Larger SE → Smaller t-statistic → Larger p-value

CGPA Example

• Correlation X1 & X2: 0.54


• R² from auxiliary regression: 0.29 → VIF = 1.41
• √(1.41) = 1.18 → SE increases by 18%

7. Case Study: Apartment Price Prediction

Variables

• Y: Price
• X1: Area (sq. ft.)
• X2: Bedrooms
• X3: Parking Lots
• Data: 20 apartments

SLR Results

• All variables: Significant individually (p < 0.05)


• Area marginal slope = 0.32

MLR Results

• Multiple R = 0.7, R² = 0.49, Model p-value = 0.01

But:

• Area (X1):

4
• Partial slope = ~0.05
• p = 0.7 → Not significant
• CI includes 0
• Parking Lots (X3):
• p = 0.11 → Not significant
• CI includes 0

Conclusion: Despite individual significance in SLR, variables become insignificant in MLR due to
multicollinearity.

VIF Values

• Area: 1.53
• Bedrooms: 1.34
• Parking Lots: 1.23

8. Signs of Multicollinearity
• R² increases only slightly with more variables
• Marginal vs. Partial slopes differ drastically
• Strong overall F-statistic, but weak individual t-tests
• Partial slope SE > Marginal slope SE

9. Remedies for Multicollinearity


1. Remove Redundant Variables

2. Drop variables that add little unique value.

3. Re-express Variables

4. Combine correlated variables into one (e.g., economic status).

5. Do Nothing (if variables are still significant)

6. If p-values are low and estimates are stable, collinearity might be acceptable.
7. Example: In CGPA model, both variables had significant partial slopes despite correlation.

You might also like