0% found this document useful (0 votes)
9 views23 pages

Module4 Regression Analysis

This document provides an overview of regression analysis, covering linear, multiple, and logistic regression methods. It discusses their applications, mathematical foundations, assumptions, and evaluation metrics, along with practical considerations for implementation. Case studies illustrate the use of these techniques in predicting housing prices and assessing credit risk.

Uploaded by

waqar.namalite
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Module4 Regression Analysis

This document provides an overview of regression analysis, covering linear, multiple, and logistic regression methods. It discusses their applications, mathematical foundations, assumptions, and evaluation metrics, along with practical considerations for implementation. Case studies illustrate the use of these techniques in predicting housing prices and assessing credit risk.

Uploaded by

waqar.namalite
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Module4 :Regression Analysis: Linear, Multiple, and

Logistic

Module4 :Regression Analysis: Linear, Multiple, and Logistic 1 / 23


Introduction to Regression

Definition: Statistical method for modeling relationships between


variables
Purpose:
Predict continuous outcomes (Linear Regression)
Classify categorical outcomes (Logistic Regression)
Applications:
Sales forecasting, risk assessment, medical diagnosis
Real Example: Predicting house prices (Zillow), credit scoring (FICO)

Everyday Regression
Your Netflix recommendations and weather forecasts both use regression
models!

Module4 :Regression Analysis: Linear, Multiple, and Logistic 2 / 23


Types of Regression

Continuous Outcomes:
Simple Linear Categorical Outcomes:
Multiple Linear Logistic (Binary)
Polynomial Multinomial
Ridge/Lasso Ordinal
Use Case: Stock price Use Case: Spam detection
prediction

Module4 :Regression Analysis: Linear, Multiple, and Logistic 3 / 23


Simple Linear Regression

Model Equation:
Y = β0 + β1 X + ϵ

Y : Dependent variable (e.g., House Price)


X : Independent variable (e.g., Square Footage)
β0 : Base price when X = 0 ($150,000)
β1 : Price per unit increase ($250/sqft)
ϵ: Error term ∼ N(0, σ 2 )

Business Insight
Adding 500 sqft? Expected price increase: $125,000

Module4 :Regression Analysis: Linear, Multiple, and Logistic 4 / 23


Linear Regression Mathematics

Objective: Minimize Residual Sum of Squares (RSS)


n
X
RSS = (Yi − (β0 + β1 Xi ))2
i=1

Closed-form Solutions:
P
(Xi − X̄ )(Yi − Ȳ )
β1 =
(Xi − X̄ )2
P

β0 = Ȳ − β1 X̄

Module4 :Regression Analysis: Linear, Multiple, and Logistic 5 / 23


Assumptions of Linear Regression

1 Linearity: Check with scatterplots


2 Independence: Durbin-Watson test
3 Homoscedasticity: Residual vs fitted plots
4 Normality: Q-Q plots
5 No perfect multicollinearity: VIF scores

Medical Study Example


Violating normality in drug trials can lead to false conclusions about
treatment effects!

Module4 :Regression Analysis: Linear, Multiple, and Logistic 6 / 23


Multiple Linear Regression

Model Equation:

Price = β0 + β1 SqFt + β2 Bedrooms + β3 Age + ϵ

Matrix Form:
Y = Xβ + ϵ

Marketing Application
Sales = 50, 000 + 2.5 × FB Ads + 1.8 × Google Ads

Module4 :Regression Analysis: Linear, Multiple, and Logistic 7 / 23


Case Study: House Price Prediction

Multiple Linear Regression Model:

Price = 50, 000+180×SqFt+15, 000×Bedrooms+30, 000×SchoolRating


Key Features:
SqFt: Living area
Sample Dataset:
Price ($) SqFt Beds Baths Location
Beds/Baths:
Bedrooms/Bathrooms
450,000 1,800 3 2 Suburban
320,000 1,400 2 1 Urban Location:
525,000 2,200 4 3 Suburban
380,000 1,600 3 2 Rural
Urban/Suburban/Rural
SchoolRating: 1-10 scale
YearBuilt: Construction year

Module4 :Regression Analysis: Linear, Multiple, and Logistic 8 / 23


Multiple Regression Mathematics

Normal Equation:
β̂ = (XT X)−1 XT Y
Geometric Interpretation:
Projection of Y onto column space of X
Requires rank(X) = p + 1

Financial Modeling
Hedge funds use this to optimize portfolios with hundreds of assets

Module4 :Regression Analysis: Linear, Multiple, and Logistic 9 / 23


Model Evaluation Metrics

R-squared:
RSS
R2 = 1 −
TSS
0.85 = Good for social sciences, 0.95 for physics
Adjusted R-squared:
 
2 n−1
R̄ = 1 − (1 − R 2 )
n−p−1

MSE:
n
1X
MSE = (Yi − Ŷi )2
n
i=1

$50,000 MSE → Typical prediction error

Module4 :Regression Analysis: Linear, Multiple, and Logistic 10 / 23


Regularization Techniques

Ridge Regression (L2):

min ∥Y − Xβ∥2 + λ∥β∥22


β

Stabilizes Netflix recommendations


Lasso (L1):
min ∥Y − Xβ∥2 + λ∥β∥1
β

Selects key genes in cancer research

Module4 :Regression Analysis: Linear, Multiple, and Logistic 11 / 23


Introduction to Logistic Regression

Used for binary classification (Y ∈ {0, 1})


Models probability P(Y = 1|X )
Logit Transformation:
 
p
logit(p) = ln = Xβ
1−p

Titanic Survival Prediction


1
P(Survive) =
1+ e −(2.5+1.8×Female)

Module4 :Regression Analysis: Linear, Multiple, and Logistic 12 / 23


Logistic Regression Model

Sigmoid Function:
1
P(Y = 1) =
1 + e −Xβ
Decision Boundary:
Typically at P(Y = 1) = 0.5
Corresponds to Xβ = 0

Module4 :Regression Analysis: Linear, Multiple, and Logistic 13 / 23


Maximum Likelihood Estimation

Likelihood Function:
n
Y
L(β) = P(Yi )Yi (1 − P(Yi ))1−Yi
i=1

Log-Likelihood:
n
X
ℓ(β) = [Yi ln P(Yi ) + (1 − Yi ) ln(1 − P(Yi ))]
i=1

Credit Scoring Application


Banks maximize this to predict loan default probabilities

Module4 :Regression Analysis: Linear, Multiple, and Logistic 14 / 23


Gradient Descent for Logistic Regression

Cost Function:
n
1X
J(β) = − [Yi ln P(Yi ) + (1 − Yi ) ln(1 − P(Yi ))]
n
i=1

Gradient Update:
∂J
βj := βj − α
∂βj

Learning rate α crucial


Used in large-scale applications

Module4 :Regression Analysis: Linear, Multiple, and Logistic 15 / 23


Multicollinearity

Definition: High correlation among predictors


Problems:
Unstable coefficient estimates
Inflated standard errors
Detection:
Variance Inflation Factor (VIF)

1
VIFj =
1 − Rj2

Marketing Data Example


Ad spend vs. impressions often correlated → Use regularization

Module4 :Regression Analysis: Linear, Multiple, and Logistic 16 / 23


Model Selection Criteria

AIC:
AIC = 2k − 2 ln(L)
Prefers simpler models
BIC:
BIC = ln(n)k − 2 ln(L)
Stronger penalty for complexity

COVID-19 Model Selection


Used to identify most predictive factors: masks vs. mobility

Module4 :Regression Analysis: Linear, Multiple, and Logistic 17 / 23


Comparison of Regression Methods

Feature Linear Logistic


Response Type Continuous Binary
Equation Y = Xβ + ϵ logit(P) = X β
Error Distribution Normal Binomial
Estimation Method OLS MLE
Output Prediction Probability

Healthcare Example
Linear: Blood pressure levels
Logistic: Heart disease risk (Yes/No)

Module4 :Regression Analysis: Linear, Multiple, and Logistic 18 / 23


Practical Considerations

Feature Scaling: Standardization for regularization


Outlier Handling: Robust regression for financial data
Missing Data: Multiple imputation for clinical trials

Data Science Pipeline


1. Clean data → 2. Explore relationships → 3. Check assumptions → 4.
Model → 5. Validate

Module4 :Regression Analysis: Linear, Multiple, and Logistic 19 / 23


Case Study: Linear Regression

Housing Price Prediction:


Predict price based on square footage, bedrooms, location
Model interpretation:
Coefficient for bedrooms: $15,000 per bedroom
School quality adds $30,000
Evaluation: R 2 = 0.85, MSE = $50,000

Module4 :Regression Analysis: Linear, Multiple, and Logistic 20 / 23


Case Study: Logistic Regression

Credit Risk Assessment:


Predict default (Yes/No) based on income, debt, credit history
Model interpretation:
Odds ratio for income: 0.92 (per $1,000 increase)
Late payments 3x risk multiplier
Evaluation: Accuracy = 87%, AUC = 0.91

Business Impact
Reduced bad loans by 22% while approving 15

Module4 :Regression Analysis: Linear, Multiple, and Logistic 21 / 23


Summary

Linear Regression:
Continuous outcomes
OLS estimation Example: Demand forecasting
Logistic Regression:
Binary outcomes
MLE estimation Example: Fraud detection

Module4 :Regression Analysis: Linear, Multiple, and Logistic 22 / 23


Questions?

Module4 :Regression Analysis: Linear, Multiple, and Logistic 23 / 23

You might also like