Module4 :Regression Analysis: Linear, Multiple, and
Logistic
Module4 :Regression Analysis: Linear, Multiple, and Logistic 1 / 23
Introduction to Regression
Definition: Statistical method for modeling relationships between
variables
Purpose:
Predict continuous outcomes (Linear Regression)
Classify categorical outcomes (Logistic Regression)
Applications:
Sales forecasting, risk assessment, medical diagnosis
Real Example: Predicting house prices (Zillow), credit scoring (FICO)
Everyday Regression
Your Netflix recommendations and weather forecasts both use regression
models!
Module4 :Regression Analysis: Linear, Multiple, and Logistic 2 / 23
Types of Regression
Continuous Outcomes:
Simple Linear Categorical Outcomes:
Multiple Linear Logistic (Binary)
Polynomial Multinomial
Ridge/Lasso Ordinal
Use Case: Stock price Use Case: Spam detection
prediction
Module4 :Regression Analysis: Linear, Multiple, and Logistic 3 / 23
Simple Linear Regression
Model Equation:
Y = β0 + β1 X + ϵ
Y : Dependent variable (e.g., House Price)
X : Independent variable (e.g., Square Footage)
β0 : Base price when X = 0 ($150,000)
β1 : Price per unit increase ($250/sqft)
ϵ: Error term ∼ N(0, σ 2 )
Business Insight
Adding 500 sqft? Expected price increase: $125,000
Module4 :Regression Analysis: Linear, Multiple, and Logistic 4 / 23
Linear Regression Mathematics
Objective: Minimize Residual Sum of Squares (RSS)
n
X
RSS = (Yi − (β0 + β1 Xi ))2
i=1
Closed-form Solutions:
P
(Xi − X̄ )(Yi − Ȳ )
β1 =
(Xi − X̄ )2
P
β0 = Ȳ − β1 X̄
Module4 :Regression Analysis: Linear, Multiple, and Logistic 5 / 23
Assumptions of Linear Regression
1 Linearity: Check with scatterplots
2 Independence: Durbin-Watson test
3 Homoscedasticity: Residual vs fitted plots
4 Normality: Q-Q plots
5 No perfect multicollinearity: VIF scores
Medical Study Example
Violating normality in drug trials can lead to false conclusions about
treatment effects!
Module4 :Regression Analysis: Linear, Multiple, and Logistic 6 / 23
Multiple Linear Regression
Model Equation:
Price = β0 + β1 SqFt + β2 Bedrooms + β3 Age + ϵ
Matrix Form:
Y = Xβ + ϵ
Marketing Application
Sales = 50, 000 + 2.5 × FB Ads + 1.8 × Google Ads
Module4 :Regression Analysis: Linear, Multiple, and Logistic 7 / 23
Case Study: House Price Prediction
Multiple Linear Regression Model:
Price = 50, 000+180×SqFt+15, 000×Bedrooms+30, 000×SchoolRating
Key Features:
SqFt: Living area
Sample Dataset:
Price ($) SqFt Beds Baths Location
Beds/Baths:
Bedrooms/Bathrooms
450,000 1,800 3 2 Suburban
320,000 1,400 2 1 Urban Location:
525,000 2,200 4 3 Suburban
380,000 1,600 3 2 Rural
Urban/Suburban/Rural
SchoolRating: 1-10 scale
YearBuilt: Construction year
Module4 :Regression Analysis: Linear, Multiple, and Logistic 8 / 23
Multiple Regression Mathematics
Normal Equation:
β̂ = (XT X)−1 XT Y
Geometric Interpretation:
Projection of Y onto column space of X
Requires rank(X) = p + 1
Financial Modeling
Hedge funds use this to optimize portfolios with hundreds of assets
Module4 :Regression Analysis: Linear, Multiple, and Logistic 9 / 23
Model Evaluation Metrics
R-squared:
RSS
R2 = 1 −
TSS
0.85 = Good for social sciences, 0.95 for physics
Adjusted R-squared:
2 n−1
R̄ = 1 − (1 − R 2 )
n−p−1
MSE:
n
1X
MSE = (Yi − Ŷi )2
n
i=1
$50,000 MSE → Typical prediction error
Module4 :Regression Analysis: Linear, Multiple, and Logistic 10 / 23
Regularization Techniques
Ridge Regression (L2):
min ∥Y − Xβ∥2 + λ∥β∥22
β
Stabilizes Netflix recommendations
Lasso (L1):
min ∥Y − Xβ∥2 + λ∥β∥1
β
Selects key genes in cancer research
Module4 :Regression Analysis: Linear, Multiple, and Logistic 11 / 23
Introduction to Logistic Regression
Used for binary classification (Y ∈ {0, 1})
Models probability P(Y = 1|X )
Logit Transformation:
p
logit(p) = ln = Xβ
1−p
Titanic Survival Prediction
1
P(Survive) =
1+ e −(2.5+1.8×Female)
Module4 :Regression Analysis: Linear, Multiple, and Logistic 12 / 23
Logistic Regression Model
Sigmoid Function:
1
P(Y = 1) =
1 + e −Xβ
Decision Boundary:
Typically at P(Y = 1) = 0.5
Corresponds to Xβ = 0
Module4 :Regression Analysis: Linear, Multiple, and Logistic 13 / 23
Maximum Likelihood Estimation
Likelihood Function:
n
Y
L(β) = P(Yi )Yi (1 − P(Yi ))1−Yi
i=1
Log-Likelihood:
n
X
ℓ(β) = [Yi ln P(Yi ) + (1 − Yi ) ln(1 − P(Yi ))]
i=1
Credit Scoring Application
Banks maximize this to predict loan default probabilities
Module4 :Regression Analysis: Linear, Multiple, and Logistic 14 / 23
Gradient Descent for Logistic Regression
Cost Function:
n
1X
J(β) = − [Yi ln P(Yi ) + (1 − Yi ) ln(1 − P(Yi ))]
n
i=1
Gradient Update:
∂J
βj := βj − α
∂βj
Learning rate α crucial
Used in large-scale applications
Module4 :Regression Analysis: Linear, Multiple, and Logistic 15 / 23
Multicollinearity
Definition: High correlation among predictors
Problems:
Unstable coefficient estimates
Inflated standard errors
Detection:
Variance Inflation Factor (VIF)
1
VIFj =
1 − Rj2
Marketing Data Example
Ad spend vs. impressions often correlated → Use regularization
Module4 :Regression Analysis: Linear, Multiple, and Logistic 16 / 23
Model Selection Criteria
AIC:
AIC = 2k − 2 ln(L)
Prefers simpler models
BIC:
BIC = ln(n)k − 2 ln(L)
Stronger penalty for complexity
COVID-19 Model Selection
Used to identify most predictive factors: masks vs. mobility
Module4 :Regression Analysis: Linear, Multiple, and Logistic 17 / 23
Comparison of Regression Methods
Feature Linear Logistic
Response Type Continuous Binary
Equation Y = Xβ + ϵ logit(P) = X β
Error Distribution Normal Binomial
Estimation Method OLS MLE
Output Prediction Probability
Healthcare Example
Linear: Blood pressure levels
Logistic: Heart disease risk (Yes/No)
Module4 :Regression Analysis: Linear, Multiple, and Logistic 18 / 23
Practical Considerations
Feature Scaling: Standardization for regularization
Outlier Handling: Robust regression for financial data
Missing Data: Multiple imputation for clinical trials
Data Science Pipeline
1. Clean data → 2. Explore relationships → 3. Check assumptions → 4.
Model → 5. Validate
Module4 :Regression Analysis: Linear, Multiple, and Logistic 19 / 23
Case Study: Linear Regression
Housing Price Prediction:
Predict price based on square footage, bedrooms, location
Model interpretation:
Coefficient for bedrooms: $15,000 per bedroom
School quality adds $30,000
Evaluation: R 2 = 0.85, MSE = $50,000
Module4 :Regression Analysis: Linear, Multiple, and Logistic 20 / 23
Case Study: Logistic Regression
Credit Risk Assessment:
Predict default (Yes/No) based on income, debt, credit history
Model interpretation:
Odds ratio for income: 0.92 (per $1,000 increase)
Late payments 3x risk multiplier
Evaluation: Accuracy = 87%, AUC = 0.91
Business Impact
Reduced bad loans by 22% while approving 15
Module4 :Regression Analysis: Linear, Multiple, and Logistic 21 / 23
Summary
Linear Regression:
Continuous outcomes
OLS estimation Example: Demand forecasting
Logistic Regression:
Binary outcomes
MLE estimation Example: Fraud detection
Module4 :Regression Analysis: Linear, Multiple, and Logistic 22 / 23
Questions?
Module4 :Regression Analysis: Linear, Multiple, and Logistic 23 / 23