Unit II : Regression
What is Regression? Explain types of Regressions.
1. What is Regression?
Regression is a supervised learning technique in machine learning used to predict a
continuous numerical value (quantity) based on one or more input features.
Goal: Find the relationship between a dependent variable (target) and independent
variables (predictors).
Example: Predicting a house price using features like size, location, and number of
rooms.
Key Terms:
1. Dependent Variable (Target) – The value we want to predict (e.g., house price).
2. Independent Variables (Features) – Input factors affecting the prediction (e.g.,
locality, rooms).
Need for Regression:
Price prediction (houses, stocks, etc.)
Trend forecasting (sales, demand)
Risk analysis (medical or financial risk)
Decision-making based on patterns
2. Types of Regression
There are several types, but the main ones covered in your syllabus are:
A. Linear Regression
Definition: Models the relationship between dependent and independent variables
with a straight-line equation.
Formula:
y=b0+b1xy = b_0 + b_1x
where b0b_0 = intercept, b1b_1 = slope.
Types:
1. Simple Linear Regression:
One independent variable.
Example: Predicting marks based on study hours.
2. Multiple Linear Regression:
Two or more independent variables.
Example: Predicting house price using size, location, and number of
bedrooms.
Advantages: Easy to interpret, works well for linear data.
Limitations: Cannot model non-linear relationships well.
B. Non-Linear Regression
Definition: Models situations where the relationship between variables is not a
straight line.
Formula: Could be polynomial, exponential, logarithmic, etc.
Example: Population growth, disease spread curves.
Advantages: Can handle complex patterns.
Limitations: More complex, harder to interpret, may require iterative methods.
C. Polynomial Regression
Definition: A special case of non-linear regression where the model is a polynomial
of the independent variable(s).
Formula:
y=b0+b1x+b2x^2+...+b^nx^n
Example: Predicting traffic flow across different times of the day.
Advantage: Fits curves better than linear regression.
Limitation: Risk of overfitting if degree is too high.
D. Stepwise Regression
Definition: Iteratively adds or removes variables to find the most relevant predictors.
Types:
1. Forward Selection – Start with no variables, add one by one.
2. Backward Elimination – Start with all variables, remove the least useful
ones.
Advantages: Reduces complexity, focuses on important variables.
Limitations: Can lead to overfitting, may miss the best combination of features.
E. Decision Tree Regression
Definition: Uses a tree-like model to split data into smaller groups based on feature
values, predicting the average of the group.
Advantages: Easy to interpret, handles non-linear data.
Limitations: Can overfit, unstable to small changes in data.
F. Random Forest Regression
Definition: An ensemble method that combines many decision trees to improve
accuracy.
Advantages: High accuracy, handles missing data, less overfitting.
Limitations: More complex, less interpretable than a single tree.
✅ Summary Table:
Relationship Handles Non-Linear
Type Complexity Example
Shape Data?
Marks vs Study
Simple Linear Straight line Low ❌ No
Hours
Multiple
Straight plane Medium ❌ No House Price
Linear
Polynomial Curved line Medium ✅ Yes Traffic Flow
Stepwise Variable Medium ✅ Sometimes Feature Selection
Decision Tree Piecewise splits Medium ✅ Yes Salary Prediction
Random
Many trees High ✅ Yes Stock Price
Forest
Differentiate multivariate regression and univariate regression.
Aspect Univariate Regression Multivariate Regression
Deals with only one dependent
variable and one independent variable
Number of (Simple Linear Regression) OR one Deals with more than one dependent
Variables dependent variable and multiple variable and multiple independent
Considered independent variables (Multiple variables.
Regression can still be univariate if
only one dependent variable).
Studies the relationship between a
Studies relationships among multiple
Purpose single dependent variable and
dependent variables simultaneously.
predictors.
Less complex, easier to visualize and More complex, requires advanced
Complexity
interpret. statistical techniques.
Multiple equations, one for each
dependent variable, e.g.,
y=b0+b1xy = b_0 + b_1x (or
Equation y1=b01+b11x1+...y_1 = b_{01} +
extended for multiple predictors but
Form b_{11}x_1 + ... and
still one y).
y2=b02+b12x1+...y_2 = b_{02} +
b_{12}x_1 + ....
Predicts multiple output values at
Output Predicts one output value.
once.
Predicting height and weight of a
Predicting student’s marks based on
Example person based on age, diet, and
study hours.
exercise.
In Short:
Univariate regression → 1 dependent variable
Multivariate regression → 2 or more dependent variables
Explain Bias-Variance Trade-off with respect to Machine Learning.
1. What is Bias?
Definition: The error caused by wrong assumptions in the learning algorithm.
High Bias → Underfitting
o Model is too simple.
o Misses important patterns in the data.
o Performs poorly on both training and test data.
Example: Trying to fit a straight line to curved data.
2. What is Variance?
Definition: The error caused by model sensitivity to small changes in the training
data.
High Variance → Overfitting
o Model is too complex.
o Fits noise as well as actual patterns.
o Performs well on training data but poorly on new data.
Example: Very deep decision tree memorizing the training set.
3. Bias–Variance Trade-off
Definition: The balance between bias and variance to achieve the best generalization
on unseen data.
Goal: Find the "sweet spot" where total error is minimal.
Reason for Trade-off:
o If a model is too simple → High bias, low variance → Underfits.
o If a model is too complex → Low bias, high variance → Overfits.
o We need a model that’s just complex enough to capture patterns without
memorizing noise.
4. Graphical Understanding
Imagine a curve showing:
Bias decreases as model complexity increases.
Variance increases as model complexity increases.
Total error is minimized at a middle point → This is the ideal trade-off.
5. Summary Table
Aspect Low Bias & High Variance High Bias & Low Variance
Model Complexity Too complex Too simple
Error on Training Data Low High
Error on Test Data High High
Problem Type Overfitting Underfitting
Example High-degree polynomial curve Straight line for curved data
✅ Key Tip for Exams:
Think of it like Goldilocks’ porridge:
Too simple → underfit (high bias).
Too complex → overfit (high variance).
Just right → good trade-off, best performance.
Differentiate Ridge and Lasso Regression techniques
1. Basic Idea
Both Ridge and Lasso are regularization techniques used in regression to:
Reduce overfitting
Improve model generalization
Work by adding a penalty term to the regression equation
3. Summary in Simple Words
Ridge → "Shrink but don’t delete" coefficients.
Lasso → "Shrink and sometimes delete" coefficients.
4. Quick Example
Imagine predicting house prices with 100 features:
Ridge will keep all features but reduce the importance of less useful ones.
Lasso will completely remove irrelevant features and keep only the most important
ones.
Explain three evaluation metrics used for regression model.
Explain the Random forest Regression in detail.
1. What is Random Forest Regression?
Definition: A machine learning algorithm that predicts continuous numerical values
by combining results from multiple decision trees (an ensemble method).
Idea: Instead of relying on one decision tree (which might overfit), build many trees
and average their predictions.
Type: Supervised learning algorithm.
2. How It Works
Random Forest builds multiple decision trees in four main steps:
1. Bootstrap Sampling (Bagging)
o Randomly select samples with replacement from the dataset to train each
tree.
o Ensures each tree gets slightly different data.
2. Feature Sampling
o At each split in a tree, only a random subset of features is considered.
o Helps make trees diverse and less correlated.
3. Tree Building
o Each tree is grown independently using its sampled data and features.
o Uses Mean Squared Error (MSE) as splitting criterion for regression tasks.
4. Prediction Aggregation
o For regression, predictions from all trees are averaged to get the final output.
3. Example
Suppose we want to predict house price:
Tree 1 predicts ₹52 lakh
Tree 2 predicts ₹50 lakh
Tree 3 predicts ₹55 lakh
Final Prediction = (52 + 50 + 55) / 3 = ₹52.33 lakh
4. Advantages
High Accuracy: Averaging multiple trees reduces error.
Handles Non-linear Relationships: Works well with complex patterns.
Robustness: Less affected by noise or missing values.
Feature Importance: Can tell which features impact predictions most.
Less Overfitting: Bagging and feature sampling reduce variance.
5. Disadvantages
Complexity: More difficult to interpret compared to a single tree.
Computation Time: Slower to train and predict if there are many trees.
Memory Usage: Requires storing multiple trees in memory.
6. When to Use
Large datasets with many features.
Problems with non-linear or complex relationships.
When avoiding overfitting is important.
✅ Quick Summary Table
Aspect Random Forest Regression
Type Ensemble (Bagging)
Aspect Random Forest Regression
Base Learner Decision Tree
Output Average of tree outputs
Strength High accuracy, robust
Weakness Less interpretable, slower
Differentiate between Regression and Correlation.
Aspect Correlation Regression
Measures the strength and Models the relationship between
Meaning direction of the relationship dependent and independent variables
between two variables. to make predictions.
To predict the value of a dependent
To see if variables are related and
Purpose variable based on one or more
how strongly.
independent variables.
A single value (correlation An equation that describes the
Output coefficient, e.g., Pearson’s r) relationship, e.g., y=b0+b1xy = b_0 +
between -1 and +1. b_1x.
Shows how much the dependent
Direction of Shows positive, negative, or no
variable changes when the
Relationship correlation.
independent variable changes.
Prediction ❌ Cannot be used for prediction. ✅ Can be used for prediction.
Does not prove causation, but can
Causation Does not imply causation. help investigate possible causal
effects.
Mathematical Equation with coefficients (slope,
Single coefficient rr.
Expression intercept).
Correlation between ice cream Predicting house price based on size
Example
sales and temperature. and location.
✅ Key Tip to Remember:
Correlation → "Are they related?" (strength & direction only)
Regression → "How are they related?" + "Can we predict?"
What is underfitting and overfitting in machine Learning explain the
techniques to reduce overfitting?
1. Underfitting
Definition:
Happens when a model is too simple to capture the underlying patterns in data.
Performs poorly on both training data and test data.
Causes:
Model complexity is too low.
Not enough training time (early stopping too soon).
Missing important features in the dataset.
Incorrect assumptions (e.g., using linear regression for non-linear data).
Characteristics:
High Bias, Low Variance.
Predictions are inaccurate even on training data.
Example:
Using a straight line (linear model) to fit a dataset with a clear curve.
2. Overfitting
Definition:
Happens when a model memorizes the training data, including noise and outliers.
Performs well on training data but poorly on unseen (test) data.
Causes:
Model complexity is too high.
Too many features without proper regularization.
Training for too many epochs without monitoring performance.
Small dataset with high model capacity.
Characteristics:
Low Bias, High Variance.
Training error is low, but test error is high.
Example:
Very deep decision tree fitting every point in training data, including noise.
3. Bias–Variance View
Underfitting → High Bias, Low Variance.
Overfitting → Low Bias, High Variance.
Goal: Find the right bias–variance trade-off for best generalization.
4. Techniques to Reduce Overfitting
Here are the main methods used in practice:
A. Simplify the Model
Reduce the number of features (Feature Selection).
Use fewer parameters.
B. Regularization
Add penalty terms to control coefficient size:
o Ridge Regression (L2 penalty)
o Lasso Regression (L1 penalty)
o Elastic Net (combination of L1 & L2).
C. Cross-Validation
Use k-fold cross-validation to check performance on different subsets of data and
prevent reliance on a single train/test split.
D. Early Stopping
Stop training when validation error starts increasing, even if training error is
decreasing.
E. Pruning (in Decision Trees)
Remove unnecessary branches to simplify the tree.
F. Dropout (in Neural Networks)
Randomly drop some neurons during training to prevent over-dependence on certain
paths.
G. Data Augmentation
Create more training data artificially (especially for images, text) by transformations
like rotation, flipping, cropping, etc.
H. Increase Training Data
More diverse training samples help the model generalize better.
5. Quick Comparison Table
Feature Underfitting Overfitting
Model Complexity Too simple Too complex
Feature Underfitting Overfitting
Bias High Low
Variance Low High
Training Error High Low
Test Error High High
Fix Increase complexity Reduce complexity / Regularize
Explain Elastic Net regression in Machine Learning.
1. What is Elastic Net Regression?
Definition:
Elastic Net is a regularization technique that combines Ridge Regression (L2
penalty) and Lasso Regression (L1 penalty) into a single model.
Purpose:
To handle limitations of both Ridge and Lasso and work well when:
o There are many correlated features.
o We need both feature selection and coefficient shrinkage.
3. How It Works
L1 (Lasso) part → forces some coefficients to exactly zero (feature selection).
L2 (Ridge) part → shrinks remaining coefficients smoothly (reduces variance).
This combination helps when:
o Some features are irrelevant.
o Some features are highly correlated.
4. Advantages
Handles multicollinearity (like Ridge).
Performs feature selection (like Lasso).
Works well when:
o Number of predictors > Number of observations.
o Features are highly correlated.
More stable than Lasso when predictors are correlated.
5. Disadvantages
Slightly more complex to tune because we have two parameters (λ\lambda and α\
alpha).
Requires careful cross-validation to find best values.
6. Example Use Case
Genomics: Thousands of gene features, many correlated, but only some relevant for
predicting a disease risk.
Finance: Predicting stock returns where many economic indicators are correlated.
7. Quick Summary Table
Feature Ridge Lasso Elastic Net
Penalty L2 L1 L1 + L2
Feature Selection No Yes Yes
Handles
Yes No Yes
Multicollinearity
Coefficient Shrinking Yes Yes (some to zero) Yes
All features Few features Many correlated features & need
Best When
useful important selection