0% found this document useful (0 votes)

6 views33 pages

Unit-2 Sem II Notes

Uploaded by

mayuri.bhandari90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views33 pages

Unit-2 Sem II Notes

Uploaded by

mayuri.bhandari90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unit-2

Advance Supervised algorithms

2.1 Ridge Regression
2.2 Lasso Regression
2.3 Decision tree classifier
2.4 Random forest classifier
2.5 Supervised model optimization Techniques

Ridge Regression-
What is Ridge Regression?

1. Ridge Regression is a method used to improve linear regression models.

2. It is also called L2 regularization.
3. It helps when your independent variables (inputs) are highly correlated — this issue is called
multicollinearity.
4. Multicollinearity can make the model’s predictions unstable and the coefficients unreliable.
5. Ridge Regression also helps to prevent overfitting, which happens when a model learns the
training data too well (including noise) and performs poorly on new data.

Purpouse?

Example-1:-

We are building a model to predict house prices based on features like:

 Size of the house (in square feet)

 Number of bedrooms
 Number of bathrooms
 Total number of rooms

Issue: Multicollinearity

 Total rooms, bedrooms, and bathrooms are highly correlated.

(E.g., more bedrooms usually means more total rooms.)
 If we use ordinary linear regression, the model may give very large and unstable
coefficients to some features to compensate for that overlap.
 A small increase in the number of bedrooms also increases total rooms, so the model
struggles to tell which feature is actually causing the change in price.

Example-2:-

We want to predict students’ final exam scores using clean, meaningful features:
 Hours studied
 Attendance rate (%)
 Assignment scores
 Midterm exam score
 Class participation score

 We decide to use a very complex model — say, a polynomial regression of

degree 5 or 6. Even with clean, useful features, the model can over-learn small
patterns, noise, or outliers in the training data.

Issue: Overfitting

 The model fits the training data perfectly, even capturing tiny ups and downs in
scores but when tested on new students, the predictions are wildly inaccurate.
 This is classic overfitting due to high model complexity — the model has learned
the training data too well, and can’t generalize.

How to Fix it?

1. Use a simpler model (e.g., linear regression instead of polynomial of degree 6)

2. Apply regularization (e.g., Ridge or Lasso) to control the size of coefficients
3. Use cross-validation to find a model that works well on unseen data

Let’s Focus on Ridge……

 Ridge Regression helps by simplifying the model and reducing the

influence of each feature (input variable).
 It uses a method called L2 regularization, which means it adds a penalty
based on the sum of the squares of the coefficients.
 This penalty is added to the loss function (a formula that tells how
wrong the model's predictions are).

The loss function for Ridge Regression is:

where:
→ RSS/MSE = Residual Sum of Squares (error)
→ λ (lambda) = regularization strength
→ bi = model coefficients
Impact of Lambda (λ) on the Model-

 λ = 0 → Ridge becomes ordinary linear regression (no regularization)

 λ very high → All coefficients shrink close to 0 (underfitting may occur)
 Moderate λ → Best balance between bias and variance.

Dataset Example: -

Step-1

Step-2 Apply Linear Regression-

Build a model-

Step-3 Problem — Multicollinearity

 Bedrooms and Total Rooms are highly correlated.

 The model gives:
 Large +95 to bedrooms
 Large –80 to total rooms
 It’s compensating one for the other — an unstable model!

Step 4: Evaluate the Model

Negative price! This is not realistic.

This shows the instability of coefficients due to multicollinearity.

Step 5: MSE

Let say

MSE= 430

Step 6: Apply Ridge Regression

Let's set λ=10

Step 7: Ridge Coefficients-

Step 8: Build model

Lasso Regression-

 Lasso Regression stands for Least Absolute Shrinkage and Selection Operator.
 Like Ridge, it’s a regularization technique used to prevent overfitting and improve
model generalization.
 It is also known as L1 regularization.
 Lasso adds a penalty based on the absolute value of the coefficients.
 Unlike Ridge, Lasso can shrink some coefficients to exactly zero, effectively performing
feature selection.

Example:-

Step-1

Step 2: Apply Simple Linear Regression

For Student ‘C’

Let’s say the actual score was 92

Step 4: Apply Lasso Regression (λ = 1)
Difference Between Ridge and Lasso Regression
Feature / Criteria Ridge Regression (L2) Lasso Regression (L1)
Ridge Regression or L2
Full Name Lasso Regression or L1 Regularization
Regularization
Penalty Term λ × ∑(coefficients²) λ × ∑|Cofficients|
Reduces model complexity, avoids Reduces model complexity and selects
Purpose
overfitting key features
Can remove unimportant features (set
Feature Selection Keeps all features
coefficients to 0)
All features likely useful, but may
When to Use Only a few features are important
overfit
All features retained, with smaller Simpler model (some features
Output Model
coefficients removed)
Shrinks all coefficients toward Shrinks some coefficients to exactly
Coefficient Shrinking
zero zero
Multicollinearity Can help, but may drop one of the
Good at handling multicollinearity
Handling correlated features
Less interpretable (many features More interpretable (only key features
Interpretability
remain) remain)
Coefficients become very small, Some coefficients become exactly
High λ (lambda) Impact
none are zero zero
Model Complexity Moderate — all features included Simple — fewer features used
Slightly slower (due to feature
Computational Cost Faster (no feature elimination)
selection)
Predicting house prices using size, Selecting key genes from thousands in
Use Case Example
location, etc. medical data
Doesn’t simplify model (all May remove useful features if λ is too
Drawback
features retained) high
Mathematical
Uses L2 norm: ‖w‖² Uses L1 norm: ‖w‖₁
Optimization

Decision tree classifier-

 Decision tree classifiers are a fundamental type of supervised
machine learning algorithm used for classification tasks.
 It works by splitting the data into branches based on feature
values, creating a tree-like model of decisions.
 They create a model that predicts the value of a target variable by
learning simple decision rules inferred from the data features.
Terms-

 Root Node: The top decision node in a tree.

 Decision Node: A node that splits the data based on a feature.
 Leaf Node: A terminal node that gives the classification output.
 Splitting: Dividing data based on a condition.
 Entropy: Measures to determine the best feature to split on.
 Pruning: Removing parts of the tree to prevent overfitting

Working-

1. Start at the Top (Root Node): The decision tree begins by asking an
important question based on the data.
2. Yes or No Questions: It then asks simple yes-or-no questions to divide
the data into smaller groups.
3. Follow the Branches:
o If the answer is yes, it goes one way.
o If the answer is no, it goes another way.
4. Keep Splitting: It keeps asking questions at each step to break the data
down further.
5. Final Answer (Leaf Node): When no more questions are needed, it
reaches a final decision — like classifying something as a "Yes" or "No"

Splitting Criteria in Decision Trees-

When building a decision tree, it’s important to choose the best feature to
split the data at each step. This is done using splitting criteria, which help the
tree decide how to divide the data effectively.

Entropy

 Measures the amount of disorder or uncertainty in the data.

 The tree tries to split the data in a way that reduces entropy the most.
 This is also called Information Gain — the more information we gain
from a split, the better.

Pruning-

 Pruning is a way to simplify the decision tree by removing extra

branches that don’t add much value.
 It is used to avoid overfitting — when the tree learns the training data
too well, including the noise or random patterns.

 It Improves accuracy on new (unseen) data by focusing on general

patterns.
 Reduces complexity, making the model faster and easier to understand.
 Helps the tree generalize better, instead of just memorizing the training
data.
Example:-
Step-1

 Total: 14 instances
 Target: PlayTennis (Yes = 9, No = 5)
Step 2: Calculate Entropy of the full dataset-

Step 3: Calculate Information Gain for each feature

Feature 1: Outlook

Values: Sunny, Overcast, Rain

→ Split by Sunny:

 5 samples: [No, No, No, Yes, Yes] → 2 Yes, 3 No

→ Split by Overcast:

 4 samples: All Yes → Entropy = 0 (pure class)

→ Split by Rain:
 5 samples: [Yes, Yes, No, Yes, No] → 3 Yes, 2 No

Feature 2: Humidity

Values: High, Normal

 High: [No, No, Yes, Yes, No, Yes] → 3 Yes, 3 No

Normal: [Yes, Yes, Yes, Yes, Yes, Yes, No, Yes] → 6 Yes, 1 No
Outlook
Outloo

Sunny Overcast Rain

Step-4 Split 'Sunny' Branch Further

Step- 5 Calculate Information Gain for Each Feature

 Split on Humidity-

 Split on Temperature-

 Split on Wind-
Step-6

Overcast: All Yes → Leaf node

Step-7 Split 'Sunny' Branch Further

Step-8 Calculate Entropy of the "Rain" Subset-

Step-9 Calculate Information Gain for Each Feature

 Split on Wind-
 Split on Temperature-


Random forest classifier-

 A Random Forest Classifier is a powerful and widely used machine learning
algorithm.
 Random forests can be used for solving regression (numeric target variable) and
classification (categorical target variable) problems, but mostly for classification.
 Random forests are an ensemble method, meaning they combine predictions from
other models.
 Each of the smaller models in the random forest ensemble is a decision tree.
 It works by creating many small decision trees (like mini decision-makers), and then
all these trees vote on the final decision.
 It outputs the mode (most common class) of the classes predicted by individual
trees.
 It is called "random forest" because it uses many trees (a forest), and each tree is
built in a random way.

Terms-

 Ensemble Learning: Combines predictions from multiple models to improve accuracy

and robustness.
 Decision Trees: Basic building blocks. Each tree is trained on a random subset of the
data and features.
 Bagging (Bootstrap Aggregating): Random Forests use bagging to train trees on
different subsets of the training data.
 Feature Randomness: At each split in a tree, a random subset of features is
considered — reducing correlation between trees.
Working-

Suppose we have a complex problem to solve, and we gather a group of experts from
different fields to provide their input. Each expert provides their opinion based on their
expertise and experience. Then, the experts would vote to arrive at a final decision.

In a random forest classification, multiple decision trees are created using different random
subsets of the data and features. Each decision tree is like an expert, providing its opinion
on how to classify the data. Predictions are made by calculating the prediction for each
decision tree and then taking the most popular result.

1. Bootstrap Sampling: Random rows are picked (with replacement) to train each tree.
2. Random Feature Selection: Each tree uses a random set of features (not all features).
3. Build Decision Trees: Trees split the data using the best feature from their random set.
Splitting continues until a stopping rule is met (like max depth).
4. Make Predictions: Each tree gives its own prediction.
5. Majority Voting: The final prediction is the one most tree agree on.
Benefits-
 Random Forest can handle large datasets and high-dimensional data.
 By combining predictions from many decision trees, it reduces the risk of overfitting
compared to a single decision tree.
 It is robust to noisy data and works well with categorical data.

Difference Between Decision Tree and Random Forest-

Decision trees Random Forest

1. Random forests are created from subsets of
1. Decision trees normally suffer
data, and the final output is based on average
from the problem of overfitting if it’s
or majority ranking; hence the problem of
allowed to grow without any control.
overfitting is taken care of.
2. A single decision tree is faster in
2. It is comparatively slower.
computation.
3. When a data set with features is 3. Random forest randomly selects
taken as input by a decision tree, it observations, builds a decision tree, and takes
will formulate some rules to make the average result. It doesn’t use any set of
predictions. formulas.

Example: -

Step-1
Step-2 Build 3 Decision Trees (Using Bootstrapped Samples)

We'll randomly select subsets (with replacement) to create Tree 1, Tree 2, Tree 3.
Step 3: Predict a New Day
Step 4: Voting

Majority Vote → Final Prediction: YES (Play Tennis)

Supervised model optimization Techniques-

Optimizing a supervised learning model involves a multifaceted approach that considers
various aspects of the data, the chosen algorithm, and the training process.
The goal is to build a model that performs well on unseen data, effectively generalizes to
new situations, and avoids issues like overfitting or Underfitting.

Some of common supervised model optimization techniques: -

 Data preprocessing and feature engineering-
[Link] Cleaning and Handling Missing Values
2. Feature Scaling
3. Feature Engineering

 Algorithm selection and tuning

1. Choosing the Right Learning Algorithm
2. Hyperparameter Tuning
3. Regularization Techniques
4. Gradient Descent Optimization

 Model evaluation and refinement

1. Cross-Validation
2. Ensemble Methods
3. Continuous Evaluation and Adjustment

Will Study-
1. Gradient Descent
[Link] Tuning

1. Gradient Descent

Cost Function-

It is a function that measures the performance of a model for any given data. Cost
Function quantifies the error between predicted values and expected values and
presents it in the form of a single real number.

After making a hypothesis(Guess) with initial parameters, we calculate the Cost

function. And with a goal to reduce the cost function, we modify the parameters by
using the Gradient descent algorithm over the given data.

Here’s the mathematical representation for it:

learning rate-

The learning rate is a key parameter in the gradient descent algorithm. It controls how big a
step the model takes when updating its parameters during training.

 If the learning rate is too small, the model learns very slowly and takes many steps
to reach the optimal solution.
 If the learning rate is too large, the model might skip over the best solution or even
diverge in different direction, making the training unstable.

What is Gradient Descent?

Gradient- means slope(derivative) (for lower dimension)
 A vector is a mathematical object that has two main properties:
 Magnitude (how big it is)
 Direction (where it's pointing)

 In higher dimensions, it's a vector showing how much the function increases in each
direction.
 A vector representing the direction of the steepest downward path at specified
point

Descent- means going downward or decreasing.

Local minima refer to a point in a function where the value is lower than all
nearby points, but not necessarily the lowest overall.

 Gradient descent is an optimization algorithm commonly used in machine learning to

minimize a cost function by iteratively adjusting model parameters.
 The goal is to find the set of parameters that reduces the difference between the
model's predictions and the actual outputs, thereby improving performance.
 The algorithm works by computing the gradient of the cost function, which indicates
the direction of steepest(fall) increase.
 To minimize the cost, gradient descent moves in the opposite direction—along the
negative gradient.
 In each iteration, the model's parameters are updated based on this negative
gradient.
 The learning rate, a key Hyperparameter, controls the step size of these updates,
affecting both the speed and stability of convergence.
 Gradient descent is a versatile method applicable to various machine learning
models, including linear and logistic regression, neural networks, and support vector
machines, making it a foundational tool for model optimization.
Working: -

1. Start with an initial guess for the model's parameters (weights).

2. Calculate the gradient (i.e., the slope or direction of the steepest increase of the loss
function).
3. Update the parameters by moving in the opposite direction of the gradient (to
reduce the loss).
4. Repeat until you reach a minimum (ideally, the lowest point).

Formula: -

Two important points-

 Which direction to go (downhill — the negative gradient)

 How big a step to take (controlled by the learning rate α)

for one parameter W-

For multiple weights:-

Example:-

Step-by-Step: Simple Linear Regression with Gradient Descent:-

Step 1:- Dataset:-

Step -2 Initialize: -

Ste
p 3: Compute Predictions and Loss

Step 4: Compute Gradients-

Formula for w: -
Formula for b:-
Step-5 Update Parameters-
Step 5: New Predictions and Loss

Final conclusion-

Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Feature Selection
No ratings yet
Feature Selection
19 pages
THUẬT TOÁN
No ratings yet
THUẬT TOÁN
4 pages
Unit 2
No ratings yet
Unit 2
8 pages
Regression
No ratings yet
Regression
56 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
ML 1
No ratings yet
ML 1
24 pages
Logistic Regression Explained: Pros & Cons
No ratings yet
Logistic Regression Explained: Pros & Cons
45 pages
Supervised Learning: Regression Insights
No ratings yet
Supervised Learning: Regression Insights
11 pages
Edab Module - 4
No ratings yet
Edab Module - 4
16 pages
MLT Content
No ratings yet
MLT Content
3 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
ML 3
No ratings yet
ML 3
50 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
A Layman's Guide To Ridge Regression
No ratings yet
A Layman's Guide To Ridge Regression
4 pages
Module 3
No ratings yet
Module 3
35 pages
Regression
No ratings yet
Regression
16 pages
ML-classification Models
No ratings yet
ML-classification Models
27 pages
Detailed Breakdown Ridge Lasso
No ratings yet
Detailed Breakdown Ridge Lasso
2 pages
Logistic Regression and Overfitting Prevention
No ratings yet
Logistic Regression and Overfitting Prevention
48 pages
SP 24 BADM 576 Final - Exam - Study - Guide
No ratings yet
SP 24 BADM 576 Final - Exam - Study - Guide
13 pages
06 Regularization
No ratings yet
06 Regularization
36 pages
Crop Yield Prediction
No ratings yet
Crop Yield Prediction
19 pages
10 - Linear Regression-Problems and Solutions
No ratings yet
10 - Linear Regression-Problems and Solutions
23 pages
Lasso & Ridge Regression
No ratings yet
Lasso & Ridge Regression
5 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Machine
No ratings yet
Machine
21 pages
Lasso Vs Ridge Vs Elastic 1
No ratings yet
Lasso Vs Ridge Vs Elastic 1
5 pages
Effects of Tuning Parameters in KNN and Regression
No ratings yet
Effects of Tuning Parameters in KNN and Regression
15 pages
Ml2 Summary
No ratings yet
Ml2 Summary
8 pages
9 - Linear Regression-Problems and Solutions
No ratings yet
9 - Linear Regression-Problems and Solutions
23 pages
ML Cheat
No ratings yet
ML Cheat
9 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
18 pages
Lasso Regression in Logistic Models
No ratings yet
Lasso Regression in Logistic Models
43 pages
Murad: Regularization
No ratings yet
Murad: Regularization
5 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
No ratings yet
Simple Linear Regression Definition: Two Variables Independent Variable Dependent Variable Equation
9 pages
Mod 3
No ratings yet
Mod 3
9 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Understanding Bias, Variance, and Regularization
No ratings yet
Understanding Bias, Variance, and Regularization
33 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
Honours 1
No ratings yet
Honours 1
5 pages
Term Paper
No ratings yet
Term Paper
10 pages
Chap 8
No ratings yet
Chap 8
9 pages
IOT Report
No ratings yet
IOT Report
2 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
53 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Ridge vs Lasso Regression Guide
No ratings yet
Ridge vs Lasso Regression Guide
5 pages
PS Notes (Machine Learning
No ratings yet
PS Notes (Machine Learning
14 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
Document
No ratings yet
Document
6 pages
Geography Project Rotation Revolution
No ratings yet
Geography Project Rotation Revolution
3 pages
Soln Endsem 21 2
No ratings yet
Soln Endsem 21 2
8 pages
Jntua University Previous Question Papers: Dept., of E.C.E, RCEW
No ratings yet
Jntua University Previous Question Papers: Dept., of E.C.E, RCEW
3 pages
Fisher Pierce Fishpif00001 10-2-6
No ratings yet
Fisher Pierce Fishpif00001 10-2-6
6 pages
Financial Modelling Syllabus
No ratings yet
Financial Modelling Syllabus
3 pages
1832FX P0556 S en
100% (1)
1832FX P0556 S en
5 pages
PCNSE7 60q v2
No ratings yet
PCNSE7 60q v2
16 pages
Research Module 2 REVISED
No ratings yet
Research Module 2 REVISED
7 pages
Lab Report 4: Protein Analysis Sbl1023 Group A Technique in Biology and Biochemistry Laboratory SEM 2 2022/2023
No ratings yet
Lab Report 4: Protein Analysis Sbl1023 Group A Technique in Biology and Biochemistry Laboratory SEM 2 2022/2023
5 pages
Grade 5 Mathematics
No ratings yet
Grade 5 Mathematics
6 pages
Simplifying Radicals
No ratings yet
Simplifying Radicals
4 pages
Scaling Conversion Reference
No ratings yet
Scaling Conversion Reference
1 page
Bifilar Suspension Mass Influence Study
No ratings yet
Bifilar Suspension Mass Influence Study
17 pages
Symbolic Math Toolbox™ User's Guide PDF
No ratings yet
Symbolic Math Toolbox™ User's Guide PDF
1,276 pages
Slides - 09 - CH 9 - Network Theorems
No ratings yet
Slides - 09 - CH 9 - Network Theorems
42 pages
Brochure APC Modular and High Density Cooling
No ratings yet
Brochure APC Modular and High Density Cooling
20 pages
Memory Management Policies: Unix: The Design of The Unix Operating System Maurice J. Bach Prentice Hall
No ratings yet
Memory Management Policies: Unix: The Design of The Unix Operating System Maurice J. Bach Prentice Hall
22 pages
Simpack Rail-Summary
No ratings yet
Simpack Rail-Summary
31 pages
Volumetric-Lubricate-Stripping-Bullheading Technics
No ratings yet
Volumetric-Lubricate-Stripping-Bullheading Technics
38 pages
DOL Starter Ladder Program Guide
No ratings yet
DOL Starter Ladder Program Guide
4 pages
Next-Gen BIM Solution for Civil Engineering
No ratings yet
Next-Gen BIM Solution for Civil Engineering
18 pages
High Voltage Power Transmission Systems
No ratings yet
High Voltage Power Transmission Systems
43 pages
Mechanics For Risers
No ratings yet
Mechanics For Risers
13 pages
Math Activity 1
No ratings yet
Math Activity 1
3 pages
Lesson 3 The Slope of A Tangent Line
No ratings yet
Lesson 3 The Slope of A Tangent Line
15 pages
Corrosion Engineering Overview
No ratings yet
Corrosion Engineering Overview
80 pages
Teradata DBSControl Settings Guide
No ratings yet
Teradata DBSControl Settings Guide
2 pages
MVE 200/15E-30A0 (EE40020030A0JA0000) : 3 PH - 4 Poles - 1500 RPM - 220-240/380-415 V - 50 HZ
No ratings yet
MVE 200/15E-30A0 (EE40020030A0JA0000) : 3 PH - 4 Poles - 1500 RPM - 220-240/380-415 V - 50 HZ
1 page
Q1 WS1 W1 History of Computers
No ratings yet
Q1 WS1 W1 History of Computers
4 pages
CBPRPlus Advance SR2025 GRACE PERIOD (Hybrid Posta CBPRPlus-pacs 008 001 08 FIToFICustomerCreditTransfer 20231221 1413
No ratings yet
CBPRPlus Advance SR2025 GRACE PERIOD (Hybrid Posta CBPRPlus-pacs 008 001 08 FIToFICustomerCreditTransfer 20231221 1413
193 pages

Unit-2 Sem II Notes

Uploaded by

Unit-2 Sem II Notes

Uploaded by

Unit-2

Advance Supervised algorithms

1. Ridge Regression is a method used to improve linear regression models.

We are building a model to predict house prices based on features like:

 Size of the house (in square feet)

 Total rooms, bedrooms, and bathrooms are highly correlated.

 We decide to use a very complex model — say, a polynomial regression of

How to Fix it?

1. Use a simpler model (e.g., linear regression instead of polynomial of degree 6)

Let’s Focus on Ridge……

 Ridge Regression helps by simplifying the model and reducing the

The loss function for Ridge Regression is:

 λ = 0 → Ridge becomes ordinary linear regression (no regularization)

Step-2 Apply Linear Regression-

Step-3 Problem — Multicollinearity

 Bedrooms and Total Rooms are highly correlated.

Step 4: Evaluate the Model

Negative price! This is not realistic.

This shows the instability of coefficients due to multicollinearity.

Step 6: Apply Ridge Regression

Let's set λ=10

Step 7: Ridge Coefficients-

Step 8: Build model

Step 2: Apply Simple Linear Regression

For Student ‘C’

Let’s say the actual score was 92

Decision tree classifier-

 Root Node: The top decision node in a tree.

Splitting Criteria in Decision Trees-

 Measures the amount of disorder or uncertainty in the data.

 Pruning is a way to simplify the decision tree by removing extra

 It Improves accuracy on new (unseen) data by focusing on general

Step 3: Calculate Information Gain for each feature

Values: Sunny, Overcast, Rain

 5 samples: [No, No, No, Yes, Yes] → 2 Yes, 3 No

 4 samples: All Yes → Entropy = 0 (pure class)

Values: High, Normal

 High: [No, No, Yes, Yes, No, Yes] → 3 Yes, 3 No

Sunny Overcast Rain

Step-4 Split 'Sunny' Branch Further

Overcast: All Yes → Leaf node

Step-7 Split 'Sunny' Branch Further

Step-9 Calculate Information Gain for Each Feature

Random forest classifier-

 Ensemble Learning: Combines predictions from multiple models to improve accuracy

Difference Between Decision Tree and Random Forest-

Decision trees Random Forest

Majority Vote → Final Prediction: YES (Play Tennis)

Supervised model optimization Techniques-

Some of common supervised model optimization techniques: -

 Algorithm selection and tuning

 Model evaluation and refinement

After making a hypothesis(Guess) with initial parameters, we calculate the Cost

Here’s the mathematical representation for it:

What is Gradient Descent?

Descent- means going downward or decreasing.

 Gradient descent is an optimization algorithm commonly used in machine learning to

1. Start with an initial guess for the model's parameters (weights).

Two important points-

 Which direction to go (downhill — the negative gradient)

for one parameter W-

Step-by-Step: Simple Linear Regression with Gradient Descent:-

Step 1:- Dataset:-

Step 4: Compute Gradients-

You might also like