0% found this document useful (0 votes)
9 views8 pages

C1 Supervised Machine Learning Week 3

This document covers the fundamentals of classification in supervised machine learning, focusing on logistic regression as a solution to the limitations of linear regression for binary classification tasks. It discusses the importance of cost functions, regularization techniques to prevent overfitting, and the implementation of gradient descent for optimizing logistic regression models. Additionally, it highlights the significance of finding a balance between model complexity and generalization to new data.

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

C1 Supervised Machine Learning Week 3

This document covers the fundamentals of classification in supervised machine learning, focusing on logistic regression as a solution to the limitations of linear regression for binary classification tasks. It discusses the importance of cost functions, regularization techniques to prevent overfitting, and the implementation of gradient descent for optimizing logistic regression models. Additionally, it highlights the significance of finding a balance between model complexity and generalization to new data.

Uploaded by

Hemesh R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Supervised Machine Learning: Regression and

Classification
Week 3 – CLASSIFICATION
Classification
 Classification problems involve predicting a limited number of
possible outcomes, such as determining if an email is spam (yes or
no) or if a financial transaction is fraudulent (true or false).
 Binary classification refers to problems with only two possible
outputs, often represented as 0 (negative class) and 1 (positive
class).
Limitations of Linear Regression for Classification
 Linear regression
predicts a continuous
range of values, which
is not ideal for
classification tasks
where the output
should be categorical.
 Adding new data points
can shift the decision
boundary
inappropriately, leading to incorrect classifications.
Introduction to Logistic Regression
 Logistic regression is introduced as a more effective algorithm for
binary classification, ensuring outputs remain between 0 and 1.
 Despite its name, logistic regression is used for classification rather
than regression, addressing the limitations of linear regression in
these scenarios.
Optional lab: Classification
– Linear regression
Approach
 The Sigmoid function, defined as ( g(z) = \frac{1}{1 + e^{-z}} ),
transforms the linear combination of features into a probability.
Optional Lab : Logistic Regression - Sigmoid or Logistic
Function

Understanding Logistic
Regression
 The logistic regression model computes outputs in two steps: first,
calculating ( z = w dot x + b ), and then applying the Sigmoid
function ( g(z) ) to obtain the probability that ( y=1) given ( x ).
 A common threshold for making predictions is 0.5; if ( f(x) \geq 0.5 ),
then ( y ) is predicted as 1, otherwise as 0.

Complex Decision Boundaries

 By incorporating polynomial features, logistic regression can model


more complex decision boundaries, such as circles or ellipses,
allowing it to fit intricate data patterns.
 The decision boundary can become non-linear with higher-order
polynomial terms, enabling the model to predict ( y = 1 ) or ( y =
0 ) based on more complex relationships between features.

Optional Lab: Logistic Regression, Decision


Boundary
(Plotting decision boundary)
Cost Functions
 The cost function measures
how well a set of
parameters fits the training
data, guiding the selection
of better parameters.
 The squared error cost
function is not suitable for
logistic regression due to
its non-convex nature,
which can lead to local
minima during optimization.
New Loss Function
 A new loss function is introduced for logistic regression, defined
based on the true label and the predicted probability. The loss
function is designed to be convex, ensuring that gradient descent
can reliably converge to the global minimum.
Analyzing Loss for Different Labels
 When the true label is 1, the loss incentivizes accurate predictions
close to 1, resulting in a low loss value. Conversely, when the true
label is 0, the loss increases significantly as the predicted probability
approaches 1, penalizing incorrect predictions.

Optional lab:
Logistic loss
function
Simplified Loss Function
 The loss function can be expressed as a single equation
Cost Function for Logistic Regression
 The cost function ( J ) is the average loss across the training set
 The derived cost function is commonly used in training logistic
regression and is based on the principle of maximum likelihood
estimation.
 This cost function is convex, which is beneficial for optimization,
ensuring that gradient descent can effectively find the best
parameters.
Optional Lab: Cost Function
for Logistic Regression

Gradient Descent Algorithm


 The gradient descent algorithm updates each parameter by
calculating the derivative of the cost function with respect
to ( w ) and ( b ).
Difference Between Linear and Logistic Regression
 Although the equations for gradient descent in both algorithms
appear similar, they differ in the definition of the function ( f(x) );
logistic regression uses the sigmoid function, while linear regression
Optional uses a linear function.
lab:  Feature scaling can be applied to both algorithms to help gradient
Gradient descent converge faster. Optional lab:
descent
Logistic regression
for logistic
with scikit-learn

Understanding Overfitting and


Underfitting
 Overfitting occurs when a model
learns the training data too well,
capturing noise and fluctuations,
which leads to poor generalization
on new data. This is often
associated with high variance.
 Underfitting happens when a model is too simple to capture the
underlying patterns in the data, resulting in poor performance on
both training and new data. This is linked to high bias.
 The goal in machine learning is to
find a model that is "just right,"
meaning it neither underfits nor
overfits the data. This balance
allows for good generalization to
new examples.
 Techniques like regularization can
help mitigate overfitting, ensuring
that the model remains flexible
enough to capture the data's patterns without becoming overly
complex.
Addressing Overfitting
Collecting More Data
 One effective way to combat overfitting is to
gather more training data, which helps the
learning algorithm fit a less complex function.
 More data allows the model to generalize better,
reducing the likelihood of high variance.
Feature Selection
 If more data isn't available,
consider using fewer features
by selecting only the most
relevant ones for the
prediction task.
 This process, known as
feature selection, can help
prevent the model from
overfitting by reducing
complexity.
Regularization Techniques
 Regularization is another method to address overfitting by shrinking
the values of the model's parameters without eliminating features
entirely.
 This technique allows the model to retain all features while
minimizing their impact, leading to better generalization.

Optional Lab:
Overfitting
Understanding
Regularization
 Regularization aims to keep the parameter values (W1 through WN)
small, which helps in creating a simpler model that is less prone to
overfitting.
 By adding a penalty term to the cost function, such as 1000 times
W3 squared plus 1000 times W4 squared, we encourage smaller
values for certain parameters.

Modified Cost Function


 The modified cost
function includes the
original mean squared
error cost plus a
regularization term, which
penalizes all parameters
(W1 to W100) to keep
them small.
 The regularization
parameter, lambda (λ),
controls the trade-off i = 1 to m and j = 1 to n (note this while studying and
between fitting the green dots valla j sarigga kanapadatledhu, it looks
like i )
training data well and
keeping the parameters small.
Choosing the Right Lambda
 If λ is set to 0, the model may overfit, while a very large λ (e.g.,
10^10) can lead to underfitting by forcing all parameters close to 0.
 The goal is to find a balanced λ that minimizes both the mean
squared error and the regularization term, resulting in a model that
fits the data appropriately.
Gradient Descent Updates
 The goal is to find
parameters w and
b that minimize
this regularized
cost function.
 The gradient
descent algorithm
updates
parameters w and
b using specific
formulas, with the update for w_j now including an additional term
from the regularization.
 The update for b remains unchanged since it is not regularized,
while the update for w_j incorporates the regularization term to
shrink its value.
Intuition Behind Regularization
 Regularization effectively
shrinks the parameters w_j
by multiplying them by a
factor slightly less than 1
during each iteration of
gradient descent.
 This process helps reduce
overfitting, especially when
dealing with many features
and a small training set, leading to improved performance in linear
regression tasks.
Understanding Overfitting in Logistic Regression
 Logistic regression can overfit when using high-order polynomial
features, leading to complex decision boundaries that do not
generalize well to new data.
 Regularization helps mitigate overfitting by adding a penalty term to
the cost function, which discourages large parameter values.

Implementing Regularized Logistic Regression


 The cost function for logistic regression can be modified by adding a
regularization term
 Gradient descent is used to minimize this cost function, with an
additional term included in the update rules for the
parameters ( w_j ).

Optional Lab - Regularized Cost and


Gradient descent for both linear and
logistic regression

LAB ASSIGNMENT :
Logistic Regression :sigmoid function , cost
function, gradient descent, evaluating logistic
regression, regularised logistic regression (cost
function, gradient descent)

You might also like