0% found this document useful (0 votes)

49 views74 pages

Module 2-Supervised Learning

The document provides an overview of supervised learning, focusing on linear regression and logistic regression. It explains the concepts of simple and multiple linear regression, including how to derive the regression line and predict outcomes, as well as the use of logistic regression for binary classification problems. Additionally, it discusses performance measures, cost functions, and optimization techniques like gradient descent for improving model accuracy.

Uploaded by

kush tejani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views74 pages

Module 2-Supervised Learning

Uploaded by

kush tejani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supervised Learning

Linear Regression
Module 2
Linear Regression
Simple Linear Regression
• Simple linear regression is when you want to predict
values of one variable, given values of another variable.
For example, you might want to predict a person's height
(in inches) from his weight (in pounds).
• Imagine a sample of ten people for whom you know their
height and weight. You could plot the values on a graph, with
weight on the x axis and height on the y axis.
Linear Regression

• If there were a perfect linear relationship between height

and weight, then all 10 points on the graph would fit on a
straight line. But, this is never the case (unless your data
are rigged).
• If there is a (nonperfect) linear relationship between
height and weight (presumably a positive one), then you
would get a cluster of points on the graph which slopes
upward. In other words, people who weigh a lot should
be taller than those people who are of less weight. (See
graph.)
Linear Regression
Linear Regression
• The purpose of regression analysis is to come up with an
equation of a line that fits through that cluster of points
with the minimal amount of deviations from the line.
• The deviation of the points from the line is called
"error." Once you have this regression equation, if you
knew a person's weight, you could then predict their
height.
• Simple linear regression is actually the same as a
bivariate correlation between the independent and
dependent variable.
Linear Regression
• After verifying that the linear correlation between two variables is significant,
next we determine the equation of the line that can be used to predict the value of
y for a given value of x.
For a given x-value,
d = (observed y-value) – (predicted y-value)

• Each data point di represents the difference between the observed y-value and the
predicted y-value for a given x- value on the line. These differences are called residuals.
Regression Line
A regression line, also called a line of best fit, is the line for which the sum of the
squares of the residuals is a minimum.

The Equation of a Regression Line

The equation of a regression line for an independent variable
x and a dependent variable y is
ŷ = mx + b
where ŷ is the predicted y-value for a given x-value. The
slope m and y-intercept b are given by,
Regression Line

Example:
Find the equation of the regression line.
x y xy x
2
y
2

1 –3 –3 1 9
2 –1 –2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
∑ x = 15 ∑ y = −1 ∑ xy = 9 ∑ x 2 = 55 ∑ y 2 = 15
Regression Line

Step 1 : Calculate m
Regression Line
Regression Line
Regression Line
Example:
The following data represents the number of hours 12 different students watched television
during the weekend and the scores of each student who took a test the following Monday.
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score
for a student who watches 9 hours of TV.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
2 0 1 4 9 9 25 25 25 36 49 49 100
x
2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
y
Regression Line
Example:
The following data represents the number of hours 12 different students watched television
during the weekend and the scores of each student who took a test the following Monday.
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score
for a student who watches 9 hours of TV.

Hours, x 0 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
2 0 1 4 9 9 25 25 25 36 49 49 100
x
2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
y
Performance Measures
Performance metrics for linear regression help evaluate how well the model fits the
data. Here are the most commonly used ones:

yi is the ith observation, and

yi^is the estimated value of yi.

Multiple regression Methods

In many instances, a better prediction can be found for a dependent (response)

variable by using more than one independent (explanatory) variable.
For example, a more accurate prediction of Monday’s test grade from the previous section
might be made by considering the number of other classes a student is taking as well as the
student’s previous knowledge of the test material.
A multiple regression equation has the form
ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk
where x1, x2, x3,…, xk are independent variables, b is the y-intercept, and y is the
dependent variable.
Performance Measures
After finding the equation of the multiple regression line, you can use the equation to
predict y-values over the range of the data.

Example:
The following multiple regression equation can be used to predict the annual U.S. rice yield
(in pounds).

ŷ = 859 + 5.76x1 + 3.82x2

where x1 is the number of acres planted (in thousands), and x2 is the number of acres
harvested (in thousands).
a.)Predict the annual rice yield when x1 = 2758, and x2 = 2714. b.) Predict the
annual rice yield when x1 = 3581, and x2 = 3021.
Performance Measures
Example continued:

a.) ŷ = 859 + 5.76x1 + 3.82x2

= 859 + 5.76(2758) + 3.82(2714)
= 27,112.56
The predicted annual rice yield is 27,1125.56 pounds.

b.) ŷ = 859 + 5.76x1 + 3.82x2

= 859 + 5.76(3581) + 3.82(3021)
= 33,025.78
The predicted annual rice yield is 33,025.78 pounds
Logistic Regression
• Logistic Regression is a “Supervised machine learning” algorithm that can be
used to model the probability of a certain class or event. It is used when the data
is linearly separable and the outcome is binary.
• Logistic regression is usually used for Binary classification problems.
• let us try if we can use linear regression to solve a binary class classification
problem. Assume we have a dataset that is linearly separable and has the output
that is discrete in two classes (0, 1).
Logistic Regression
Logistic Regression
How does Logistic Regression Work?
Consider we have a model with one predictor “x” and one response variable “ŷ” and
p is the probability of ŷ=1. The linear equation can be written as:
p = b0+b1x --------> eq 1
• The right-hand side of the equation (b0+b1x) is a linear equation and can hold
values that exceed the range (0,1). But we know probability will always be in the
range of (0,1).
• To overcome that, we predict odds instead of probability.
• Odds: The ratio of the probability of an event occurring to the probability of an
event not occurring.
• Odds = p/(1-p)
• The equation 1 can be re-written as:
• p/(1-p) = b0+b1x --------> eq 2
• ln(p/(1-p)) = b0+b1x --------> eq 3
Logistic Regression
• To recover p from equation 3, we apply exponential on both sides.
exp(ln(p/(1-p))) = exp(b0+b1x)
eln(p/(1-p)) = e(b0+b1x)
• p = (1-p) * e(b0+b1x) p = e(b0+b1x)- p * e(b0+b1x)
• Taking p as common on the right-hand side
• p = p * ((e(b0+b1x))/p - e(b0+b1x)) p = e(b0+b1x) / (1 + e(b0+b1x))
• Dividing numerator and denominator by e(b0+b1x) on the right-hand side
• p = 1 / (1 + e-(b0+b1x))
• Similarly, the equation for a logistic model with ‘n’ predictors is as below:p =
1/ (1 + e-(b0+b1x1+b2x2+b3x3+----+bnxn)
• The right side part looks familiar, isn’t it? Yes, it is the sigmoid function. It
helps to squeeze the output to be in the range between 0 and 1.
Sigmoid Function
• The sigmoid function is useful to map any predicted values of probabilities into another
value between 0 and 1.
Logistic Regression

• Linear model: ŷ = b0+b1x

Sigmoid function: σ(z) = 1/(1+e−z)
Logistic regression model:
ŷ = σ(b0+b1x) = 1/(1+e-(b0+b1x))
Logistic Regression
E.g. 1. Construct a logistic regression model with two predictors
Logistic Regression

Calculate Accuracy using above confusion matrix

Maximum Likelihood Estimation

But how would you be sure that the given sigmoid curve is the best sigmoid curve?
The logistic regression does this by estimating the optimal values for the coefficients that
maximize the likelihood of the observed data.
This process is called Maximum Likelihood Estimation.

Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the parameters of
a probability distribution that best explains the observed data.
Mathematically, the likelihood function L(θ) is defined as the product of the individual probabilities of
each data point given the parameters (derived from Bernoulli Distribution) or you can say it as the joint
probability of independent events.
Logistic Regression

So, the best-fitting combination of β0, β1… βn will be the one that maximizes this product.

Log Likelihood
Considering the values of probability is very dense that lies between 0 and 1 doing a product would not be mathematical
easier and feasible to do.
However, taking the logarithm allows transforming the product of probabilities in the likelihood function into a sum of
logarithms, which is generally easier to work with and we called this Log Likelihood. Our target is to maximize it, such that
we can get the best coefficients value.
Logistic Regression

Negatibe Log Likelihood

However, again taking the logarithm of probabilities value which lies between 0 and 1 will
always return negative values.

So, in order to have positive values we multiply it with -1 and by doing so we get the negative
log-likelihood.

Cost Function

In logistic regression, we consider the negative log-likelihood as the cost function, and it is also called a
cross-entropy function.
Logistic Regression
Cross entropy is a loss function that is used to quantify the difference between two probability distributions. It is
a measure of how well one distribution predicts another distribution.
So, if it is minimum then it means that the true and the predicted probabilities distributions are close to each
other and that is what we always aim for.
To optimize the parameters in logistic regression, iterative optimization algorithms like gradient descent are
commonly used. These algorithms may converge to a local minimum, but there is no guarantee that the found
solution is the global minimum.
Logistic Regression
In linear regression, we use mean squared error (MSE) as the cost function. But in logistic regression, using the mean of the
squared differences between actual and predicted outcomes as the cost function might give a wavy, non-convex solution;
containing many local optima:
In this case, finding an optimal solution with the gradient
descent method is not possible.
Instead, we use a logarithmic function to represent the
cost of logistic regression.
It is guaranteed to be convex for all input values, containing
only one minimum, allowing us to run the gradient descent
algorithm.
When dealing with a binary classification problem, the
logarithmic cost of error depends on the value of y. We can
define the cost for two cases separately:
Logistic Regression
Gradient Descent Algorithm
Gradient descent is an iterative optimization algorithm, which finds the minimum of a differentiable function. In
this process, we try different values and update them to reach the optimal ones, minimizing the output.

This way, we can find an optimal solution minimizing the cost over model parameters:
Gradient descent Algorithm for Logistic Regression

xx (Exam Score) yy (Admitted=1)

2 0

4 0

6 1

8 1
Gradient descent Algorithm
Gradient descent Algorithm

Iteration β₀ β₁ grad_b0 grad_b1

1 0.0000 0.1000 0.0000 -1.0000

2 -0.0121 0.1278 0.1210 -0.2780

3 -0.0270 0.1389 0.1490 -0.1110

4 -0.0428 0.1447 0.1575 -0.0581

5 -0.0588 0.1486 0.1602 -0.0392

Gradient descent Algorithm
For linear regression
Gradient descent Algorithm
Gradient descent Algorithm
Gradient descent Algorithm with logistic regression

We have one feature (e.g. a normalized tumor measurement) and a binary label:
xxx = feature (continuous)
y∈{0,1}y\in\{0,1\}y∈{0,1} where 1 = malignant, 0 = benign

x y

0.5 0

1.0 0

1.5 1

2.0 1

2.5 1
Gradient descent Algorithm with logistic regression

We have one feature (e.g. a normalized tumor measurement) and a binary label:
xxx = feature (continuous)
y∈{0,1}y\in\{0,1\}y∈{0,1} where 1 = malignant, 0 = benign

x y
0.5 0
1.0 0
1.5 1
2.0 1
2.5 1
Gradient descent Algorithm with logistic regression
Gradient descent Algorithm with logistic regression
Gradient descent Algorithm
Optimization
• In machine learning, optimizers and loss functions are two components that help
improve the performance of the model.
• A loss function measures the performance of a model by measuring the difference
between the output expected from the model and the actual output obtained from
the model.
• The optimizer helps improve the model by adjusting its parameters to minimize the
loss function value.
• The role of the optimizer is to find the best set of parameters (weights and biases)
of the neural network that allow it to make accurate predictions.
• Optimization is at the heart of almost all machine learning techniques. Choosing
the best element from some set of available alternatives.
Optimization
• The computational method for iterative optimization technique can be broadly divided in
three types.
• Zero Order or Direct Search: These involve exploring a range of potential values for the
variable x to find the minimum of the objective function
• First-order or Gradient Methods: These techniques make use of the first-order partial
derivatives. e.g. gradient descent
• Second Order Methods: These techniques make use of the second-order partial
derivatives: Newton Method
or
• Derivative based optimization- Steepest Descent, Newton metho

• Derivative free optimization- Random Search, Down Hill Simplex

Optimization
Optimization is the process of finding the best solution from all possible solutions to a problem, usually by maximizing
or minimizing an objective function.
Mathematically:

Types of Optimization Methods:

1. Derivative Based Methods
Gradient Descent
Newton’s Method

1. Derivative Free methods

Grid Search
Random Search, Downhill Simplex methods
Optimization
General Optimization Algorithm Structure
1. Initialize variables or population.
2. Evaluate the objective function for current solutions.
3. Update parameters using search strategy (gradient step, mutation, exploration, etc.).
4. Check stopping criteria (max iterations, convergence, tolerance).
5. Return the best solution.

Example Applications
● Engineering: Minimize material cost while keeping strength.

● Machine Learning: Minimize loss function to improve model accuracy.

● Operations Research: Maximize profits with limited resources.

● Networking: Minimize latency or maximize throughput.

Basic element of Optimization
Basic elements of optimization: There are three basic elements of any optimization
problem

Variables: These are the free parameters which the algorithm can tune

Constraints: These are the boundaries within which the parameters (or some
combination thereof) must fall

Objective function: This is the set of goal towards which the algorithm drives the
solution. For machine learning, often this amount to minimizing some error measure or
maximizing some utility function.
Steepest Descent method

Derivative based optimization deals with gradient-based optimization techniques, capable of

determining search directions according to an objective function’s derivative information.

1. Steepest Descent: In this method, the search starts from an initial trial point X1, and iteratively
moves along the steepest descent directions until the optimum point is found. Although, the method
is straightforward, it is not applicable to the problems having multiple local optima. In such cases
the solution may get stuck at local optimum points.
Newton’s Method

Newton method: Newton's method is a very popular method which is based on Taylor's Series
expansion.
Basic Concept of Newton Method
Let us understand Newton method for root finding , then we will see how it can be used for
optimization.
Problem : We have a function f(x). The goal is to find root of f(x) i.e., f(x) = 0.
Initial Guess : We make a initial guess for the root Xo.
Update: We update the current guess to get a new estimate using the formula……

The method proceeds iteratively repeat above step till we reach a predefined tolerance or a
certain number of iterations.
Newton’s Method

Newton's method is a very popular method which is based on Taylor's Series expansion. The Taylor's
Series expansion of a function
Newton Method for optimization

Newton Method For Optimization

Derivative free optimization
• Derivative free optimization algorithms are often used when it is difficult to find function
derivatives, or if finding such derivatives are time consuming.
• Derivative-free optimization is a discipline in mathematical optimization that does not
use derivative information in the classical sense to find optimal solutions.
• Sometimes information about the derivative of the objective function f is unavailable, unreliable or
impractical to obtain.
• For example, f might be non- smooth, or time-consuming to evaluate, or in some way noisy, so
that methods that rely on derivatives or approximate them via finite differences are of little use.
• The problem to find optimal points in such situations is referred to as derivative-free optimization .
Random Search
Random Search: This method generates trial solutions for the optimization model using random number
generators for the decision variables.
Random search method includes random jump method, random walk method and random walk method with
direction exploitation.
Random jump method generates huge number of data points for the decision variable assuming a uniform
distribution for them and finds out the best solution by comparing the corresponding objective function values.
Random walk method generates trial solution with sequential improvements which is governed by a scalar step
length and a unit random vector.
The random walk method with direct exploitation is an improved version of random walk method, in which, first
the successful direction of generating trial solutions is found out and then maximum possible steps are taken along
this successful direction.
Methods of Random search
Random search
Random Search
Random Walk Method
Random Walk Method with direction
Exploitation
Single feature Logistic Regression
Simplex

In geometry, a simplex (plural: simplexes or simplices) is a generalization

of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex
is so-named because it represents the simplest possible polytope in any
given dimension. For example,
a 0-dimensional simplex is a point,
a 1-dimensional simplex is a line segment,
a 2-dimensional simplex is a triangle,
a 3-dimensional simplex is a tetrahedron, and
a 4-dimensional simplex is a 5-cell.
Downhill Simplex
Simplex Method: Simplex method is a conventional direct search algorithm where the best
solution lies on the vertices of a geometric figure in N-dimensional space made of a set of N+1
points.
The method compares the objective function values at the N+1 vertices and moves towards
the optimum point iteratively.
The movement of the simplex algorithm is achieved by reflection, contraction and expansion.

The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope
method) is a numerical method used to find the minimum or maximum of an objective
function in a multidimensional space.
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex
Simplex

Linear Regression
No ratings yet
Linear Regression
7 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
LAB04 RegressionTasks
No ratings yet
LAB04 RegressionTasks
31 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
UNIT 2 Machine Learning BCAI601BCDS062
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062
244 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
34 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
7 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
21 pages
Regression: Machine Learning Algorithms
No ratings yet
Regression: Machine Learning Algorithms
5 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Machine Learning for Data Analysts
No ratings yet
Machine Learning for Data Analysts
201 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
13 pages
Linear Regression for Academics
No ratings yet
Linear Regression for Academics
28 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Linearregression-Rupak
No ratings yet
Linearregression-Rupak
32 pages
Unit-2: Machine Learning Techniques (KCS-055) Module-2
No ratings yet
Unit-2: Machine Learning Techniques (KCS-055) Module-2
199 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Logistic Regression Example Explained
No ratings yet
Logistic Regression Example Explained
45 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Unit III
No ratings yet
Unit III
18 pages
Linear Regression
No ratings yet
Linear Regression
32 pages
R Language for Regression Analysis
No ratings yet
R Language for Regression Analysis
144 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Regression
No ratings yet
Regression
60 pages
Linear Regression Formula: Get Free Access To Lakhs of
No ratings yet
Linear Regression Formula: Get Free Access To Lakhs of
11 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Unit III
No ratings yet
Unit III
11 pages
Unit III
No ratings yet
Unit III
24 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
10.introduction To Artificial Intelligence
No ratings yet
10.introduction To Artificial Intelligence
25 pages
Regression Analysis Linear Multiple Logistic
No ratings yet
Regression Analysis Linear Multiple Logistic
25 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
19 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
13 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
18 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Updated Lecture 7
No ratings yet
Updated Lecture 7
29 pages
Lec 6
No ratings yet
Lec 6
19 pages
Understanding Simple Regression Analysis
100% (1)
Understanding Simple Regression Analysis
8 pages
Regression Analysis Concepts
No ratings yet
Regression Analysis Concepts
36 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Numerical Methods for Root Finding
No ratings yet
Numerical Methods for Root Finding
46 pages
Knapsack Problem Using Greedy Algorithm
No ratings yet
Knapsack Problem Using Greedy Algorithm
4 pages
Mathematics Notes: Matrices & Equations
60% (5)
Mathematics Notes: Matrices & Equations
234 pages
Chapter 2
No ratings yet
Chapter 2
11 pages
Vehicle Routing Problem Insights
No ratings yet
Vehicle Routing Problem Insights
9 pages
Numerical Analysis Cheat Sheet
No ratings yet
Numerical Analysis Cheat Sheet
2 pages
Polynomial Long Division Practice
No ratings yet
Polynomial Long Division Practice
44 pages
Numerical Methods Exam Paper
No ratings yet
Numerical Methods Exam Paper
7 pages
Class IX Polynomials Assignment
No ratings yet
Class IX Polynomials Assignment
3 pages
Linear Equation Solution Methods
No ratings yet
Linear Equation Solution Methods
4 pages
Transportation Optimization Guide
0% (1)
Transportation Optimization Guide
26 pages
OR Chapter 2
No ratings yet
OR Chapter 2
73 pages
QB Mse Nmme 2024-25
No ratings yet
QB Mse Nmme 2024-25
3 pages
Term Paper Approximation
No ratings yet
Term Paper Approximation
48 pages
Lobatto Quadrature PDF
No ratings yet
Lobatto Quadrature PDF
18 pages
24-IDL-EE 466-UNIT 7-Brief Overview of Optimization Theory PDF
No ratings yet
24-IDL-EE 466-UNIT 7-Brief Overview of Optimization Theory PDF
24 pages
Partial Fractions - Types, Formulas, Examples and Solutions
No ratings yet
Partial Fractions - Types, Formulas, Examples and Solutions
2 pages
Numerical Solution of Non-Linear Equations (Root Finding Method)
100% (1)
Numerical Solution of Non-Linear Equations (Root Finding Method)
4 pages
RMS Factorisation Worksheet
No ratings yet
RMS Factorisation Worksheet
5 pages
Systems of Equations Worksheets
No ratings yet
Systems of Equations Worksheets
2 pages
Practice Questions Solution Lecture 1-3
No ratings yet
Practice Questions Solution Lecture 1-3
3 pages
Assigment
No ratings yet
Assigment
11 pages
IVP and BVP Methods in ODEs
No ratings yet
IVP and BVP Methods in ODEs
34 pages
Skull Crusher 10 Class XII JEE (Adv) Mathematics
No ratings yet
Skull Crusher 10 Class XII JEE (Adv) Mathematics
5 pages
MCSC202 Theory Chap 6 Lec 2
No ratings yet
MCSC202 Theory Chap 6 Lec 2
72 pages
Sytem of Linear Equations - : Phan Thi Khanh Van
No ratings yet
Sytem of Linear Equations - : Phan Thi Khanh Van
14 pages
Numerics Hawassa Num Chap 6 Edited
No ratings yet
Numerics Hawassa Num Chap 6 Edited
11 pages
Transportation Problem Overview
100% (1)
Transportation Problem Overview
134 pages
Tdma
No ratings yet
Tdma
2 pages
Assignment Class X (2019-20) 1 Mark Questions:: Polynomials
No ratings yet
Assignment Class X (2019-20) 1 Mark Questions:: Polynomials
5 pages

Module 2-Supervised Learning

Uploaded by

Module 2-Supervised Learning

Uploaded by

Supervised Learning

• If there were a perfect linear relationship between height

The Equation of a Regression Line

yi is the ith observation, and

yi^is the estimated value of yi.

In many instances, a better prediction can be found for a dependent (response)

ŷ = 859 + 5.76x1 + 3.82x2

a.) ŷ = 859 + 5.76x1 + 3.82x2

b.) ŷ = 859 + 5.76x1 + 3.82x2

• Linear model: ŷ = b0+b1x

Calculate Accuracy using above confusion matrix

Negatibe Log Likelihood

xx (Exam Score) yy (Admitted=1)

Iteration β₀ β₁ grad_b0 grad_b1

1 0.0000 0.1000 0.0000 -1.0000

2 -0.0121 0.1278 0.1210 -0.2780

3 -0.0270 0.1389 0.1490 -0.1110

4 -0.0428 0.1447 0.1575 -0.0581

5 -0.0588 0.1486 0.1602 -0.0392

• Derivative free optimization- Random Search, Down Hill Simplex

Types of Optimization Methods:

1. Derivative Free methods

● Machine Learning: Minimize loss function to improve model accuracy.

● Operations Research: Maximize profits with limited resources.

● Networking: Minimize latency or maximize throughput.

Derivative based optimization deals with gradient-based optimization techniques, capable of

Newton Method For Optimization

In geometry, a simplex (plural: simplexes or simplices) is a generalization

You might also like