0% found this document useful (0 votes)
28 views56 pages

Lecture 3

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views56 pages

Lecture 3

Uploaded by

Uzma Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 3: Linear Neural Network and Linear

Regression: Part 1

Md. Shahriar Hussain


ECE Department, NSU

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain
Linear Neural Networks (LNNs)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 2
Linear Neural Networks (LNNs)

• The neuron aggregates the


weighted input data.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 3
Linear Neural Networks (LNNs)

• There can be two different types of Linear Neural Networks


– Regression Problem
– Classification Problem

Classification
Regression Problem
Problem

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 4
Linear Neural Networks (LNNs)

• For Regression,
– There will be only aggregation
– No activation function is needed.

North South University Source: Andrew NG Lectures CSE445 5


Linear Neural Networks (LNNs)

• For Regression Problem, we need to


– Cast Linear Regression Technique as a LNN model
• For Classification Problem, we need to
– Cast Logistic and Softmax Regression Technique as a LNN model

Regression Problem Classification


Problem

North South University Source: Andrew NG Lectures CSE445 6


What is Linear Regression

• Linear regression is defined as an algorithm that provides a linear


relationship between an independent variable and a dependent variable to
predict the outcome of future events

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 7
Linear Regression Example

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 8
Linear Regression Example

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 9
Linear Regression Example

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 10
Linear Regression Example

A Line of best
fit/Regression Line is
a straight line that
represents the best
approximation of a
scatter plot of data
points

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 11
Linear Regression Example

Estimated/Predicted value (𝑦/𝑦) Actual/True value (𝑦)/Ground Truth

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 12
Data Set Description

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 13
Data Set Description

x (1) = 2104
x (2) = 1416
y (1) = 460
(x, y)= One Training Example
(x (i), y (i))= ith Training example y (2) = 232

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 14
Hypothesis

Training Set

Learning Algorithm

Size of house h Estimated


New/unseen price
data (x) hypothesis 𝑦 𝑥
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 15
Hypothesis

• How do we represent h ?

𝑦 𝑥 =

𝜃0 and 𝜃1 : parameters/weights that will be


trained/determined by the ML model
Not hyperparameters Linear regression with one variable.
𝜃0 = intercept/bias/constant Univariate linear regression.
𝜃1 = slope/coefficient/gradient

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 16
Hypothesis

The goal is to choose Ө0 and Ө1 properly so that hӨ(x) is close to y.

• A cost function lets us figure out how to fit the best straight line to our data

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 17
Hypothesis

Size in feet2 (x) Price ($) in 1000's (y)


2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 18
Example

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 19
Cost Function

minimize
Ө0 Ө1

• We need to choose Ө0 and Ө1 in a way that the result of the function will be minimized for all
m training example. This equation is called cost function.
J(Ө0 , Ө1)=
minimize
Ө0 Ө1 J(Ө0 , Ө1)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 20
Cost Function

Cost Function:

Goal:

• Here the cost function is called Squared Error cost function


• Minimize squared different between predicted house price and actual house
price
• 1/m - means we determine the average
• 1/2m the 2 makes the math a bit easier, and doesn't change the constants we
determine at all (i.e., half the smallest value is still the smallest value!)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 21
Cost Function Calculation

• For simplifications, assumes θ0 = 0

Find best values of θ1 so that J(θ1) is minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 22
Cost Function Calculation

3
3

2
2

1
1

0
0 -0.5 0 0.5 1 1.5 2 2.5
0 1 2 3
For, θ1 = 1
J(θ1) = 1/2*3 [0+0+0]=0
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 23
Cost Function Calculation

For, θ1 = 0.5
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 24
Cost Function Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 25
Cost Function Calculation

For, θ1 = 0
J(θ1) = ?
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 26
Cost Function Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 27
Cost Function Calculation

• If we compute a range of values plot


 J(θ1) vs θ1 we get a polynomial
(looks like a quadratic)
• The optimization objective for the learning
algorithm is find the value of θ1 which
minimizes J(θ1)
 So, here θ1 = 1 is the best value
for θ1

 The line which has the least sum


of squares of errors is the best fit
line

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 28
Important Equations

Hypothesis:

Parameters:

Cost Function:

Goal:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 29
Cost Function for two parameters

(for fixed , this is a function of x) (function of the parameters )

500

400

300
Price ($)
200
in 1000’s
100

0
0 500 1000 1500 2000 2500 3000

Size in feet2 (x)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 30
Cost Function for two parameters

• Previously we plotted our cost function by plotting


– θ1 vs J(θ1)
• Now we have two parameters
– Plot becomes a bit more complicated
– Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 31
Cost Function for two parameters

• We can see that the height


(y) of the graph indicates the
value of the cost function,
• we need to find where y is at
a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 32
Cost Function for two parameters

• A contour plot is a graphical technique for representing a 3-dimensional surface by


plotting constant z slices, called contours, on a 2-dimensional format

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 33
Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 34
Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 35
Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 36
Cost Function for two parameters

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 37
Gradient descent

• We want to get min J(θ0, θ1)


• Gradient descent
– Used all over machine learning for minimization

• Outline:

• Start with some

• Keep changing to reduce until we hopefully


end up at a minimum

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 38
Gradient descent

 Start with initial guesses


 Start at 0,0 (or any other value)
 Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
 Each time you change the parameters, you select the gradient which
reduces J(θ0,θ1) the most possible
 Repeat
 Do so until you converge to a local minimum
 Has an interesting property
 Where you start can determine which minimum you end up
 Here we can see one initialization point led to one local minimum
 The other led to a different one

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 39
Gradient descent

• One initialization point led to one local minimum.


The other led to a different one
North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 40
Gradient Descent Algorithm
• Gradient descent is used to minimize the MSE by
calculating the gradient of the cost function

Correct: Simultaneous update Incorrect:

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 41
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 42
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 43
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 44
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 45
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 46
Gradient Descent Algorithm

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 47
Learning Rate

• Here, α is the learning rate, a hyperparameter


• It controls how big steps we made
• If α is small, we will take tiny steps
• If α is big, we have an aggressive gradient descent

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 48
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 49
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 50
Learning Rate

If α is too small, gradient descent


can be slow.
Higher training time

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 51
Local Minima

• Local minimum: value of the loss function is minimum at that point in a local
region.
• Global minima: value of the loss function is minimum globally across the
entire domain the loss function

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 52
Local Minima

at local minima

Global minima

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 53
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 54
Gradient Descent Calculation

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 55
• Reference:
– Andrew NG Lectures on Machine Learning, Standford University

North South University Source: Andrew NG Lectures CSE465 Md. Shahriar Hussain 56

You might also like