0% found this document useful (0 votes)
27 views5 pages

Solution Week2

Cv

Uploaded by

Toufeeq HusSain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

Solution Week2

Cv

Uploaded by

Toufeeq HusSain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

◆ Suppose we want to find a linear model 𝑓 𝑥 = 𝜃0 + 𝜃1 𝑥 to fit a dataset

𝐷 𝑥, 𝑦 = 0,1 , 1,2 . We consider the squared error 𝐽 𝜃 =


1 𝑚 2
σ 𝑓 𝑥 𝑗 −𝑦 𝑗 as the cost function, where 𝑚 is the number of data
𝑚 𝑗=1
examples. We use the gradient descent to iteratively update the parameters
𝜃0 and 𝜃1 starting from 𝜃0 = 𝜃1 = 0.

Question 1: Calculate 𝜃0 and 𝜃1 after the first iteration of the update process
in the gradient descent, where the learning rate is set to 0.1.
1 1 𝑚
𝜕𝐽(𝜃) 𝜕(𝑚 σ𝑚 (𝑗) 𝑗 2
𝑗=1(𝑓(𝑥 )−𝑦 ) ) 𝑚
σ𝑗=1 𝜕(𝑓(𝑥 (𝑗) )−𝑦 𝑗 )2
= =
𝜕𝜃0 𝜕𝜃0 𝜕𝜃0

𝑗 𝑗 𝑗 𝑗
And according to derivative calculation rule
As we have 𝑓 𝑥 −𝑦 = 𝜃0 + 𝜃1 𝑥1 − 𝑦 𝑑(𝑎𝜃 − 𝑏)2
= 2 𝑎𝜃 − 𝑏 𝑎
𝑑𝜃
𝜕𝐽(𝜃) 1 𝑚 2
𝑗 𝑗
= ෍ 2(𝜃0 + 𝜃1 𝑥1 − 𝑦 ) = (0 + 0 − 1) + (0 + 0 − 2) = −3
𝜕𝜃0 𝑚 𝑗=1 2
𝜕𝐽 𝜃
𝜃0 ← 𝜃0 − 𝛼 = 0 − 0.1 × (−3) = 0.3
𝜕𝜃0
◆ Suppose we want to find a linear model 𝑓 𝑥 = 𝜃0 + 𝜃1 𝑥 to fit a dataset
𝐷 𝑥, 𝑦 = 0,1 , 1,2 . We consider the squared error 𝐽 𝜃 =
1 𝑚 2
σ 𝑓 𝑥 𝑗 −𝑦 𝑗 as the cost function, where 𝑚 is the number of data
𝑚 𝑗=1
examples. We use the gradient descent to iteratively update the parameters
𝜃0 and 𝜃1 starting from 𝜃0 = 𝜃1 = 0.

Question 1: Calculate 𝜃0 and 𝜃1 after the first iteration of the update process
in the gradient descent, where the learning rate is set to 0.1.
1 1 𝑚 𝑗
𝜕𝐽(𝜃) 𝜕(𝑚 σ𝑚 (𝑗) 𝑗 2
𝑗=1(𝑓(𝑥 )−𝑦 ) ) 𝑚
σ𝑗=1 𝜕(𝜃0 +𝜃1 𝑥1 −𝑦 𝑗 )2
= =
𝜕𝜃1 𝜕𝜃1 𝜕𝜃1

𝑑(𝑎𝜃−𝑏)2
And according to derivative calculation rule = 2 𝑎𝜃 − 𝑏 𝑎
𝑑𝜃

𝜕𝐽(𝜃) 1 𝑚 2
𝑗 𝑗 𝑗
= ෍ 2(𝜃0 + 𝜃1 𝑥1 − 𝑦 )𝑥 = (0 + 0 − 1) × 0 + (0 + 0 − 2) × 1 = −2
𝜕𝜃1 𝑚 𝑗=1 2

𝜕𝐽 𝜃
𝜃1 ← 𝜃1 − 𝛼 = 0 − 0.1 × (−2) = 0.2
𝜕𝜃0
Question 2: Now suppose we want to find a quadratic model 𝑓 𝑥 = 𝜃0 +
𝜃1 𝑥 + 𝜃2 𝑥 2 to fit the data. Consider the same cost function and the initial
value 𝜃0 = 𝜃1 = 𝜃2 = 0. Calculate 𝜃0 , 𝜃1 , 𝜃2 after the first iteration of the
update process in the gradient descent, where the learning rate is set to 0.1.

1 𝑚 1 𝑚
𝜕𝐽(𝜃) 𝜕( σ𝑗=1(𝑓(𝑥 (𝑗) ) − 𝑦 𝑗
)2 ) σ𝑗=1 𝜕(𝑓(𝑥 (𝑗) ) − 𝑦 𝑗
)2
= 𝑚 =𝑚
𝜕𝜃0 𝜕𝜃0 𝜕𝜃0
2 𝑚 𝑗 𝑗 2
= σ 𝑓 𝑥 −𝑦 = −1 − 2 = −3
𝑚 𝑗=1 2

𝜕𝐽 𝜃
𝜃0 ← 𝜃0 − 𝛼 = 0 − 0.1 × (−3) = 0.3
𝜕𝜃0
Question 2: Now suppose we want to find a quadratic model 𝑓 𝑥 = 𝜃0 +
𝜃1 𝑥 + 𝜃2 𝑥 2 to fit the data. Consider the same cost function and the initial
value 𝜃0 = 𝜃1 = 𝜃2 = 0. Calculate 𝜃0 , 𝜃1 , 𝜃2 after the first iteration of the
update process in the gradient descent, where the learning rate is set to 0.1.

1 𝑚 (𝑗) 𝑗 2 1 𝑚
𝜕𝐽(𝜃) 𝜕( σ𝑗=1 (𝑓(𝑥 ) − 𝑦 ) ) σ𝑗=1 𝜕(𝑓(𝑥 (𝑗) ) − 𝑦 𝑗
)2
= 𝑚 =𝑚
𝜕𝜃1 𝜕𝜃1 𝜕𝜃1
𝑚
2 𝑗 𝑗 𝑗
2
= ෍ 𝑓 𝑥 −𝑦 𝑥 = 0 − 2 = −2
𝑚 2
𝑗=1

𝜕𝐽 𝜃
𝜃1 ← 𝜃1 − 𝛼 = 0 − 0.1 × (−2) = 0.2
𝜕𝜃1
Question 2: Now suppose we want to find a quadratic model 𝑓 𝑥 = 𝜃0 +
𝜃1 𝑥 + 𝜃2 𝑥 2 to fit the data. Consider the same cost function and the initial
value 𝜃0 = 𝜃1 = 𝜃2 = 0. Calculate 𝜃0 , 𝜃1 , 𝜃2 after the first iteration of the
update process in the gradient descent, where the learning rate is set to 0.1.

1 𝑚 (𝑗) 𝑗 2 1 𝑚 (𝑗) 𝑗
𝜕𝐽(𝜃) 𝜕(𝑚 σ𝑗=1(𝑓(𝑥 ) − 𝑦 ) ) 𝑚 σ𝑗=1 𝜕(𝑓(𝑥 ) − 𝑦 )2
= =
𝜕𝜃2 𝜕𝜃2 𝜕𝜃2
𝑚
2
= ෍ 𝑓 𝑥 𝑗 − 𝑦 𝑗 (𝑥 (𝑗) )2
𝑚
𝑗=1
2
= ((0 + 0 + 0 − 1) × 0 + (0 + 0 + 0 − 2) × 1) = −2
2

𝜕𝐽 𝜃
𝜃2 ← 𝜃2 − 𝛼 = 0 − 0.1 × (−2) = 0.2
𝜕𝜃2

You might also like