0% found this document useful (0 votes)

13 views47 pages

Lecture 03

The document discusses linear models and the frequentist approach. It describes how minimizing expected loss leads to predicting the conditional mean, but this ignores model uncertainty. It then introduces the bias-variance decomposition, showing expected loss is the sum of bias, variance, and noise. Over-regularization increases bias while under-regularization increases variance. Bayesian linear regression is presented as an alternative that places a prior over model parameters and updates this based on data.

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views47 pages

Lecture 03

Uploaded by

carlo.768.ri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Advanced Machine Learning

Lecture 3: Linear models

Sandjai Bhulai
Vrije Universiteit Amsterdam

[Link]@[Link]
12 September 2023
Towards a Bayesian framework

Advanced Machine Learning

The frequentist pitfall

= (t | y(x0, w), β −1)

3 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
The frequentist pitfall
▪ Given p(t | x) or p(t, x) directly minimize expected loss
function

∫∫
[L] = L(t, y(x))p(x, t) dx dt

▪ Natural choice:

∫∫
[L] = {y(x) − t}2 p(x, t) dx dt
𝔼
𝔼
4 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
The frequentist pitfall
▪ Given a point x, the expected loss at that point is given by

∫
[L(t, y(x))] = {y(x) − t}2 p(t | x) dt

▪ Taking the derivative w.r.t. y(x) yields

∫
2 {y(x) − t}p(t | x) dt

▪ Setting this expression to 0, yields

∫ ∫
y(x) = y(x)p(t | x) dt = tp(t | x) dt = t[t | x]
𝔼
𝔼
5 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
The frequentist pitfall
▪ The expected loss function

∫∫
[L] = {y(x) − t}2 p(x, t) dx dt

▪ Rewrite integrand as
{y(x) − t}2 = {y(x) − [t | x] + [t | x] + −t}2
= {y(x) − [t | x]}2 + 2{y(x) − [t | x]}{ [t | x] − t} + { [t | x] − t}2

▪ Taking the expected value yields

∫ ∫
[L] = {y(x) − [t | x]}2 p(x) dx + var[t | x]p(x) dx
𝔼
𝔼
𝔼
6 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝔼
𝔼
𝔼
𝔼
𝔼
𝔼
The frequentist pitfall
▪ Recall the expected square loss,

∫ ∫∫
[L] = {y(x) − h(x)}2 p(x) dx + {h(x) − t}2 p(x, t) dx dt

∫
where h(x) = [t | x] = tp(t | x) dt

▪ The second term corresponds to the noise inherent in the

random variable t
▪ What about the rst term?
𝔼
𝔼
7 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
fi
The frequentist pitfall
▪ Suppose we were given multiple datasets, each of size N.
Any particular dataset , will give a particular function
y(x; )

▪ We then have

{y(x; ) − h(x)}2

= {y(x; )− [y(x; )] + [y(x; )] − h(x)}2

= {y(x; )− [y(x; )]}2 + { [y(x; )] − h(x)}2

+2{y(x; )− [y(x; )]}{ [y(x; )] − h(x)}

𝒟
𝒟
𝒟
𝒟
𝒟
𝒟
8
𝒟
𝒟
Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝒟
𝒟
𝒟
𝒟
𝔼
𝔼
𝔼
𝒟
𝒟
𝒟
𝔼
𝔼
𝔼
𝒟
𝒟
𝒟
The frequentist pitfall
▪ Taking the expectation over yields:
[{y(x; ) − h(x)}2]

={ [y(x; )] − h(x)}2 + [{y(x; )− [y(x; )]}2]

(bias)2 variance
𝒟
𝒟
𝒟
𝒟
9
𝒟
Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝔼
𝔼
𝒟
𝒟
𝔼
𝒟
𝔼
𝒟
The frequentist pitfall
▪ In conclusion:

expected loss = (bias)2 + variance + noise

where

∫
(bias)2 = { [y(x; )] − h(x)}2 p(x) dx

∫
variance = [{y(x; )] − [y(x; )]}2] p(x) dx

∫∫
noise = {h(x) − t}2 p(x, t) dx dt
𝒟
𝒟
𝒟
𝔼
𝔼
𝒟
𝒟
𝔼
𝒟
10 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
Bias-variance decomposition
▪ Example: 100 datasets from the sinusoidal with 25 data
points, varying the degree of regularization

11 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bias-variance decomposition
▪ Example: 100 datasets from the sinusoidal with 25 data
points, varying the degree of regularization

12 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bias-variance decomposition
▪ Example: 100 datasets from the sinusoidal with 25 data
points, varying the degree of regularization

13 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bias-variance decomposition
▪ From these plots, we note that an over-regularized model
(large λ) will have a high bias, while an under-regularized
model (small λ) will have a high variance

14 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bias-variance decomposition
▪ These insights are of limited practical value
▪ It is based on averages with respect to ensembles of
datasets
▪ In practice, we have only the single observed dataset

15 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bias-variance tradeo

16 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

f
Bayesian linear regression
▪ Bayes’ theorem:
p(X | Y )p(Y )
p(Y | X) =
p(X)

▪ Essentially, this leads to

posterior ∝ likelhood × prior

▪ The idea is to use a probability distribution over the weights,

and then update the weights based on the observed data

17 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ De ne a conjugate prior over w
p(w) = (w | m0, S0)

▪ Combining this with the likelihood function and using results

for marginal and conditional Gaussian distributions, gives the
posterior
p(w | t) = (w | mN, SN )

where
mN = SN (S−1
0 m 0 + βΦ ⊤
t)

S−1
N = S −1
0 + βΦ ⊤
Φ
18 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023
𝒩
𝒩
fi
Bayesian linear regression
▪ A common choice for the prior is

p(w) = (w | 0, α −1I)

for which
mN = β SN Φ⊤t

S−1
N = αI + βΦ ⊤
Φ

▪ Consider the following example to make the concept less

abstract: y(x) = − 0.3 + 0.5x

19 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
Bayesian linear regression
▪ 0 data points observed

20 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ 1 data point observed

21 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ 2 data points observed

22 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ 20 data points observed

23 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ A common choice for the prior is

p(w) = (w | 0, α −1I)

for which
mN = β SN Φ⊤t

S−1
N = αI + βΦ ⊤
Φ

▪ What is the log of the posterior distribution, i.e., ln p(w | t)?

p(t | w)p(w) β N T 2 α ⊤
∑
ln p(w | t) = ln = {tn − w φ(xn)} − w w + const
p(t) 2 n=1 2

24 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
Bayesian linear regression
▪ A common choice for the prior is

p(w) = (w | 0, α −1I)

for which
mN = β SN Φ⊤t

S−1
N = αI + βΦ ⊤
Φ

▪ What if we have no prior no information, i.e., α → 0?

mN = (Φ⊤Φ)−1Φ⊤t

25 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
Bayesian linear regression
▪ A common choice for the prior is

p(w) = (w | 0, α −1I)

for which
mN = β SN Φ⊤t

S−1
N = αI + βΦ ⊤
Φ

▪ What if we have precise prior information, i.e., α → ∞?

mN = 0

26 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
Bayesian linear regression
▪ A common choice for the prior is

p(w) = (w | 0, α −1I)

for which
mN = β SN Φ⊤t

S−1
N = αI + βΦ ⊤
Φ

▪ What if we have in nite data, i.e., N → ∞?

lim mN = (Φ⊤Φ)−1Φ⊤t
N→∞

27 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
fi
Bayesian linear regression
▪ Predict t for new values of x by integrating over w

∫
p(t | t, α, β) = p(t | w, β)p(w | t, α, β) dw

= (t | m⊤N φ(x), σN2 (x))

where
1
σN2 (x) = + φ(x)⊤SN φ(x)
β

28 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

𝒩
Bayesian linear regression
▪ Sinusoidal data, 9 Gaussian basis functions: 1 data point

29 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ Sinusoidal data, 9 Gaussian basis functions: 2 data points

30 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ Sinusoidal data, 9 Gaussian basis functions: 4 data points

31 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Bayesian linear regression
▪ Sinusoidal data, 9 Gaussian basis functions: 25 data points

32 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Conclusions
▪ The use of maximum likelihood, or equivalently, least
squares, can lead to severe over tting if complex models are
trained using data sets of limited size

▪ A Bayesian approach to machine learning avoids the

over tting and also quanti es the uncertainty in model
parameters

33 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

fi
fi
fi
Linear models for regression

Advanced Machine Learning

Linear regression

35 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Linear regression

36 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Linear regression
▪ General model is:

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ φj = x j
▪ Take M=2
= (Φ Φ) ⊤ −1
▪ Calculate wML Φ ⊤t

37 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Linear regression
▪ Thus, we have y(x, w) = w0 + w1x

▪ Performance is measured by

1 N
{y(xn, w) − tn}2
2N ∑
E(w) =
n=1

▪ Goal: min E(w0, w1)

w0,w1

38 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent

39 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent

40 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Gradient descent algorithm:

repeat until convergence {

∂
wj := wj − α E(w0, w1)
∂wj
}

41 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Correct update:
∂
temp0 := w0 − α E(w0, w1)
∂w0
∂
temp1 := w1 − α E(w0, w1)
∂w1
w0 := temp0

w1 := temp1

42 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Incorrect update:
∂
temp0 := w0 − α E(w0, w1)
∂w0
w0 := temp0
∂
temp1 := w1 − α E(w0, w1)
∂w1
w1 := temp1

43 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Feature scaling is important

44 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Step size is important for convergence

45 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent
▪ Convexity of the problem is important for global optimality

46 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Gradient descent for linear regression
▪ General model is:

M−1
wjφj(x) = w⊤φ(x)
∑
y(x, w) =
j=0

▪ Repeat {

1 N
wj := wj − α
∑ (y(xn, w) − tn) φj(xn)
N n=1
}

47 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Lecture 02
No ratings yet
Lecture 02
33 pages
Lecture 04
No ratings yet
Lecture 04
28 pages
ML 3
No ratings yet
ML 3
66 pages
Regression
No ratings yet
Regression
39 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Regression Models Overview
100% (1)
Linear Regression Models Overview
39 pages
Bayesian Linreg
No ratings yet
Bayesian Linreg
36 pages
Main 2
No ratings yet
Main 2
37 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Understanding Bias-Variance Tradeoff
No ratings yet
Understanding Bias-Variance Tradeoff
10 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
ML Lecture Linear Regression 2
No ratings yet
ML Lecture Linear Regression 2
23 pages
Ai - Foundations of Machine Learning II
No ratings yet
Ai - Foundations of Machine Learning II
54 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
40 Essential Machine Learning Interview Questions
100% (1)
40 Essential Machine Learning Interview Questions
21 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
ML 01
No ratings yet
ML 01
24 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Statistical Machine Learning Exam Guide
No ratings yet
Statistical Machine Learning Exam Guide
12 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
ML Assignment 1
No ratings yet
ML Assignment 1
7 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Linear Regression, Active Learning
No ratings yet
Linear Regression, Active Learning
10 pages
Machine Learning Cheat Sheet PDF
100% (1)
Machine Learning Cheat Sheet PDF
15 pages
Bayesian Linear Regression - GeeksforGeeks
No ratings yet
Bayesian Linear Regression - GeeksforGeeks
15 pages
Linear Modal For Regresion
No ratings yet
Linear Modal For Regresion
32 pages
Time Grad
No ratings yet
Time Grad
11 pages
ML Iit Madras Summary (1-12)
No ratings yet
ML Iit Madras Summary (1-12)
43 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
ML in 10 Pages 1683806402
No ratings yet
ML in 10 Pages 1683806402
10 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
9 Mle
No ratings yet
9 Mle
39 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
ML Lecture Linear Regression 1
No ratings yet
ML Lecture Linear Regression 1
33 pages
ML Classifiers & Regression Guide
No ratings yet
ML Classifiers & Regression Guide
46 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Deep Learning for NLP Enthusiasts
No ratings yet
Deep Learning for NLP Enthusiasts
189 pages
Bmate301 Aiml
No ratings yet
Bmate301 Aiml
11 pages
Natural Language Processing: MIT 6.8610-6.8611 / Fall 2023
No ratings yet
Natural Language Processing: MIT 6.8610-6.8611 / Fall 2023
122 pages
Gradient Descent for ML Practitioners
No ratings yet
Gradient Descent for ML Practitioners
2 pages
Google - Machine Learning Glossary
No ratings yet
Google - Machine Learning Glossary
83 pages
Optimization Techniques and Applications
No ratings yet
Optimization Techniques and Applications
10 pages
Nested Learning: The Illusion of Deep Learning Architectures
No ratings yet
Nested Learning: The Illusion of Deep Learning Architectures
16 pages
Numerical Linear Algebra and Matrix Factorizations: Tom Lyche
No ratings yet
Numerical Linear Algebra and Matrix Factorizations: Tom Lyche
376 pages
Unit Vi Parametric Machine Learning
No ratings yet
Unit Vi Parametric Machine Learning
77 pages
Mathematics For Intelligent Systems
No ratings yet
Mathematics For Intelligent Systems
7 pages
Foundations of Calculus For Data Science An Foundational Guide To PDF
No ratings yet
Foundations of Calculus For Data Science An Foundational Guide To PDF
216 pages
A Stein Variational Newton Method: Preprint. Work in Progress
No ratings yet
A Stein Variational Newton Method: Preprint. Work in Progress
14 pages
Linear Algebra & Optimization Q&A
No ratings yet
Linear Algebra & Optimization Q&A
2 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Modelling & Optimization Exam 2020
No ratings yet
Modelling & Optimization Exam 2020
7 pages
Quant Finance in Excel 1716541306
No ratings yet
Quant Finance in Excel 1716541306
140 pages
Quantum Energy Minimization Method
No ratings yet
Quantum Energy Minimization Method
30 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
61 pages
Quiz Machine Learning
No ratings yet
Quiz Machine Learning
4 pages
Deep Learning Fundamentals Materials
100% (1)
Deep Learning Fundamentals Materials
216 pages
Deep Learning Quantum
No ratings yet
Deep Learning Quantum
124 pages
BookSlides 7A Error-Based Learning
No ratings yet
BookSlides 7A Error-Based Learning
49 pages
Machine Learning A Bayesian and Optimization Perspective 1st Edition Sergios Theodoridis PDF Download
No ratings yet
Machine Learning A Bayesian and Optimization Perspective 1st Edition Sergios Theodoridis PDF Download
84 pages
L6 Adaptive Filters
No ratings yet
L6 Adaptive Filters
35 pages
Module 2 Quiz - Correct
No ratings yet
Module 2 Quiz - Correct
4 pages
Unit I-Deep Learning
No ratings yet
Unit I-Deep Learning
22 pages
Advanced Stochastic Methods
No ratings yet
Advanced Stochastic Methods
4 pages
Cost Functions in Machine Learning
No ratings yet
Cost Functions in Machine Learning
23 pages
Composite Structures: Andjelka Stanic, Blaz Hudobivnik, Boštjan Brank
No ratings yet
Composite Structures: Andjelka Stanic, Blaz Hudobivnik, Boštjan Brank
11 pages
Linear Programming Project Report
No ratings yet
Linear Programming Project Report
10 pages

Lecture 03

Uploaded by

Lecture 03

Uploaded by

Advanced Machine Learning

Lecture 3: Linear models

Advanced Machine Learning

= (t | y(x0, w), β −1)

3 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ Taking the derivative w.r.t. y(x) yields

▪ Setting this expression to 0, yields

▪ Taking the expected value yields

▪ The second term corresponds to the noise inherent in the

= {y(x; )− [y(x; )] + [y(x; )] − h(x)}2

= {y(x; )− [y(x; )]}2 + { [y(x; )] − h(x)}2

+2{y(x; )− [y(x; )]}{ [y(x; )] − h(x)}

={ [y(x; )] − h(x)}2 + [{y(x; )− [y(x; )]}2]

expected loss = (bias)2 + variance + noise

11 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

12 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

13 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

14 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

15 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

16 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ Essentially, this leads to

posterior ∝ likelhood × prior

▪ The idea is to use a probability distribution over the weights,

17 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ Combining this with the likelihood function and using results

▪ Consider the following example to make the concept less

19 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

20 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

21 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

22 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

23 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ What is the log of the posterior distribution, i.e., ln p(w | t)?

24 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ What if we have no prior no information, i.e., α → 0?

25 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ What if we have precise prior information, i.e., α → ∞?

26 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ What if we have in nite data, i.e., N → ∞?

27 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

= (t | m⊤N φ(x), σN2 (x))

28 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

29 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

30 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

31 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

32 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ A Bayesian approach to machine learning avoids the

33 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

Advanced Machine Learning

35 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

36 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

37 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

▪ Goal: min E(w0, w1)

38 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

39 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

40 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

repeat until convergence {

41 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

42 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

43 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

44 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

45 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

46 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

47 Sandjai Bhulai / Advanced Machine Learning / 12 September 2023

You might also like