0% found this document useful (0 votes)

15 views49 pages

Lec 1 Prob Bayesian Modeling

Uploaded by

Aaditya Saraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views49 pages

Lec 1 Prob Bayesian Modeling

Uploaded by

Aaditya Saraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Probabilistic Bayesian Modelling

Debaditya Roy
Probabilistic Model
• 𝑥 – an observation (random variable/vector)
• 𝑋 = {𝑥1, 𝑥2, … , 𝑥𝑛}, set of observations, evidence, data
• Probabilistic model – a mathematical form which provides stochastic
information about the random variable 𝑋
• 𝜃 - parameters of a model
• 𝑀 – hyperparameters of a model
Modelling Goals
• Estimation (of the underlying model parameters) - p(,m|X)
• Understand
• Generate new data

• Prediction − 𝑝(𝑥 ∗ |𝜃) or 𝑝(𝑥 ∗ |𝑋), x* is a new observation

• Model comparison – 𝑝(𝑋| 𝜃1) > 𝑝(𝑋| 𝜃2)

• Solving the first goal helps solve the second and third goals
Some probabilities of interest

Note: We are talking about probability distributions and not single (point) probabilities
Maximum Likelihood Estimation
Rules of Probability
Posterior Distribution
Posterior Distribution
Posterior Predictive Distribution
Marginal Likelihood
Model Comparison/Averaging
A Simple Parameter Estimation Problem
• for a single-parameter model
• hyperparameter if any will be assumed to be fixed/known
Simple Example (MLE)
• Consider a sequence of N coin tosses (call head = 1, tail = 0)
• The 𝑛ᵗʰ outcome 𝑥ₙ is a binary random variable ∈ {0, 1}
• Assume 𝜃 to be probability of a head (parameter we wish to estimate)

• Each likelihood term 𝑝(𝑥ₙ | 𝜃) is Bernoulli: 𝑝 𝑥𝑛 𝜃) = 𝜃 𝑥𝑛 1 − 𝜃 1−𝑥𝑛

• Log-likelihood:σ𝑁
𝑛=1 log 𝑝 𝑥𝑛 𝜃 = σ 𝑁
𝑛=1 𝑥𝑛 log 𝜃 + 1 − 𝑥𝑛 log(1 − 𝜃)

• Taking derivative of the log-likelihood w.r.t. 𝜃, and setting it to zero gives:

σ𝑁𝑛=1 𝑥𝑛
෡
𝜽ₘₗₑ =
𝑁
෡ in this example is simply the fraction of heads!
𝜽ₘₗₑ
MAP Estimate
Posterior Distribution

Posterior has the same form as prior – conjugate prior

Posterior Predictive Distribution
Visualization

• Prior 𝐵𝑒𝑡𝑎(2,2)
• Likelihood (scaled)
• Posterior 𝐵𝑒𝑡𝑎(14,10)

Vertical lines for:

12
•MLE: 𝜃 = 20 = 0.60
13
•MAP: 𝜃 = 22 ≈ 0.59
14
•Bayesian Mean: 𝜃 = 24 ≈ 0.58

Bayesian mean and MAP are pulled slightly toward the prior compared to the MLE.
Multinoulli Observation Model
Multinoulli Model
Detour: Dirichlet Distribution
A Bag of Proportions
Imagine you're trying to model the proportions of 𝐾 different categories (say: red, green, blue
marbles in a bag). But instead of knowing the exact proportions, you're uncertain — and you
want a probabilistic guess of what those proportions might be.

The Dirichlet distribution gives you a way to describe that uncertainty:

• Each sample from a Dirichlet distribution gives you a possible set of proportions (like: 60%
red, 30% green, 10% blue).
• Different parameters of the Dirichlet control what kinds of proportions you're more likely to
see.
Detour: Dirichlet Distribution
Dirichlet distribution has a parameter α = [α₁, α₂, ..., αₖ] — one for each category.
Here’s what those parameters intuitively do:

• αᵢ > 1 → “I believe the 𝑖 𝑡ℎ category will have a large proportion.”

• αᵢ < 1 → “I believe the 𝑖 𝑡ℎ category will have a small proportion (or maybe even zero).”

• αᵢ = 1 → “I have no strong preference for the 𝑖 𝑡ℎ category.”

Sum of αs, often denoted α₀ = ∑αᵢ, controls concentration:

• High α₀ (e.g. all αᵢ = 10): samples are tightly clustered around the mean (less variability).
• Low α₀ (e.g. all αᵢ = 0.2): samples are sparse — most of the probability mass goes to just
one or two categories in each sample.
Detour: Dirichlet Distribution
Posterior Distribution
Exercise
For Multinoulli Likelihood and Dirichlet Prior
- What is the MLE/MAP?
- Posterior Predictive Distribution?
Gaussian Models
• Univariate with fixed variance
• Univariate with fixed mean
• Univariate with varying mean and variance
• Multivariate
26
Detour: Generative Models
Generative models invariably are also probabilistic models

• Image-to-image translation
• Deepfake generation

• Anomaly Detection in Medical Imaging

• Generating synthetic but interpretable data

• High-fidelity Audio Generation

e.g., WaveGlow for speech synthesis

• Text-to-Image Generation

Figure credit: Lilian Weng

Fixed Variance Gaussian Model
Bayesian Inference for Mean of a Gaussian

Notion of Sufficient Statistics

We only need sufficient statistics to estimate the parameters and values of individual observations aren’t
needed
Likelihood

Prior
Completing the square

Resulting Posterior
Posterior Predictive Distribution

Convolution of Gaussians
Posterior Predictive Distribution

Why? Because you are adding uncertainty

Fixed Mean Gaussian Model
Choosing a Conjugate Prior for 𝜎 2
Goal: Find a prior 𝑝(𝜎 2 ) that makes posterior inference tractable (i.e., conjugate prior).
Posterior Distribution over 𝜎 2 or Precision 𝜆

sum of squared deviations

Visualization

• Posterior sharpens around the true

variance
• Bayesian inference updates our
belief after observing data.
Univariate Gaussian — Unknown Mean & Variance

𝜅0 is a scaling parameter that determines confidence

in prior belief about 𝜇
Posterior Derivation — Normal-Inverse-Gamma
Visualization
Multivariate Gaussian

A two-dimensional Gaussian
Multivariate Gaussian: Examples
Covariance matrix Σ determines:
•Shape of the distribution
•Orientation and spread
Multivariate Gaussian: Marginals and Conditionals
Multivariate Gaussian : Full Bayesian Estimation
Multivariate Gaussian : Full Bayesian Estimation
Multivariate Gaussian : Full Bayesian Estimation
Linear Gaussian Model Formulation
LGM ↔ Bayesian Inference Mapping
Bayesian Inference Linear Gaussian Model
Unknown 𝝁 Latent variable

𝐱𝑖 = 𝜇 + 𝜖 LGM equation

Gaussian noise 𝜖 ∼ 𝒩(0, 𝚺) Measurement uncertainty

Gaussian prior on 𝝁 Conjugate prior

Posterior of 𝝁 LGM inference result

Bayesian inference in this setup is equivalent to inference in a Linear Gaussian Model where
parameters (like the mean vector) are latent variables and observations are generated through a
linear-Gaussian transformation.
48
Gaussian Observation Model
• MLE/MAP for 𝜇, 𝜎 2 (or both) is straightforward in Gaussian observation models.

• Posterior also straightforward in most situations for such models

• (As we saw) computing posterior of 𝜇 is easy (using Gaussian prior) if variance 𝜎 2 is known
• Likewise, computing posterior of 𝜎 2 is easy (using gamma prior on 𝜎 2 ) if mean 𝜇 is known

• If 𝜇, 𝜎 2 both are unknown, posterior computation requires computing 𝑝 𝜇, 𝜎 2 𝑥

• Computing joint posterior 𝑝 𝜇, 𝜎 2 𝒙 exactly requires a jointly conjugate prior 𝑝(𝜇, 𝜎 2 )
• “Gaussian-gamma” (“Normal-gamma”) is such a conjugate prior – a product of normal and
gamma
• Note: Computing joint posteriors exactly is possible only in rare cases such this one

▪If each observation 𝒙𝑛 ∈ ℝ𝐷 , can assume a likelihood/observation model 𝒩 𝒙 𝝁, 𝚺

• Need to estimate a vector-valued mean 𝝁 ∈ ℝ𝐷 . Can use a multivariate Gaussian prior
• Need to estimate a 𝐷 × 𝐷 positive definite covariance matrix 𝚺. Can use a Wishart prior
• If 𝝁, 𝚺 both are unknown, can use Normal-Wishart as a conjugate prior
References
• Review article on Ghahramani, Probabilistic machine learning and artificial
intelligence Nature, 521(7553), 452-459. (freely available online)

• Section 4.6 and Section 11.7 Kevin Murphy, Probabilistic Machine Learning:
An Introduction (PML-1), MIT Press, 2022 (freely available online)

• Chapter 2 and Appendix B of Christopher Bishop, Pattern Recognition and

Machine Learning (PRML), Springer, 2007 (freely available online)

• Kevin Murphy, Conjugate Bayesian analysis of the Gaussian distribution

https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf

• Probabilistic Machine Learning (CS772A), Piyush Rai

Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Probability Distributions Guide
No ratings yet
Probability Distributions Guide
86 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Var PPTS
No ratings yet
Var PPTS
249 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
MCMC and Bayesian Modeling Overview
No ratings yet
MCMC and Bayesian Modeling Overview
27 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Dirichlet Conjugate Priors Explained
No ratings yet
Dirichlet Conjugate Priors Explained
71 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
ML 2
No ratings yet
ML 2
109 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
36 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
Bayesian Conjugate Priors Explained
No ratings yet
Bayesian Conjugate Priors Explained
5 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
Lec 2 Prob Supervised Learning
No ratings yet
Lec 2 Prob Supervised Learning
31 pages
Probabilistic Modeling in Finance
No ratings yet
Probabilistic Modeling in Finance
35 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
No ratings yet
Chapter 9 Bayesian Methods - Machine Learning For Factor Investing
11 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
No ratings yet
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
3 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Introduction to Bayesian Statistics
No ratings yet
Introduction to Bayesian Statistics
6 pages
Bayesian Analysis of Binary Sequences
No ratings yet
Bayesian Analysis of Binary Sequences
13 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
CLASS 2025 Bayesian Framework
No ratings yet
CLASS 2025 Bayesian Framework
46 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Bayesian Inference - Bayesian Modeling and Computation in Python
No ratings yet
Bayesian Inference - Bayesian Modeling and Computation in Python
8 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Bayesian Methods
No ratings yet
Bayesian Methods
11 pages
CB PDF
No ratings yet
CB PDF
69 pages
Gaussian Conjugate Prior Overview
No ratings yet
Gaussian Conjugate Prior Overview
7 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
DS 630 - Lec 02 - ST
No ratings yet
DS 630 - Lec 02 - ST
34 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
Lecture Material 2.5 - Bayesian Estimation & Concepts
No ratings yet
Lecture Material 2.5 - Bayesian Estimation & Concepts
12 pages
Bayesian Theory-Priors, Part 1: Other Reading
No ratings yet
Bayesian Theory-Priors, Part 1: Other Reading
14 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Frequentist vs. Bayesian ML Methods
No ratings yet
Frequentist vs. Bayesian ML Methods
4 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Revision - Bayesian Inference
No ratings yet
Revision - Bayesian Inference
4 pages
Intro to Variational Methods for ML
No ratings yet
Intro to Variational Methods for ML
9 pages
CS772 Lec4
No ratings yet
CS772 Lec4
17 pages
Bayesian Linear Model Gory Details
No ratings yet
Bayesian Linear Model Gory Details
9 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Geometric Distribution
No ratings yet
Geometric Distribution
4 pages
Bin N Poisson Distributions Assignment
No ratings yet
Bin N Poisson Distributions Assignment
3 pages
CS236 Hw2 Answers
No ratings yet
CS236 Hw2 Answers
14 pages
Sta3702 - Jan - Febr - 2022 Online
No ratings yet
Sta3702 - Jan - Febr - 2022 Online
5 pages
Econ2032 MC2
No ratings yet
Econ2032 MC2
3 pages
PQT Unit 5 Markov
No ratings yet
PQT Unit 5 Markov
26 pages
Slides 23 Count
No ratings yet
Slides 23 Count
212 pages
DSSM Previous Question Papers
100% (1)
DSSM Previous Question Papers
8 pages
Chapter 4 Part 3 Measures of Skewness and Relative Position
No ratings yet
Chapter 4 Part 3 Measures of Skewness and Relative Position
20 pages
2.4 the Binomial Distribution 習題
No ratings yet
2.4 the Binomial Distribution 習題
5 pages
(Measures of Central Tendency) : Nadeem Uddin Associate Professor of Statistics
No ratings yet
(Measures of Central Tendency) : Nadeem Uddin Associate Professor of Statistics
11 pages
Business Stats for Students
No ratings yet
Business Stats for Students
43 pages
Tutorial 2
No ratings yet
Tutorial 2
2 pages
Stat Quiz Ball
No ratings yet
Stat Quiz Ball
85 pages
U1 4-RVDistributions
No ratings yet
U1 4-RVDistributions
36 pages
Econometrics: Continuous Variables
No ratings yet
Econometrics: Continuous Variables
64 pages
STA568 - Formulas (Appendix) - JUL 2024
No ratings yet
STA568 - Formulas (Appendix) - JUL 2024
1 page
IMEN 319 Homework Chapter3
No ratings yet
IMEN 319 Homework Chapter3
2 pages
Sathyabama: Register Number
No ratings yet
Sathyabama: Register Number
4 pages
Skew Normal Distribution - Wikipedia
No ratings yet
Skew Normal Distribution - Wikipedia
7 pages
Poisson Distribution
No ratings yet
Poisson Distribution
4 pages
Unit 2.2 - Probability
No ratings yet
Unit 2.2 - Probability
34 pages
Chapter 5 Discrete Random Variables
No ratings yet
Chapter 5 Discrete Random Variables
54 pages
EstimationTheory Lecture 02
No ratings yet
EstimationTheory Lecture 02
14 pages
Mean First Passage Time in Stochastic Processes
No ratings yet
Mean First Passage Time in Stochastic Processes
9 pages
Normal Distribution & It's Properties
No ratings yet
Normal Distribution & It's Properties
11 pages
Unit 3 and 4
No ratings yet
Unit 3 and 4
15 pages
MA2203
No ratings yet
MA2203
2 pages
HY013 - Exercise 3 - Earthquake Forecasting
No ratings yet
HY013 - Exercise 3 - Earthquake Forecasting
2 pages
4.2 Expected Value and Variance of Continuous Random Variables
No ratings yet
4.2 Expected Value and Variance of Continuous Random Variables
2 pages

Lec 1 Prob Bayesian Modeling

Uploaded by

Lec 1 Prob Bayesian Modeling

Uploaded by

Probabilistic Bayesian Modelling

• Prediction − 𝑝(𝑥 ∗ |𝜃) or 𝑝(𝑥 ∗ |𝑋), x* is a new observation

• Model comparison – 𝑝(𝑋| 𝜃1) > 𝑝(𝑋| 𝜃2)

• Each likelihood term 𝑝(𝑥ₙ | 𝜃) is Bernoulli: 𝑝 𝑥𝑛 𝜃) = 𝜃 𝑥𝑛 1 − 𝜃 1−𝑥𝑛

• Taking derivative of the log-likelihood w.r.t. 𝜃, and setting it to zero gives:

Posterior has the same form as prior – conjugate prior

Vertical lines for:

The Dirichlet distribution gives you a way to describe that uncertainty:

• αᵢ > 1 → “I believe the 𝑖 𝑡ℎ category will have a large proportion.”

• αᵢ = 1 → “I have no strong preference for the 𝑖 𝑡ℎ category.”

Sum of αs, often denoted α₀ = ∑αᵢ, controls concentration:

• Anomaly Detection in Medical Imaging

• High-fidelity Audio Generation

Figure credit: Lilian Weng

Notion of Sufficient Statistics

Why? Because you are adding uncertainty

sum of squared deviations

• Posterior sharpens around the true

𝜅0 is a scaling parameter that determines confidence

Gaussian noise 𝜖 ∼ 𝒩(0, 𝚺) Measurement uncertainty

Gaussian prior on 𝝁 Conjugate prior

• Posterior also straightforward in most situations for such models

• If 𝜇, 𝜎 2 both are unknown, posterior computation requires computing 𝑝 𝜇, 𝜎 2 𝑥

▪If each observation 𝒙𝑛 ∈ ℝ𝐷 , can assume a likelihood/observation model 𝒩 𝒙 𝝁, 𝚺

• Chapter 2 and Appendix B of Christopher Bishop, Pattern Recognition and

• Kevin Murphy, Conjugate Bayesian analysis of the Gaussian distribution

• Probabilistic Machine Learning (CS772A), Piyush Rai

You might also like