0% found this document useful (0 votes)

17 views59 pages

Lecture 5

Uploaded by

Moqiu Liang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views59 pages

Lecture 5

Uploaded by

Moqiu Liang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

FIT3154 Lecture 5

Bayesian Inference #2

Daniel F. Schmidt

Faculty of Information Technology, Monash University

August 21, 2024

(Monash University) August 21, 2024 1 / 54

Outline

1 Prior Distributions Revisited

Bayesian Inference of Normal Distribution
Transformations of Priors

2 Weakly Informative Prior Distributions

The Cauchy Prior
Bayesian Inference of Normal Revisited

(Monash University) August 21, 2024 2 / 54

Bayes Inference - A Recap (1)

We have a model of our data, p(y | θ)

Assume our population parameter is a R.V. distributed as per

θ ∼ π(θ)dθ

where π(θ) is our prior distribution

Chosen to represent prior beliefs about θ/prior ignorance/convenience
We observe some data y = (y1 , . . . , yn )
We form the posterior distribution of θ, given y

p(y | θ)π(θ)
p(θ | y) = R
p(y | θ)π(θ)dθ

(Monash University) August 21, 2024 3 / 54

Bayes Inference - A Recap (2)

Quantity Frequentist Bayesian

Model of population p(y | θ), true population parameter θ unknown

Population Parameter True θ unknown, but fixed True θ is a random variable

i.e., θ ∼ θ(π)dθ

Point Estimates Maximum Likelihood θ̂ML Posterior mean, posterior mode

Penalized Maximum Likelihood, etc. General Bayes estimator

Measures of Uncertainty Standard error

q Posterior standard deviation
p
V θ̂ML V [θ | y]

Interval Estimates 100α% Confidence Intervals 100α% Credible Intervals

A(y) such that P(θ ∈ A(y)) = α A such that P(θ ∈ A | y) = α
if y ∼ p(y | θ), θ unknown but fixed conditional on seeing y

Frequentist vs Bayesian Inference

(Monash University) August 21, 2024 4 / 54

Today’s Relevant Figure (N/A)

Dennis Lindley (1923 - 2013). Born in London, England. Studied mathematics at

Cambridge, where he first encountered statistics. He was a committed Bayesian,
and was one of the few prominent figures in the 50s, 60s and 70s to promote
Bayesian statistics. Made many fundamental contributions to the area of
Bayesian inference.
(Monash University) August 21, 2024 5 / 54
Outline

1 Prior Distributions Revisited

Bayesian Inference of Normal Distribution
Transformations of Priors

2 Weakly Informative Prior Distributions

The Cauchy Prior
Bayesian Inference of Normal Revisited

(Monash University) August 21, 2024 6 / 54

Bayesian Inference of the Normal Distribution

Let’s start with another example of Bayesian inference

Let us examine Bayesian inference of the normal distribution
 
n/2 n
1 1 X

p(y | µ, σ 2 ) = exp − (µ − yj )2 
2πσ 2 2σ 2 j=1

with unknown mean µ and known variance σ 2

We will relax the latter assumption later
So given a data sample y = (y1 , . . . , yn ), we want to infer the
population µ using Bayesian inference
Point estimate for µ
Interval estimates for µ

(Monash University) August 21, 2024 7 / 54

Bayesian Inference of the Normal Distribution

For Bayesian inference we need a prior distribution

Describes our a priori beliefs about potential values of µ
Let us use the normal distribution as prior for µ:
1/2 !
1 (m − µ)2

2
π(µ | m, s ) = exp −
2πs2 2s2

where m and s2 are the prior hyperparameters.

Prior mean E [µ] = m sets our “best guess” of the population
parameter
Prior variance V [µ] = s2 controls how much confidence we have in our
guess
Normal is convenient as it is the “conjugate prior”

(Monash University) August 21, 2024 8 / 54

Normal Prior Distributions

0.4
N(m=10, s 2=1)
0.35 N(m = 10, s 2=16)

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20

Two normal prior distributions; the first, N (m = 10, s2 = 1) expresses strong

belief µ is near 10; the second, N (m = 10, s2 = 16) expresses much weaker belief
that µ is near 10.

(Monash University) August 21, 2024 9 / 54

Bayesian Inference of the Normal Distribution

We can write our Bayesian model as the hierarchy

yj | µ, σ 2 ∼ N (µ, σ 2 ), j = 1, . . . , n
µ | m, s2 ∼ N (m, s2 )

After observing a sample y = (y1 , . . . , yn ), the posterior is

−1 !
n 1

µ | y ∼ N wȳ + (1 − w)m, +
σ 2 s2

where
ns2
w=
ns2 + σ 2
is the weight put on the information in the data
=⇒ (1 − w) is the weight put on the prior information
Weight controlled by sample size and population and prior variances
(Monash University) August 21, 2024 10 / 54
Bayesian Inference of the Normal Distribution

We have recorded on body mass index (BMI) on n = 25 members of

Pima indian ethnic group
We want to estimate the average BMI for the Pima population
We are told that population standard deviation of BMI in US is
6.5kg/m2
We choose to use a normal prior distribution for µ
Need to choose values for m and s
We know that the average BMI of people in US is 28.6 (source:
CDC)
So we could set m = 28.6kg/m2

(Monash University) August 21, 2024 11 / 54

Bayesian Inference of the Normal Distribution

How to select s2 (the prior standard deviation)?

We know that a majority of the US population has a BMI > 25
So we could choose s = 2, so that there is a 95% prior probability that
the population mean BMI for Pima indians is between ≈ (24.6, 32.6)
So most prior probability concentrated on µ being in the higher range
of plausible BMI values
Let’s see how our analysis turns out

(Monash University) August 21, 2024 12 / 54

Bayesian Inference of the Normal Distribution
The sample mean of our data is ȳ = 33.20kg/m2
Standard error of 1.3kg/m2
Using our prior (m = 28.6, s = 2) yields the posterior

µ | y ∼ N (31.84, 1.188)

and the 95% credible interval

(29.703, 33.976)

using qnorm(c(0.025,0.975),31.84,sqrt(1.188))
How sensitive is this to choice of prior guess m?
Instead use population average BMI of Japan (m = 22); then

µ | y ∼ N (29.87, 1.188)

which is a ≈ 7% change in estimate of mean BMI

(Monash University) August 21, 2024 13 / 54
Bayesian Inference of the Normal Distribution
The sample mean of our data is ȳ = 33.20kg/m2
Standard error of 1.3kg/m2
Using our prior (m = 28.6, s = 2) yields the posterior

µ | y ∼ N (31.84, 1.188)

and the 95% credible interval

(29.703, 33.976)

using qnorm(c(0.025,0.975),31.84,sqrt(1.188))
How sensitive is this to choice of prior guess m?
Instead use population average BMI of Japan (m = 22); then

µ | y ∼ N (29.87, 1.188)

which is a ≈ 7% change in estimate of mean BMI

(Monash University) August 21, 2024 13 / 54
Normal Posterior Distributions

0.4
N(m=28.6,s2=4) prior
N(m=22,s2=4) prior
0.35 Sample mean

0.3

0.25

0.2

0.15

0.1

0.05

0
24 26 28 30 32 34 36 38

Posterior distributions of µ for Pima Indians BMI data, for two choices of normal
prior distribution (m = 28.6kg/m2 and m = 22kg/m2 ). Notice how much the
posterior is affected by choice of prior guess m.

(Monash University) August 21, 2024 14 / 54

Why is the posterior mean so sensitive?

The posterior mean (and mode, and median) is given by

E [µ | y] = wȳ + (1 − w)m

where
ns2
w=
ns2 + σ 2
Important observations:
As s2 → ∞, w → 1 (use only the data)
As s2 → 0, w → 0 (use only the prior guess)
if s2 < ∞, then as |m| → ∞

|E [µ | y] − ȳ| → ∞

that is, the posterior mean gets further and further away from the
sample mean as we move the prior mean guess
=⇒ the posterior is very sensitive to choices of m and s2
(Monash University) August 21, 2024 15 / 54
Uninformative Priors

How to solve this sensitivity?

One way is to try and make the prior uninformative
In our analysis of normal-normal posterior, we noticed that if s2 is
larger then the prior has less effect
So perhaps we could let s2 grow very large to make a more
“uninformative” prior ...

(Monash University) August 21, 2024 16 / 54

Uninformative Priors

0.2
N(28.6,22 )
N(28.6,102 )
N(28.6,202 )

0.15 N(28.6,100 2 )

0.1

0.05

0
0 10 20 30 40 50

Normal prior distributions as s increases. Notice for very large s the distribution
becomes “flat” and spreads its probability very thinly across the µ-line.

(Monash University) August 21, 2024 17 / 54

Uninformative Priors

In the extreme case that s2 → ∞, the prior becomes uniform

π(µ) ∝ 1

which tries to say that any value of µ is a priori equally likely

=⇒ no a priori preference for any particular value of µ
One issue is that the uniform prior on R doesn’t normalise, i.e.,
Z ∞
(1)dµ = ∞
−∞

This type of prior is called improper

It lacks even a subjective probability interpretation
Assigns “zero” prior probability to every set A ⊂ R ...

(Monash University) August 21, 2024 18 / 54

Uninformative Priors

Incredibly, despite being improper, using it with the normal likelihood

results in a proper posterior
We now have the hierarchy

yj | µ, σ 2 ∼ N (µ, σ 2 ), j = 1, . . . , n
µ ∼ (1)dµ

with p(x)dx denoting distribution associated with PDF p(x)

The posterior is now !
σ2
µ | y ∼ N ȳ,
n
where we can see that
The posterior mean is now the sample mean
The posterior variance is the square of the standard error
=⇒ data completely determines inferences
(Monash University) August 21, 2024 19 / 54
Normal Posterior Distributions

0.4
N(m=28.6,s2=4) prior
N(m=22,s2=4) prior
0.35 Uninformative prior (s= )
Sample mean
0.3

0.25

0.2

0.15

0.1

0.05

0
24 26 28 30 32 34 36 38

Posterior distributions of µ for Pima Indians BMI data, for two choices of normal
prior distribution (m = 28.6kg/m2 and m = 22kg/m2 ) and uninformative
(uniform prior).

(Monash University) August 21, 2024 20 / 54

Uninformative Priors
Why does the posterior normalise even though prior does not?
We can write any prior probability distribution as
πu (θ)
π(θ) =
πc
where
πu (θ) is the prior density up to constantsRin θ
πc are the normalizing constants so that π(θ)dθ = 1, i.e.,
Z
πc = πu (θ)dθ

The posterior can then be written as

p(y | θ)πu (θ)(1/πc )
p(θ | y) = R
(1/πc ) p(y | θ)πu (θ)dθ
so that the normalizing constants πc cancel
(Monash University) August 21, 2024 21 / 54
Uninformative Priors
Why does the posterior normalise even though prior does not?
We can write any prior probability distribution as
πu (θ)
π(θ) =
πc
where
πu (θ) is the prior density up to constantsRin θ
πc are the normalizing constants so that π(θ)dθ = 1, i.e.,
Z
πc = πu (θ)dθ

The posterior can then be written as

p(y | θ)πu (θ)(1/πc )
p(θ | y) = R
(1/πc ) p(y | θ)πu (θ)dθ
=⇒ the normalizing constants πc cancel
(Monash University) August 21, 2024 22 / 54
Uninformative Priors

Uniform prior distributions are “uninformative”

So we have solved the problem of Bayesian inference!
Not quite ...
There are three problems with this approach:
1 Posteriors based on improper priors lack many Bayesian optimality
properties
2 The marginal probability p(y) is zero – which causes problems for some
parts of Bayesian theory
3 Being uniform for one parameterisation does not necessarily mean
“uninformative” in actuality
To see the last point we now need to examine transformations of
random variables

(Monash University) August 21, 2024 23 / 54

Reparameterisation of models (1)

Recall our definition of a model:

A distribution p(y | θ) over dataspace y ∈ Y n
The quantity θ are the parameter(s)
Example: normal distribution
1/2 !
1 (y − µ)2

2
p(y | µ, σ ) = exp −
2πσ 2 2σ 2

θ = (µ, σ 2 ) are the parameters

But this parameterisation is not unique!

(Monash University) August 21, 2024 24 / 54

Reparameterisation of models (2)
We can choose any one-to-one transformation of θ, i.e.,
ϕ = f (θ) ⇐⇒ θ = f −1 (ϕ)
so that ϕ is the new parameterisation; then
p(y | ϕ) ≡ p(y | θ = f −1 (ϕ))
Example, we can use precision, τ , instead of variance
r
1 1
τ = 2 ⇐⇒ σ =
σ τ
which leads to
1/2 !
τ τ (y − µ)2

−1/2
p(y | µ, σ = τ )= exp −
2π 2
E.g., (µ = 0, σ = 2) is the same model as (µ = 0, τ = 1/4)
No parameterisation is better than any other, though some may be
more interpretable
(Monash University) August 21, 2024 25 / 54
Reparameterisation of models (2)
We can choose any one-to-one transformation of θ, i.e.,
ϕ = f (θ) ⇐⇒ θ = f −1 (ϕ)
so that ϕ is the new parameterisation; then
p(y | ϕ) ≡ p(y | θ = f −1 (ϕ))
Example, we can use precision, τ , instead of variance
r
1 1
τ = 2 ⇐⇒ σ =
σ τ
which leads to
1/2 !
τ τ (y − µ)2

−1/2
p(y | µ, σ = τ )= exp −
2π 2
E.g., (µ = 0, σ = 2) is the same model as (µ = 0, τ = 1/4)
No parameterisation is better than any other, though some may be
more interpretable
(Monash University) August 21, 2024 25 / 54
Transformation of Random Variables (1)
We now consider transformation of random variables
Let X be a discrete RV with probability distribution p(X = x)
Now consider a one-to-one transformation

Y = f (X) ⇐⇒ X = f −1 (Y );

then
P(Y = y) = P(X = f −1 (Y ))

Example; if X ∈ {1, 2, 3, 4, 5, 6} is the result of a fair dice throw, and

Y = 1/X, then

Y ∈ {1, 1/2, 1/3, 1/4, 1/5, 1/6}

and
P(Y = 1/3) = P(X = 3)
(Monash University) August 21, 2024 26 / 54
Transformation of Random Variables (1)
We now consider transformation of random variables
Let X be a discrete RV with probability distribution p(X = x)
Now consider a one-to-one transformation

Y = f (X) ⇐⇒ X = f −1 (Y );

then
P(Y = y) = P(X = f −1 (Y ))

Example; if X ∈ {1, 2, 3, 4, 5, 6} is the result of a fair dice throw, and

Y = 1/X, then

Y ∈ {1, 1/2, 1/3, 1/4, 1/5, 1/6}

and
P(Y = 1/3) = P(X = 3)
(Monash University) August 21, 2024 26 / 54
Transformation of Random Variables (2)

For continuous RV we have a pdf, say p(X = x) ≡ p(x)

This means that p(y) ̸= p(x = f −1 (y)), in general
=⇒ p(x) is not the probability of x; p(x)dx is for small dx
So, we have

p(y)dy = p(x)dx
dx
p(y) = p(x)
dy
df −1 (y)
p(y) = p(f −1 (y))
dy

where the last term is called the Jacobian

We need y = f (x) be differentiable as well as one-to-one

(Monash University) August 21, 2024 27 / 54

Transformation of Random Variables (2)

For continuous RV we have a pdf, say p(X = x) ≡ p(x)

This means that p(y) ̸= p(x = f −1 (y)), in general
=⇒ p(x) is not the probability of x; p(x)dx is for small dx
So, we have

p(y)dy = p(x)dx
dx
p(y) = p(x)
dy
df −1 (y)
p(y) = p(f −1 (y))
dy

where the last term is called the Jacobian

We need y = f (x) be differentiable as well as one-to-one

(Monash University) August 21, 2024 27 / 54

Transformation of Prior Distributions (1)

Why is this important to us?

In Bayesian inference we have:
1 A probability model, p(y | θ);
2 A prior distribution π(θ) on the parameter θ

If we reparameterise our model, say ϕ = f (θ), then

Our new probability model is p(y | ϕ) ≡ p(y | θ = f −1 (ϕ))
Our new prior is the transformation of π(θ) to π(ϕ)
=⇒ θ is a random variable, so need to transform the prior
This means uniform prior for one parameterisation does not imply
uniform prior for another ...

(Monash University) August 21, 2024 28 / 54

Transformation of Prior Distributions (2)

Example: Bayesian inference of Bernoulli model

Here our probability model is

p(yj | θ) = θyj (1 − θ)1−yj

so θ is probability of success
Lets choose a uniform prior on θ, i.e.,

θ ∼ Beta(1, 1)

which has a probability density

π(θ) = 1

so that we are “uninformative”

=⇒ any probability of success equally likely

(Monash University) August 21, 2024 29 / 54

Transformation of Prior Distributions (3)
But an alternative parameterisation could be in terms of odds
θ O
O= , θ=
1−θ O+1
i.e., how much more likely a success is than a failure
This is one-to-one and differentiable; for example
θ = 0.5 ⇐⇒ O = 1
θ = 0.9 ⇐⇒ O = 9
θ = 0.1 ⇐⇒ O = 1/9
In terms of odds, our probability model is
yj 1−yj
O O O

p yj | θ = = 1−
O+1 O+1 O+1
O yj
=
O+1
What about our prior for O?
(Monash University) August 21, 2024 30 / 54
Transformation of Prior Distributions (4)

We said that θ ∼ Beta(1, 1) (i.e., uniform)

So to find π(O) we need to transform the probability density

O dO/(O + 1)

π(O) = π θ =
O+1 dO
1
=
(O + 1)2

This is clearly not uniform

Let’s have a look at it ...
Note: see if you can transform this prior back to θ (you should
recover the uniform distribution ...)

(Monash University) August 21, 2024 31 / 54

Example: Bayesian Analysis of Bernoulli Distribution (2)

0.8

0.6

0.4

0.2

0
0 2 4 6 8 10

Uniform prior on θ leads to very non-uniform prior on odds O; would we expect

uniform prior on O to make sense? Think: θ < 1/2 ⇔ O < 1, so that
R1 R 1/2
0
(1 + O)−2 dO = 1/2 just as 0 (1)dθ = 1/2.

(Monash University) August 21, 2024 32 / 54

Transformation of Prior Distributions (3)

But it can be much worse; imagine we start by putting a uniform prior

on O
π(O) ∝ 1
which is improper as O ∈ (0, ∞)
We might think we are being uninformative, as we are saying all odds
are equally likely; but does this make sense?
Transforming back to a prior on θ yields:

θ dθ/(1 − θ)

π(θ) ∝ π O =
1−θ dθ
1
∝
(1 − θ)2

This is definitely not uninformative; in fact, let’s look at it ...

(Monash University) August 21, 2024 33 / 54

Example: Bayesian Analysis of Bernoulli Distribution (2)

10000

8000

6000

4000

2000

0
0 0.2 0.4 0.6 0.8 1

Uniform prior on O leads to prior that heavily favours θ close to 1 – so “uniform”

does not mean uninformative; we need to think about what our priors are saying
about our beliefs about the population parameter.

(Monash University) August 21, 2024 34 / 54

Outline

1 Prior Distributions Revisited

Bayesian Inference of Normal Distribution
Transformations of Priors

2 Weakly Informative Prior Distributions

The Cauchy Prior
Bayesian Inference of Normal Revisited

(Monash University) August 21, 2024 35 / 54

Weakly Informative Priors (1)

We now look at weakly informative priors

They are proper and let us specify prior beliefs
But they don’t affect our inferences too much
For our normal example; instead of using normal prior for µ, lets use
the CauchyW distribution
1
π(µ | m, s) =
πs(1 + (µ − m)2 /s2 )

where
m is the location parameter, and
s is the scale parameter
The Cauchy is bell-shaped and symmetric around m
However, Cauchy does not have a mean or variance
The parameter m sets the median (and mode)

(Monash University) August 21, 2024 36 / 54

Weakly Informative Priors (2)

Our hierarchy is then

yj | µ, σ 2 ∼ N (µ, σ 2 ), j = 1, . . . , n
µ | m, s ∼ C(m, s)

where X ∼ C(m, s) denotes that X is a RV distributed a per a

Cauchy with location m and scale s
Unfortunately, the posterior does not have a nice form and the
marginal p(y) does not have a nice solution
Because we only have one parameter θ we use numerical intergration
to compute marginal
Lets see how it works

(Monash University) August 21, 2024 37 / 54

Weakly Informative Priors (3)

For this example, let’s choose m and s to calibrate with our normal
prior
We chose prior so that 95% prior probability that µ ∈ (24.6, 32.6) to
match choice of s = 2 for our normal prior
Let’s set m = 28.6 for Cauchy, and then adjust s until

1 - pcauchy(32.6, 28.6, scale=s)

is approximately 0.025
Choice of s = 0.32 approximately satisfies this, so that
Z 32.6
π(µ | m = 28.6, s = 0.32)dµ ≈ 0.95
24.6

(Monash University) August 21, 2024 38 / 54

Cauchy vs Normal Prior Distribution (1)

1
Normal N(m=28.6, s2=4) prior
Cauchy, C(m=28.6,s=0.32) prior
0.8

0.6

0.4

0.2

0
15 20 25 30 35 40

Normal and Cauchy prior distributions for µ calibrated so that

P(µ ∈ (24.6, 32.6)) ≈ 0.95. Note how more peaked the Cauchy prior is; suggests
it might be more informative ...

(Monash University) August 21, 2024 39 / 54

Cauchy vs Normal Prior Distribution (2)

0
10

10-2

10-4

10-6

10-8

10-10
Normal N(m=28.6, s2=4) prior
Cauchy, C(m=28.6,s=0.32) prior
-12
10
15 20 25 30 35 40

... but plotting the two densities on the log-scale shows that the normal
distribution races off to zero probability as |µ − m| grows much faster than
Cauchy. It has “heavier tails”.

(Monash University) August 21, 2024 40 / 54

Cauchy vs Normal Prior Distribution (3)

Sample mean ȳ = 33.20

For normal prior:
When µ ∼ N (28.6, 4), then E [µ | y] = 31.84kg/m2
When µ ∼ N (22.0, 4), then E [µ | y] = 29.87kg/m2

For Cauchy prior:

When µ ∼ C(28.6, 0.32), then E [µ | y] = 31.96kg/m2
When µ ∼ C(22.0, 0.32), then E [µ | y] = 32.89kg/m2
Setting expected guess further away from ȳ made our estimate closer
to ȳ!
Let’s see behaviour as our prior guess m is varied for the two priors
(normal vs Cauchy)

(Monash University) August 21, 2024 41 / 54

Cauchy vs Normal Prior Distribution (4)
45
Normal N(m,s 2=4) prior
Cauchy, C(m, s=0.32) prior
Uninformative Prior
40

20
0 10 20 30 40 50 60

Posterior mean for the Pima indians BMI data for the three different priors as prior
guess m is varied. The uninformative prior is π(µ) ∝ 1 and has no “prior guess”.
Note how the Cauchy prior only uses its prior guess when m is “near” ȳ = 33.2.

(Monash University) August 21, 2024 42 / 54

Weakly Informative Priors: Summary

In summary, using weakly informative priors:

If our sample mean is in “vicinity” of our prior guess, use prior
information
If our sample mean is far from our prior guess, then ignore our prior
guess
Cauchy is very robust
It means we can specify any prior information we might have, but be
safe that if it is grossly wrong, we won’t bias our results
Better than uninformative prior because it remains proper
(normalizable)
(Minor) drawback is they can be harder to work with

(Monash University) August 21, 2024 43 / 54

Tails of Distributions

Cauchy distribution vs normal distribution

For simplicity, let m = 0 and s = 1 for both
Then, for normal: !
µ2
π(µ) ∝ exp −
2
For Cauchy:
1
π(µ) ∝
1 + µ2

So as |µ| → ∞ ...
probability vanishes to zero exponentially fast for normal;
probability vanishes to zero polynomially fast for Cauchy
Cauchy goes to zero infinitely slower (“heavier tails”)

(Monash University) August 21, 2024 44 / 54

Inference of Normal with Unknown σ 2 (1)

Let’s finish by revisiting inference of the normal distribution

So far we have assumed the data variance σ 2 was known
Somewhat restrictive - and unrealistic

Can we relax this assumption in the Bayesian framework?

Of course we can!

But we need a prior for σ ...

(Monash University) August 21, 2024 45 / 54

Inference of Normal with Unknown σ 2 (2)
σ controls the scale of the data (m, km, etc.)
The uninformative prior for σ is
1
π(σ) ∝
σ
which is not normalizable
Instead we might choose to use the (unit) half-Cauchy
2
π(σ) =
π(1 + σ 2 )
which is a good default choice for any scale type parameter
This has heavy tails; no mean, or variance
Median is at σ = 1
If an RV X follows a unit half-Cauchy, we can say that

X ∼ C + (0, 1)
(Monash University) August 21, 2024 46 / 54
Half-Cauchy Prior Distribution

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 2 4 6 8 10

Unit half-Cauchy prior distribution for σ; it tails off to probability of zero slowly
as σ → ∞.

(Monash University) August 21, 2024 47 / 54

Inference of Normal with Unknown σ 2

Our hierarchy is then

yj | µ, σ 2 ∼ N (µ, σ 2 ), j = 1, . . . , n
µ | m, s ∼ C(m, s)
σ ∼ C + (0, 1)

The posterior distribution is then

p(y | µ, σ)π(µ | m, s)π(σ)

p(µ, σ | y) = R R
p(y | µ, σ)π(µ | m, s)π(σ)dµdσ

The denominator integral does note exist in closed form

Two dimensional integral, harder to compute numerically
Even if it did, dealing with two parameter PDFs is difficult
Hard to calculate credible intervals, etc.

(Monash University) August 21, 2024 48 / 54

Bayesian Inference with Difficult Posteriors

This problem with denominator integral is the norm rather than

exception
Can we overcome this or is that it for the Bayesian approach?
It turns out it is much easier to draw random samples from p(θ | y)
than to compute the integral p(y)
So what we usually do is draw many “samples” of θ from the
posterior, and use these samples to approximate mean, intervals, etc.
Called the Markov-Chain Monte-Carlo (MCMC) approach
We will look at how we do this a little in the next lecture

(Monash University) August 21, 2024 49 / 54

Bayesian Inference with Difficult Posteriors

This problem with denominator integral is the norm rather than

(Monash University) August 21, 2024 49 / 54

Example: Pima indian BMI data

For now, let’s return to inference of our normal distribution

Our hierarchy is then

yj | µ, σ 2 ∼ N (µ, σ 2 ), j = 1, . . . , n
µ | m, s ∼ C(m, s)
σ ∼ C + (0, 1)

For our Pima indian BMI data, we chose to use m = 28.6 and
s = 0.32 for our hyperparameters
Draw 100, 000 samples of µ and σ from posterior p(µ, σ | y)
Use histograms of samples to visualise posterior
Use mean, sd, etc. to get statistics

(Monash University) August 21, 2024 50 / 54

Posterior Samples of µ
0.3

0.25

0.2

0.15

0.1

0.05

0
26 28 30 32 34 36 38

mean([Link]) ≈ 32.91; sd([Link]) ≈ 1.27;

quantile([Link],c(0.025,0.975)) ≈ (30.36, 35.40).
Note the two modes – one near ȳ and one near our prior guess m.

(Monash University) August 21, 2024 51 / 54

Posterior Samples of σ
0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
4 6 8 10 12 14

mean([Link]) ≈ 6.17; sd([Link]) ≈ 0.92;

quantile([Link],c(0.025,0.975)) ≈ (4.68, 8.24).
ddd

(Monash University) August 21, 2024 52 / 54

Markov-Chain Monte-Carlo Approaches

In general, this sampling approach lets us explore the posterior for any
Bayesian hierarchy
The drawback is we don’t get a clear mathematical
look/understanding of what is happening
The advantage is we don’t have to worry too much about being
clever as there are general purpose progams for this
The drawback of those is that they are much slower than being clever
and developing specialised programs

(Monash University) August 21, 2024 53 / 54

Terms to Revise

Terms you should know/be aware of:

Improper prior
Transformation of random variables
Weakly informative prior distributions
Cauchy distribution
Next week we will examine Bayesian Poisson models and Bayesian
linear regression

(Monash University) August 21, 2024 54 / 54

Bayesian Theory-Priors, Part 2: 3.1 Specifying Hyperparameters
No ratings yet
Bayesian Theory-Priors, Part 2: 3.1 Specifying Hyperparameters
13 pages
BT Wk3 LectureNotes
No ratings yet
BT Wk3 LectureNotes
16 pages
Bayesian Priors for Normal and Beta Distributions
No ratings yet
Bayesian Priors for Normal and Beta Distributions
19 pages
25 Intro To Bayesian Inference
No ratings yet
25 Intro To Bayesian Inference
31 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
CH 5
No ratings yet
CH 5
45 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
No ratings yet
MAS3301 Bayesian Statistics: M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2008-9
18 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Introduction to Bayesian Statistics
No ratings yet
Introduction to Bayesian Statistics
6 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
24 Intro To Bayesian Inference
No ratings yet
24 Intro To Bayesian Inference
33 pages
MCMC and Bayesian Modeling Overview
No ratings yet
MCMC and Bayesian Modeling Overview
27 pages
Bayesian Theory-Priors, Part 1: Other Reading
No ratings yet
Bayesian Theory-Priors, Part 1: Other Reading
14 pages
Multi Parametric Models
No ratings yet
Multi Parametric Models
5 pages
Inference For Normal Mean Variance - postXX
No ratings yet
Inference For Normal Mean Variance - postXX
35 pages
Week 11
No ratings yet
Week 11
11 pages
Slides 535 Day 5 SPR 2014
No ratings yet
Slides 535 Day 5 SPR 2014
13 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Introduction To Bayesian Methods With An Example
No ratings yet
Introduction To Bayesian Methods With An Example
25 pages
Bayesian Inference Under Small Sample
No ratings yet
Bayesian Inference Under Small Sample
20 pages
Statistical Computing & Monte Carlo Methods
No ratings yet
Statistical Computing & Monte Carlo Methods
23 pages
Stat 111
No ratings yet
Stat 111
7 pages
Bayesian Week2 LectureNotes
No ratings yet
Bayesian Week2 LectureNotes
14 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
A Cautionary Note On The Discrete Uniform Prior For The Binomial
No ratings yet
A Cautionary Note On The Discrete Uniform Prior For The Binomial
7 pages
Bayesian Analysis in Environmental Valuation
No ratings yet
Bayesian Analysis in Environmental Valuation
34 pages
Notes BMDA PDF
No ratings yet
Notes BMDA PDF
520 pages
Single Parameter Models
No ratings yet
Single Parameter Models
37 pages
Notes
No ratings yet
Notes
520 pages
Raghunath Chatterjee - Normal Distribution - Lecture
No ratings yet
Raghunath Chatterjee - Normal Distribution - Lecture
39 pages
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
No ratings yet
Nonparametric Inference Techniques For High-Dimensional Data: Challenges and Solutions
16 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Point Estimation in Economics
No ratings yet
Bayesian Point Estimation in Economics
5 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
76 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
No ratings yet
Introduction To Bayesian Statistics: 24 February 2016 A Semester's Worth of Material in Just A Few Dozen Slides
40 pages
Talk Nov04 PDF
No ratings yet
Talk Nov04 PDF
13 pages
Single Parametric Models
No ratings yet
Single Parametric Models
10 pages
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages
i i 2 i 1 2 θ i 2 2 3 2
No ratings yet
i i 2 i 1 2 θ i 2 2 3 2
159 pages
10 1093@biomet@63 1 201
No ratings yet
10 1093@biomet@63 1 201
3 pages
Bayesian vs Frequentist Sample Size
No ratings yet
Bayesian vs Frequentist Sample Size
26 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Lecture 2 - 4 Prior
No ratings yet
Lecture 2 - 4 Prior
51 pages
02 Solution Bayes Example
No ratings yet
02 Solution Bayes Example
2 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Chapitre 1 Statistique - Bayesienne
No ratings yet
Chapitre 1 Statistique - Bayesienne
47 pages
Understanding Univariate Gaussian Distributions
No ratings yet
Understanding Univariate Gaussian Distributions
3 pages
Intro-Bayes Theory
No ratings yet
Intro-Bayes Theory
17 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
No ratings yet
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
22 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Game Log
No ratings yet
Game Log
30 pages
Std. V - ASSESSMENT III - 2025-2026
No ratings yet
Std. V - ASSESSMENT III - 2025-2026
2 pages
Revision Matrix
No ratings yet
Revision Matrix
3 pages
Flip Sheet Dowels
No ratings yet
Flip Sheet Dowels
3 pages
Winter War Rules SPI
No ratings yet
Winter War Rules SPI
7 pages
TP4056 Lithium Cell Charger Module Circuit Working Explanation - Electrothinks
No ratings yet
TP4056 Lithium Cell Charger Module Circuit Working Explanation - Electrothinks
9 pages
Dsa Lab
No ratings yet
Dsa Lab
2 pages
Nine at Mary Brickell Village Floor Plans
No ratings yet
Nine at Mary Brickell Village Floor Plans
18 pages
Bits F446 1816 20230809111214
No ratings yet
Bits F446 1816 20230809111214
2 pages
CH - 3 - Pair of Linear Equations in Two Variables
No ratings yet
CH - 3 - Pair of Linear Equations in Two Variables
1 page
UPV Test Summary
No ratings yet
UPV Test Summary
2 pages
Grade 9 Math Final Exam Paper
No ratings yet
Grade 9 Math Final Exam Paper
8 pages
Data Encryption Standard
No ratings yet
Data Encryption Standard
8 pages
Skills Practice: Measuring Angles and Arcs
No ratings yet
Skills Practice: Measuring Angles and Arcs
1 page
Jacobian Matrix and Determinant
No ratings yet
Jacobian Matrix and Determinant
5 pages
Curved Mirror Image Analysis
No ratings yet
Curved Mirror Image Analysis
3 pages
7G Solids, Liquids, Gases
No ratings yet
7G Solids, Liquids, Gases
46 pages
IFR Flight Rules and Requirements Guide
No ratings yet
IFR Flight Rules and Requirements Guide
2 pages
Grade 7 Math: Understanding Sets
No ratings yet
Grade 7 Math: Understanding Sets
1 page
CBSE 9 Polynomials Study Guide
No ratings yet
CBSE 9 Polynomials Study Guide
76 pages
Cement Notes
No ratings yet
Cement Notes
14 pages
The Physics of Spiderman 3
No ratings yet
The Physics of Spiderman 3
2 pages
CSC 101 Lecture Notes
No ratings yet
CSC 101 Lecture Notes
17 pages
Relations Between The Einstein Coefficients
No ratings yet
Relations Between The Einstein Coefficients
12 pages
MMW Module 2 - MATHEMATICAL LANGUAGE AND SYMBOLS
No ratings yet
MMW Module 2 - MATHEMATICAL LANGUAGE AND SYMBOLS
12 pages
Engineering Vibration Assignment
No ratings yet
Engineering Vibration Assignment
3 pages
Effect of Light Intensity and PH Condition On The Growth Biomass and Lipid Content of Microalgae Scenedesmus Species
No ratings yet
Effect of Light Intensity and PH Condition On The Growth Biomass and Lipid Content of Microalgae Scenedesmus Species
10 pages
A1sj71qc24n (-R2) Serial Com PDF
No ratings yet
A1sj71qc24n (-R2) Serial Com PDF
27 pages
WCED Maths Lit Revision Booklet - Grade 11 Term 1 2024 QP
No ratings yet
WCED Maths Lit Revision Booklet - Grade 11 Term 1 2024 QP
17 pages
APA Format Sample Research Proposal
No ratings yet
APA Format Sample Research Proposal
3 pages