0% found this document useful (0 votes)

14 views49 pages

FSMLecture 4

Lecture 4 focuses on Bayesian statistics, including prediction and testing, and contrasts it with Neyman-Pearson methods. It discusses statistical models, maximum likelihood estimation, and the Bayesian posterior, emphasizing the importance of reporting full posterior distributions. Additionally, it covers hypothesis testing via Bayes factors and introduces e-processes in the context of betting strategies.

Uploaded by

Günay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views49 pages

FSMLecture 4

Uploaded by

Günay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Today, Lecture 4

1. Bayesian Statistics
• Bayesian prediction, testing
2. E-Processes with simple nulls
• Simple alternative: GRO e-variable
• Composite alternative: learning in Bayesian & non-Bayesian manner
3. Bayesian vs. Neyman-Pearson vs. E-Process Testing with simple nulls

1
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

𝑥", … , 𝑥! ∈ 𝒳 !
• We call a set of distributions ℳ = {𝑃# : 𝜃 ∈ Θ} on Ω a statistical model (or
often hypothesis) for the data
• Simple example: 𝒳 = 0,1 , Θ = 0,1 , ℳ is the Bernoulli model, defined
by
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

Note: for all distributions on Ω,

Bernoulli is the restriction to those distrs with

Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• NOTE: not
Models

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• The method of maximum likelihood (Fisher, 1922) tells us to pick, as a

‘best guess’ of the true 𝜃 , the value 𝜃2 maximizing the probability of the
actually observed data.
The Likelihood Function

𝒏
𝒑𝜽 𝑿 as function of 𝜽
The Bayesian Posterior

• From the Bayesian perspective, you do not necessarily want to make a

‘single’ estimate of 𝜃
• Rather, you want to report the full posterior – this encapsulates
everything you have learned from the data
• Example – Bernoulli model with prior 𝑃 on Θ = [0,1]
" % "
• We have already seen the example with 𝑃 𝜃 = =𝑃 𝜃= = ;
$ $ %
posterior was 𝑃(𝜃|𝐷), a probability distr on 2 parameter values
The Bayesian Posterior

• From the Bayesian perspective, you do not necessarily want to make a

‘single’ estimate of 𝜃
• Rather, you want to report the full posterior – this encapsulates
everything you have learned from data
• Example – Bernoulli model with prior on Θ = [0,1]
• If we want to take a prior on the full Bernoulli model, we should take
one with a continuous probability density 𝑝 𝜃
• Everything works as before: posterior is
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, perhaps not desirable!)
• Why not desirable?
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• Not invariant to reparametrization
• …we could just as well have defined 𝑝# 𝑋& = 1 = 𝜃 %
The Bayesian Posterior

• Posterior is

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

function!
• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• For general parametric models and continuous priors, posterior looks
more and more like a normal distribution as 𝑛 increases, centered
around 𝜃,2 with variance of order 1/√𝑛
The Bayesian Posterior

• Posterior is

• If we take uniform prior, this is proportional to likelihood function!

• For more general priors, uniform prior not always well-defined (and even
for Bernoulli, not desirable!)
• For general parametric models and continuous priors, posterior looks
more and more like a normal distribution as 𝑛 increases, centered
around 𝜃,2 with variance of order 1/√𝑛
A Note On Notation

• We will henceforth use 𝑤(𝜃) and 𝑤 𝜃 𝐷 = 𝑤(𝜃|𝑋 ! )

for prior and posterior (𝑤 stands for “weight”) and write
𝑝# (𝑋 ! ) instead of 𝑝 𝑋 ! 𝜃) and 𝑝' 𝑋 ! for 𝑝 𝑋 ! , the
marginal probability of the data.

• So Bayes theorem becomes

…and
Bayesian Prediction/
Predictive Estimation
• As a Bayesian you prefer to output the full posterior
• But what if you are asked to make a specific prediction for the next
outcome? Then you have to come up with a distribution after all
• Bayesian predictive distribution:
Laplace Rule of Succession

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….

Laplace Rule of Succession

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….

Two Fundamentally Different Uses of Bayes
Theorem
1. A Priori Probabilities can be meaningfully estimated
(medical testing, for example!)

2. A Priori Probabilities are wild guess (and conceivably do not exist)

• Sweden/France
• “Bayesian inference” in statistics

(…in reality it’s often ‘somewhere in the middle’)

Hypothesis Testing
via Bayes Factors
• Bayes factor testing: alternative to Neyman-Pearson / E-based testing
• First, very special case: 𝐻( and 𝐻" are both point (simple) hypotheses,
just like last two weeks
• E.g. our example -

Posterior odds
Bayes Factor
Hypothesis Testing
via Bayes Factors
• Bayes factor testing: alternative to Neyman-Pearson
• First, very special case: 𝐻( and 𝐻" are both point (simple) hypotheses,
just like last week
• E.g. our example - Bayes Factor

• Jeffreys: evidence in favor of 𝐻", against 𝐻(, should be measured by the

Bayes factor
= likelihood ratio (but only if 𝐻( and 𝐻" are simple)
= posterior odds if prior odds are equal
Hypothesis Testing
via Bayes Factors
• Composite case: still Bayes Factor

• …with now 𝑝 𝐷 𝐻) ) = ∫ 𝑝 𝐷 𝜃 𝑝 𝜃 𝐻) given by the marginal likelihood

(probability of the data, averaged according to the prior ‘within’ 𝐻) )
• Evidence in favor of 𝐻", against 𝐻( ,still measured by the Bayes factor
= marginal likelihood ratio, ≠standard likelihood ratio
= posterior odds if prior odds are equal
Example:
testing whether a coin is fair

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃

" "
Θ( = %
, Θ" = 0,1 ∖ %
• Θ( is simple so no need to put prior on its elements
• Θ" represented by (for example) 𝑤" 𝜃 , uniform prior density on 0,1
"
(puts mass 0 on % so this seems o.k.)
• Evidence against 𝐻( measured by Bayes factor
Bayes factor testing in
‘non-Bayesian’ notation

𝐻( = 𝑝# 𝜃 ∈ Θ(} vs 𝐻" = 𝑝# 𝜃 ∈ Θ"} :

Evidence in favour of 𝐻" provided by the data measured by

where
Example:
testing whether a coin is fair

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃

" "
Θ( = %
, Θ" = 0,1 ∖ %
• Θ( is simple so no need to put prior on its elements
• Evidence against 𝐻( measured by Bayes factor

• …wait! Last week we saw the same formula as an e-process for testing
𝐻( against 𝐻" !?
Today, Lecture 4

25
E-Processes and Betting

• Let 𝒳 = {1, … , 𝐾}.

At each time 𝑡 = 1,2, … there are 𝐾 tickets available. Ticket 𝑘 pays off
1/𝑝((𝑘) if outcome is 𝑘, and 0 otherwise.
You may buy multiple and fractional nrs of tickets.
• You start by investing 1$ in ticket 1.
• At each time t you put fraction 𝑃I" 𝑋* = 𝑘|𝑋 *+" of your money on outcome
𝑘. Then your total capital 𝑀()) gets multiplied by 𝑀) : = 𝑝"̅ 𝑋* |𝑋 *+" /𝑝((𝑋* )
• After 1 outcome you either stop with end-capital 𝑀" or continue, putting
fraction 𝑃I% 𝑋% = 𝑘|𝑋" of 𝑀" on outcome 𝑋% = 𝑘 (“reinvest everything”).
After 2nd outcome you stop with end capital 𝑀 % = 𝑀" ⋅ 𝑀% or you
continue, and so on… 26
Good Betting Strategies

• If the null is true, you do not expect to gain any money, under any
stopping time, no matter what strategy 𝑝"̅ you use

• If you think alternative is a specific 𝑝" , then using 𝑝"̅ = 𝑝" is a good idea
• “constant” strategy

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"

27
Simple 𝑯𝟏 , log-optimal betting

If null and alternative are simple, 𝐻( = 𝑃( , 𝐻" = {𝑃"} , 𝑋", 𝑋%, … are i.i.d.
according to 𝑃", then using 𝑝"̅ = 𝑝" is a good idea. Why?
• For any choice of e-variable 𝑆& = 𝑠(𝑋& ), we have, with 𝑆 (!) = ∏!&." 𝑠(𝑋& ),
!
1 !
1
log 𝑆 = T log 𝑆& → 𝐄/∼1! [log 𝑠(𝑋)] , 𝑃" − a. s.
n 𝑛
&."
• …hence if we measure evidence against 𝐻( with same e-variable 𝑠 𝑋&
at each 𝑖 , we would like to pick 𝑠 ∗ (𝑋) maximizing
𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
leads a.s. to exponentially more money than any other e-variable!
28
Simple 𝑯𝟏 , log-optimal betting

We aim to to pick 𝑠 ∗ (𝑋) maximizing

𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(

3! /
It turns out that maximum is achieved for 𝑠∗ 𝑋 = : the LR e-variable
3" (/)
• We say: betting according to 𝑝" 𝑋& at each 𝑋& is log-optimal or GRO
(GRO = Growth-Optimal)
• We say that the LR e-variable 𝑠 ∗ (𝑋) is log-optimal/GRO
• Note that many sub-log-optimal e-variables exist as well…
3! /
e.g. 𝜆 + 1 − 𝜆 3" (/)
for any 𝜆 ∈ [0,1] or Neyman-Pearson e-variable
30
Simple 𝑯𝟏 , log-optimal betting

We aim to to pick 𝑠 ∗ (𝑋) maximizing

𝐄/∼1! [log 𝑠(𝑋)] over all e-variables 𝑠 𝑋 for 𝐻(
3! /
maximum is achieved for 𝑠∗ 𝑋 = 3" (/)
Proof: homework (with substantial hint)

31
Composite 𝑯𝟏

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
" "
Example, 𝐻(: 𝑋& ∼ Ber %
, H": X 4 ∼ Ber 𝜃 , 𝜃 ≠ % : set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !

…we use notation for conditional probabilities, but we should really think of
𝑝"̅ as a sequential betting strategy with the “conditional probabilities”
indicating how to bet/invest in the next round, given the past data
32
Composite 𝑯𝟏

• If you think 𝐻( is wrong, but you do not know which alternative is true,
then… you can try to learn 𝑝"
• Use a 𝑝"̅ that better and better mimics the true, or just “best” fixed 𝑝"
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥! ≔ !5%
, where 𝑛" is nr of 1s in 𝑥 !
…still, formally, using telescoping-in-reverse, we find that 𝑝"̅ also uniquely
defines a marginal probability distribution for 𝑋 ! , for each 𝑛 , and our
accumulated capital at time 𝑛 is again given by the likelihood ratio.
3̅! /# 3̅! (/$ ∣/$%! )
= ∏&."..!
3" (/# ) 3" (/$ ∣𝑿𝒊%𝟏 )
33
Composite 𝑯𝟏
"
Example, 𝐻(: 𝑋& ∼ Ber %
, set:
!! 5"
𝑝"̅ 𝑋!5" = 1 𝑥 ! ≔ , where 𝑛" is nr of 1s in 𝑥 !
!5%
using telescoping-in-reverse, we find that 𝑝"̅ also uniquely defines a
marginal probability distribution for 𝑋 ! , for each 𝑛 , and our accumulated
capital at time 𝑛 is again given by the likelihood ratio.
3̅! (/$ ∣/$%! ) 3̅! /# ∫ 3( /# ; # <#
∏&."..! = =
3" (/$ ) 3" (/# ) 3" (/# )

Last week’s “plug-in” strategy turns out to be equal to a Bayesian strategy:

Laplace Rule of Succession
34
Composite 𝑯𝟏 : plug-in vs. Bayes

Two general strategies for learning 𝑃" ∈ 𝐻" ∶

• ”prequential plug-in” (or simply “plug-in”) vs.
• “method-of-mixture” (or, in present simple context, simply “Bayesian”)

𝐻" Bernoulli model:

!! 5"
• plug-in based on the regularized MLE !5%
is precisely equal to Bayesian
strategy based on uniform prior

35
Composite 𝑯𝟏 : plug-in vs. Bayes

Two general strategies for learning 𝑃" ∈ 𝐻" ∶

• ”prequential plug-in” (or simply “plug-in”) vs.
• “method-of-mixture” (or, in present simple context, simply “Bayesian”)

𝐻" Bernoulli model:

!! 5=!
• plug-in based on the regularized MLE !5= is precisely equal to
! 5=)
Bayesian strategy based on beta prior 𝐵(𝑚", 𝑚%)

36
Composite 𝑯𝟏 : plug-in vs. Bayes

𝐻" Bernoulli model:

• plug-in can be precisely equal to Bayesian strategy
• Highly specific to Bernoulli/multinomial, e.g.:

𝐻" = {𝑁 𝜇, 1 : 𝜇 ∈ ℝ}
∑#
$*! /$ 5?
• plug-in: normal density with mean !5"
variance 1
• Bayes with normal prior 𝑁 𝑎, 𝜌 : Bayes predictive distribution with same
@
mean but variance 1 + ! > 1 (“out-model”)
Other models: differences even more substantial
37
General Insight for
Simple Nulls, Composite Alternatives
• If the null is simple, every Bayes factor defines an e-process:

3+! /# A /#
𝐄 3" /#
= ∫ 𝑝( 𝑋! ⋅ 3 /# 𝑑𝑋 ! = ∫ 𝑞 𝑋 ! 𝑑𝑋 ! = 1
"

• … but there are e-processes which are not Bayes factors

• general plug-in processes, e.g. for non-Bernoulli models

38
Today, Lecture 4

39
Similarities & Differences
Bayes Factor vs Neyman Pearson vs E-Testing
• In Bayesian testing, the roles of 𝐻( and 𝐻( are symmetrical
• In NP and E-testing they are not
• Type-I error control is the most important
• May seem like a bug, but turns out to be a feature when moving to
confidence intervals

• Likelihood ratios play an important role in all three theories

• NP: via the NP Lemma
• E: via growth-rate optimality of the likelihood ratio
• Bayes: via occurrence of likelihood in Bayes’ theorem
Differences
Bayes Factor vs Neyman Pearson
• The Bayesian views (marginal) likelihood ratios as evidence in favour of
either hypothesis and views the goal of testing as induction: one wants
to find out which is true, 𝐻( or 𝐻", and gets statements like ‘the
probability that 𝐻" is true is close to 95%’
• The Neymanian thinks that statements like ‘the probability of 𝐻" is…’ are
meaningless and finding out which one is true is too ambitious. She is
only interested in inductive behavior: not making mistakes too often if
one does many hypothesis tests in one’s lifetime
BF vs NP vs E

• Even though philosophies are different, we can still try to compare the
methods more closely
• As a Bayesian you can report the full posterior but it is also fine to
merely use the posterior as a tool if your goal is to make a specific
decision (which like in the NP theory can e.g. be ‘accept’ or ‘reject’)
• It then makes sense to reject the null if the Bayes posterior for 𝐻( is
smaller than 𝛼 , since then the conditional (on the data) Type-I error,
i.e. the probability that 𝐻( is true given that you reject it, is bounded by 𝛼:

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼
The Bayesian’s Conditional Type-I Error

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼

• This is intuitively correct but it does need proof!

• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼

43
The Bayesian’s Conditional Type-I Error

𝑃 𝐻( is true 𝛿 𝑋 B = reject) ≤ 𝛼

• This is intuitively correct but it does need proof:

• 𝑃 𝐻( is true {𝑋 B : 𝛿 𝑋 B = reject}) =
𝐄/, ∼1|{/, : F /, .GHIHJK} [𝑃 𝐻( is true 𝑋 B )] ≤ 𝐄/, ∼1|{/, : F /, .GHIHJK} 𝛼 =𝛼

44
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
BF in “some sense”
less conservative than E
" "
• With 𝛼 = 0.05 = and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) ≤ 1/20 is
%( %
equivalent to Bayes factor ≥ 19
• The Bayesian would reject the null if BF ≥ 19 and would get a
conditional Type-I error probability bound of 0.05
• The E-Statistician, who uses Bayesian learning for 𝐻", would reject null if
BF ≥ 20 and get an unconditional Type-I error probability bound of 0.05
• Conditional bounds imply unconditional ones (why?) but not vice versa.
• It seems the Bayesian gets better bound with less conservative rule!?!?
This is possible because the Bayesian makes much stronger assumptions
E-bounds hold irrespective of whether (uniform) prior on 𝐻" is “correct”
Bayesian bounds rely on correctness of this prior.
BF usually
more conservative than NP
" "
• With 𝛼 = 0.05 = %(
and 𝑤(𝐻() = 𝑤 𝐻" = , 𝑃 𝐻( 𝑋 ! ) < 1/20 eqv to BF > 19
%
• Suppose 𝐻(,𝐻" simple (so Bayes factor=𝐿𝑅), 𝛼 = 0.05
• NP: reject null if 𝐿𝑅 ≥ ℓ Such that 𝑃M" 𝐿𝑅 ≥ ℓ = 0.05, i.e. 𝑝 ≤ 0.05
• (in contrast to BF and E, the NP test does not depend on the actual alternative
𝑃" ∈ 𝐻" or a prior thereon; this is one advantage of it!)
How difficult is p < 0.05 as function of 𝒏?
10 20 30 .. 50 .. 100 .. 200 .. 500
≥9 ≥ 15 ≥ 20 ≥ 32 ≥ 59 ≥ 113 ≥ 269
90% 75% 67% 64% 59% 56% 54%

How difficult is BF > 19?

10 20 30 .. 50 .. 100 .. 200 .. 500
≥ 10 ≥ 17 ≥ 24 ≥ 36 ≥ 66 ≥ 124 ≥ 289
100% 85% 80% 72% 66% 62% 58%
Upcoming Weeks

• Beyond Testing: Confidence Intervals

• Composite null hypotheses

• Math: exponential families, concentration inequalities

Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
IT590 Bayesian Theory Lecture 2
No ratings yet
IT590 Bayesian Theory Lecture 2
5 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Lecture3 Statistics
No ratings yet
Lecture3 Statistics
39 pages
Bayesian Inference
No ratings yet
Bayesian Inference
18 pages
19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Stat 535: Bayesian Methods Overview
No ratings yet
Stat 535: Bayesian Methods Overview
23 pages
Statistical Computing & Monte Carlo Methods
No ratings yet
Statistical Computing & Monte Carlo Methods
23 pages
20-Bayesian 310456690
No ratings yet
20-Bayesian 310456690
34 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Studio 5 Questions
No ratings yet
Studio 5 Questions
8 pages
BML Lecture Notes
No ratings yet
BML Lecture Notes
126 pages
Bayesian Inference Basics
No ratings yet
Bayesian Inference Basics
7 pages
STAT40950 2 HypothesisTesting
No ratings yet
STAT40950 2 HypothesisTesting
13 pages
Chapter 1 B
No ratings yet
Chapter 1 B
35 pages
CH 5
No ratings yet
CH 5
45 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
24 Intro To Bayesian Inference
No ratings yet
24 Intro To Bayesian Inference
33 pages
25 Intro To Bayesian Inference
No ratings yet
25 Intro To Bayesian Inference
31 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
Mstat Note14 Bayesian Inference FSP
No ratings yet
Mstat Note14 Bayesian Inference FSP
30 pages
Modern Bayesian Econometrics
No ratings yet
Modern Bayesian Econometrics
100 pages
1-MS2 (Intro Bayes)
No ratings yet
1-MS2 (Intro Bayes)
38 pages
Introduction to Bayesian Probability
No ratings yet
Introduction to Bayesian Probability
41 pages
Bayesian Statistics: Thomas Bayes
No ratings yet
Bayesian Statistics: Thomas Bayes
22 pages
Introduction to Discrete Bayesian Methods
No ratings yet
Introduction to Discrete Bayesian Methods
146 pages
BaYesian Models Machine Learning 2016
No ratings yet
BaYesian Models Machine Learning 2016
126 pages
Single Parameter Models
No ratings yet
Single Parameter Models
37 pages
Bayes' Estimators: The Method
No ratings yet
Bayes' Estimators: The Method
7 pages
Unit - 5 ML
No ratings yet
Unit - 5 ML
57 pages
Bayesian Inference: The Basics
No ratings yet
Bayesian Inference: The Basics
37 pages
Structure Learning in Graphical Models
No ratings yet
Structure Learning in Graphical Models
49 pages
Bayesian Statistics
No ratings yet
Bayesian Statistics
76 pages
20 Bayesian2
No ratings yet
20 Bayesian2
50 pages
Bayesian Analysis in Environmental Valuation
No ratings yet
Bayesian Analysis in Environmental Valuation
34 pages
Lecture 6. Bayesian Estimation
No ratings yet
Lecture 6. Bayesian Estimation
14 pages
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
No ratings yet
Introduction To Bayesian Statistics: Foo Lee Kien (PHD)
65 pages
Introduction to Bayesian Statistics
No ratings yet
Introduction to Bayesian Statistics
6 pages
An Introduction To Bayesian Statistics
100% (9)
An Introduction To Bayesian Statistics
20 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
BST413 12jan Page1to11
No ratings yet
BST413 12jan Page1to11
11 pages
Introduction to Bayesian Econometrics
No ratings yet
Introduction to Bayesian Econometrics
30 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
79 pages
Lectures On Statistics in Theory - Prelude To Statistics in Practice
No ratings yet
Lectures On Statistics in Theory - Prelude To Statistics in Practice
94 pages
Problem Set 1 Sol
No ratings yet
Problem Set 1 Sol
7 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Bayesian Statistics Essentials
No ratings yet
Bayesian Statistics Essentials
180 pages
Statisticians' Debate: Bayes vs. Frequentism
No ratings yet
Statisticians' Debate: Bayes vs. Frequentism
40 pages
IDS22Bayes Applications
No ratings yet
IDS22Bayes Applications
34 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
No ratings yet
Bayesian Data Analysis - Reading Instructions 2: Chapter 2 - Outline
36 pages
MIT18 650F16 Bayesian Statistics
No ratings yet
MIT18 650F16 Bayesian Statistics
18 pages
How To Test The Value of A Single Regression Coefficient, I.e.:: ? T Test: ( ) Where The Standard Error of Is Computed As
No ratings yet
How To Test The Value of A Single Regression Coefficient, I.e.:: ? T Test: ( ) Where The Standard Error of Is Computed As
3 pages
Lecture 11
No ratings yet
Lecture 11
37 pages
Lecture 2
No ratings yet
Lecture 2
40 pages
2 Ijsrnsc 00620
No ratings yet
2 Ijsrnsc 00620
4 pages
L01 Cslda2025
No ratings yet
L01 Cslda2025
67 pages
Judy Taylor
No ratings yet
Judy Taylor
10 pages
Lecture 4
No ratings yet
Lecture 4
13 pages
Chapter 5
No ratings yet
Chapter 5
16 pages
Lectures 2017.pdf Manifolds
No ratings yet
Lectures 2017.pdf Manifolds
10 pages
PROB 01 CSLDA2025 v3
No ratings yet
PROB 01 CSLDA2025 v3
4 pages
Lecture05 - Survival Analysis
No ratings yet
Lecture05 - Survival Analysis
52 pages
Lectures 2017 MANIFOLDS
No ratings yet
Lectures 2017 MANIFOLDS
6 pages
Taylor Polynomials for Function Approximation
No ratings yet
Taylor Polynomials for Function Approximation
21 pages
Survival Analysis by Marta Fiocco
No ratings yet
Survival Analysis by Marta Fiocco
41 pages
Lecture 011 FOUNDATIONS OF STATISTICS
No ratings yet
Lecture 011 FOUNDATIONS OF STATISTICS
63 pages
JuniorCoach Waiver Form 3new 1 202109 1
No ratings yet
JuniorCoach Waiver Form 3new 1 202109 1
3 pages
521 Lecture 13
No ratings yet
521 Lecture 13
7 pages
Stephen Greenblatt
No ratings yet
Stephen Greenblatt
14 pages
Charector Analysis of Catherine Earnshaw
No ratings yet
Charector Analysis of Catherine Earnshaw
3 pages
Cornelius Agrippa's de Vanitate: Polemic or Paradox?
No ratings yet
Cornelius Agrippa's de Vanitate: Polemic or Paradox?
9 pages
589-Article Text-5582-1-10-20101213
No ratings yet
589-Article Text-5582-1-10-20101213
15 pages
Jaya's Journey in That Long Silence
No ratings yet
Jaya's Journey in That Long Silence
7 pages
Deconstructing Father-Son Relationships in Khaled Hosseini's The Kite Runner
No ratings yet
Deconstructing Father-Son Relationships in Khaled Hosseini's The Kite Runner
8 pages
Administrative Law and The Regulatory
No ratings yet
Administrative Law and The Regulatory
7 pages
Gothic Tales - Mary Shelley - PDF
No ratings yet
Gothic Tales - Mary Shelley - PDF
94 pages
Heaven and Hell (1793) - in Other Poems, He Expressed Powerfully A Vision of The New Urban World As Plagued by
0% (1)
Heaven and Hell (1793) - in Other Poems, He Expressed Powerfully A Vision of The New Urban World As Plagued by
2 pages
The Reason Why
No ratings yet
The Reason Why
1 page
S Fainstein The Just City
No ratings yet
S Fainstein The Just City
4 pages
Understanding Security INTERNAL
No ratings yet
Understanding Security INTERNAL
16 pages
BAYM, Nancy. Making New Media Make Sense.
No ratings yet
BAYM, Nancy. Making New Media Make Sense.
17 pages
Comparing Three English Dictionaries
No ratings yet
Comparing Three English Dictionaries
2 pages
Brader Marcus Oxford Handbookof Political Psychology Chapter 62013
No ratings yet
Brader Marcus Oxford Handbookof Political Psychology Chapter 62013
41 pages
Vastu and Home Design Elements
No ratings yet
Vastu and Home Design Elements
1 page
Abraham Hicks & The Universe
100% (2)
Abraham Hicks & The Universe
2 pages
Lifelong Learning for Believers
No ratings yet
Lifelong Learning for Believers
2 pages
2 Theory TFN
No ratings yet
2 Theory TFN
29 pages
Celebrating Girl Scout Values and Leadership
No ratings yet
Celebrating Girl Scout Values and Leadership
8 pages
Henry Mintzberg On Strategic Management PDF
100% (1)
Henry Mintzberg On Strategic Management PDF
24 pages
Graphic Organizers For Writing: 1. Persuasion Map
No ratings yet
Graphic Organizers For Writing: 1. Persuasion Map
19 pages
The Looseness of Loose Coupling: The Use and Misuse of "Loose Coupling" in Higher Education Research
No ratings yet
The Looseness of Loose Coupling: The Use and Misuse of "Loose Coupling" in Higher Education Research
19 pages
PhA 076 - Inwood, Mansfeld (Eds.) - Assent and Argument - Studies in Cicero's Academic Books (7th Symposium Hellenisticum) 1997 PDF
No ratings yet
PhA 076 - Inwood, Mansfeld (Eds.) - Assent and Argument - Studies in Cicero's Academic Books (7th Symposium Hellenisticum) 1997 PDF
341 pages
Clinical Practice Guidelines An Overview Of.7
No ratings yet
Clinical Practice Guidelines An Overview Of.7
13 pages
English Speech
No ratings yet
English Speech
3 pages
A Speech For
No ratings yet
A Speech For
1 page
Linguistics: Saussure to Bloomfield
No ratings yet
Linguistics: Saussure to Bloomfield
2 pages

FSMLecture 4

Uploaded by

FSMLecture 4

Uploaded by

Today, Lecture 4

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

Note: for all distributions on Ω,

Bernoulli is the restriction to those distrs with

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• Let Ω = 𝒳 ! be a sample space and suppose we observe data

• The method of maximum likelihood (Fisher, 1922) tells us to pick, as a

• From the Bayesian perspective, you do not necessarily want to make a

• From the Bayesian perspective, you do not necessarily want to make a

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

• If we take uniform prior 𝑝 𝜃 ≡ 1, this is proportional to likelihood

• If we take uniform prior, this is proportional to likelihood function!

• We will henceforth use 𝑤(𝜃) and 𝑤 𝜃 𝐷 = 𝑤(𝜃|𝑋 ! )

• So Bayes theorem becomes

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….

• For the Bernoulli model with uniform prior 𝑊,

…a formula first derived by Laplace, around 1800.

We can also view these predictions as a ‘Bayesian estimate’ of 𝜃 ….

2. A Priori Probabilities are wild guess (and conceivably do not exist)

(…in reality it’s often ‘somewhere in the middle’)

• Jeffreys: evidence in favor of 𝐻", against 𝐻(, should be measured by the

• …with now 𝑝 𝐷 𝐻) ) = ∫ 𝑝 𝐷 𝜃 𝑝 𝜃 𝐻) given by the marginal likelihood

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃

𝐻( = 𝑝# 𝜃 ∈ Θ(} vs 𝐻" = 𝑝# 𝜃 ∈ Θ"} :

• Under 𝑃# , data are i.i.d. Bernoulli 𝜃

• Let 𝒳 = {1, … , 𝐾}.

We aim to to pick 𝑠 ∗ (𝑋) maximizing

We aim to to pick 𝑠 ∗ (𝑋) maximizing

Last week’s “plug-in” strategy turns out to be equal to a Bayesian strategy:

Two general strategies for learning 𝑃" ∈ 𝐻" ∶

𝐻" Bernoulli model:

Two general strategies for learning 𝑃" ∈ 𝐻" ∶

𝐻" Bernoulli model:

𝐻" Bernoulli model:

• … but there are e-processes which are not Bayes factors

• Likelihood ratios play an important role in all three theories

• This is intuitively correct but it does need proof!

• This is intuitively correct but it does need proof:

How difficult is BF > 19?

• Beyond Testing: Confidence Intervals

• Composite null hypotheses

• Math: exponential families, concentration inequalities

You might also like