0% found this document useful (0 votes)

14 views16 pages

MLB Assignment 7 Final

The document outlines a home assignment for a Machine Learning course, focusing on three main topics: XGBoost regression for predicting quasar redshift, a simplified version of Empirical Bernstein's inequality, and PAC-Bayes-Unexpected-Bernstein. It details the steps taken to implement XGBoost, including data preparation, model training, hyperparameter tuning, and performance evaluation against a baseline model. Additionally, it presents mathematical proofs related to the inequalities and their implications in probability theory.

Uploaded by

Benjamin Baadsager

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views16 pages

MLB Assignment 7 Final

Uploaded by

Benjamin Baadsager

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning B (2025)

Home Assignment 7
Benjamin Baadsager (NPR151)

Contents
1 XGBoost (30 points) 2

2 A simple version of Empirical Bernstein’s inequality (30 points) 4

3 PAC-Bayes-Unexpected-Bernstein (40 points) 8

1
1 XGBoost (30 points)
We aim to predict quasar redshift y ∈ R from ten photometric features x ∈ R10 using
an XGBoost regression model. The dataset quasars.csv contains n observations, each row
giving ten input variables and the corresponding redshift.

1.
First, we load the CSV file and separate the first ten columns as our feature matrix X
and the last column as our target vector y. We then split the data into training (80%)
and hold-out test (20%) sets with scikit-learn’s train test split(X, y, test size=0.2,
random state=2025) with random state=2025 to ensure reproducibility.

2.
Next, we carve out 10% of the training data to form a validation set (again via train test split).
Using the remaining 90% for training, we initialize an XGBRegressor with:

• objective=’reg:squarederror’

• colsample bytree = 0.5

• learning rate = 0.1

• max depth = 4

• reg lambda = 1

• n estimators = 500

We pass our training and validation splits into the fit method via the eval set argument
so XGBoost can record RMSE at each boosting round. After fitting, we pull out the
RMSE curves from model.evals result() and plot training vs. validation RMSE against
boosting iterations. This is shown in the below figure:

2
As expected, both curves drop initially, with the validation error leveling off.

Finally, we use the fitted model to predict on the hold-out test set. The initial performance
is:

• Test RMSE: 0.4994

• Test R2: 0.3778

3.
To improve performance, we perform a grid search over a broad set of hyperparame-
ters—varying colsample bytree, learning rate, max depth, reg lambda and n estimators.
Especially we extend the grid to

• colsample bytree: [0.3, 0.5, 0.7, 0.9]

• learning rate: [0.01, 0.1, 0.05]

• max depth: [3, 4, 5, 6]

• reg lambda: [0.1, 1, 2]

• n estimators: [200, 300, 500, 700, 900]

3
We use GridSearchCV with 3-fold cross-validation on the 80% training data, optimizing
for RMSE by specifying scoring=’neg root mean squared error’. This returns the best
combination:
• colsample bytree: 0.7
• learning rate: 0.01
• max depth: 6
• reg lambda: 2
• n estimators: 500
With these optimal hyperparameters in hand, we refit an XGBoost model on the full 80%
training set and evaluate on the 20% hold-out test set. As a simple baseline, we also train
a 5-NN regressor (KNeighborsRegressor(n neighbors=5)). The test results are:

Model Test RMSE Test R2

Grid-Search XGBoost 0.4754 0.4361
5-NN Baseline 0.4778 0.4306

We see how the tuned XGBoost model slightly outperforms the 5-NN baseline in both
RMSE and R2.

2 A simple version of Empirical Bernstein’s inequality

(30 points)
1. Let X and X ′ be two independent identically distributed random variables with mean
µ = E [X] and variance Var(X) = σ 2 . We then wish to prove that
E (X − X ′ )2 = 2 Var(X).

Observe first that

2
(X − X ′ )2 = X 2 + X ′ − 2 X X ′ .
Taking expectations on both sides yields
2
E (X − X ′ )2 = E[ X 2 ] + E[ X ′ ] − 2 E[ X X ′ ]

2
= E[ X 2 ] + E[ X ′ ] − 2 E[ X]E [ X ′ ]
= 2E X 2 − 2µ2

where we used that X ′ is an independent copy of X, thus µ = E[X ′ ] and Var(X ′ ) = σ 2 .

Finally, since Var(X) = E [X 2 ] − µ2 , we get
2 E[X 2 ] − 2 µ2 = 2 E[X 2 ] − µ2 = 2 Var(X).

4
and hence
E (X − X ′ )2 = 2 Var(X),

as required.

2. Let X1 , X2 , . . . , Xn be i.i.d. random variables taking values in [0, 1], and assume n is
P n/2 2
even. Define νbn = n1 i=1 X2i − X2i−1 and ν = Var(X1 ). We then wish to prove that
 s 
1
ln δ
P ν ≥ ν̂n + ≤δ
n

We let
n
Zi := (X2i − X2i−1 )2 , i = 1, ...,
2
Since each Xj ∈ [0, 1], we have Zi ∈ [0, 1]. Independence of the Xj implies that the Zi are
independent as well. Moreover, by part 1,

E [Zi ] = E (X2i − X2i−1 )2 = 2 Var(X1 ) = 2ν

Next we observe that

n/2
2X
Zi = 2ν̂n
n i=1
and  
n/2
2 X
E Zi  = 2ν
n i=1

Next we define: s
1

ln δ
ε=
n
and observe the equivalences
 
n/2 n/2
2 X 2X
ν ≥ ν̂n + ε ⇔ 2ν − 2ν̂n ≥ 2ε ⇔ E  Zi −
 Zi ≥ 2ε
n i=1 n i=1

Now apply Hoeffding’s inequality to the independent [0, 1]–valued Zi :

   
n/2 n/2
2 X 2 X n
Zi ≥ 2ε ≤ exp −2 (2ε)2 = exp −4nε2

P E  Zi  −
n i=1 n i=1 2

5
Next we obtain by inserting the expression for ε:
     s 2 
n/2 n/2
ln 1δ

2X  2X 4
Zi − Zi ≥ 2ε ≤ exp −4n  =δ
 
P E 
n i=1 n i=1 n

since δ ∈ (0, 1) we get  s

1

ln δ
P 2ν − 2ν̂n ≥ 2 ≤δ
n

Thus:  s
1

ln δ
P ν − ν̂n ≥ ≤δ
n

and hence we get  s

1

ln δ
P ν ≥ ν̂n + ≤δ
n

as required.

3. Let X1 , ..., Xn , n, ν and ν̂n as before, and let µ = E [X1 ]. We now wish to prove that

n
s ! 43 
2 √ 2 2

1 X 2ν̂n ln δ ln δ ln δ
P µ ≥ Xi + + 2 + ≤δ
n i=1 n n 3n

We start be fixing δ ∈ (0, 1) and set

s
2

ln δ
ε=
n
Next define the event
B = {ν ≤ ν̂n + ε}
By part (2)( with confidence parameter 2δ )
 s 
2
ln
P(B) = P ν > ν̂n + ≤ δ
δ
n 2

On B we have
s s
2 2 √ √

2ν ln δ
ln δ
√
≤ 2(ν̂n + ε) = a+b≤ a+ b
n n

6
with
2 2

ln δ
ln δ
a = 2ν̂n , b = 2ε
n n
q 2
ln(
δ )
Since ε = n
we obtain:
s ! 43
√ 2 √ 2

ln δ
ln δ
b= 2ε = 2
n n

Hence on B, s s ! 34
2 2 √ 2

2ν ln δ
ln δ
ln δ
≤ 2ν̂n + 2
n n n
Now let
! 43
 s 
n 2 √ 2 2

1X  2ν̂n ln δ
ln δ
ln δ
A= µ≥ Xi + + 2 +
 n i=1 n n 3n 

On B, the bound above shows

 s 
n 2
2

 1X 2ν ln δ
ln δ
A⊂ µ≥ Xi + +
 n i=1 n 3n 

But by Bernstein’s inequality (applied with variance ν and confidence parameter δ/2),
 s 
n 2
2
1 2ν ln δ ln δ
≤ δ
X
P µ ≥ Xi + +
n i=1 n 3n 2

δ
Hence we have P(A|B) ≤ 2
and thus

δ δ
P(A) ≤ P(A|B) + P(B) ≤ + =δ
2 2
which gives us the desired the result:

n
s ! 43 
2 √ 2 2

1 X 2ν̂n ln δ
ln δ
ln δ
P µ ≥ Xi + + 2 + ≤δ
n i=1 n n 3n

7
3 PAC-Bayes-Unexpected-Bernstein (40 points)
1.

Let Z ≤ 1 be a random variable. We then wish to show, that for any λ ∈ [0, 12 ]:
h 2 2
i
E e−λZ−λ Z ≤ e−λE[Z]

1
We start by letting z = −λZ. Then since 0 ≤ λ ≤ 2
and Z ≤ 1, we have

−λZ ≥ −λ ≥ − 12

so the hinted inequality

z − z 2 ≤ ln(1 + z) (∀z ≥ −1/2)
applies. Noting that
z − z 2 = −λZ − λ2 Z 2
we obtain
−λZ − λ2 Z 2 ≤ ln(1 − λZ).
Exponentiating both sides gives
2Z2
e−λZ−λ ≤ 1 − λZ.

Taking expectations,
h 2 2
i
E e−λZ−λ Z ≤ E[1 − λZ] = 1 − λE[Z].

Finally, since 1 + x ≤ ex for all x, with x = −λE [Z] we get

1 − λE[Z] ≤ e−λE[Z] .

Combining these h i
2 2
E e−λZ−λ Z ≤ 1 − λE[Z] ≤ e−λE[Z] ,
as required.

Here the assumption Z ≤1 and λ ≤ 21 ensure −λ ≥ − 12 , so we may use z − z 2 ≤ ln(1 + z).
The assumption λ ∈ 0, 12 also guarantees 1 − λZ > 0, making ln(1 − λZ) well-defined.

In this exercise we aim to prove that for Z ≤ 1 and λ ∈ [0, 21 ] we have

h 2 2
i
E eλ(E[Z]−Z)−λ Z ≤ 1

8
We start by rewriting the following
2Z2 2Z2)
eλ(E[Z]−Z)−λ = eλE[Z] e(−λZ−λ

Taking expectations gives

h 2 2
i h 2 2
i
E eλ(E[Z]−Z)−λ Z = eλE[Z] E e−λZ−λ Z

By part 1, for Z ≤ 1 and λ ∈ [0, 21 ] we have

h 2 2
i
E e−λZ−λ Z ≤ e−λE[Z]

Hence h i
2 2
E eλ(E[Z]−Z)−λ Z ≤ eλE[Z] · e−λE[Z] = 1
as claimed.

Let Z1 , . . . , Zn be independent random variables with Z ≤ 1, and fix λ ∈ [0, 12 ]. We then

aim to show " #
X n Xn
E[Zi ] − Zi − λ2 Zi2

E exp λ ≤ 1.
i=1 i=1

First write
n
X n
X n h
X i
− λ2 Zi2 = − λ2 Zi2 .

S=λ E[Zi ] − Zi λ E[Zi ] − Zi
i=1 i=1 i=1

Then n
Y
exp S = exp λ (E[Zi ] − Zi ) − λ2 Zi2 .
i=1

By independence,
" # n
Y h i
E exp S = E exp λ (E[Zi ] − Zi ) − λ2 Zi2 .
i=1

But from part 2 each factor

h i
E exp λ (E[Zi ] − Zi ) − λ2 Zi2 ≤ 1,

Hence n n
Y h i Y
2 2
E exp λ (E[Zi ] − Zi ) − λ Zi ≤ 1 = 1.
i=1 i=1

9
Thus we get the desired result:
" n n
#
X X
2 2
E exp λ (E[Zi ] − Zi ) − λ Zi ≤ 1,
i=1 i=1

We let Z1 , ...Zn be independent random variables upper bounded by 1 and fix λ ∈ (0, 21 ].
We wish to show:
" n # n n !
1X 1X λX 2 1 1
P E Zi ≥ Zi + Zi + ln ≤δ
n i=1 n i=1 n i=1 λn δ

From part 3 we have the follwoing

" n n
#
X X
2 2
E exp λ (E[Zi ] − Zi ) − λ Zi ≤ 1 (*)
i=1 i=1

Define the event:

( n n n )
X X X 1 1
A= E [Zi ] ≥ Zi + λ Zi2 + ln
i=1 i=1 i=1
λ δ

Then we have on A:
n n n
X X X 1 1
E [Zi ] − Zi ≥ λ Zi2 + ln
i=1 i=1 i=1
λ δ

Exponentiating (since ex is increasing) and multiply each term with λ gives

n
! n
!
X
2
X
2 1 1
exp λ (E [Zi ] − Zi ) − λ Zi ≥ exp ln =
i=1 i=1
δ δ

Now let !
n
X n
X
2
X=λ (E [Zi ] − Zi ) −λ Zi2
i=1 i=1

We then apply Markov’s inequality

1
P(A) = P exp (X) ≥
δ
≤ δE [exp (X)]

10
By (*) we have E [exp(X)] ≤ 1, so
n n n !
X X X 1 1
P E [Zi ] ≥ Zi + λ Zi2 + ln ≤δ
i=1 i=1 i=1
λ δ

Dividing all sums by n yields exactly the stated inequality:

" n # n n !
1X 1X λX 2 1 1
P E Zi ≥ Zi + Zi + ln ≤δ
n i=1 n i=1 n i=1 λn δ

Let Λ = {λ1 , . . . , λk } be a grid of k values, such that λi ∈ (0, 12 ] for all i. Prove that:
" n # n n
!!
1X 1X 1 X 2 ln(k/δ)
P E Zi ≥ Zi + min λ · Z + ≤ δ.
n i=1 n i=1 λ∈Λ n i=1 i λn

For each fixed λ ∈ Λ, part 4 gives

" n # n n
!
1X 1X 1 X 2 ln( δ1′ )
P E Zi ≥ Zi + λi · Z + ≤ δ′
n i=1 n i=1 n i=1 i λi n

For any δ ′ > 0. Now choose δ ′ = δ/k. Then for each λi ∈ Λ

" n # n n
!
1X 1X 1 X 2 ln( kδ ) δ
P E Zi ≥ Zi + λi · Zi + ≤
n i=1 n i=1 n i=1 λi n k

By the union bound over the k grid points, the probability that any λi fails its bound is
at most
" n # n n
! k
1X 1X 1 X 2 ln(k/δ) X δ
P ∃λ ∈ Λ : E Zi ≥ Zi + λ · Zi + ≤ = δ.
n i=1 n i=1 n i=1 λn i=1
k

Equivalently, with probability at least 1 − δ we have

" n # n n
1X 1X 1 X 2 ln(k/δ)
E Zi ≤ Zi + λ · Z + , for all λ ∈ Λ.
n i=1 n i=1 n i=1 i λn

Since this holds for all λ ∈ Λ, it also holds for the one corresponding to the smallest
right–hand side, so
" n # n n
!
1X 1X 1 X 2 ln(k/δ)
E Zi ≤ Zi + min λ · Z + .
n i=1 n i=1 λ∈Λ n i=1 i λn

11
and hence we end up with desired result:
" n # n n
!!
1X 1X 1 X 2 ln(k/δ)
P E Zi ≥ Zi + min λ · Z + ≤ δ.
n i=1 n i=1 λ∈Λ n i=1 i λn

We now wish to compare two high-probability upper bounds on p − p̂n when sampling
n = 100 i.i.d. draws from the three-point distribution
1 − p1
P(Z = 0) = P (Z = 1) = 2
, P(Z = 12 ) = p1/2
2
1
so that the true mean is p = 2
for all p1/2 ∈ [0, 1]. For each p1/2 on a fine grid, we simulate
Z1 , ...Z100 and compute
n n
1X 1X 2
p̂n = Zi , v̂n = Z
n i=1 n i=1 i
and plot:

• kl bound: !
1
−1+ ln δ
p − p̂n ≤ kl p̂n , − p̂n
n

• Unexpected-Bernstein bound: Choose

r
n
k = log2 /2 , Λ = { 12 , 212 , ..., 21k }
ln(1/δ)

Then with probability at least 1 − δ

ln(k/δ)
p − p̂n ≤ min λv̂n +
λ∈Λ λn

The below figure shows the two bounds as a function of p1/2 with a fixed confidence
parameter δ = 0.05.

12
kl is tight at p1/2 = 0 (the Bernoulli case), but as p1/2 grows and variance shrinks, the
Unexpected-Bernstein bound performs better than the kl bound.

Define Zi = ℓ(h(Xi ), Yi ) so that each Zi ∈ [0, 1]. Then

n n
1X 1X 2
L(h) = E [ℓ(h(X), Y )] = E [Zi ] , L̂(h, S) = Zi , V̂ (h, S) = Z
n i=1 n i=1 i

Our goal is then to show, that for any λ ∈ [0, 21 ]:

h i
E exp n λ L(h) − L̂(h, S) − λ2 V̂ (h, S) ≤1

Start by observing the following

n
1X
L(h) − L̂(h, S) = E [Zi ] − Zi
n i=1

13
and n
1X 2
V̂ (h, S) = Z
n i=1 i
Hence
n
! n
!

2
1X 21
X
n λ L(h) − L̂(h, S) − λ V̂ (h, S) = n λ E [Zi ] − Zi −λ Zi2
n i=1 n i=1

Which can be written as

n
X n
X n
X n
X n
X n
X n
X
λnE [Zi ]−λ Zi −λ2 Zi2 = λ E [Zi ]−λ Zi −λ2 Zi2 = λ (E [Zi ]−Zi )−λ2 Zi2
i=1 i=1 i=1 i=1 i=1 i=1 i=1

Hence:
n n
!
X X
exp n λ L(h) − L̂(h, S) − λ2 V̂ (h, S) = exp λ (E [Zi ] − Zi ) − λ2 Zi2
i=1 i=1

Next using the independence of the Zi ≤ 1 together with part 3 gives, for any λ ∈ [0, 12 ],
" n n
!#
X X
E exp λ (E [Zi ] − Zi ) − λ2 Zi2 ≤1
i=1 i=1

Thus we conclude that for any λ ∈ [0, 21 ] we have

h i
2
E exp n λ L(h) − L̂(h, S) − λ V̂ (h, S) ≤1

as required.

As the hint suggests, we define the function

f (h, S) := n λ(L(h) − L̂(h, S)) − λ2 V̂ (h, S) ,

with λ ∈ (0, 21 ] and

n
1X
V̂ (h, S) = ℓ(h(Xi ), Yi )2
n i=1
From step 7 we know for each fixed h that

ES [exp (f (h, S))] ≤ 1.

14
Hence also
Eh∼π [ES [exp (f (h, S))]] ≤ 1.
for any prior π independent of S.

Now apply the PAC–Bayes lemma (Lemma 3.28): with probability at least 1 − δ over the
draw of S, every posterior ρ satisfies
!!
Eh∼π ES ef (h,S)
P ∃ρ : Eh∼ρ [f (h, S)] ≥ KL(ρ∥π) + ln ≤ δ.
δ

Since we already have shown that Eh∼π [ES [exp (f (h, S))]] ≤ 1, we get the following:

1
P ∃ρ : Eρ [f (h, S)] ≥ KL(ρ∥π) + ln ≤ δ.
δ
Substituting back f and rearranging gives:
h i
P ∃ρ : Eρ n λ(L(h) − L̂(h, S)) − λ2 V̂ (h, S) ≥ KL(ρ∥π) + ln 1
δ
≤ δ.

Thus we have with probability at least 1 − δ, that for all ρ

h i
Eρ n λ(L(h) − L̂(h, S)) − λ2 V̂ (h, S) ≤ KL(ρ∥π) + ln 1

δ
.

Which can be written as:

nλ Eρ [L(h)] − Eρ [L̂(h, S)] ≤ nλ2 Eρ [V̂ (h, S)] + KL(ρ∥π) + ln 1

δ
.

Dividing by nλ > 0, yields with probability at least 1 − δ:

1

KL(ρ∥π) + ln δ
Eρ [L(h)] ≤ Eρ [L̂(h, S)] + λEρ [V̂ (h, S)] + .
nλ
Thus we get the desired result as:
1
!
KL(ρ∥π) + ln δ
P ∃ρ : Eρ [L(h)] ≥ Eρ [L̂(h, S)] + λEρ [V̂ (h, S)] + ≤ δ,
nλ

Let S be an i.i.d. sample, ℓ ≤ 1 a loss function, and π any prior on H. Define

n n
b S) = 1 1X
X 2
L(h, ℓ h(Xi ), Yi , Vb (h, S) = ℓ h(Xi ), Yi ,
n i=1 n i=1

15
and denote the true loss L(h) = E[ℓ(h(X), Y )].

From Part 8 we have: for every λ ∈ (0, 21 ],

KL(ρ∥π) + ln 1δ
P ∃ ρ : Eρ [ L(h) ] ≥ Eρ [ L(h, S) ] + λ Eρ [ V (h, S) ] +
b b ≤ δ
nλ
Now let Λ = {λ1 , λ2 , . . . , λk } ⊂ (0, 21 ]. For each λi ∈ Λ, replace δ by δ/k in the bound
above. Then
KL(ρ∥π) + ln kδ

δ
P ∃ ρ : Eρ [ L(h) ] ≥ Eρ [ L(h, S) ] + λi Eρ [ V (h, S) ] +
b b ≤
n λi k
A union bound over i = 1, ..., k shows that with probability at least 1 − δ, all k inequalities
hold simultaneously. Hence for every posterior ρ we get

KL(ρ∥π) + ln kδ

Eρ [ L(h) ] ≤ Eρ [ L(h, S) ] + λi Eρ [ V (h, S) ] +
b b
n λi
Taking the minimum over i then yields for every ρ with probability at least 1 − δ:
k o

n KL(ρ∥π) + ln δ
Eρ [ L(h) ] ≤ Eρ [ L(h,
b S) ] + min λ Eρ [ Vb (h, S) ] + .
λ∈Λ nλ
which can be written as
k
b S) ] + min λ Eρ [ Vb (h, S) ] + KL(ρ∥π) + ln δ

P ∃ ρ : Eρ [ L(h) ] ≥ Eρ [ L(h, ≤ δ.
λ∈Λ nλ
This is exactly the desired statement.

Lecture Notes 2 1 Probability Inequalities
No ratings yet
Lecture Notes 2 1 Probability Inequalities
9 pages
Probability Inequalities and Their Applications
No ratings yet
Probability Inequalities and Their Applications
9 pages
Lec1 Markov Applications
No ratings yet
Lec1 Markov Applications
5 pages
X400004 20220215 Solutions
No ratings yet
X400004 20220215 Solutions
8 pages
Econometric Analysis MT Official Problem Set Solution 3
No ratings yet
Econometric Analysis MT Official Problem Set Solution 3
9 pages
Machine Learning Homework Guide
No ratings yet
Machine Learning Homework Guide
2 pages
CS 174: Problem Set 2 Solutions
No ratings yet
CS 174: Problem Set 2 Solutions
6 pages
Sol3 2015
No ratings yet
Sol3 2015
8 pages
Bernstein's Inequality, and Generalizations: CS281B/Stat241B (Spring 2003) Statistical Learning Theory
No ratings yet
Bernstein's Inequality, and Generalizations: CS281B/Stat241B (Spring 2003) Statistical Learning Theory
4 pages
Probability and Statistics 22s Soln
No ratings yet
Probability and Statistics 22s Soln
4 pages
Probability Bounds in Learning Theory
No ratings yet
Probability Bounds in Learning Theory
14 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
STAT 135 Solutions To Homework 4:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 4:: 30 Points
9 pages
Advanced Probability Homework
No ratings yet
Advanced Probability Homework
13 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Econometrics - Fumio Hayashi (Solutions)
No ratings yet
Econometrics - Fumio Hayashi (Solutions)
19 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
Solution 3 Problem 1: Let X
No ratings yet
Solution 3 Problem 1: Let X
12 pages
Solutions Chapter 11
No ratings yet
Solutions Chapter 11
6 pages
Martingale and Stochastic Regression Theory
No ratings yet
Martingale and Stochastic Regression Theory
155 pages
Advanced Regression Techniques
No ratings yet
Advanced Regression Techniques
7 pages
Probability Lecture Notes CMU
No ratings yet
Probability Lecture Notes CMU
70 pages
728HA1
No ratings yet
728HA1
6 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Ashish Mcdiarmid
No ratings yet
Ashish Mcdiarmid
22 pages
E2 201: Information Theory (2019) Solutions To Homework 3
No ratings yet
E2 201: Information Theory (2019) Solutions To Homework 3
11 pages
hw3 Sol
No ratings yet
hw3 Sol
8 pages
ES Key
No ratings yet
ES Key
6 pages
Exercise 0: Probability Theory: N I I N N I
No ratings yet
Exercise 0: Probability Theory: N I I N N I
3 pages
Solutions To Exam 1: 1 2 N N A N
No ratings yet
Solutions To Exam 1: 1 2 N N A N
3 pages
Lecture 4 Inequalities and Asymptotic Estimates
No ratings yet
Lecture 4 Inequalities and Asymptotic Estimates
9 pages
Chernoff Bounds in Random Variables
No ratings yet
Chernoff Bounds in Random Variables
4 pages
CH 2
No ratings yet
CH 2
11 pages
HW1 Solutions
No ratings yet
HW1 Solutions
9 pages
Sufficient Statistics & Exponential Families
No ratings yet
Sufficient Statistics & Exponential Families
8 pages
05 Fall Exam1 Soln
No ratings yet
05 Fall Exam1 Soln
2 pages
Math525 2
No ratings yet
Math525 2
8 pages
Exercise 3 Computer Intensive Statistics
No ratings yet
Exercise 3 Computer Intensive Statistics
10 pages
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set #3 Solutions: Theory & Unsuper-Vised Learning
16 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
Sol 2
No ratings yet
Sol 2
7 pages
Problem Set4
No ratings yet
Problem Set4
11 pages
Solutions To Selected Problems-Duda, Hart
67% (3)
Solutions To Selected Problems-Duda, Hart
12 pages
Sup LAWS
No ratings yet
Sup LAWS
11 pages
Corrige Exam 2025
No ratings yet
Corrige Exam 2025
7 pages
1 Inequalities: 1.1 Markov
No ratings yet
1 Inequalities: 1.1 Markov
15 pages
Probabilistic Modelling in Belief Networks
No ratings yet
Probabilistic Modelling in Belief Networks
13 pages
Homework 2
No ratings yet
Homework 2
10 pages
Final Exam Solution 20220201
No ratings yet
Final Exam Solution 20220201
14 pages
MA 4040/ MA 2540: Probability Theory
No ratings yet
MA 4040/ MA 2540: Probability Theory
12 pages
Problem Set 2 Solution
No ratings yet
Problem Set 2 Solution
10 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
Homework - 1
No ratings yet
Homework - 1
10 pages
T1.1 Branches of Biology
No ratings yet
T1.1 Branches of Biology
3 pages
Lesson-Plan P.E 12 - Week 2
No ratings yet
Lesson-Plan P.E 12 - Week 2
3 pages
IELTS Task 1
No ratings yet
IELTS Task 1
14 pages
Feldman10 PPT ch10
50% (6)
Feldman10 PPT ch10
48 pages
Syllabus - Grade 8 - First Term 2024-25
No ratings yet
Syllabus - Grade 8 - First Term 2024-25
7 pages
Bundle Test Bank Cell Signalling 4th Edition Instant Download
No ratings yet
Bundle Test Bank Cell Signalling 4th Edition Instant Download
403 pages
How Much Do You Know About The Iron Claw Movie Qu
No ratings yet
How Much Do You Know About The Iron Claw Movie Qu
8 pages
Board Game Studies Colloquium 2025 Progr
No ratings yet
Board Game Studies Colloquium 2025 Progr
1 page
Chapter 4 - Conclusions and Recommendations
No ratings yet
Chapter 4 - Conclusions and Recommendations
2 pages
Soal UAS Bahasa Inggris PAI Semester I
No ratings yet
Soal UAS Bahasa Inggris PAI Semester I
2 pages
Assessment of Pediatric Client Whole Physical Assessment With Anthropomorphic
No ratings yet
Assessment of Pediatric Client Whole Physical Assessment With Anthropomorphic
51 pages
Plaquette MA Fashion Design
No ratings yet
Plaquette MA Fashion Design
9 pages
Business English: Stage#2
No ratings yet
Business English: Stage#2
1 page
The Importance of English - 10 Reasons To Learn It
No ratings yet
The Importance of English - 10 Reasons To Learn It
3 pages
Analysis of Sports Performance Prediction Model Ba
No ratings yet
Analysis of Sports Performance Prediction Model Ba
12 pages
2021 ACM CHI A Visual Analytics Approach To Facilitate The Proc
No ratings yet
2021 ACM CHI A Visual Analytics Approach To Facilitate The Proc
18 pages
Nmuc International Fees 2024
No ratings yet
Nmuc International Fees 2024
2 pages
FIITJEE Programs for Students
No ratings yet
FIITJEE Programs for Students
76 pages
Life Instructions
No ratings yet
Life Instructions
2 pages
Ranjith Kumar's Resume Overview
No ratings yet
Ranjith Kumar's Resume Overview
3 pages
Grade 8 Masterlist 2022 2023
No ratings yet
Grade 8 Masterlist 2022 2023
16 pages
Frustration Theory Abram Amsel Digital Download
No ratings yet
Frustration Theory Abram Amsel Digital Download
126 pages
Egantari Your Natal Chart
No ratings yet
Egantari Your Natal Chart
5 pages
APFS ApplicationPreview
No ratings yet
APFS ApplicationPreview
2 pages
Assessment Brief 2 and Report Structure
No ratings yet
Assessment Brief 2 and Report Structure
6 pages
Ericka Joyce O. Reynera's Profile
No ratings yet
Ericka Joyce O. Reynera's Profile
2 pages
Contemporary Problems Question Paper 1
No ratings yet
Contemporary Problems Question Paper 1
1 page
POL216 Letter
No ratings yet
POL216 Letter
3 pages
HRM Long Ans Q2: On-The-Job Training or Internal Training T
No ratings yet
HRM Long Ans Q2: On-The-Job Training or Internal Training T
8 pages
4 Pillar of Education
No ratings yet
4 Pillar of Education
2 pages

MLB Assignment 7 Final

Uploaded by

MLB Assignment 7 Final

Uploaded by

Machine Learning B (2025)

2 A simple version of Empirical Bernstein’s inequality (30 points) 4

3 PAC-Bayes-Unexpected-Bernstein (40 points) 8

• colsample bytree = 0.5

• learning rate = 0.1

• Test RMSE: 0.4994

• Test R2: 0.3778

• colsample bytree: [0.3, 0.5, 0.7, 0.9]

• learning rate: [0.01, 0.1, 0.05]

• max depth: [3, 4, 5, 6]

• reg lambda: [0.1, 1, 2]

• n estimators: [200, 300, 500, 700, 900]

Model Test RMSE Test R2

2 A simple version of Empirical Bernstein’s inequality

Observe first that

where we used that X ′ is an independent copy of X, thus µ = E[X ′ ] and Var(X ′ ) = σ 2 .

E [Zi ] = E (X2i − X2i−1 )2 = 2 Var(X1 ) = 2ν

Next we observe that

Now apply Hoeffding’s inequality to the independent [0, 1]–valued Zi :

since δ ∈ (0, 1) we get  s

and hence we get  s

We start be fixing δ ∈ (0, 1) and set

On B, the bound above shows

so the hinted inequality

Finally, since 1 + x ≤ ex for all x, with x = −λE [Z] we get

In this exercise we aim to prove that for Z ≤ 1 and λ ∈ [0, 21 ] we have

Taking expectations gives

By part 1, for Z ≤ 1 and λ ∈ [0, 21 ] we have

Let Z1 , . . . , Zn be independent random variables with Z ≤ 1, and fix λ ∈ [0, 12 ]. We then

But from part 2 each factor

From part 3 we have the follwoing

Define the event:

Exponentiating (since ex is increasing) and multiply each term with λ gives

We then apply Markov’s inequality

Dividing all sums by n yields exactly the stated inequality:

For each fixed λ ∈ Λ, part 4 gives

For any δ ′ > 0. Now choose δ ′ = δ/k. Then for each λi ∈ Λ

Equivalently, with probability at least 1 − δ we have

• Unexpected-Bernstein bound: Choose

Then with probability at least 1 − δ

Define Zi = ℓ(h(Xi ), Yi ) so that each Zi ∈ [0, 1]. Then

Our goal is then to show, that for any λ ∈ [0, 21 ]:

Start by observing the following

Which can be written as

Thus we conclude that for any λ ∈ [0, 21 ] we have

As the hint suggests, we define the function

with λ ∈ (0, 21 ] and

ES [exp (f (h, S))] ≤ 1.

Thus we have with probability at least 1 − δ, that for all ρ

Which can be written as:

Dividing by nλ > 0, yields with probability at least 1 − δ:

Let S be an i.i.d. sample, ℓ ≤ 1 a loss function, and π any prior on H. Define

From Part 8 we have: for every λ ∈ (0, 21 ],

You might also like