0% found this document useful (0 votes)

26 views61 pages

Supervised Learning

The document discusses supervised learning and focuses on empirical risk minimization (ERM) as a method for estimating the lowest expected risk in classification problems. It outlines the concepts of generalization error, excess risk decomposition, and the challenges of overfitting in relation to ERM. The document also emphasizes the importance of understanding the relationship between empirical and expected risks, and introduces Hoeffding's inequality as a tool for controlling generalization error.

Uploaded by

aaaaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views61 pages

Supervised Learning

Uploaded by

aaaaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Supervised Learning (COMP0078)

6. Learning Theory (Part I)

Carlo Ciliberto

University College London

Department of Computer Science

1
Previous Classes

In the previous classes we:

• Formulated the supervised learning problem.

• Focused on Empirical Risk Minmization (ERM).
• Discussed overfitting and how to tackle it (heuristically).

2
Previous Classes

We studied a few different algorithms (in particular focusing on

classification):

• Nearest neighbor
• Least-squares
• Support Vector Machines
• Logistic Regression
• Decision Trees
• Ensemble methods (Bagging, Boosting)

We observed that many of these methods can be formulated as

ERM problems.

3
Outline

Equipped with our shiny new tool-set of supervised learning

algorithms, we will now go back to asking ourselves the
question:

| When/why/how do these methods “work”?

Outline:

• Emprical Risk Minimization (Again!)

• Generalization error
• Excess Risk Decomposition
• Regularization & Bias-Variance Trade-off (Again!)

4
Refresher on the Learning Problem

The goal of Supervised Learning is to find a “good” estimator

fn : X → Y, approximating the lowest expected risk

Z
inf E(f ), E(f ) = ℓ(f (x), y) dρ(x, y)
f :X →Y X ×Y

given only a finite number of (training) examples (xi , yi )ni=1

sampled independently from the unknown distribution ρ.

5
A Wishlist

What do we mean by... “good”?

6
A Wishlist

What do we mean by... “good”?

To have a low excess risk E(fn ) − E(f∗ )

• Consistency. Does E(fn ) − E(f∗ ) → 0

• in Expectation?
• in Probability?
as the size of the training set S = (xi , yi )ni=1 of points
randomly sampled from ρ grows, n → +∞.

• Learning Rates. How “fast” is consistency achieved?

Nonasymptotic bounds: finite sample complexity, tail
bounds, error bounds...

6
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea?

7
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

n
1X
En (f ) = ℓ(f (xi ), yi )
n
i=1
Is this a good idea? A simple calculation shows that
n n
1X 1X
ES∼ρn [En (f )] = E(xi ,yi )∼ρ [ℓ(f (xi ), yi )] = E(f ) = E(f )
n n
i=1 i=1
The expectation of En (f ) is the expected risk E(f )!
Empirical Risk as a Proxy

If ρ is unknown, can we say anything about E(fn ) − inf f ∈F E(f )?

We only “glimpse” ρ via the samples (xi , yi )ni=1 . Can we use

them to gather some information about ρ (or better, on E(f ))?

Consider function f : X → Y and its empirical risk

Nice! But... how close is En (f ) to E(f ) with respect to n?

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
E[(X̄n − E(X ))2 ] = Var(X̄n )

8
Empirical Vs Expected

Nice! But... how close is En (f ) to E(f ) with respect to n?

1 Pn
Let X and (Xi )ni=1 be i.i.d. random variables, X̄n = n i=1 Xi .

Then
Var(X )
E[(X̄n − E(X ))2 ] = Var(X̄n ) =
n

Therefore, the expected (squared) distance between the

empirical mean of the Xi and their expectation E(X ) goes to
zero as O(1/n)... (Assuming X has finite variance).

8
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

9
Empirical Vs Expected Risk

If Xi = ℓ(f (xi ), yi ), we have X̄n = En (f ) and therefore

Vf
E[(En (f ) − E(f ))2 ] =
n
Where Vf = Var(ℓ(f (x), y )). In particular

r
Vf
E[|En (f ) − E(f )|] ≤
n

Exercise. Can we similarly get a “tail” bound?

P |En (f ) − E(f )| ≤ ε(n, δ) ≥ 1 − δ

9
Empirical Vs Expected

Assume there exists a minimizer f∗ : X → Y of the expected risk

E(f∗ ) = inf E(f )

f ∈F

Then, for any f : X → Y we can decompose the excess risk as

E(f ) − E(f∗ ) = E(f ) − En (f ) + En (f ) − En (f∗ ) + En (f∗ ) − E(f∗ ),

We can therefore leverage on the statistical relation between En

and E to study the expected risk in terms of the empirical risk.

This leads naturally to Empirical Risk Minimization

10
Empirical Risk Minimization (ERM)

Let fn be the minimizer of the empirical risk

fn = arg min En (f )
f ∈F

We automatically have En (fn ) − En (f∗ ) ≤ 0 (for any training set).

Then...

E [E(fn ) − E(f∗ )] ≤ E [E(fn ) − En (fn )] (Question. Why?)

We can “just” focus on studying only the generalization error

E [E(fn ) − En (fn )]

11
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]

12
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

12
Generalization Error

How can we control the generalization error

En (fn ) − E(fn )
with respect to the number n of examples?

This question is far from trivial (a key one in SLT, in fact):

In general... E[En (fn ) − E(fn )]̸= 0 (Question. Why?)

En and fn both depend on the sampled training data. Therefore,

we cannot use the result
r
Var(ℓ(fn (x), y))
E[|En (fn ) − E(fn )|] ≤
n
which indeed is not true in general...
12
Issues with ERM

LetX = Y = R, ρ with dense support1 and ℓ(y, y) = 0 ∀y ∈ Y.

For any S = (xi , yi )ni=1 with distinct inputs xi , let fn : X → Y be

(
yi if x = xi for some i = 1, . . . , n
fn (x) =
0 otherwise

Then, for any number n of training points:

• E [En (fn )] = 0
• E [E(fn )] = E(0), which is greater than E(f ∗ ) (unless f ∗ ≡ 0)

Therefore E [E(fn ) − En (fn )] = E(0) ↛ 0 as n increases!

1
and such that every pair (x, y ) has measure zero according to ρ

13
Overfitting

An estimator fn is said to overfit the training data if ∀n ∈ N:

• E [E(fn ) − E(f∗ )] > C for a constant C > 0, and

• E [En (fn ) − En (f∗ )] ≤ 0

According to this definition ERM overfits...

14
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly

very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|

f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

15
ERM on Finite Hypotheses Spaces

Is ERM hopeless? Consider the case X and Y finite.

Then, F = Y X = {f : X → Y} is finite as well (albeit possibly

very large), and therefore:

E|En (fn ) − E(fn )| ≤ E sup |En (f ) − E(f )|

f ∈F
X p
≤ E|En (f ) − E(f )| ≤ |F | VF /n
f ∈F

where VF = supf ∈F Vf and |F| denotes the cardinality of F.

Here ERM works! limn→+∞ E|E(fn ) − E(f∗ )| = 0

15
ERM on Finite Hypotheses (Sub) Spaces

The same argument holds in general: let H ⊂ F be a finite

space of hypotheses (even if F is not). Then,
p
E|En (fn ) − E(fn )| ≤ |H| VH /n

In particular, if f∗ ∈ H, then
p
E|E(fn ) − E(f∗ )| ≤ |H| VH /n

and ERM is a good estimator for the problem considered.

16
Example: Threshold functions

Consider a binary classification problem Y = {0, 1}. Someone

has told us that the minimizer of the risk is a “threshold
function” fa∗ (x) = 1[a∗ ,+∞) with a∗ ∈ [−1, 1].

1.5
a
b

0.5

0
-1.5 -1 -0.5 0 0.5 1 1.5

We can learn on H = {fa |a ∈ R} = [−1, 1]. However on a

computer we can only represent real numbers up to a given
precision. 17
Example: Threshold Functions (with precision p)

Discretization: given a p > 0, we can consider

Hp = {fa | a ∈ [−1, 1], a · 10p = [a · 10p ]}

with [a] denoting the integer part (i.e. the closest integer) of a
scalar a. The value p can be interpreted as the “precision” of
our space of functions Hp . Note that |Hp | = 2 · 10p

If f ∗ ∈ Hp , then we have automatically that

p √
E|E(fn ) − E(f∗ )| ≤ |Hp | VH /n ≤ 10p / n

(This is because VH ≤ 1/4 Question. Why?)

18
Rates in Expectation Vs Probability

In practice, even for small values of p

√
E|E(fn ) − E(f∗ )| ≤ 10p / n

will need a very large n in order to have a meaningful bound on

the expected error.

Interestingly, we can get much better constants (not rates

though!) by working in probability...

19
Hoeffding’s Inequality

Let X1 , . . . , Xn independent random variables s.t. Xi ∈ [ai , bi ].

1 Pn
Let X = n i=1 Xi . Then,

2n2 ϵ2

P X − E X ≥ ϵ ≤ 2 exp − Pn 2
i=1 (bi − ai )

20
Applying Hoeffding’s inequality

Assume that ∀f ∈ H, x ∈ X , y ∈ Y the loss is bounded

|ℓ(f (x), y)| ≤ M by some constant M > 0. Then, for any f ∈ H
we have

nϵ2
P (|En (f ) − E(f )| ≥ ϵ) ≤ 2 exp(− )
2M 2

21
Controlling the Generalization Error

We would like to control the generalization error En (fn ) − E(fn )

of our estimator in probability. One possible way to do that is by
controlling the generalzation error of the whole set H.

P (|En (fn ) − E(fn )| ≥ ϵ) ≤ P sup |En (f ) − E(f )| ≥ ϵ
f ∈H

The latter term is the probability that least one of the events
|En (f ) − E(f )| ≥ ϵ occurs for f ∈ H. In other words the
probability of the union of such events. Therefore

X
P sup |En (f ) − E(f )| ≥ ϵ ≤ P (|En (f ) − E(f )| ≥ ϵ)
f ∈H f ∈H

by the so-called union bound.

22
Hoeffding the Generalization Error

By applying Hoeffding’s inequality,

nϵ2
P (|En (fn ) − E(fn )| ≥ ϵ) ≤ 2|H| exp(− )
2M 2

Or, equivalently, that for any δ ∈ (0, 1],

r
2M 2 log(2|H|/δ)
|En (fn ) − E(fn )| ≤
n
with probability at least 1 − δ.

23
Example: Threshold Functions (in Probability)

Going back to Hp space of threshold functions...

r
4 + 6p − 2 log δ
|En (fn ) − E(fn )| ≤
n
since M = 1 and
log 2|H| = log 4 · 10p = log 4 + p log 10 ≤ 2 + 3p.

For example, let δ = 0.001. We can say that

r
6p + 18
|En (fn ) − E(fn )| ≤
n
holds at least 99.9% of the times.

24
Bounds in Expectation Vs Probability

Comparing the two bounds

√
E |En (fn ) − E(fn )| ≤ 10p / n (Expectation)

While, with probability greater than 99.9%

r
6p + 18
|En (fn ) − E(fn )| ≤ (Probability)
n

Although we cannot be 100% sure of it, we can be quite

confident that the generalization error will be much smaller than
what the bound in expectation tells us...
Rates: note however that the rates of convergence to 0 are the
√
same (i.e. O(1/ n)).
25
Improving the bound in Expectation

Exploiting the bound in probability and the knowledge that on

Hp the excess risk is bounded by a constant, we can improve
the bound in expectation...
Let X be a random variable s.t. |X | < M for some constant
M > 0. Then, for any ϵ > 0 we have

E |X | ≤ ϵ P (|X | ≤ ϵ) + MP (|X | > ϵ)

Applying to our problem: for any δ ∈ (0, 1]

r
2M 2 log(2|Hp |/δ)
E |En (fn ) − E(fn )| ≤ (1 − δ) + δM
n
Therefore only log |Hp | appears (no |Hp | alone).

26
Infinite Hypotheses Spaces

What if f∗ ∈ H \ Hp for any p > 0?

ERM on Hp will never minimize the expected risk. There will

always be a gap for E(fn,p ) − E(f∗ ).

For p → +∞ it is natural to expect such gap to decrease... BUT

if p increases too fast (with respect to the number n of
examples) we cannot control the generalization error anymore!
r
6p + 18
|En (fn ) − E(fn )| ≤ → +∞ for p → +∞
n

Therefore we need to increase p gradually as a function p(n) of

the number of training examples. This approach is known as
regularization.

27
Approximation Error for Threshold Functions

Consider fp = 1[ap ,+∞) = arg minf ∈Hp E(f ) with ap ∈ [−1, 1].
We decompose the excess risk E(fn ) − E(f∗ ):

E(fn ) − En (fn ) + En (fn ) − En (fp ) +En (fp ) − E(fp ) + E(fp ) − E(f∗ )

| {z }
≤0

We already know how to control the generalization of fn (via the

supremum over Hp ) and fp (since it is a single function).
Moreover, we have that the approximation error is

E(fp ) − E(f∗ ) ≤ |ap − a∗ | ≤ 10−p (why?)

Note that it does not depend on training data!

28
Approximation Error for Threshold Functions II

Putting everything together: for any δ ∈ (0, 1] and p ≥ 0,

r
4 + 6p − 2 log δ
E(fn ) − E(f∗ ) ≤ 2 + 10−p = ϕ(n, δ, p)
n
holds with probability greater or equal to 1 − δ.

So, for any n and δ, we can choose the best precision as

p(n, δ) = arg min ϕ(n, δ, p)

p≥0

which leads to an error bound ϵ(n, δ) = ϕ(n, δ, p(n, δ)) holding

with probability larger or equal than 1 − δ.

29
Regularization

Most hypotheses spaces are “too” large and therefore prone to

overfitting. Regularization is the process of controlling the
“freedom” of an estimator as a function on the number of
training examples.

Idea. Parametrize H as a union H = ∪γ>0 Hγ of hypotheses

spaces Hγ that are not prone to overfitting (e.g. finite spaces).
γ is known as the regularization parameter (e.g. the precision p
in our examples). Assume Hγ ⊂ Hγ ′ if γ ≤ γ ′ .

Regularization Algorithm. Given n training points, find an

estimator fγ,n on Hγ (e.g. ERM on Hγ ). Let γ = γ(n) increase
as n → +∞.

30
Regularization and Decomposition of the Excess Risk

Let γ > 0 and fγ = argmin E(f )

f ∈Hγ

We can decompose the excess risk E(fγ,n ) − E(f∗ ) as

E(fγ,n ) − E(fγ ) + E(fγ ) − inf E(f ) + inf E(f ) − E(f∗ )

| {z } f ∈H f ∈H
Sample error
| {z } | {z }
Approximation error Irreducible error

31
Irreducible Error

inf E(f ) − E(f∗ )

f ∈H

Recall: H is the “largest” possible Hypotheses space we are

considering.

If the irreducible error is zero, H is called universal (e.g. the

RKHS induced by the Gaussian kernel is a universal
Hypotheses space).

32
Approximation Error

E(fγ ) − inf E(f )

f ∈H

• Does not depend on the dataset (deterministic).

• Does depend on the distribution ρ.
• Also referred to as bias.

33
Convergence of the Approximation Error

Under mild assumptions,

lim E(fγ ) − inf E(f ) = 0

γ→+∞ f ∈H

34
Density Results

lim E(fγ ) − E(f∗ ) = 0

γ→+∞

• Convergence of Approximation error

+
• Universal Hypotheses space

Note: It corresponds to a density property of the space H in

F = {f : X → Y}

35
Approximation error bounds

E(fγ ) − inf E(f ) ≤ A(ρ, γ)

f ∈H

• No rates without assumptions – related to the so-called No

Free Lunch Theorem.
• Studied in Approximation Theory using tools such as
Kolmogorov n-width, K-functionals, interpolation spaces. . .
Prototypical result:
If f∗ is s-“regular”2 parameter equal to s, then
A(ρ, γ) = cγ −s .
2
Some abstract notion of regularity parametrizing the class of target
functions. Typical example: f∗ in a Sobolev space W s,2 .
36
Sample Error

E(fγ,n ) − E(fγ )

Random quantity depending on the data.

Two main ways to study it:

• Capacity/Complexity estimates on Hγ .

• Stability.

37
Sample Error Decomposition

We have seen how to decompose E(fγ,n ) − E(fγ ) in

E(fγ,n ) − En (fγ,n ) + En (fγ,n ) − En (fγ ) + En (fγ ) − E(fγ )

| {z } | {z } | {z }
Generalization error Excess empirical Risk Generalization error

38
Generalization Error(s)

As we have observed,

E(fγ,n ) − En (fγ,n ) and En (fγ ) − E(fγ )

Can be controlled by studying the empirical process

sup |En (f ) − E(f )|

f ∈Hγ

Example: we have already observed that for a finite space Hγ

" # r
VHγ
E sup |En (f ) − E(f )| ≤ |Hγ |
f ∈Hγ n

39
ERM on Finite Spaces and Computational Efficiency

The strategy used for threshold functions can be generalized to

any H for which it is possible to find a finite discretization Hp
with respect to the L1 (X , ρX ) norm (e.g. H compact with
respect to such norm).

However, it could be computationally very expensive to find the

empirical risk minimizer on a discretization Hp , since in
principle it could be necessary to evaluate En (f ) for any f ∈ Hp .

40
ERM on Convex Spaces?

As we have seen in previous classes, ERM on convex (thus

dense) spaces is often much more amenable to computations.

In principle, we have observed that on infinite hypotheses

spaces it is difficult to control the generalization error but...

...we might be able to leverage the discretization argument

used for threshold functions to control the generalization error
of ERM for a larger family of hypotheses spaces.

41
Example: Risks for Continuous functions

Let X ⊂ Rd be a compact (i.e., closed and bounded) space and

C(X ) be the space of continuous functions. Let ∥ · ∥∞ be
defined for any f ∈ C(X ) as ∥f ∥∞ = supx∈X |f (x)|.

If the loss function ℓ : Y × Y → R is such that ℓ(·, y) is uniformly

Lipschitz with constant L > 0, for any y ∈ Y, we have that

1) |E(f1 ) − E(f2 )| ≤ L∥f1 − f2 ∥L1 (X ,ρX ) ≤ L∥f1 − f2 ∥∞ , and

n
1X
2) |En (f1 ) − En (f2 )| ≤ |ℓ(f1 (xi ), yi ) − ℓ(f2 (xi ), yi )| ≤ L∥f1 − f2 ∥∞
n
i=1

Therefore, “close” functions in ∥ · ∥∞ will have similar expected

and empirical risks!

42
Example: Covering numbers

We define the covering number of H of radius η > 0 as the

cardinality of a minimal cover of H with balls of radius η.
m
[
N (H, η) = inf{m | H ⊆ Bη (hi ) hi ∈ H}
i=1

Image credits: Lorenzo Rosasco.

Example. If H ∼
= BR (0) is a ball of radius R in Rd :
N (BR (0), η) = (4R/η)d 43
Example: Covering numbers (continued)

Note that the resolution η is related to the precision in the

sense of a distance between two hypothesis. We can apply the
same reasoning used for threshold functions
r
VH
E sup |E(fn ) − E(f )| ≤ 2Lη + N (H, η)
f ∈H n

For η → 0 the covering number N (H, η) → +∞. However, for

n → +∞ the bound tends to zero.
It is typically possible to find an η(n) for which the bound tends
to zero as n → +∞.
(Exercise. find such an η(n) for H a ball of radius R)

44
Example: Covering numbers (continued)

Same argument can be reproduced for bounds in probability,

namely for any δ ∈ [0, 1),

r
2M 2 log(2N (H, η)/δ)
sup |E(fn ) − E(f )| ≤ 2Lη +
f ∈H n

holds with probability at least 1 − δ.

45
Complexity Measures

In general, the error

sup |En (f ) − E(f )|

f ∈Hγ

Can be controlled via capacity/complexity measures:

• Covering numbers,
• combinatorial dimension, e.g. VC-dimension, fat-shattering
dimension
• Rademacher complexities
• Gaussian complexities
• ...
46
Prototypical Results

A prototypical result (under suitable assumptions, e.g.

regularity of f∗ ):

E(fγ,n ) − E(f ∗ ) ≤ E(fγ,n ) − E(fγ ) + E(fγ ) − E(f ∗ )

| {z } | {z }
≲ γ β n−α ≲ γ −τ
(Variance) (Bias)

47
Choosing γ(n) in practice

The best γ(n) depends on the unknown distribution ρ. So how

can we choose such parameter in practice?

Problem known as model selection. Possible approaches:

• Cross validation,
• complexity regularization/structural risk minimization,
• balancing principles.
• ...

48
Abstract Regularization

We got a new perspective on the concept of regularization:

controlling the expressiveness of the hypotheses space
according to the number of training examples in order to
guarantee good prediction performance and consistency.

There are many ways to implement this strategy in practice:

• Tikhonov (and Ivanov) regularization

• Spectral filtering
• Early stopping
• Random sampling
• ...

49
Wrapping Up

Building on a few (reasoble?) assumptions on the learning

problem, we:

• Have shown how the empirical risk can be a proxy for the
expected.
• Identified the main reasons behind overfitting and
discussed how to counteract it (in a more principle way!).
• Highlighted the key role played by the choice of the
hypotheses space and how their “complexity” affect
performance.

Next class we will focus on one such measure of complexity

and derive upper bounds on the generalization error.

50
Recommended Reading

Chapter 4, 5 and 6 of Shalev-Shwartz, Shai, and Shai

Ben-David. Understanding machine learning: From theory to
algorithms. Cambridge university press, 2014.

ERM and Uniform Convergence Insights
No ratings yet
ERM and Uniform Convergence Insights
29 pages
Class 02
No ratings yet
Class 02
42 pages
02 First Model of Learning
No ratings yet
02 First Model of Learning
37 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
100 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
57 pages
480 Note Lin
No ratings yet
480 Note Lin
11 pages
Unit-2 IML
No ratings yet
Unit-2 IML
54 pages
Supervised Learning Fundamentals
No ratings yet
Supervised Learning Fundamentals
47 pages
Generalization Error of The Tilted Empirical Risk
No ratings yet
Generalization Error of The Tilted Empirical Risk
54 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Lecture 1
No ratings yet
Lecture 1
5 pages
RIP Routing Protocol
No ratings yet
RIP Routing Protocol
27 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Supervised Learning
No ratings yet
Supervised Learning
52 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Chapter 08
100% (2)
Chapter 08
202 pages
Lecture 02
No ratings yet
Lecture 02
4 pages
Emiprical Risk Minimization
No ratings yet
Emiprical Risk Minimization
12 pages
Learning Theory
No ratings yet
Learning Theory
19 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Lect 02
No ratings yet
Lect 02
36 pages
MATH 499 Homework 2
100% (3)
MATH 499 Homework 2
2 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Statistical Decision Theory, Least Squares, and Bias Variance Tradeoff
No ratings yet
Statistical Decision Theory, Least Squares, and Bias Variance Tradeoff
3 pages
Review Materials 0 8 1
No ratings yet
Review Materials 0 8 1
140 pages
Week12 Summary Detail
No ratings yet
Week12 Summary Detail
10 pages
Introduction
No ratings yet
Introduction
11 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Statistical Learning by Sasha Rakhlin
No ratings yet
Statistical Learning by Sasha Rakhlin
26 pages
Homework - 1
No ratings yet
Homework - 1
10 pages
Lecture Notes For ECE 695-09/08/03
No ratings yet
Lecture Notes For ECE 695-09/08/03
3 pages
Asymptotic Theory & Inference Guide
No ratings yet
Asymptotic Theory & Inference Guide
32 pages
Mukherjee AImemo
No ratings yet
Mukherjee AImemo
55 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
PRML Solution Manual-2
No ratings yet
PRML Solution Manual-2
122 pages
Notes 2
No ratings yet
Notes 2
10 pages
Overfitting: Extracting Too Much
No ratings yet
Overfitting: Extracting Too Much
17 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
Econometric Theorems
No ratings yet
Econometric Theorems
63 pages
ML Lecture23
No ratings yet
ML Lecture23
57 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
Asymptotic Theory for OLS
No ratings yet
Asymptotic Theory for OLS
15 pages
04 Estimation
No ratings yet
04 Estimation
48 pages
4 Estimation
No ratings yet
4 Estimation
33 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Lecture 3 - ATM PDF
No ratings yet
Lecture 3 - ATM PDF
29 pages
Econometrics Homework Solutions
No ratings yet
Econometrics Homework Solutions
11 pages
Bias-Variance Tradeoff Explained
No ratings yet
Bias-Variance Tradeoff Explained
31 pages
Asymptotic Theory in Econometrics
No ratings yet
Asymptotic Theory in Econometrics
11 pages
Econometrics - Fumio Hayashi (Solutions)
No ratings yet
Econometrics - Fumio Hayashi (Solutions)
19 pages
Output and Loss Functions in Deep Learning
No ratings yet
Output and Loss Functions in Deep Learning
15 pages
Lecture 5
No ratings yet
Lecture 5
47 pages
Asymp2 Analogy 2006-04-05 Mms
No ratings yet
Asymp2 Analogy 2006-04-05 Mms
63 pages
Notes
No ratings yet
Notes
10 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
Digital Filters: Design and Structures
No ratings yet
Digital Filters: Design and Structures
27 pages
Chapter6 (MBI) El
No ratings yet
Chapter6 (MBI) El
51 pages
DSA Unit 1- Introduction to Data Structures and Algorithms
No ratings yet
DSA Unit 1- Introduction to Data Structures and Algorithms
16 pages
EC8352 - SS - All in One PDF
No ratings yet
EC8352 - SS - All in One PDF
140 pages
Chapter 5 Math
No ratings yet
Chapter 5 Math
67 pages
Lecture Notes
No ratings yet
Lecture Notes
136 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Minimum Spanning Tree (Prim's and Kruskal's Algorithms)
No ratings yet
Minimum Spanning Tree (Prim's and Kruskal's Algorithms)
17 pages
Set3sol 2022
No ratings yet
Set3sol 2022
3 pages
Daa Unit 1
No ratings yet
Daa Unit 1
24 pages
Tractable Nonconvex Problems in ML
No ratings yet
Tractable Nonconvex Problems in ML
67 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
11 pages
10-Day DSA Learning Plan (2 Hours - Day)
No ratings yet
10-Day DSA Learning Plan (2 Hours - Day)
9 pages
Aiml Neural Net
No ratings yet
Aiml Neural Net
19 pages
Chapter 8 - Searching and Hashing
No ratings yet
Chapter 8 - Searching and Hashing
53 pages
Exercises Algoritms
No ratings yet
Exercises Algoritms
3 pages
Discrete Time Sequence
No ratings yet
Discrete Time Sequence
22 pages
Apriori Algorithm in Frequent Itemset Mining
No ratings yet
Apriori Algorithm in Frequent Itemset Mining
15 pages
Applications of Numerical Computing in Computer Science and Engineering
No ratings yet
Applications of Numerical Computing in Computer Science and Engineering
3 pages
Brain Tumor Classification Using Convolutional Neural Network
No ratings yet
Brain Tumor Classification Using Convolutional Neural Network
11 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
46 pages
YOLO for Real-Time Object Detection
No ratings yet
YOLO for Real-Time Object Detection
1 page
Business Analytic Shubham Jindal
No ratings yet
Business Analytic Shubham Jindal
11 pages
Design of Lowpass Butterworth Filter
No ratings yet
Design of Lowpass Butterworth Filter
7 pages
Deep Learning and Reinforcement Learning Model1 Notes
No ratings yet
Deep Learning and Reinforcement Learning Model1 Notes
4 pages
Daa Question Paper Winter 2024
No ratings yet
Daa Question Paper Winter 2024
8 pages
C# Object-Oriented Programming Course
No ratings yet
C# Object-Oriented Programming Course
3 pages
Why Look at Case Studies?
No ratings yet
Why Look at Case Studies?
50 pages
Multi-Objective Optimization by Genetic Algorithms A Review
No ratings yet
Multi-Objective Optimization by Genetic Algorithms A Review
6 pages
AI Lect 09 Bidirectional Search
No ratings yet
AI Lect 09 Bidirectional Search
3 pages