0% found this document useful (0 votes)

20 views24 pages

Lecture 1

HehehehjejejejrRespected Professor Berman, I hope this message finds you well. My name is Aditya Sahu, a Computer Science graduate student at ASU. I am very interested in exploring potential research opportunities with you. My background includes AI/ML, data science, and full-stack development, with hands-on projects and IEEE publications. I would be grateful for the chance to contribute to your work and learn from your expertise. Please find my resume attached for reference.

Uploaded by

Aditya sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views24 pages

Lecture 1

Uploaded by

Aditya sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSE598 Statistical Learning Theory

Lecture01

Yingzhen Yang

1 / 20
Lecture Overview

Course information, syllabus and logistics

2 / 20
Instructor and TA

Instructor: Yingzhen Yang, assistant professor at SCAI

Statistical Machine Learning and Deep Learning My research most
focuses on machine learning/deep learning theory with tools from
statistics and other fields, as well as optimization/generalization for
machine learning.
Email: [email protected], office hours: Monday &
Wednesday 2 pm — 3 pm. Office hours are both online (with
Zoom link https://asu.zoom.us/j/9531979310) and in person
(BYENG 590).

3 / 20
Instructor and TA

Instructor: Yingzhen Yang, assistant professor at SCAI

TA: Yancheng Wang

PhD student in deep learning, especially Neural Architecture

Search (NAS)
Email: [email protected], office hours: Monday & Wednesday 11
am – 12 pm. Office hours are both online (with Zoom link
https://asu.zoom.us/j/7737185236) and in-person (BYENG
221).

3 / 20
Difference from Other Machine Learning Courses

The major difference between CSE 575 and this course is that, CSE
575 is more focused on concrete machine learning algorithms, while
this course is focused on the theoretical foundation of these learning
algorithms.

4 / 20
Difference from Other Machine Learning Courses

Example: Support Vector Machines (SVMs) for classification

CSE 575 introduces the concept of maximum margin classification

(by SVMs) and its optimization algorithm
This course will explain the fundamental mathematical reason why
maximum margin is good.
5 / 20
Difference from Other Machine Learning Courses

Another example: Neural Networks

CSE 575 introduces the concept of neural networks and the basic
(stochastic) gradient descent algorithm for optimization of neural
networks.
This course will explain the fundamental mathematical reason why
gradient descent can be used for the optimization of neural
networks and when the trained neural networks can generalize well
(generalizing well means low test error or high test accuracy).

6 / 20
Difference from Other Machine Learning Courses

Another example: Neural Networks

As you could see here, a prior knowledge of basic machine learning

algorithms is very helpful. However, all the relevant machine
learning algorithms will be reviewed in this course so that you can
understand why the learning theory studied in this course is crucial
to the design of these algorithms.
The major mathematical tools used for theoretical analysis of
machine learning algorithms will be reviewed in lectures. You are
expected to feel comfortable with basic mathematics and statistics.
6 / 20
Course Assessment Plan

40%: Four assignments of equal weights.

20%: Paper presentation Every student will present at least one
paper related to theoretical aspects of machine learning or deep
learning. Presenting application-oriented papers with theoretical
background or theoretical motivation is welcome, and presenting
multiple papers is strongly recommended.

7 / 20
Course Assessment Plan

40%: Project

The project can be in any area related to the topics of the course.
For example, you may develop a new method and demonstrate its
empirical performance, or perform experiments with an existing
method.
You may also write a comprehensive survey of topic of your own
interest (relevant to this course) as the project.
You will submit a final project report which will be due on
12/08/2025.
You are allowed to work on projects in groups of two, and all group
members will receive the same grade for the project.

8 / 20
Prerequisites and Course Objective

Prerequisites: linear algebra, probability, statistics algorithm design

and analysis. While the major mathematical tools will be reviewed
in lectures, a solid mathematical background is required to
understand the materials.
Course Objective: An in-depth understanding of statistical learning
theory behind the design of machine learning algorithms.

9 / 20
Topics to Cover (tentative)

Basics of statistical prediction theory

Concentration inequalities
Supervised and unsupervised learning
Empirical risk minimization
Complexity-regularized estimation; Generalization bounds for
learning algorithms; VC dimension and Rademacher complexity;
minimax lower bounds
Optimization for machine learning
Recent theoretical developments in deep learning, including
optimization and generalization of deep neural networks, and Neural
Tangent Kernel

10 / 20
More Information

Reference Book: Pattern Recognition and Machine Learning By

Christopher M. Bishop
The relevant machine learning algorithms will be reviewed so that
you can understand why the learning theory studied in this course is
crucial to the design of such algorithms.
We will post links to additional resources which will be helpful for
you to understand the course materials.

11 / 20
What is Machine Learning?

Machine Learning: Study of algorithms that

improve their performance P

at some task T
with experience E

Well-defined learning task: (P,T,E)

A machine learning algorithm learns from some experience E
at some task T, aiming to improve some performance P.

E is also called the training data.

12 / 20
What is Machine Learning?

Example: Spam Filter, an algorithm to tell if an input email is spam

or not spam (the task T)
Below are examples of training emails (the experience E)

13 / 20
Probabilistic Formulation of Prediction Problems

Prediction problem: predict an outcome y from some set Y of

possible outcomes based on the input x from a feature space X . X
and Y are also called the input space and the output space.
In the example of spam filter, the outcome y is a label in
Y = {spam, no spam} based on an input email x.
The training data for such prediction problems is a data set of n
pairs, (x1 , y1 ), . . . , (xn , yn ), and we will use the training data to
produce a function f : X → Y so that f (x) is a good prediction of
y for some unseen pair (x, y).

14 / 20
Probabilistic Formulation of Prediction Problems

To measure the quality of the prediction, we define a loss function

ℓ : Y × Y → R, so that ℓ(by , y) measures the cost of the prediction
yb = f (x) when the true outcome is y.
Our goal is to ensure that ℓ(f (x), y) is small.
In classification problems, the goal is to classify an input x as one of
a finite number of classes (that is, the label space Y is finite). We
can choose a 0 − 1 loss function defined as

ℓ(b
y , y) = 1I{b
y ̸=y}
(
1 if yb ̸= y,
=
0 otherwise.

1I{E} is an indicator function with 1I{E} = 1 if the event E is true,

and 0 if E is not true.

15 / 20
Probabilistic Formulation of Prediction Problems

In a regression problem, with Y = R, we can define the quadratic

loss function, ℓ(b y − y)2 .
y , y) = (b
Assume that there is a probability distribution P over X × Y, and
that the training data (X1 , Y1 ), . . . , (Xn , Yn ) are chosen
independently according to P . The aim is to produce a function f
using the training data so that the risk of f , which is defined as

R(f ) = E(X,Y )∼P [ℓ(f (X), Y )] ,

is small. E [·] indicates expectation.

16 / 20
Probabilistic Formulation of Prediction Problems

In a regression problem, with Y = R, we can define the quadratic

R(f ) = E(X,Y )∼P [ℓ(f (X), Y )] ,

is small. E [·] indicates expectation.
Important notes:
We use capital letters to denote random variables.
Because the training data is random, the produced prediction
function f , which depends on the training data, is also random.
We hope to have a small R(f ). In statistical learning theory, we
aim to make R(f ) small with high probability.
16 / 20
Probabilistic Formulation of Prediction Problems

When the 0 − 1 loss function is used for a classification problem,

then R(f ) = E(X,Y )∼P [ℓ(f (X), Y )] = P (Y ̸= f (X)).
Assume that we are dealing with a binary classification problem with
the output space Y = {1, −1}, and P is the joint distribution over
X × Y. Define η as the conditional probability of Y = 1 given
X = x : η(x) = P (Y = 1|X = x).

We used the basic fact that E 1I{E} = P (E).

Then we can compute the risk of f .

17 / 20
18 / 20
Probabilistic Formulation of Prediction Problems

We can compute the risk of f .

R(f ) = E [ℓ(f (X), Y )]

= E [E [ℓ(f (X), Y )|X]]
= E [ℓ(f (X), 1)P (Y = 1|X) + ℓ(f (X), −1)P (Y = −1|X)]

= E 1I{f (X)̸=1} η(X) + 1I{f (X)̸=−1} (1 − η(X))

= E 1I{f (X)̸=1} η(X) + (1 − 1I{f (X)̸=1} )(1 − η(X))

= E 1I{f (X)̸=1} (2η(X) − 1) + 1 − η(X) .

19 / 20
Probabilistic Formulation of Prediction Problems

Remember that we aim to make R(f ) small.

20 / 20
Probabilistic Formulation of Prediction Problems

Remember that we aim to make R(f ) small.

Define a function f ∗ that predicts the label “naturally”:
(
∗ 1 if η(x) ≥ 1/2,
f (x) =
−1 otherwise.

Denote the optimal risk R∗ = inf f R(f ) by the Bayes risk

(inf f R(f ) takes the infimum of R(f ) over all possible prediction
function f ). We will see that f ∗ achieves the Bayes risk. f ∗ is
called the Bayes decision function.

20 / 20

Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
46 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
AIML Unit3
No ratings yet
AIML Unit3
48 pages
A Course in Machine Learning - Ciml-V0 - 9-All
No ratings yet
A Course in Machine Learning - Ciml-V0 - 9-All
193 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
A Course in Machine Learning 1648562733
No ratings yet
A Course in Machine Learning 1648562733
193 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
ECE 6254: Statistical Machine Learning
No ratings yet
ECE 6254: Statistical Machine Learning
22 pages
A Course in Machine Learning
No ratings yet
A Course in Machine Learning
50 pages
CS 791 - Course Overview
No ratings yet
CS 791 - Course Overview
27 pages
ML Merge
No ratings yet
ML Merge
145 pages
Introduction
No ratings yet
Introduction
41 pages
Unit 1
No ratings yet
Unit 1
92 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
Machine Learning: Supervised Learning Basics
No ratings yet
Machine Learning: Supervised Learning Basics
23 pages
EE378A - Combined Notes
No ratings yet
EE378A - Combined Notes
76 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
A Course in Machine Learning
100% (1)
A Course in Machine Learning
191 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Machine Learning Theory CSE 250C: Introductory Lecture
No ratings yet
Machine Learning Theory CSE 250C: Introductory Lecture
29 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Basics of Learning Theory
No ratings yet
Basics of Learning Theory
37 pages
19 ML Intro
No ratings yet
19 ML Intro
33 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
ML Hand Written Notes
No ratings yet
ML Hand Written Notes
19 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
SML Lecture1
No ratings yet
SML Lecture1
37 pages
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
No ratings yet
Lecture 3: Applications of Machine Learning Algorithms Jul. 06 & 09, 2018
3 pages
Data Science & ML Course Guide
No ratings yet
Data Science & ML Course Guide
83 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
No ratings yet
Course Code Course Title Course Planner: Through This Course Students Should Be Able To
4 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
What Are Probabilistic Machine Learning Models?
No ratings yet
What Are Probabilistic Machine Learning Models?
61 pages
ML 01
No ratings yet
ML 01
24 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Syl3 ML
No ratings yet
Syl3 ML
5 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
20 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
ML Notes Unit-1
No ratings yet
ML Notes Unit-1
11 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
Algorithms For Artificial Intelligence
No ratings yet
Algorithms For Artificial Intelligence
69 pages
MITx 6.86x Notes - MD
No ratings yet
MITx 6.86x Notes - MD
91 pages
Upload Unit 1
No ratings yet
Upload Unit 1
36 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
2-Inductive Learning
No ratings yet
2-Inductive Learning
37 pages
Introduction. Binary Classification and Bayes Optimal Classifier
No ratings yet
Introduction. Binary Classification and Bayes Optimal Classifier
7 pages
MBA - Operations Research For Managers - Unit 3 - Transportation Problems - TP
No ratings yet
MBA - Operations Research For Managers - Unit 3 - Transportation Problems - TP
32 pages
System Identification Based On Flight Test Data and
No ratings yet
System Identification Based On Flight Test Data and
7 pages
Operation Research (Study Material)
0% (1)
Operation Research (Study Material)
105 pages
Cost and Cost Concepts
No ratings yet
Cost and Cost Concepts
2 pages
Sizing of DG Unit To Reduce Losses in Radial Distribution - A Case Study
No ratings yet
Sizing of DG Unit To Reduce Losses in Radial Distribution - A Case Study
7 pages
A MATLAB Implementation of TOPMODEL
No ratings yet
A MATLAB Implementation of TOPMODEL
15 pages
Introduction to Optimum Design 3rd Ed.
100% (1)
Introduction to Optimum Design 3rd Ed.
40 pages
Lecture6 GoalProgramming
No ratings yet
Lecture6 GoalProgramming
43 pages
Powertrain Special Characteristics Guide
No ratings yet
Powertrain Special Characteristics Guide
14 pages
Advanced Process Control Performance Guide
No ratings yet
Advanced Process Control Performance Guide
31 pages
Non StationarySA R1
No ratings yet
Non StationarySA R1
52 pages
Finalpdnd
50% (2)
Finalpdnd
32 pages
Shrinkage Estimation For Mean and Covariance Matrices Total Access Ebook
100% (19)
Shrinkage Estimation For Mean and Covariance Matrices Total Access Ebook
16 pages
Anoptimal Design For Axial-Flow Fan Blade - Theorical and Experimental Studies
No ratings yet
Anoptimal Design For Axial-Flow Fan Blade - Theorical and Experimental Studies
10 pages
Decision Theory in Statistics Lecture
No ratings yet
Decision Theory in Statistics Lecture
7 pages
Enhancing AI Trust with FactSheets
No ratings yet
Enhancing AI Trust with FactSheets
31 pages
L04 - Nonlinear Programming
No ratings yet
L04 - Nonlinear Programming
17 pages
Zhang B. Et Al. 2017 Optimal Provision of Regulation Service Reserves Under Dynamic Energy Service Preferences
No ratings yet
Zhang B. Et Al. 2017 Optimal Provision of Regulation Service Reserves Under Dynamic Energy Service Preferences
16 pages
Goods and Services Design
No ratings yet
Goods and Services Design
16 pages
Loss Functions in Unsupervised Learning
No ratings yet
Loss Functions in Unsupervised Learning
4 pages
Linear Programming Solution Examples
No ratings yet
Linear Programming Solution Examples
11 pages
Intro to Statistical Machine Learning
No ratings yet
Intro to Statistical Machine Learning
84 pages
Management Science Notes - Unit I To V
No ratings yet
Management Science Notes - Unit I To V
63 pages
Week2 Linear Programming Fundamentals
No ratings yet
Week2 Linear Programming Fundamentals
46 pages
LINEAR-PROGRAMMING-Example Problems With Answer
70% (10)
LINEAR-PROGRAMMING-Example Problems With Answer
7 pages
OM Test 1-Sample (Undergraduate)
0% (1)
OM Test 1-Sample (Undergraduate)
5 pages
Full Download Bayesian Optimization: Theory and Practice Using Python Peng Liu PDF
100% (4)
Full Download Bayesian Optimization: Theory and Practice Using Python Peng Liu PDF
66 pages
Operations Research Lecture Notes
No ratings yet
Operations Research Lecture Notes
105 pages
Solution Week2
No ratings yet
Solution Week2
5 pages
Overview of Operations Research Methodology
No ratings yet
Overview of Operations Research Methodology
26 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

CSE598 Statistical Learning Theory

Course information, syllabus and logistics

Instructor: Yingzhen Yang, assistant professor at SCAI

Instructor: Yingzhen Yang, assistant professor at SCAI

TA: Yancheng Wang

PhD student in deep learning, especially Neural Architecture

Example: Support Vector Machines (SVMs) for classification

CSE 575 introduces the concept of maximum margin classification

Another example: Neural Networks

Another example: Neural Networks

As you could see here, a prior knowledge of basic machine learning

40%: Four assignments of equal weights.

Prerequisites: linear algebra, probability, statistics algorithm design

Basics of statistical prediction theory

Reference Book: Pattern Recognition and Machine Learning By

Machine Learning: Study of algorithms that

improve their performance P

Well-defined learning task: (P,T,E)

E is also called the training data.

Example: Spam Filter, an algorithm to tell if an input email is spam

Prediction problem: predict an outcome y from some set Y of

To measure the quality of the prediction, we define a loss function

1I{E} is an indicator function with 1I{E} = 1 if the event E is true,

In a regression problem, with Y = R, we can define the quadratic

R(f ) = E(X,Y )∼P [ℓ(f (X), Y )] ,

In a regression problem, with Y = R, we can define the quadratic

R(f ) = E(X,Y )∼P [ℓ(f (X), Y )] ,

When the 0 − 1 loss function is used for a classification problem,

Then we can compute the risk of f .

We can compute the risk of f .

R(f ) = E [ℓ(f (X), Y )]

Remember that we aim to make R(f ) small.

Remember that we aim to make R(f ) small.

Denote the optimal risk R∗ = inf f R(f ) by the Bayes risk

You might also like