0% found this document useful (0 votes)

20 views10 pages

Lec24 Linear Regression

Lecture 24 introduces linear regression as a method for analyzing the relationship between a response variable and feature variables through regression functions. It covers fixed design linear regression, parameter estimation, square loss, empirical risk minimization, and generalization error. The lecture emphasizes the importance of estimating the regression function based on data and discusses the challenges of achieving optimal performance compared to the best possible regression function over the population.

Uploaded by

gadakrish4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views10 pages

Lec24 Linear Regression

Uploaded by

gadakrish4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture 24: Linear regression I

Introduction to Mathematical Modeling, Spring 2025

Lecturer: Yijun Dong

Helpful references:
Statistical Learning Theory Lecture Notes by Percy Liang §2.7 (2.8 FYI)
All of Statistics by Larry Wasserman §13
Example: determine ages based on face images

Figure 1: UTKFace dataset: face images with age labels.

1
Regression

• Consider a joint distribution P(x, y ) of a random vector X → Rd and a

random variable Y → R.
• Regression is a method for studying the relationship between a response
R.V. Y → R and a feature R.V. X → Rd through a regression function:
!
f→ (x) = E [Y | X = x] = yP(x, y ) dy
R
• The goal of regression is to estimate the regression function f→ (x) based
on data (observations) drawn from the joint distribution P(x, y ):
(X1 , Y1 ), (X2 , Y2 ), · · · , (Xn , Yn ) ↑ P(x, y )
• Without prior knowledge on the regression function, finding a good
estimate f ↓ f→ is hard. A common approach is to assume a parametric
model for the regression function, e.g.:
• Linear regression: f (x) = ω T x for some ω → Rd .
• Two-layer neural network regression: f (x) = a→ ω(W→ x) for some
a → Rm , W → Rd↑m , where ω is a non-linear activation function.
2
Fixed design linear regression: parameter estimation

• Consider a joint distribution Pω→ (x, y ) of a random vector X → Rd and a

random variable Y → R parameterized by some unknown parameter
ω → → Rd :
(x, y ) ↑ Pω→ (x, y ) ↔ y = x↑ ω → + z, z ↑ N (0, ω 2 ),
where z ↑ N (0, ω 2 ) is an independent Gaussian noise of the
response/label with mean 0 and variance ω 2 .
• Fixed design linear regression aims to estimate the parameter ω →
based on a fixed set of features (i.e., no randomness)
X = [x1 , x2 , · · · , xn ]↑ → Rn↓d .
• For each xi (i = 1, 2, · · · , n), the corresponding label (response) is
yi = x↑
i ω → + zi , zi ↑ N (0, ω 2 ),
where zi is the independent Gaussian label noise.
• Exercise: Let y = [y1 , y2 , · · · , yn ]↑ → Rn . Is y a random vector in fixed
design? If so, where does the randomness come from?
3
Square loss for linear regression

• The fixed features X → Rn↓d and corresponding labels y → Rn are

related as

y = Xω → + z, z = [z1 , z2 , · · · , zn ]↑ ↑ N (0, ω 2 In ).

• Exercise: show that with independent Gaussian label noises

zi ↑ N (0, ω 2 ) for all i → [n], z ↑ N (0, ω 2 In ).

4
Square loss for linear regression

• The fixed features X → Rn↓d and corresponding labels y → Rn are

related as

y = Xω → + z, z = [z1 , z2 , · · · , zn ]↑ ↑ N (0, ω 2 In ).

• Exercise: show that with independent Gaussian label noises

zi ↑ N (0, ω 2 ) for all i → [n], z ↑ N (0, ω 2 In ).
• The square loss (i.e., ε2 loss) is defined as: ε(y , y") = (y ↗ y")2 .
• Expected/population risk of a regression function parameterized by
ω → Rd under the square loss is defined as
# ↑
$ # ↑ 2
$
L(ω) = E(x,y )↔Pω→ (x,y ) ε(y , x ω) = E(x,y )↔Pω→ (x,y ) (y ↗ x ω) .

• For fixed design given X → Rn↓d , the expected risk can be expressed as
% & % &
1 2 1 2
L(ω) = Ey↑ ↔Pω→ (·|X) ↘Xω ↗ y↗ ↘2 = Ey↑ |X ↘Xω ↗ y↗ ↘2
n n

4
Empirical risk minimization (ERM)

• Population (truth distribution): the true joint distribution Pω→ (x, y ) that
generates the data (features x and labels y ).
• Samples (empirical distribution): the fixed features x1 , · · · , xn and the
corresponding random labels y1 , · · · , yn drawn from the 'population. (
2
• Population risk (fixed design, square loss): L(ω) = Ey↑ |X n1 ↘Xω ↗ y↘2 .
• Empirical risk (fixed design, square loss):

" 1 2
L(ω) = ↘Xω ↗ y↘2 .
n
Empirical risk minimization (ERM)

• What we want: estimate ω → that characterizes the population Pω→ (x, y )

• What we have: n samples (X, y) where y = Xω → + z and
z ↑ N (0, ω 2 In ).
) *
" " 1 2
ERM : ω = argmin L(ω) = ↘Xω ↗ y↘2 .
ω↘Rd n

5
Generalization error

Generalization error measures how much the regression function

f"(x) = x↑ ω
" learned with finite samples (X, y) underperforms the best
possible regression function over the entire population.
• The best possible regression function over the population is ω → :
) % &*
1 2
min L(ω) = Ey↑ |X ↘Xω ↗ y↗ ↘2
ω↘Rd n
% &
1 2
= min Ez↑ ↘Xω ↗ Xω → ↗ z↗ ↘2
ω↘Rd n
1 ' (
2 ↗ 2 ↗↑
= min Ez↑ ↘X(ω ↗ ω → )↘2 + ↘z ↘2 ↗ 2z X(ω ↗ ω → )
ω↘Rd n
1+ 2
'
2
( # $ ,
= min ↘X(ω ↗ ω → )↘2 + Ez↑ ↘z↗ ↘2 ↗ 2Ez↑ z↗↑ X(ω ↗ ω → )
ω↘Rd n
1 2
= min ↘X(ω ↗ ω → )↘2 + ω 2 = ω 2 when ω = ω → .
ω↘Rd n

• Population risk of the best possible regression function ω → is L(ω → ) = ω 2 .

6
Generalization error

• The population risk of the regression function learned via ERM over the
n samples (X, y) is
% - - & - -2
" = Ey↑ |X 1 - " - 2 1 - " -
L(ω) -Xω ↗ y↗ - = -X(ω ↗ ω → )- + ω 2 .
n 2 n 2

"
• Formally, the generalization error is defined as the suboptimality of ω
compared to ω → in terms of the population risk, known as excess risk:
1 - -2
" := L(ω) -
" ↗ L(ω → ) = -X(ω " ↗ ω → )-
ER(ω) - .
n 2

• Define the covariance matrix of the features X as

1 ↑
!= X X → Rd↓d .
n
- -2 ≃
-
" = -ω -
" ↗ ω → - , where ↘u↘ = u↑ !u is the
• Notice that ER(ω) !
!
Mahalanobis norm of any u → Rd with respect to ! ⇐ 0.
7
Intuition for generalization error

Finance Students' Guide to Regression
No ratings yet
Finance Students' Guide to Regression
41 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Unit-2 ML
No ratings yet
Unit-2 ML
199 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
34 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
Linear Regression Models Guide
No ratings yet
Linear Regression Models Guide
42 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
OLS Regression Explained by Dr. Mitiku
No ratings yet
OLS Regression Explained by Dr. Mitiku
80 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Simple Linear Regression Assumptions
No ratings yet
Simple Linear Regression Assumptions
20 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
No ratings yet
Raw Introduction to Linear Regression (서울대 회귀분석 강의노트)
226 pages
Linear Models
No ratings yet
Linear Models
92 pages
Machine Learning for Data Analysts
No ratings yet
Machine Learning for Data Analysts
201 pages
Intro To Regression
No ratings yet
Intro To Regression
4 pages
f23 Econ103 Week2 Ta Note
No ratings yet
f23 Econ103 Week2 Ta Note
5 pages
Fe5209 3 Ay 2024
No ratings yet
Fe5209 3 Ay 2024
59 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
ML Unit
No ratings yet
ML Unit
23 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
14 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Ch3 Slides Ed4 2024 20
No ratings yet
Ch3 Slides Ed4 2024 20
72 pages
Unit II ML
No ratings yet
Unit II ML
14 pages
Linear Regression & Least Squares
No ratings yet
Linear Regression & Least Squares
29 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Classical Linear Regression Model (CLRM)
100% (1)
Classical Linear Regression Model (CLRM)
68 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Econometrics Cheat Sheet Stock and Watson
100% (5)
Econometrics Cheat Sheet Stock and Watson
2 pages
Econometrics Cheat Sheet Stock and Watson
No ratings yet
Econometrics Cheat Sheet Stock and Watson
2 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
L4.5 Linear Regression 2023
No ratings yet
L4.5 Linear Regression 2023
47 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
31 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Regression Analysis in Finance
No ratings yet
Regression Analysis in Finance
34 pages
No Linealidades Stock Watson
No ratings yet
No Linealidades Stock Watson
59 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
Book Review: Regression Analysis of Count Data
No ratings yet
Book Review: Regression Analysis of Count Data
2 pages
Theory of HPLC Quantitative and Qualitative HPLC PDF
No ratings yet
Theory of HPLC Quantitative and Qualitative HPLC PDF
24 pages
Intro to Simple Linear Regression
0% (1)
Intro to Simple Linear Regression
50 pages
Emergency Diesel-Electric Generator Set Maintenance and Test Periodicity
No ratings yet
Emergency Diesel-Electric Generator Set Maintenance and Test Periodicity
7 pages
Earnings Management (EM) Calculation Using the MJM
No ratings yet
Earnings Management (EM) Calculation Using the MJM
3 pages
Foundation of AIML
No ratings yet
Foundation of AIML
5 pages
Inventory Management and Operational Performance o
No ratings yet
Inventory Management and Operational Performance o
8 pages
1 s2.0 S2214629625001380 Main
No ratings yet
1 s2.0 S2214629625001380 Main
11 pages
Factors Influencing Skincare Purchases
No ratings yet
Factors Influencing Skincare Purchases
31 pages
Linear Calibration Curve Analysis
No ratings yet
Linear Calibration Curve Analysis
28 pages
Land Use Change Modeling in Chunati
No ratings yet
Land Use Change Modeling in Chunati
15 pages
Motorcycle Seat Comfort Analysis
No ratings yet
Motorcycle Seat Comfort Analysis
6 pages
Vol 1.1 Mizalika & Nasrullah
No ratings yet
Vol 1.1 Mizalika & Nasrullah
13 pages
Unit3 Pa
No ratings yet
Unit3 Pa
26 pages
2010 Kriging Neighbourhood Analysis by Cameron Boyle
No ratings yet
2010 Kriging Neighbourhood Analysis by Cameron Boyle
10 pages
Measures of Relationships
No ratings yet
Measures of Relationships
13 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
13 pages
CORE Stat and Prob Q4 Mod20 W10 Solving Problems Involving Regression Analysis
No ratings yet
CORE Stat and Prob Q4 Mod20 W10 Solving Problems Involving Regression Analysis
19 pages
Modelos Lineales Generalizados Con Ejemplos en R
No ratings yet
Modelos Lineales Generalizados Con Ejemplos en R
573 pages
(Signal Processing and Data Analysis) Contents
No ratings yet
(Signal Processing and Data Analysis) Contents
10 pages
Dependent Variable 1
No ratings yet
Dependent Variable 1
3 pages
MBA Seaborn 2
No ratings yet
MBA Seaborn 2
62 pages
Mind The Energy Performance Gap
No ratings yet
Mind The Energy Performance Gap
28 pages
Chapter 15: Logistics, Distribution, and Transportation
No ratings yet
Chapter 15: Logistics, Distribution, and Transportation
30 pages
Applied Statistics Test Bank Download
100% (22)
Applied Statistics Test Bank Download
5 pages
Olympic Long Jump Data Analysis
No ratings yet
Olympic Long Jump Data Analysis
3 pages
Unit 2
No ratings yet
Unit 2
28 pages
Leme 6101 Topic One
No ratings yet
Leme 6101 Topic One
2 pages
Safety Impact of Curve Design on Rural Highways
No ratings yet
Safety Impact of Curve Design on Rural Highways
10 pages
The Effect of Profitability, Firm Size, and Leverage On Islamic Social Reporting With Sharia Supervisory Board As Moderating Variable
No ratings yet
The Effect of Profitability, Firm Size, and Leverage On Islamic Social Reporting With Sharia Supervisory Board As Moderating Variable
28 pages

Lec24 Linear Regression

Uploaded by

Lec24 Linear Regression

Uploaded by

Lecture 24: Linear regression I

Introduction to Mathematical Modeling, Spring 2025

Lecturer: Yijun Dong

Figure 1: UTKFace dataset: face images with age labels.

• Consider a joint distribution P(x, y ) of a random vector X → Rd and a

• Consider a joint distribution Pω→ (x, y ) of a random vector X → Rd and a

• The fixed features X → Rn↓d and corresponding labels y → Rn are

• Exercise: show that with independent Gaussian label noises

• The fixed features X → Rn↓d and corresponding labels y → Rn are

• Exercise: show that with independent Gaussian label noises

• What we want: estimate ω → that characterizes the population Pω→ (x, y )

Generalization error measures how much the regression function

• Population risk of the best possible regression function ω → is L(ω → ) = ω 2 .

• Define the covariance matrix of the features X as

You might also like