0% found this document useful (0 votes)
21 views11 pages

02 02 Regression

Uploaded by

luomichael23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

02 02 Regression

Uploaded by

luomichael23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

9/5/23

Polynomial Regression Warm-up: Linear Regression

1 2

Linear Regression (Task) Least Squares Regression (Method)


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ

Output: a vector 𝐰 ∈ ℝ# and scalar 𝑏 ∈ ℝ such that 𝐱 $% 𝐰 + 𝑏 ≈ 𝑦$ . 1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min
!"#
𝐗𝐰−𝐲 ,
.
𝐰∈ℝ

Tasks assume 𝑦$ is a linear Tasks Methods


function of 𝐱 $ .

Linear Linear Least Squares Regression


Regression Regression

3 4

1
9/5/23

Least Squares Regression (Method)


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ
1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min 𝐗𝐰−𝐲 .
𝐰∈ℝ!"# , Polynomial Regression
Tasks Methods Algorithms
Analytical Solution
Linear Least Squares Regression
Regression Gradient Descent

Conjugate Gradient

5 6

The Regression Task The Regression Task


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦. Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.

Question: 𝑓 is unknown! So how to learn 𝑓? Question: 𝑓 is unknown! So how to learn 𝑓?

Answer: polynomial approximation; 𝑓 is a polynomial function.


"$$ #
Taylor expansion: 𝑓 𝑥 = 𝑓 𝑎 + 𝑓 ! 𝑎 𝑎 − 𝑥 + 𝑎−𝑥 $ +⋯
$!

7 8

2
9/5/23

Polynomial Regression: 1D Example Polynomial Regression: 1D Example


Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦. Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦.

One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤.𝑥 .. One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤. 𝑥 . .

Polynomial regression:
1. Define a feature map 𝛟 𝑥 = [1, 𝑥, 𝑥 ,, 𝑥 /, ⋯ , 𝑥 . ].
2. For 𝑗 = 1 to 𝑛, do the mapping 𝑥& ↦ 𝛟(𝑥& ).
• Let 𝚽 = 𝛟 𝑥! ; ⋯ , 𝛟 𝑥" % ∈ ℝ"× .'!
,
3. Solve the least squares regression min
%"#
𝚽𝐰−𝐲 ,
.
𝐰∈ℝ

9 10

Polynomial Regression: 2D Example Polynomial Regression


In [1]:

import numpy

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ, and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.


X = numpy.arange(6).reshape(3, 2)
print('X = ')
print(X)
Output: a function 𝑓: ℝ, ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ . X =
[[0 1]
[2 3]
Two-dimensional example: how to do feature mapping? [4 5]]

In [2]:
Polynomial features:
from sklearn.preprocessing import PolynomialFeatures
𝛟 𝐱 = [1, 𝑥!, 𝑥,, 𝑥!,, 𝑥,,, 𝑥!𝑥,, 𝑥!/, 𝑥,/, 𝑥!𝑥,,, 𝑥!,𝑥, ]. poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
print('Phi = ')
print(Phi)

degree-0 degree-1 degree-2 degree-3 Phi =


[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]

In [1]:

11 12 from keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

print('shape of x_train: ' + str(x_train.shape))


print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))

3
print('shape of y_test: ' + str(y_test.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the sec
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
shape of x_train: (404, 13)
9/5/23

In [1]:

Polynomial Regression Polynomial Regression


import numpy
XIn= [1]:
numpy.arange(6).reshape(3, 2)
print('X = ')
import numpy
print(X)
X = numpy.arange(6).reshape(3, 2)
Xprint('X
= = ')
[[0 1]
print(X)
[2 3] • 𝐱: 𝑑-dimensional
X[4
= 5]]
[[0 1] • 𝛟(𝐱): degree-𝑝 polynomial
[2 3]
In[4[2]:
5]] • The dimension of 𝛟 𝐱 is 𝑂 𝑑 .
from sklearn.preprocessing import PolynomialFeatures
In [2]:
poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
from sklearn.preprocessing
print('Phi = ') import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
print(Phi)
Phi = poly.fit_transform(X)
print('Phi
Phi = = ')
print(Phi)
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
Phi
[ 1.= 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
In [1]:
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
degree-0 degree-1 degree-2 degree-3
from keras.datasets import boston_housing
In [1]:

13 (x_train, y_train), (x_test, y_test) = boston_housing.load_data()


from keras.datasets import boston_housing 14
print('shape of x_train: ' + str(x_train.shape))
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))
print('shape of x_train: ' + str(x_train.shape))
print('shape of y_test: ' + str(y_test.shape))
print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `
print('shape
from ._conv of y_test:
import ' + str(y_test.shape))
register_converters as _register_converters
Using TensorFlow backend.
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `n
fromof
shape ._conv import
x_train: register_converters
(404, 13) as _register_converters
Using of
shape TensorFlow backend.
x_test: (102, 13)

Polynomial Regression
shape of y_train: (404,)
shape of x_train: (404, 13)
shape of y_test: (102,)
shape of x_test: (102, 13)
shape of y_train: (404,)
In [2]:
Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ
shape #
of y_test: (102,)
and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
import numpy
In [2]:
#
Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝐱 ≈𝑦.
n, d = x_train.shape
import numpy $ $
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
n, d = x_train.shape
axis=1)
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
Training, Test, and Overfitting
print('shape of x_train: ' + str(x_train.shape))
axis=1)
print('shape of xbar_train: ' + str(xbar_train.shape))
shape
!
print('shape of x_train:
of x_train:
# of xbar_train:
print('shape
shape
(404, 13)
of xbar_train: "
' + str(x_train.shape))
' + str(xbar_train.shape))
(404, 14)
shape of x_train: (404, 13)
In [3]:
shape of xbar_train: (404, 14)
# the analytical solution
In [3]:
$
xx = numpy.dot(xbar_train.T,
# the analytical solution % $&
xbar_train)
xx_inv = numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
xx = numpy.dot(xbar_train.T, xbar_train)
w = numpy.dot(xx_inv,
xx_inv xy)
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
15 w =[4]:
In numpy.dot(xx_inv, xy)
16
# mean squared error (training)
In [4]:
y_lsr
# mean= squared
numpy.dot(xbar_train,
error (training)w)
diff = y_lsr - y_train
mse = numpy.mean(diff * diff)
y_lsr = numpy.dot(xbar_train, w)
print('Train
diff = y_lsr -MSE: ' + str(mse))
y_train
mse = MSE:
Train numpy.mean(diff * diff)
22.00480083834814
print('Train MSE: ' + str(mse)) 4
Train
In MSE: 22.00480083834814
[5]:
print(y_train[0:10])
9/5/23

Polynomial Regression: Training Polynomial Regression: Training


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . . Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . .
' '
Least squares: min 𝚽𝐰−𝐲
'
. Least squares: min 𝚽𝐰−𝐲 '
.
𝐰 𝐰

Question: what will happen as 𝑝 grows?

1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.


2. Then you can find 𝐰 such that 𝚽 𝐰 = 𝐲. (Zero training error!)

17 18

Training and Testing Training and Testing

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.


Train:
Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ .

Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ# .


Test:
Input: predict its label by 𝑓 𝐱 7 .

Underfitting Overfitting

19 20

5
9/5/23

Training and Testing Training and Testing

𝐱′ 𝐱′ 𝐱′
BAD GOOD BAD linear model
Degree-15 polynomial
Degree-4 polynomial

21 22

Hyper-Parameter Tuning Hyper-Parameter Tuning


Question: for the polynomial regression model, how to determine the degree 𝑝? Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Answer: the degree 𝑝 leads to the smallest test error.
Train a degree-2 polynomial regression Test MSE = 19.0
Train a degree-3 polynomial regression Test MSE = 16.7
Train a degree-4 polynomial regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0

23 24

6
9/5/23

Hyper-Parameter Tuning Hyper-Parameter Tuning


Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Test MSE = 23.2
Train a degree-2 polynomial regression Test MSE = 19.0 Train a degree-2 polynomial regression Test MSE = 19.0
labl e!
labels are unavaiTest
• Wrong! The test
Train a degree-3 polynomial regression Test MSE = 16.7 Train a degree-3 polynomial regression MSE = 16.7
th e te st labels, ne ver do this!
Train a degree-4 polynomial regression Test MSE = 12.2 if yo u ha
Train a degree-4 polynomial
• Even ve
regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8 Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1 Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4 Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0 Train a degree-8 polynomial regression Test MSE = 53.0

25 26

Select Models Using Test Labels

⋯ Cross-Validation (Naïve Approach)


for Hyper-Parameter Tuning

27 28

7
9/5/23

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)

labels labels

features features

𝑛 training samples 𝑚 test samples 𝑛 training samples 𝑛&'( validation 𝑚 test samples
samples

29 30

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)


Validation Set
Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Valid. MSE = 23.1
Train a degree-2 polynomial regression Test MSE = 19.0 Train a degree-2 polynomial regression Valid. MSE = 19.2
Train a degree-3 polynomial regression Test MSE = 16.7 Train a degree-3 polynomial regression Valid. MSE = 16.3
Train a degree-4 polynomial regression Test MSE = 12.2 Train a degree-4 polynomial regression Valid. MSE = 12.5
Train a degree-5 polynomial regression Test MSE = 14.8 Train a degree-5 polynomial regression Valid. MSE = 14.4
Train a degree-6 polynomial regression Test MSE = 25.1 Train a degree-6 polynomial regression Valid. MSE = 25.0
Train a degree-7 polynomial regression Test MSE = 39.4 Train a degree-7 polynomial regression Valid. MSE = 39.1
Train a degree-8 polynomial regression Test MSE = 53.0 Train a degree-8 polynomial regression Valid. MSE = 53.5

31 32

8
9/5/23

𝑘-Fold Cross-Validation
1. Propose a grid of hyper-parameters.
• E.g. 𝑝 ∈ {1, 2, 3, 4, 5, 6}.
2. Randomly partition the training samples to 𝑘 parts.
𝑘-Fold Cross-Validation • 𝑘 − 1 parts for training.
• One part for test.
3. Compute the averaged test errors of the 𝑘 repeats.
• The average is called the validation error.
4. Choose the hyper-parameter 𝑝 that leads to the
smallest validation error.

Example: 5-fold cross-validation

33 34

Example: 10-Fold Cross-Validation Example: 10-Fold Cross-Validation


hyper-parameter validation error

p=1 23.19
p=2 21.00
p=3 18.54
validation error
p=4 24.36
p=5 27.96
p=6 33.10

35 36

9
9/5/23

The Available Data


Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB
Real-World Machine Learning Competition
Test Data
The public and private are mixed;
Participants cannot distinguish them.

37 38

Train A Model Prediction


Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB

Model
𝐲89:;<= 𝐲8><?@AB
Model

39 40

10
9/5/23

Submission to Leaderboard Submission to Leaderboard


Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB

𝐲89:;<= 𝐲8><?@AB Question: Why two leaderboards? 𝐲89:;<= 𝐲8><?@AB

Submission Submission
Answer: The score can be evilly used
for hyper-parameter tuning (cheating).
Score=0.9527 Secret! Score=0.9527 Secret!

41 42

Summary

• Polynomial regression for non-linear problems.


• Polynomial regression has a hyper-parameter 𝑝.
• Underfitting (very small 𝑝) and overfitting (very big 𝑝) .
• Tune the hyper-parameters using cross-validation.
• Make your model parameters and hyper-parameters
independent of the test set!!!

43

11

You might also like