100% found this document useful (1 vote)

100 views39 pages

Linear Regression Models Overview

- Linear regression aims to predict a target value based on an input value using a linear model. - Nonlinear basis functions can be applied to inputs to allow for nonlinear relationships in the model while still using linear regression. - Maximum likelihood estimation and least squares are commonly used to fit the linear regression model parameters by minimizing the sum of squared errors. - Regularization can be added to the least squares optimization to prevent overfitting. Multiple outputs can also be modeled using a matrix of weights.

Uploaded by

howgibaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

100 views39 pages

Linear Regression Models Overview

Uploaded by

howgibaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

Linear Models for Regression

林彥宇教授
Yen-Yu Lin, Professor
國立陽明交通大學資訊工程學系
Computer Science, National Yang Ming Chiao Tung University

Some slides are modified from Prof. Sheng-Jyh Wang

and Prof. Hwang-Tzong Chen
Regression

• Given a training data set comprising 𝑁 observations {𝐱𝑛 }𝑁 𝑛=1

and the corresponding target values {𝑡𝑛 }𝑁 𝑛=1 , the goal of
regression is to predict the value of 𝑡 for a new value of 𝐱

[Link]
2
A simple regression model

• A simple linear model:

➢ Each observation is in a 𝐷–dimensional space 𝐱 = (𝑥1 , … , 𝑥𝐷 )T

➢ 𝑦 is a regression model parametrized by 𝐰 = (𝑤0 , … , 𝑤𝐷 )T
➢ The output is a linear combination of the input variables
➢ It is a linear function of parameters
➢ The fitting power is quite limited. Seeking a nonlinear extension
for the input variables

3
An example

• A regressor in the form of

➢ A straight line in this case -> Insufficient fitting power
➢ Nonlinear feature transforms before linear regression

4
Linear regression with nonlinear basis functions

• Simple linear model:

• A linear model with nonlinear basis functions

where {𝜙𝑗 }𝑀−1

𝑗=1 : nonlinear basis functions
𝑀: the number of parameters
𝑤0 : the bias parameter allowing a fixed offset

• The regression output is a linear combination of nonlinear

basis functions of the inputs
5
Linear regression with nonlinear basis functions

• A linear model with nonlinear basis functions

• Let 𝜙0 𝐱 = 1, a dummy basis function. The regression function

is equivalently expressed as

where and

6
Examples of basis functions
• Polynomial basis function: taking the form of powers of 𝑥

• Gaussian basis function: governed by and

➢ governs the location while governs the scale

• Sigmoidal basis function: governed by and

where

7
How basis functions work

• Take Gaussian basis functions as an example

y = w0 + w11 ( x ) + w22 ( x ) + ... + wM −1M −1 ( x )

1(x) 2(x) 3(x) 4(x) 5(x) 6(x) 7(x) 8(x)

8
Maximum likelihood and least squares

• Assume each observation is sampled from a deterministic

function with an added Gaussian noise

where 𝜀 is a zero mean Gaussian and precision (inverse

variance) is 𝛽

• Thus, we have the conditional probability

9
Maximum likelihood and least squares

• Given a data set of inputs X = {x1, . . . , xN} with corresponding

target values t1, . . . , tN, we have the likelihood function

• The log likelihood function is

where

10
Maximum likelihood and least squares

• Given a data set of inputs X = {x1, . . . , xN} with corresponding

target values t1, . . . , tN, we have the likelihood function

• The log likelihood function is

How?

where

11
Maximum likelihood and least squares

• Gaussian noise likelihood ֞ sum-of-squares error function

• Maximum likelihood solution: Optimize 𝐰 by maximizing the

log likelihood function

• Step 1: Compute the gradient of log likelihood w.r.t. 𝐰

• Step 2: Set the gradient to zero, which gives

12
Maximum likelihood and least squares

• Define the design matrix in this task

➢ It has 𝑁 rows, one for each training sample

➢ It has 𝑀 columns, one for each basis function

13
Maximum likelihood and least squares

• Setting the gradient to zero

we have

• How to derive?
➢ Hint 1:
➢ Hint 2:

14
Maximum likelihood and least squares

• The ML solution

• is known as the Moore-Penrose pseudo-

inverse of the design matrix

• has linearly independent columns. Why is invertible?

15
Maximum likelihood and least squares

• Similarly, 𝛽 is optimized by maximizing the log likelihood

where

• We get

16
Regression for a new data point

• The conditional probability (likelihood function)

• After learning, we get 𝐰 ՚ 𝐰ML and 𝛽 ՚ 𝛽ML

• Specify the prediction of a data point 𝐱 in the form of a

−1
Gaussian distribution with mean 𝑦 𝐱, 𝐰ML and variance 𝛽ML

17
Regularized least squares

• Add a regularization term helps alleviate over-fitting

• The simplest form of the regularization term

• The total error function becomes

• Setting the gradient of the function w.r.t. 𝐰 to 0, we have

18
Regularized least squares

• A more general regularizer

• q=2 → quadratic regularizer

• q=1 → the lasso in the statistics literature
• Contours of the regularization term

19
Multiple outputs

• In some applications, we wish to predict 𝐾 > 1 target values

➢ One target value: Income -> Happiness
➢ Multiple target values: Income -> Happiness, Hours of duty, Health

• Recall the one-dimensional case

• With the same basis functions, the regression approach becomes

where 𝐖 is a 𝑀 × 𝐾 matrix, 𝑀 is the number of basis functions,

and 𝐾 is the number of target values

20
Multiple outputs

• The conditional probability of a single observation is

➢ An isotropic Gaussian, i.e., with a diagonal covariance matrix

➢ Each pair of variables are independent

• The log likelihood function is

21
Multiple outputs: Maximum likelihood solution

• Setting the gradient of the log likelihood function w.r.t. 𝐖 to 0,

we have

• Consider the kth column of 𝐖ML

where 𝐭 𝑘 is a 𝑁-dimensional vector with components [𝑡𝑛𝑘 ]

• It leads to 𝐾 independent regression problems

22
Sequential learning

• The maximum likelihood derivation is a batch technique

➢ It takes all training data into account at the same time
➢ Case 1: The training data set is sufficiently large
➢ Case 2: Data points are arriving in a continuous stream

• For the two cases, it is worthwhile to use sequential

algorithms, or on-line algorithms, in which the data points are
considered one by one, and the model parameters are
updated incrementally

23
Sequential learning

• Stochastic gradient descent

➢ Error function comprises a sum over data points 𝐸 = σ𝑛 𝐸𝑛

➢ Given data point 𝐱 𝑛 , the parameter vector 𝐰 is updated by

where 𝜏 is the iteration number and η is the learning rate

➢ In the case of sum-of-squares error, it is

24
Maximum a posterior

• Likelihood function

• Let’s consider a prior function, which is a Gaussian

where 𝐦0 is the mean and S0 is the covariance matrix

• The posterior function is also a Gaussian

where is the mean

and is the covariance
25
How to derive the mean and covariance in posterior

• According to the marginal and conditional Gaussians on page

93 of the PRML textbook

26
A zero-mean isotropic Gaussian prior

• A general Gaussian prior function

where 𝐦0 is the mean and S0 is the covariance matrix

• A widely used Gaussian prior

• Mean and covariance of the resulting posterior function

27
Sequential Bayesian learning: An example

• Data, including observations and target values, are given one-

by-one

• Data are in a one-dimensional space

• Data are sampled from the function ,

where and , and added by a Gaussian
noise
➢ Note that the function is unknown
➢ We have just the observations and the target values

28
An example

• Regression function

29
An example

• Regression function

• In the beginning, no
data are available

• Constant likelihood

• Prior = posterior

• Sample 6 curves for

function according to
posterior distribution

30
An example

• Regression function

• One data (blue circle)

sample is given
• Likelihood for this
sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
31
An example

• Regression function

• Second data (blue

circle) sample is given
• Likelihood for the
second sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
32
An example

• Regression function

• 20 data (blue circle)

sample are given
• Likelihood for the 20th
sample
• White cross
• Posterior proportional
to likelihood x prior
• Sample 6 curves
according to posterior
33
Predictive distribution
• Recall the posterior function

where and

• Given 𝐰, we regress a data sample via

• In Bayesian treatment, the predictive distribution is

• Then we have

• where
34
• Green curve is used to sample data. It is unknown
• Blue circle: a sampled data
• After learning, the predictive distribution

• Red curve: the mean of the Gaussian above

• Red shaded region: One standard deviation on either side of mean
35
36
• Sample 5 points of 𝐰 according to the posterior function
• Plot the corresponding regression functions

37
References

• Chapters 3.1 and 3.3 in the PRML textbook

38
Thank You for Your Attention!

Yen-Yu Lin (林彥宇)

Email: lin@[Link]
URL: [Link]

Linear Regression Models Guide
100% (1)
Linear Regression Models Guide
61 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Regression Analysis Essentials
100% (1)
Regression Analysis Essentials
2 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
ML Lect1
100% (1)
ML Lect1
51 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Diabetes Classification with SMOTE Analysis
100% (1)
Diabetes Classification with SMOTE Analysis
7 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Vinee
100% (1)
Vinee
28 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
K-Nearest Neighbor Algorithm Explained
100% (1)
K-Nearest Neighbor Algorithm Explained
17 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
14 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
PR01
100% (1)
PR01
41 pages
ECG Image Classification with ML
100% (1)
ECG Image Classification with ML
16 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Weather Impact on Radio Links
100% (1)
Weather Impact on Radio Links
8 pages
Book
100% (1)
Book
480 pages
Importing Stock Data with Pandas
100% (1)
Importing Stock Data with Pandas
4 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
Correlation Measures and Hypothesis Tests
100% (1)
Correlation Measures and Hypothesis Tests
24 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
82 pages
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
Telecom Customer Churn Dataset Analysis
100% (1)
Telecom Customer Churn Dataset Analysis
5 pages
Supervised Learning: Logistic Regression
100% (1)
Supervised Learning: Logistic Regression
35 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
Introduction to Applied Machine Learning
100% (1)
Introduction to Applied Machine Learning
48 pages
Introduction to Pattern Recognition
100% (1)
Introduction to Pattern Recognition
39 pages
Evaluating Machine Learning Algorithms
100% (2)
Evaluating Machine Learning Algorithms
42 pages
Classification
100% (1)
Classification
37 pages
Python Cumprod Function Overview
100% (1)
Python Cumprod Function Overview
27 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
HW1: Image Processing Assignment
100% (1)
HW1: Image Processing Assignment
8 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
Bank Marketing Campaign Analysis
100% (2)
Bank Marketing Campaign Analysis
14 pages
SAT and GPA Regression Analysis
100% (1)
SAT and GPA Regression Analysis
1 page
KNN for Telecom Customer Segmentation
100% (1)
KNN for Telecom Customer Segmentation
11 pages
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
100% (1)
Dokumen - Pub Approaching Almost Any Machine Learning Problem 9788269211528 L 5276104
151 pages
Python for Science: A Minimal Guide
100% (1)
Python for Science: A Minimal Guide
108 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Stats 101: Double-Sided Cheat Sheet
100% (1)
Stats 101: Double-Sided Cheat Sheet
2 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
EDA Lecture Module 2
100% (1)
EDA Lecture Module 2
42 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Case Studies Hysys
100% (1)
Case Studies Hysys
39 pages
1 s2.0 S2666154324004617 Main
No ratings yet
1 s2.0 S2666154324004617 Main
20 pages
Environmental Challenges: Biswajit Bera, Sumana Bhattacharjee, Nairita Sengupta, Soumik Saha
No ratings yet
Environmental Challenges: Biswajit Bera, Sumana Bhattacharjee, Nairita Sengupta, Soumik Saha
8 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
2 Marks
No ratings yet
2 Marks
14 pages
Aditeya Gupta Bachelor Thesis
No ratings yet
Aditeya Gupta Bachelor Thesis
36 pages
Machine Learning Programming Report
No ratings yet
Machine Learning Programming Report
32 pages
Pengaruh Kualitas Terhadap Kepuasan Pelanggan
No ratings yet
Pengaruh Kualitas Terhadap Kepuasan Pelanggan
18 pages
Assignment Multiple Regression R
No ratings yet
Assignment Multiple Regression R
3 pages
Professionally Written Analytical Report
No ratings yet
Professionally Written Analytical Report
7 pages
Internship Report
No ratings yet
Internship Report
61 pages
Women's Autonomy & Schooling
No ratings yet
Women's Autonomy & Schooling
45 pages
CMA Exam Part 1 Practice MCQs
No ratings yet
CMA Exam Part 1 Practice MCQs
29 pages
Econometrics Chapter-4-Multicollinearity-14-08-2023
No ratings yet
Econometrics Chapter-4-Multicollinearity-14-08-2023
33 pages
Mohammad Fawzi Shubita (2024)
No ratings yet
Mohammad Fawzi Shubita (2024)
11 pages
Nur 224 M7
No ratings yet
Nur 224 M7
6 pages
Tripod Cluster Checklist
No ratings yet
Tripod Cluster Checklist
2 pages
National Competency-Based Teacher Standards and Teaching Effectiveness
No ratings yet
National Competency-Based Teacher Standards and Teaching Effectiveness
7 pages
Random Forest Algorithm Overview
No ratings yet
Random Forest Algorithm Overview
11 pages
Chapter 6 (Part I)
0% (1)
Chapter 6 (Part I)
40 pages
Pr2 Titles 1 - 3 Draft
No ratings yet
Pr2 Titles 1 - 3 Draft
8 pages
Integrated Science SBA Workbook 2020-2023
No ratings yet
Integrated Science SBA Workbook 2020-2023
66 pages
Effect of Audio-Visual Aids in Teaching and Learning and Student Academic Performance in Geography Subject in Day Secondary Schools in Rwanda
No ratings yet
Effect of Audio-Visual Aids in Teaching and Learning and Student Academic Performance in Geography Subject in Day Secondary Schools in Rwanda
10 pages
Financial Forecasting
No ratings yet
Financial Forecasting
31 pages
ACM Research Paper - Tonyong-Bayawak-04-03-2023 REVISED
No ratings yet
ACM Research Paper - Tonyong-Bayawak-04-03-2023 REVISED
5 pages
BRM Unit 4
No ratings yet
BRM Unit 4
35 pages
Impact of Macroeconomics on NPAs
No ratings yet
Impact of Macroeconomics on NPAs
4 pages
Lieberson 1991
No ratings yet
Lieberson 1991
14 pages
Using Econometrics A Practical Guide 6th Edition A.H. Studenmund Updated 2025
No ratings yet
Using Econometrics A Practical Guide 6th Edition A.H. Studenmund Updated 2025
153 pages
Intro to Linear Regression Analysis
No ratings yet
Intro to Linear Regression Analysis
43 pages