0% found this document useful (0 votes)
39 views69 pages

Cme250 Lecture2

The document is a lecture on Linear and Logistic Regression from a Machine Learning course, covering key concepts such as the bias-variance trade-off, assessing model performance, and the application of linear regression techniques. It discusses simple and multiple linear regression, the least squares method, and introduces logistic regression for binary classification. The lecture emphasizes the importance of model flexibility, potential problems in regression analysis, and assessment metrics for evaluating model fit.

Uploaded by

HOD CSM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views69 pages

Cme250 Lecture2

The document is a lecture on Linear and Logistic Regression from a Machine Learning course, covering key concepts such as the bias-variance trade-off, assessing model performance, and the application of linear regression techniques. It discusses simple and multiple linear regression, the least squares method, and introduces logistic regression for binary classification. The lecture emphasizes the importance of model flexibility, potential problems in regression analysis, and assessment metrics for evaluating model fit.

Uploaded by

HOD CSM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CME 250:

Introduction to Machine Learning


Lecture 2:
Linear and Logistic Regression
Sherrie Wang
[email protected]
CME 250: Introduction to Machine Learning, Winter 2019 1
Slides are online at
Agenda cme250.stanford.edu

• Bias-variance trade-off

• Linear regression
- Simple linear regression
- What is a “good” fit?
- Multiple linear regression
- Variations on linear regression

• Logistic regression
- What is a “good” fit?
CME 250: Introduction to Machine Learning, Winter 2019 2
Recall: Types of Machine Learning
Do you have
labeled data?

Yes No

Supervised Unsupervised

What do you want


to predict?

Category Quantity

Classification Regression

CME 250: Introduction to Machine Learning, Winter 2019 3


Bias and Variance

CME 250: Introduction to Machine Learning, Winter 2019 4


Assessing Model Performance
There are a number of metrics used to assess model performance on
supervised tasks (regression and classification).

Key point: We want to know how good predictions are when we


apply our method to previously unseen data.

Why? The ability to generalize to unseen data is what makes these


methods useful.

CME 250: Introduction to Machine Learning, Winter 2019 5


Datasets
Training data
• Observations used to learn the model

Validation data
• Observations used to estimate error for parameter-tuning or model
selection

Test data
• Observations used to measure performance on unseen data (how well the
model generalizes)
• Not available to the algorithm during any part of the learning process
CME 250: Introduction to Machine Learning, Winter 2019 6
Assessing Model Performance
We want a method that gives high test set performance, or low
test error.

What about high training set performance / low training set error?

There is no guarantee that the method with the highest training


performance will have the highest test performance.

CME 250: Introduction to Machine Learning, Winter 2019 7


Model Flexibility

Test error

Training error

FIGURE 2.9, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 8


Model Flexibility
As model flexibility increases, training error will decrease.

The model can fit more and more of the variance in the training set.
Some of this variance, however, may be noise.

Therefore the test error may or may not decrease.

If training error is much larger than test error, the model is overfitting.

CME 250: Introduction to Machine Learning, Winter 2019 9


Model Flexibility

Test error

Training error

FIGURE 2.10, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 10


Bias and Variance
Expected test error = Variance + Bias2 + another term

Bias = error caused by simplifying assumptions built into the model

Variance = how much the learned function will change if trained on a


different training set

CME 250: Introduction to Machine Learning, Winter 2019 11


Bias-Variance Trade-off
Generally, more flexible methods have more variance and less bias.

Less flexible methods have more bias and less variance.

The best method for a task will balance the two types of error to
achieve the lowest test error.

CME 250: Introduction to Machine Learning, Winter 2019 12


Bias-Variance Trade-off
Dataset #1 Dataset #2 Dataset #3

FIGURE 2.12, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 13


Supervised Algorithm #2:
Linear Regression

CME 250: Introduction to Machine Learning, Winter 2019 14


Linear Regression
Simple supervised learning method, used to predict quantitative
output values.

• Many machine learning methods are generalizations of linear


regression.

• Illustrates key concepts in supervised learning while maintaining


interpretability.

CME 250: Introduction to Machine Learning, Winter 2019 15


Simple Linear Regression
Predict a quantitative response Y on the basis of a single predictor
variable X.

Assumes there is an approximately linear relationship between X and Y.

and are the coefficients or parameters of the linear model. In


this case they represent the intercept and slope terms of a line.

CME 250: Introduction to Machine Learning, Winter 2019 16


Simple Linear Regression
Estimate βs using training data: (x(1), y(1)), (x(2), y(2)), … (x(n), y(n))

Once βs are estimated, we denote them using “hats”.

For a particular realization of X, aka X = x, the predicted output is


denoted “y-hat”:

Goal: Pick , such that the model is a good fit to the training data.

CME 250: Introduction to Machine Learning, Winter 2019 17


Simple Linear Regression

Advertising
dataset

FIGURE 3.1, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 18


Simple Linear Regression
Two related questions:

• How do we estimate the coefficients? (aka “fit the model”)

• What is a “good fit” to the data?

CME 250: Introduction to Machine Learning, Winter 2019 19


Least Squares
Typically, how well a linear model is fit to the data is measured using
least squares.

Residual for the i-th sample:

Residual sum of squares (RSS):

CME 250: Introduction to Machine Learning, Winter 2019 20


Least Squares
The model fit using least squares finds , that minimize the RSS.

Recall that the extrema of a function can be found by setting its


derivative to zero, and verified to be minima via the second derivative.

sample means
CME 250: Introduction to Machine Learning, Winter 2019 21
Least Squares in Pictures

RSS is the sum of the


squares of all vertical
gray lines.

As we vary the βs,


RSS changes. Least
squares finds βs that
minimize RSS.

FIGURE 3.1, ISL (8th printing 2017) FIGURE 3.2, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 22


How good is the model fit?
Linear regression is typically assessed using two related metrics:

• Residual standard error (RSE)


higher RSE
means worse fit

CME 250: Introduction to Machine Learning, Winter 2019 23


How good is the model fit?
Linear regression is typically assessed using two related metrics:

• Residual standard error (RSE)


higher RSE
means worse fit

TSS is the “total


sum of squares”
• R2 statistic
higher R2
means better fit

R2 measures the proportion of variability in Y explained by X.


CME 250: Introduction to Machine Learning, Winter 2019 24
What range of values can R2 take?

CME 250: Introduction to Machine Learning, Winter 2019 25


What range of values can R2 take?

CME 250: Introduction to Machine Learning, Winter 2019 26


How good is the model fit?
What values of RSE and R2 are “good”?

Depends on the domain and application.

• In physics, we may know the data comes from a linear model. In


such a case, we’d require R close to 1 in order to call the fit good.
2

• In biology, social sciences, and other domains, a linear model may


be a crude approximation. Existing models may also not be good at
estimating Y, so a model with R2 of 0.4 may be considered good.

CME 250: Introduction to Machine Learning, Winter 2019 27


Multiple Linear Regression
What if our dataset contains multiple input dimensions Xj?

We could run a simple linear regression for each input dimension.

Is this satisfactory?

CME 250: Introduction to Machine Learning, Winter 2019 28


Multiple Linear Regression
Predict the response variable using more than one predictor variable.

Here, Xj represents the j-th predictor.

We interpret βj as the average effect on Y of a 1 unit increase in Xj,


holding all other predictors fixed.

CME 250: Introduction to Machine Learning, Winter 2019 29


Multiple Linear Regression
3 Simple Regressions 1 Multiple Regression

CME 250: Introduction to Machine Learning, Winter 2019 30


Multiple Linear Regression
3 Simple Regressions 1 Multiple Regression

Variable Correlations

CME 250: Introduction to Machine Learning, Winter 2019 31


Multiple Linear Regression

FIGURE 3.4, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 32


Multiple Linear Regression
Another way to write

is to use matrix notation.

CME 250: Introduction to Machine Learning, Winter 2019 33


Multiple Linear Regression
Define:

Then:

CME 250: Introduction to Machine Learning, Winter 2019 34


Least Squares
The model fit using least squares finds that minimize the RSS.

Notation:
square of L2-norm
of the vector inside ||

The normal equations give us the analytical solution for .

CME 250: Introduction to Machine Learning, Winter 2019 35


How good is the model fit?
Multiple linear regression can also be assessed using the 2 metrics
previously discussed.

• Residual standard error (RSE)

• R2 statistic

CME 250: Introduction to Machine Learning, Winter 2019 36


Qualitative Inputs
How do we include qualitative inputs in regression?

• E.g. predictor is “gender”: “male” or “female”

If only two possible values, we can create a dummy variable

CME 250: Introduction to Machine Learning, Winter 2019 37


Qualitative Inputs
If more than two possible values, introduce multiple dummy variables

• E.g. predictor is “eye color”: “blue”, “green”, or “brown”

CME 250: Introduction to Machine Learning, Winter 2019 38


Potential Problems
• Non-linearity of response-predictor relationships

• Non-additivity of predictors

• Non-constant variance of error terms

• Outliers can mislead model performance

• High-leverage points overly influence βs

• Correlation of error terms

• Collinearity
CME 250: Introduction to Machine Learning, Winter 2019 39
Non-linearity of Data
Residual plots can help
diagnose if this is an issue.

Plot residuals vs. fitted


values.

If there is a pattern in the


residual plot, then the
linearity assumption is
suspect. FIGURE 3.11, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 40


Non-linearity of Data
A simple way to extend
linear regression to model
non-linear relationships is
via polynomial regression.

This is still a linear model; it FIGURE 3.11, ISL (8th printing 2017)

can be fit via least squares.

CME 250: Introduction to Machine Learning, Winter 2019 41


Non-additivity of Predictors
Additivity means each predictor Xj affects Y independently of the value
of other predictors.

If this is not true, we can extend linear regression by including


interaction effects.

CME 250: Introduction to Machine Learning, Winter 2019 42


Non-constant Variance of Error
Can be seen in a funnel
shape in the residual plot.

Standard errors, confidence


intervals, and hypothesis tests
rely on constant variance
assumption.

One possible solution is to


transform the response Y
using a concave function (log,
square root). FIGURE 3.1, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 43


Outliers and High-leverage Points
Outliers have unusual Y
values, and can inflate RSE
and deflate R2.

High-leverage points have


unusual values for
predictors X, and influence
least squares line
substantially.

CME 250: Introduction to Machine Learning, Winter 2019 44


Other Questions
Not addressed in this course, but important to consider:

• Does the data contain evidence of a relationship between X and Y?

• Are the estimated coefficients close to the true ones?

Future lecture:

• If we have many predictors Xj, which one(s) do we include in the


model?

CME 250: Introduction to Machine Learning, Winter 2019 45


Linear Regression Summary
Advantages:

• Simple model

• Interpretable coefficients

• Can obtain good results with small datasets

Disadvantages:

• Model may be too simple to make accurate predictions over large range of values

• Sensitive to outliers in data due to minimization of squared error

CME 250: Introduction to Machine Learning, Winter 2019 46


Linear Regression in sklearn
from sklearn.linear_model import LinearRegression

linreg = LinearRegression()
linreg.fit(X, y)
linreg.coef_ # coefficients of input
linreg.intercept_ # model intercept
y_hat = linreg.predict(y)

CME 250: Introduction to Machine Learning, Winter 2019 47


Assessment Metrics in sklearn
from sklearn.metrics import r2_score,
mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

CME 250: Introduction to Machine Learning, Winter 2019 48


Supervised Algorithm #3:
Logistic Regression

CME 250: Introduction to Machine Learning, Winter 2019 49


Classification
Classification methods are used to
predict qualitative output values.

• Assign each observation to a class /


category

• e.g. K-nearest neighbor classifier from


Lecture 1

FIGURE 4.1, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 50


Logistic Regression
Binary classification: Y takes on two values, 0 or 1, corresponding to 2
classes.

Logistic regression models binary classification as

• Set threshold to obtain class decisions

• Extension of linear regression for probabilities in [0, 1]

CME 250: Introduction to Machine Learning, Winter 2019 51


Linear regression?
Why don’t we use this expression to model probability?

CME 250: Introduction to Machine Learning, Winter 2019 52


Logistic Regression
Logistic (sigmoid) function bounds output

Logistic function

• S-shaped curve

• Always takes values in (0, 1), which are valid probabilities

Logistic regression model:

CME 250: Introduction to Machine Learning, Winter 2019 53


Logistic Regression

https://www.saedsayad.com/logistic_regression.htm

CME 250: Introduction to Machine Learning, Winter 2019 54


Logistic Regression

CME 250: Introduction to Machine Learning, Winter 2019 55


Logistic Regression

CME 250: Introduction to Machine Learning, Winter 2019 56


Logistic Regression

CME 250: Introduction to Machine Learning, Winter 2019 57


Logistic Regression

Threshold

CME 250: Introduction to Machine Learning, Winter 2019 58


Logistic Regression

FIGURE 4.2, ISL (8th printing 2017)

CME 250: Introduction to Machine Learning, Winter 2019 59


Estimating Coefficients (β)
Recall that for linear regression, βs are estimated using least squares
on the training data.

For logistic regression, there is no closed form solution for βs


obtainable by taking the derivative and setting to zero.

Instead, βs are estimated using maximum likelihood estimation.

CME 250: Introduction to Machine Learning, Winter 2019 60


Multiple Logistic Regression
We can extend logistic regression to the case of multiple predictor
variables.

CME 250: Introduction to Machine Learning, Winter 2019 61


Multiple Logistic Regression
We can extend logistic regression to the case of multiple predictor
variables.

https://florianhartl.com/logistic-regression-geometric-intuition.html

CME 250: Introduction to Machine Learning, Winter 2019 62


How good is the model fit?
Classifier performance can be
summarized in a table called the Predicted class
confusion matrix.
0 1

True class
• “Good performance” is when True Positive False Negative
0 (TP) (FN)
TP, TN large and FP, FN small
False Positive True Negative
1 (FP) (TN)
• Can be computed for training,
validation, and test sets. Test
set informs you about model
generalizability.
CME 250: Introduction to Machine Learning, Winter 2019 63
How good is the model fit?
Some common metrics to assess
performance: Predicted class

• Accuracy: (TP + TN) / n 0 1

True class
True Positive False Negative
• Recall: TP / (TP + FN) 0 (TP) (FN)
False Positive True Negative
• Precision: TP / (TP + FP) 1 (FP) (TN)

• Specificity: TN / (FP + TN)

• False positive rate: FP / (FP + TN)


CME 250: Introduction to Machine Learning, Winter 2019 64
How good is the model fit?
ROC (receiver operating characteristic)
curve summarizes the trade-off between
recall and false positive rate. Area under
the curve summarizes in one metric.

Many classifiers have a “knob” or


threshold that can be adjusted to make
the classifier more or less conservative
in predicting Y = 1.

Trade-off: more true positive (TP) —>


more false positives (FP)
CME 250: Introduction to Machine Learning, Winter 2019 65
Logistic Regression Summary
Advantages:

• Extension of linear regression, simple

• Interpretable: log-odds are linear in predictors

• No hyperparameters to tune

Disadvantages:

• Cannot model complex decision boundaries

CME 250: Introduction to Machine Learning, Winter 2019 66


Dataset Splitting in sklearn
import pandas as pd
from sklearn.model_selection import train_test_split

data = pd.read_csv(‘dataset.csv’)
X = data[‘input’]
y = data[‘output’]
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.2)

CME 250: Introduction to Machine Learning, Winter 2019 67


Logistic Regression in sklearn
from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)

CME 250: Introduction to Machine Learning, Winter 2019 68


Assessment Metrics in sklearn
from sklearn.metrics import accuracy_score,
precision_score, recall_score, confusion_matrix

accuracy = accuracy_score(y_test, y_pred)


cm = confusion_matrix(y_test, y_pred)

CME 250: Introduction to Machine Learning, Winter 2019 69

You might also like