0% found this document useful (0 votes)

19 views53 pages

M02 Regression

The document outlines the INFO6105 Data Science Engineering course for Fall 2024, covering various modules including Linear Algebra, Neural Networks, and Ensemble Learning. It details the concepts of supervised learning, linear regression, gradient descent, and the differences between machine learning and statistics. Additionally, it discusses regularization techniques like Lasso and Ridge regression, emphasizing their applications in model accuracy and feature selection.

Uploaded by

Jerry Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views53 pages

M02 Regression

Uploaded by

Jerry Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INFO6105

Data Science Engineering

Methods and Tools
Fall 2024
Email: [Link]@[Link]
Overview
• Module 1 — Linear Algebra, Probability and Statistics
• Module 2 — Linear Regression, Gradient Descent
• Module 3 — Neural Networks
• Module 4 — Decision Trees
• Module 5 — Ensemble Learning
• Module 6 — Instance Based Learning
• Module 7 — Kernel Methods and SVMs
Module 2: Linear Regression, Gradient
Descent

1. Supervised Learning & AI

2. Linear Regression
3. Gradient Descent

3
Artificial Intelligence (AI)

[Link]
Types of machine learning tasks
• Supervised: correct output known for each training example
– Learn to predict output when given an input vector
• Classification: 1-of-N output (speech recognition, object recognition, medical diagnosis)
• Regression: real-valued output (predicting market prices, customer rating)
• Unsupervised learning
– Create an internal representation of the input, capturing regularities /
structure in data
– Examples: form clusters; extract features
• How do we know if a representation is good?
• Reinforcement learning
– Learn action to maximize payoff
– Not much information in a payoff signal
– Payoff is often delayed
[Link]
Many learning algorithms for different tasks
1. Classification: Determine which discrete category the example is
2. Recognizing patterns: Speech Recognition, facial identity, etc
3. Recommender Systems: Noisy data, commercial pay-off (e.g., Amazon,Netflix).
4. Information retrieval: Find documents or images with similar content
5. Computer vision: detection, segmentation, depth estimation, optical flow, etc
6. Robotics: perception, planning, etc
7. Learning to play games
8. Recognizing anomalies: Unusual sequences of credit card transactions, panic
situation at an airport
9. Spam filtering, fraud detection: The enemy adapts so we must adapt too
10. Many more!

[Link]
Classification

[Link]
Recognizing patterns

[Link]
Recommender Systems
Information retrieval
Computer Vision

[Link]
Robotics

[Link]
Play video games

[Link]
Play games: Alpha Go [Link]

[Link]
Machine Learning vs Data Mining
• Data-mining: Typically using very simple machine learning techniques
on very large databases because computers are too slow to do
anything more interesting with ten billion examples
• Previously used in a negative sense
• misguided statistical procedure of looking for all kinds of relationships in the
data until finally find one
• Now lines are blurred: many ML problems involve tons of data
• But problems with AI flavor (e.g., recognition, robot navigation) still
domain of ML

[Link]
Machine Learning vs Statistics
• ML uses statistical theory to build models
• A lot of ML is rediscovery of things statisticians already knew; often
disguised by differences in terminology
• But the emphasis is very different:
• Good piece of statistics: Clever proof that relatively simple estimation procedure is
asymptotically unbiased.
• Good piece of ML: Demo that a complicated algorithm produces impressive results
on a specific task.
• Can view ML as applying computational techniques to statistical problems.
But go beyond typical statistics problems, with different aims

[Link]
Why Machine Learning
• The essence of machine learning:
– We are sure there is a pattern.
– We can’t pin down exactly the equations for the pattern.
– We have data.
Recognize my Mom among all humans: Credit card approval?
- I am sure I can recognize my mom by how she looks.
- I can’t describe exactly the facial patterns to you.
- I have lots of photos of my mom’s face.
Recommend a movie I like:
- How others like this movie and how you like other movies
have a pattern.
- I can’t describe exactly this pattern.
- I have lots of movie watching data.
17
Supervised Learning
• The machine learning task of
learning a function that maps
an input to an output based
on example input-output
pairs.
• Classification
• Regression

Loan/Credit card approval:

Data: Input x1, x2, x3…Output y
SL is to learn a function f, that maps x to y.
Supervised Learning
• Linear regression
• Logistic regression
• Decision trees
• Support-vector machines
• K-nearest neighbor algorithm
• Neural networks (Multilayer
perceptron)
•…

[Link]
Covariance, Correlation and Regression
Coefficient
• Covariance is the
movement of two
random variables
moving around their
own means.
• Variance is a special
case of covariance.
Correlation
• Correlation is the standardized
value of covariance.
• The standardized values are
bound between -1 and 1.
• X and Y move in the same
direction, correlation is positive.
• X and Y move in opposite
directions, correlation is negative.
• No correlation: Cor(X, Y) = 0.
Pearson correlation coefficient of x and y

[Link]
Regression Coefficient
Exercise:
• X=[1,2,3,4,5,6]
• Y=[1.8,3.2,4.1,6.5,5.8,7.2]
• Calculate the slope for the regression model of Y on X: Use the
above formulae.
• X on Y?
Hands-on • import [Link] as plt
import numpy as np
• import mglearn
x=[Link]([1,2,3,4,5,6])
y=[Link]([1.8,3.2,4.1,6.5,5.8,7.2]) • X, y = [Link].make_wave(n_samples=40)
cov_matrix = [Link](x,y) • [Link](X, y, 'o')
print(cov_matrix)
• [Link](-3, 3)
Out[22]: • [Link]("Feature")
array([[3.5 , 3.72 ], • [Link]("Target")
[3.72 , 4.33866667]])
• [Link]()
reg_coeff= cov_matrix[0,1]/cov_matrix[0,0]
z = [Link](x, y, 1)

In [34]: reg_coeff
Out[34]: 1.062857142857143
In [36]: z
Out[36]: array([1.06285714, 1.04666667])
Hands-on
• from sklearn.linear_model import LinearRegression
• from sklearn.model_selection import train_test_split
• import mglearn
• X, y = [Link].make_wave(n_samples=60)
• X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
• lr = LinearRegression().fit(X_train, y_train)
• print("lr.coef_: {}".format(lr.coef_))
• print("lr.intercept_: {}".format(lr.intercept_))

• print("Training set score: {:.2f}".format([Link](X_train, y_train)))

• print("Test set score: {:.2f}".format([Link](X_test, y_test)))
Linear Regression Hypothesis Testing
• X and Y are random variables with unknown joint distribution.
• 𝛽መ is a random variable, normally distributed or t-distributed with
df=n−k−1
• k is the number of predictor variables. k = 1.
Least Squares
1. from sklearn import datasets
2. import [Link] as sm
3. from scipy import stats

4. diabetes = datasets.load_diabetes()
5. X = [Link]
6. y = [Link]

7. X2 = sm.add_constant(X)
8. est = [Link](y, X2)
9. est2 = [Link]()
10. print([Link]())
Linear Regression
Cost (Loss) function
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import mglearn
import numpy as np
X, y = [Link].make_wave(n_samples=60)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
theta = [Link](X_train.T@X_train)@X_train.T@y_train

X_train2=np.c_[[Link](len(X_train)), X_train]

In [57]: theta
Out[57]: array([-0.03180434, 0.39390555])

lr.coef_: [0.39390555] lr.coef_: [0.39390555]

lr.intercept_: -0.031804343026759746 lr.intercept_: -0.031804343026759746
Gradient decent
Gradient decent vs Stochastic gradient decent
Linear regression
Order of the Polynomial
k=2
• k=0 → constant
• k=1 → line k=1
• k=2 → parabola
• … k=0

Cost
• k=8?

Underfitting vs Overfitting

Size
Cross validation

Testing Error
Training Error

Training Error
Bias vs Variance

[Link]
Regression: statistics or machine learning
• Machine learning: learn a function
• Statistics: learn a distribution
Likelihood
• Maximize log likelihood:
What does it mean?
• The least-squares regression corresponds to finding the maximum
likelihood estimate of θ.
• The target y follows Gaussian distribution.

[Link]
Logistic Regression
• This is Classification.
• Y is usually called “Label”.

Logistic or Sigmoid Function

[Link]
Logistic Function
• Maximum log likelihood:
Regularization
• Simple Linear Regression

• Minimize the residual sum of squares

• RSS=

• Regularization is to avoid overfitting by

penalizing high-valued regression
coefficients.
[Link]
Lasso: Least Absolute Shrinkage and Selection
• Lasso performs L1 regularization

• Where λ is the tuning parameter for the L1 penalty.

• λ = 0 – no coefficients are eliminated; result is the same as least squares
regression
• λ = ∞ – all coefficients are eliminated
• As λ increases, bias increases
• As λ decreases, variance increases
• L1 can reduce coefficients to zero and eliminate the variable.
Ridge regression
• Implements L2 regularization

• Where λ is the tuning parameter for the L2 penalty.

• λ = 0 – no coefficients are eliminated; result is the same as least squares
regression
• λ = ∞ – all coefficients are reduced and approach zero
• As λ increases, bias increases
• As λ decreases, variance increases
• All coefficients are shrunk by the same factor (none are eliminated).
Lasso vs. Ridge Regression
• Lasso method produces a
simpler and more
interpretable model.
• Lasso can reduce coefficients
to zero, performing feature
selection and eliminating
variables of no value.
• Lasso is generally more
accurate compared to Ridge
regression.
• Cross-validation shall be used.
Lasso exercise
• X, y = [Link].load_extended_boston()
• X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
• lr = LinearRegression().fit(X_train, y_train)
• print("Training set score: {:.2f}".format([Link](X_train, y_train)))
• print("Test set score: {:.2f}".format([Link](X_test, y_test)))
• from sklearn.linear_model import Lasso Training set score: 0.95
• lasso = Lasso().fit(X_train, y_train) Test set score: 0.61
• print("Training set score: {:.2f}".format([Link](X_train, y_train)))
Training set score: 0.29
• print("Test set score: {:.2f}".format([Link](X_test, y_test))) Test set score: 0.21
• print("Number of features used: {}".format([Link](lasso.coef_ != 0))) Number of features used: 4
Lasso exercise
• lasso001 = Lasso(alpha=0.01, max_iter=100000).fit(X_train, y_train)
• print("Training set score: {:.2f}".format([Link](X_train, y_train)))
• print("Test set score: {:.2f}".format([Link](X_test, y_test)))
• print("Number of features used: {}".format([Link](lasso001.coef_ != 0)))

• lasso00001 = Lasso(alpha=0.0001, max_iter=100000).fit(X_train, y_train)

• print("Training set score: {:.2f}".format([Link](X_train, y_train)))
• print("Test set score: {:.2f}".format([Link](X_test, y_test)))
• print("Number of features used: {}".format([Link](lasso00001.coef_ != 0)))
• [Link](lasso.coef_, 's', label="Lasso
alpha=1")
• [Link](lasso001.coef_, '^', label="Lasso
alpha=0.01")
• [Link](lasso00001.coef_, 'v',
label="Lasso alpha=0.0001")
• [Link](ncol=2, loc=(0, 1.05))
• [Link](-25, 25)
• [Link]("Coefficient index")
• [Link]("Coefficient magnitude")
• [Link]()
Values of Lambda
Non-zero coefficients
• Through cross-validation, we
can determine the value of λ
λmin λmax
that minimizes the out-of-
sample loss.

[Link]
[Link]/en/latest/glmnet_vignette.html
Log(λ)
Generalized Linear Models
• Regression
• Classification
η: The natural parameter
• The exponential family T(y): The sufficient statistic
a(η): log partition function

McCullagh and Nelder, Generalized Linear Models (2nd ed.)

Summary
• Relationship between AI, ML, and Supervised Learning.
• Linear Regression from probability and machine learning perspective.
• Gradient Descent one method to learn the function.
• Linear in terms of coefficients.
• Bias vs Variance.
• Regularization.
• Generalized Linear Method.

Machine Learning
No ratings yet
Machine Learning
95 pages
LN ML Rug
No ratings yet
LN ML Rug
283 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
257 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
Course Content
No ratings yet
Course Content
3 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Detailed Contents
No ratings yet
Detailed Contents
8 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
QSRI Lecture1
No ratings yet
QSRI Lecture1
45 pages
MIT 6.036: Machine Learning Overview
No ratings yet
MIT 6.036: Machine Learning Overview
56 pages
Machine Learning and Data Mining Notes 1647447657
No ratings yet
Machine Learning and Data Mining Notes 1647447657
134 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
134 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
CS229
No ratings yet
CS229
216 pages
Lecture1 2015
No ratings yet
Lecture1 2015
52 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
152 pages
Math Foundation of ML 1714673313
No ratings yet
Math Foundation of ML 1714673313
300 pages
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
UC Berkeley ML Course Guide
100% (1)
UC Berkeley ML Course Guide
185 pages
Visualization and Pricing of Option Strategies 1689898666
No ratings yet
Visualization and Pricing of Option Strategies 1689898666
300 pages
Notes On Data Science and Machine Learning
No ratings yet
Notes On Data Science and Machine Learning
53 pages
Probabilistic Machine Learning Guide
No ratings yet
Probabilistic Machine Learning Guide
343 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Andrew NG Main - Notes PDF
100% (1)
Andrew NG Main - Notes PDF
226 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
435 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
431 pages
Machine Learning Math Lectures
No ratings yet
Machine Learning Math Lectures
435 pages
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
No ratings yet
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
145 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Math for Machine Learning Fans
No ratings yet
Math for Machine Learning Fans
433 pages
Mathematical Foundations
No ratings yet
Mathematical Foundations
431 pages
Shatter Pitch in MLP Context
No ratings yet
Shatter Pitch in MLP Context
227 pages
CS229 Machine Learning Lecture Notes
No ratings yet
CS229 Machine Learning Lecture Notes
223 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Intro to Machine Learning Concepts
100% (1)
Intro to Machine Learning Concepts
58 pages
Lect 1
No ratings yet
Lect 1
24 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
No ratings yet
Undergraduate Fundamentals of Machine Learning Author William J. Deuschle
143 pages
Machine Learning Updated
No ratings yet
Machine Learning Updated
14 pages
Intro To ML
No ratings yet
Intro To ML
26 pages
Krishna Edx Machine Learning With Python
No ratings yet
Krishna Edx Machine Learning With Python
18 pages
Machine Learning Math Lectures
100% (2)
Machine Learning Math Lectures
408 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
CS229: Machine Learning Notes
No ratings yet
CS229: Machine Learning Notes
241 pages
Model Question Paper Programming in C and Data Structures (14PCD13/14PCD23)
No ratings yet
Model Question Paper Programming in C and Data Structures (14PCD13/14PCD23)
4 pages
ODINI: Escaping Sensitive Data From Faraday-Caged, Air-Gapped Computers Via Magnetic Fields
No ratings yet
ODINI: Escaping Sensitive Data From Faraday-Caged, Air-Gapped Computers Via Magnetic Fields
18 pages
Civil 3D Tips and Tricks
No ratings yet
Civil 3D Tips and Tricks
12 pages
Mis Final Report
50% (4)
Mis Final Report
21 pages
Cisco Software Defined Access Networking PDF
100% (6)
Cisco Software Defined Access Networking PDF
687 pages
VTK VisualizationToolkit Introduction
No ratings yet
VTK VisualizationToolkit Introduction
37 pages
AI Bayes Theorem
No ratings yet
AI Bayes Theorem
10 pages
NSN UMTS Equipment Overview
100% (1)
NSN UMTS Equipment Overview
33 pages
Unit 5
No ratings yet
Unit 5
8 pages
Ebooks File Using Asyncio in Python Understanding Python S Asynchronous Programming Features Caleb Hattingh All Chapters
100% (2)
Ebooks File Using Asyncio in Python Understanding Python S Asynchronous Programming Features Caleb Hattingh All Chapters
55 pages
IXG System IXG Support Tool Setting Manual (Administrator Mode) Ver.2.0.0.0 en
No ratings yet
IXG System IXG Support Tool Setting Manual (Administrator Mode) Ver.2.0.0.0 en
321 pages
NCERT Solution For Class 10 Maths Chapter 8 Introduction To Trigonometry Exercise 8.4
No ratings yet
NCERT Solution For Class 10 Maths Chapter 8 Introduction To Trigonometry Exercise 8.4
9 pages
Cloud File Management Essentials
No ratings yet
Cloud File Management Essentials
3 pages
Config Yml
No ratings yet
Config Yml
9 pages
Ionic-Vue Android
No ratings yet
Ionic-Vue Android
4 pages
Data Capturing and Cybersecurity Laws in India
No ratings yet
Data Capturing and Cybersecurity Laws in India
12 pages
Mock MCQ Test: Subject: Information Systems Management (Ism)
No ratings yet
Mock MCQ Test: Subject: Information Systems Management (Ism)
13 pages
DV Converter Advc 55 User Manual
No ratings yet
DV Converter Advc 55 User Manual
20 pages
Liudmyla H. Havrilova
No ratings yet
Liudmyla H. Havrilova
15 pages
Arc Alterna Brochure Rev 1 2017
No ratings yet
Arc Alterna Brochure Rev 1 2017
12 pages
C++ Mid-Term Solutions: Q.1 (A) What Is The Data Type? Why Do We Require Data Type?
No ratings yet
C++ Mid-Term Solutions: Q.1 (A) What Is The Data Type? Why Do We Require Data Type?
86 pages
PSet1 - Solnb Solutiond
No ratings yet
PSet1 - Solnb Solutiond
10 pages
IRIS Addressable Fire Panel Manual
No ratings yet
IRIS Addressable Fire Panel Manual
94 pages
PFW 5.4 Upgrade Guide
No ratings yet
PFW 5.4 Upgrade Guide
168 pages
Packet Tracer Multiarea Ospf Exploration Physical Mode Part 1sabordo
No ratings yet
Packet Tracer Multiarea Ospf Exploration Physical Mode Part 1sabordo
9 pages
Handout 2 Communication and Globalization Comm Strats Tech
No ratings yet
Handout 2 Communication and Globalization Comm Strats Tech
22 pages
AI Based Customer Analysis
No ratings yet
AI Based Customer Analysis
24 pages
Brochure ILCE 1
No ratings yet
Brochure ILCE 1
32 pages
TOPSTAR Technology System Block Overview
No ratings yet
TOPSTAR Technology System Block Overview
51 pages
Algorithmic Strength Reduction in Filters and Transforms
No ratings yet
Algorithmic Strength Reduction in Filters and Transforms
19 pages

M02 Regression

Uploaded by

M02 Regression

Uploaded by

INFO6105

Data Science Engineering

1. Supervised Learning & AI

Loan/Credit card approval:

• print("Training set score: {:.2f}".format([Link](X_train, y_train)))

lr.coef_: [0.39390555] lr.coef_: [0.39390555]

Logistic or Sigmoid Function

• Minimize the residual sum of squares

• Regularization is to avoid overfitting by

• Where λ is the tuning parameter for the L1 penalty.

• Where λ is the tuning parameter for the L2 penalty.

• lasso00001 = Lasso(alpha=0.0001, max_iter=100000).fit(X_train, y_train)

McCullagh and Nelder, Generalized Linear Models (2nd ed.)

You might also like