0% found this document useful (0 votes)

7 views74 pages

5 LogRegNN

The document outlines a course on Data Mining focusing on Principal Component Analysis, Logistic Regression, and Neural Networks. It includes details about quizzes, midterm exams, and homework assignments, as well as a review of probability and various regression techniques. Key concepts discussed include binary and multi-class logistic regression, perceptrons, and multi-layer perceptrons, along with optimization techniques like maximum likelihood estimation and gradient descent.

Uploaded by

wuyuman6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views74 pages

5 LogRegNN

Uploaded by

wuyuman6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining

Principal Component Analysis

Logistic Regression and Neural

Networks

CS 584 :: Fall 2024

Ziwei Zhu
Department of Computer Science
George Mason University
Part of slides is from Dr. Theodora Chaspari.

1
• Quiz 2 today.
• Solutions of Quiz 1 and 2 will be explained next week.
• Midterm exam is on 10/08
• HW2 out, due in 3 weeks (10/14).

2
Linear Classification
RentPrice = w0 + w1 × Size Predict whether the apartment
+ w2 × DistanceFromGMU will be rented or not
+...

size
Distance to GMU

3
Outline

• Brief review of probability

• Binary Logistic Regression
• Multi-class Logistic Regression
• Perceptron
• Multi-layer Perceptron (MLP)
• Design Issues of Neural Networks

4
Bernoulli Distribution

Toss a biased coin.

A single experiment outputs head/tail.

5
Bernoulli Distribution
Toss a biased coin. A single experiment outputs head/tail.

Probability
p(Y=y)

Distribution
Function
(PDF)
y 6
Bernoulli Distribution: Likelihood

• If we toss this biased coin 𝑛 times.

• Assume the result of each time is 𝑦𝑖 (𝑖 = 1,2 … , 𝑛).
• What’s the probability of observing these results?

7
Bernoulli Distribution: Likelihood
If we toss this biased coin 𝑛 times. Assume the result of
each time is 𝑦𝑖 (𝑖 = 1,2 … , 𝑛). The probability of
observing these results is

likelihood 𝑛 𝑛

𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෑ 𝑝(𝑦𝑖 |𝜃) = ෑ 𝜃 𝑦𝑖 (1 − 𝜃)𝑦𝑖

𝑖=1 𝑖=1

8
Bernoulli Distribution: Likelihood
If we toss this biased coin 𝑛 times. Assume the result of
each time is 𝑦𝑖 (𝑖 = 1,2 … , 𝑛). The probability of
observing these results is

likelihood 𝑛 𝑛

𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෑ 𝑝(𝑦𝑖 |𝜃) = ෑ 𝜃 𝑦𝑖 (1 − 𝜃)𝑦𝑖

𝑖=1 𝑖=1

log-likelihood 𝑛

log 𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෍ 𝑦𝑖 log 𝜃 + 1 − 𝑦𝑖 log(1 − 𝜃)

𝑖=1
9
Maximum Likelihood Estimation (MLE)

If we don’t know the parameter 𝜃 of the biased

coin but we observe 𝑦1 , . . , 𝑦𝑛 from 𝑛 independent
experiments, how can we estimate 𝜃?

10
Maximum Likelihood Estimation (MLE)
If we don’t know the parameter 𝜃 of the biased coin
but we observe 𝑦1 , . . , 𝑦𝑛 from 𝑛 independent
experiments, then we can estimate 𝜃 by maximizing the
log likelihood:
𝑛

𝜃 ∗ = arg max ෍ 𝑦𝑖 log 𝜃 + 1 − 𝑦𝑖 log(1 − 𝜃)

𝜃
𝑖=1

𝑛
∗
σ𝑖=1 𝑦𝑖
𝜃 =
𝑛
11
Outline

• Brief review of probability

• Binary Logistic Regression
• Multi-class Logistic Regression
• Perceptron
• Multi-layer Perceptron (MLP)
• Design Issues of Neural Networks

12
Logistic Regression

Key idea: given one input sample, represented by its

features 𝒙 ∈ ℝ𝐷 , we use a linear model to predict
the probability that the label 𝑦 of this sample is
positive (y = 1).

13
Logistic Regression
Example:
Classification task: whether a student passes or not the course
Features: SAT scores
Logistic Regression: estimates “pass” probability, i.e.,
f(score)=p(pass). If p(pass)= f(score)>0.5, predicts “pass”,
otherwise “fail”.

14
How exactly do we do?
Input: 𝒙 ∈ ℝ𝐷+1
Predict Probability: 𝑝 𝑦 = 1 𝒙 = 𝒘𝑇 𝒙
𝑤0
𝑤1
𝒘 = ⋮ the same as 𝒘 in the linear regression model
𝑤𝐷

𝑝 𝑦 = 1 𝒙 = 𝒘𝑇 𝒙 ∈ ℝ

However, the probability should be 𝟎~𝟏

15
How exactly do we do?
Input: 𝒙 ∈ ℝ𝐷+1
Predict Probability: 𝑝 𝑦 = 1 𝒙 = 𝜎(𝒘𝑇 𝒙)

𝑤0
𝑤1
𝒘 = ⋮ the same as 𝒘 in the linear regression model
𝑤𝐷

1 𝑒𝑥
Sigmoid function: 𝜎 𝑥 = 1+𝑒 −𝑥
= 1+𝑒 𝑥
∈ (0,1)
16
The Sigmoid Function

𝑑𝜎(𝑥)
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥 17
The Sigmoid Function

𝑑𝜎(𝑥)
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥 18
The Sigmoid Function

𝑑𝜎(𝑥)
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥 19
The Sigmoid Function

𝑑𝜎(𝑥)
= 𝜎(𝑥)(1 − 𝜎(𝑥))
𝑑𝑥 20
How exactly do we do?

Input: 𝒙 ∈ ℝ𝐷
Predict Probability: 𝑝 𝑦 = 1 𝒙 = 𝜎 𝒘𝑇 𝒙

1, 𝑝 𝑦 = 1 𝒙 > 𝟎. 𝟓
Predict Class: 𝑦ො = ቊ
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

can change as we need

21
Logistic Regression

∈ ℝ𝐷+1
22
Logistic Regression

Binary classification can be considered as predicting the

parameter in a Bernoulli distribution:

𝑦~

23
Likelihood as the Model Evaluation

24
Likelihood as the Model Evaluation

25
Optimization

26
Optimization

𝒘∗ = arg min 𝜀(𝒘)

𝒘

27
Optimization

𝒘∗ = arg min 𝜀(𝒘)

𝒘

28
Recap: Gradient Descent

29
Optimization: Gradient Descent

Optimization at k-th step:

30
Optimization: Gradient Descent

31
Recap: Non-Linear Regression

32
Can be Applied to Logistic Regression

Just replace 𝑿𝑛 with Φ(𝑿𝑛 )

33
Recap: Overfitting

34
Overfitting in Logistic Regression

35
Regularization in Logistic Regression

36
Outline

• Brief review of probability

• Binary Logistic Regression
• Multi-class Logistic Regression

37
Multi-class Logistic Regression

38
Multi-class Logistic Regression

39
Multi-class Logistic Regression

40
Multi-class Logistic Regression

41
Multi-class Logistic Regression
+1

42
Optimization

43
Optimization

44
What have we learnt so far

47
Outline

• Brief review of probability

• Binary Logistic Regression
• Multi-class Logistic Regression
• Perceptron
• Multi-layer Perceptron (MLP)
• Design Issues of Neural Networks

48
Another Way to View Logistic Regression

𝑦ො = 𝜎(𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝐷 𝑥𝐷 )

49
Perceptron

• Feature inputs 𝑥𝑑 ∈ ℝ, 𝑑 = 1, … , 𝐷
• Each input is associated with a connection weight 𝑤𝑑 ∈
ℝ, 𝑑 = 1, … , 𝐷
• One additional bias term 𝑤0 or denoted as 𝑏
• Output is some function (called activation function) of the
linear combination of inputs: 𝑦ො = 𝑠(𝑤0 + 𝑤1𝑥1 + 𝑤2 𝑥2 +
⋯ + 𝑤𝐷 𝑥𝐷 ) = 𝑠(𝒘𝑇 𝒙), where 𝑠(∙) have many choices, e.g.,
sigmoid function 𝜎 ∙
• Perceptron can be used for classification and regression 50
Perceptron

55
Perceptron: Training

56
Perceptron: Approximate Linear Functions
Example: Boolean AND

57
Perceptron: Approximate Linear Functions
Example: Boolean XOR

58
Outline

• Perceptron
• Multi-layer Perceptron (MLP)
• Design Issues of Neural Networks

59
Multi-layer Perceptron

60
Multi-layer Perceptron: Example

61
Multi-layer Perceptron
input layer output layer

hidden layer
62
Multi-layer Perceptron

63
Multi-layer Perceptron
Flexible to add hidden layers and nodes at layers to
increase the complexity of the model

Theoretically, can approximate any functions

65
Multi-layer Perceptron

Loss for Classification

Loss for Regression

How to train the model?

66
Optimization: Backpropagation

• Forward propagation to calculate training loss

• Back propagation to measure how much each node is
“responsible” for the training loss, and then we update
corresponding weights by Gradient Descent
67
Optimization: Backpropagation
• Not easy especially for neural networks of complex
structures.
• Fortunately, we have awesome tools!

68
Outline

• Perceptron
• Multi-layer Perceptron (MLP)
• Design Issues of Neural Networks

69
Determine number of layers and sizes
• The capacity of the network (i.e., the number and
complexity of representable functions
• How to avoid overfitting?

70
Determine number of layers and sizes

71
Determine Activation Function

Linear: 𝑠 𝑥 = 𝑥

• Cannot introduce non-linearity to the neural networks.

73
Determine Activation Function

74
Determine Activation Function

75
Determine Activation Function

76
Determine Activation Function

77
What have we learnt so far

• View Logistic Regression as Perceptron, which is the

basic processing unit of neural networks and can
represent linear functions.
• Multi-layer perceptron to approximate non-linear
functions.
• We need to determine number of layers, size of layers,
and activation function for neural networks.

Machine Learning Algorithms Explained
No ratings yet
Machine Learning Algorithms Explained
46 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
Mod 1
No ratings yet
Mod 1
99 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
NN Theory
No ratings yet
NN Theory
138 pages
cs188 sp24 Note22
No ratings yet
cs188 sp24 Note22
8 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
Basic of NN
No ratings yet
Basic of NN
230 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages
Unit 3
No ratings yet
Unit 3
15 pages
Unit 2 - Class
No ratings yet
Unit 2 - Class
16 pages
Basics of Deep Learning
No ratings yet
Basics of Deep Learning
20 pages
Comprehensive Guide to Machine Learning
No ratings yet
Comprehensive Guide to Machine Learning
19 pages
Lec1 PerceptronPocket Recap
100% (1)
Lec1 PerceptronPocket Recap
61 pages
Classification
No ratings yet
Classification
47 pages
Multilayer Perceptron (MLP) : Rowel Atienza, PHD University of The Philippines 2023
No ratings yet
Multilayer Perceptron (MLP) : Rowel Atienza, PHD University of The Philippines 2023
21 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Module 5
No ratings yet
Module 5
27 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Neural Networks: 10-601B Introduction To Machine Learning
No ratings yet
Neural Networks: 10-601B Introduction To Machine Learning
78 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Neural Networks (Basics)
No ratings yet
Neural Networks (Basics)
30 pages
Bayesian Belief and Regression
No ratings yet
Bayesian Belief and Regression
19 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Super VIP Cheatsheet: AI Overview
No ratings yet
Super VIP Cheatsheet: AI Overview
18 pages
Neural Networks
No ratings yet
Neural Networks
63 pages
AI & Neural Networks Basics
No ratings yet
AI & Neural Networks Basics
39 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Julien
No ratings yet
Julien
84 pages
Deep Learning and Logistic Regression Guide
No ratings yet
Deep Learning and Logistic Regression Guide
42 pages
M02 Regression
No ratings yet
M02 Regression
53 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
EEE3335 - Final Exam Companion Prep Pack
No ratings yet
EEE3335 - Final Exam Companion Prep Pack
7 pages
Chap 2 Slides
No ratings yet
Chap 2 Slides
74 pages
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
No ratings yet
Lecture 13 - Perceptrons: Machine Learning March 16, 2010
49 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
23 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Part 2
No ratings yet
Part 2
10 pages
Offline Exam Handbook - Semester I (1) - 42-61
No ratings yet
Offline Exam Handbook - Semester I (1) - 42-61
20 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Module 2
No ratings yet
Module 2
55 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Softmax vs Sigmoid in Neural Networks
No ratings yet
Softmax vs Sigmoid in Neural Networks
15 pages
BAI 3303 Notes
No ratings yet
BAI 3303 Notes
12 pages
Lec 05
No ratings yet
Lec 05
46 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Will Bilevel Optimizers Benefit From Loops: Kaiyi Ji, Mingrui Liu, Yingbin Liang and Lei Ying June 2, 2022
No ratings yet
Will Bilevel Optimizers Benefit From Loops: Kaiyi Ji, Mingrui Liu, Yingbin Liang and Lei Ying June 2, 2022
32 pages
On The Implicit Bias in Deep-Learning Algorithms: Gal Vardi TTI-Chicago and Hebrew University
No ratings yet
On The Implicit Bias in Deep-Learning Algorithms: Gal Vardi TTI-Chicago and Hebrew University
17 pages
Asymptotics For Sketching in Least Squares Regression: Edgar Dobriban and Sifan Liu October 8, 2019
No ratings yet
Asymptotics For Sketching in Least Squares Regression: Edgar Dobriban and Sifan Liu October 8, 2019
47 pages
Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems
No ratings yet
Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems
22 pages
Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting
No ratings yet
Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting
25 pages
1 s2.0 S0885064X14000831 Main
No ratings yet
1 s2.0 S0885064X14000831 Main
14 pages
A Lyapunov Analysis of Accelerated Methods in Optimization: Ashia C. Wilson
No ratings yet
A Lyapunov Analysis of Accelerated Methods in Optimization: Ashia C. Wilson
34 pages
Understanding The Role of Momentum in Stochastic Gradient Methods
No ratings yet
Understanding The Role of Momentum in Stochastic Gradient Methods
32 pages
Benign Overfitting in Ridge Regression: Alexander Tsigler
No ratings yet
Benign Overfitting in Ridge Regression: Alexander Tsigler
76 pages
1 s2.0 0022247X9190144O Main
No ratings yet
1 s2.0 0022247X9190144O Main
22 pages
Online Mirror Descent and Dual Averaging: Keeping Pace in The Dynamic Case
No ratings yet
Online Mirror Descent and Dual Averaging: Keeping Pace in The Dynamic Case
38 pages
Stochastic Optimization Under Distributional Drift: Joshua Cutler
No ratings yet
Stochastic Optimization Under Distributional Drift: Joshua Cutler
56 pages
Gasnikov 19 A
No ratings yet
Gasnikov 19 A
18 pages
A Parameter-Free Conditional Gradient Method For Composite Minimization Under H Older Condition
No ratings yet
A Parameter-Free Conditional Gradient Method For Composite Minimization Under H Older Condition
34 pages
A Statistical Perspective On Randomized Sketching For Ordinary Least-Squares
No ratings yet
A Statistical Perspective On Randomized Sketching For Ordinary Least-Squares
31 pages
cs580 HWK Set2 Sol
No ratings yet
cs580 HWK Set2 Sol
6 pages
8 Clustering2
No ratings yet
8 Clustering2
84 pages
Module10 Assignment
No ratings yet
Module10 Assignment
2 pages
7-2 Categories+Objects
No ratings yet
7-2 Categories+Objects
15 pages
7 Clustering1
No ratings yet
7 Clustering1
72 pages
4 LinReg
No ratings yet
4 LinReg
80 pages
TLE 8 Weekly Lesson Overview
No ratings yet
TLE 8 Weekly Lesson Overview
12 pages
Control Volume Angular Momentum Analysis
No ratings yet
Control Volume Angular Momentum Analysis
5 pages
Ultrasonic Sensor Setup Guide
No ratings yet
Ultrasonic Sensor Setup Guide
3 pages
Fourth Periodic Test in Science (Grade 8) SY 2016-2017
No ratings yet
Fourth Periodic Test in Science (Grade 8) SY 2016-2017
6 pages
7 Series Furuno Radar S Band Magnetron Replacement
No ratings yet
7 Series Furuno Radar S Band Magnetron Replacement
5 pages
CDF and PMF in Probability Theory
No ratings yet
CDF and PMF in Probability Theory
10 pages
Gahhaj
No ratings yet
Gahhaj
10 pages
Thesis Statement Examples For Narrative Essays
100% (2)
Thesis Statement Examples For Narrative Essays
11 pages
MFG en Paper Collaborative Models in Bwda BFL and Their Self Help Groups in India Jan 2010
No ratings yet
MFG en Paper Collaborative Models in Bwda BFL and Their Self Help Groups in India Jan 2010
5 pages
Observation Phase: Craft Insights
No ratings yet
Observation Phase: Craft Insights
8 pages
Determination of Moisture and Ash in Hydrated Lime
No ratings yet
Determination of Moisture and Ash in Hydrated Lime
2 pages
Teens and Fast Fashion Impact
No ratings yet
Teens and Fast Fashion Impact
36 pages
Organizational Behavior Basics
No ratings yet
Organizational Behavior Basics
114 pages
Ambewela Aitken Spence Wind Farm
No ratings yet
Ambewela Aitken Spence Wind Farm
19 pages
Solution
No ratings yet
Solution
27 pages
m.i.c.e Tiếng Anh
No ratings yet
m.i.c.e Tiếng Anh
20 pages
Ricardo R-A 2020-21 Web
No ratings yet
Ricardo R-A 2020-21 Web
200 pages
Internship Acknowledgements
No ratings yet
Internship Acknowledgements
6 pages
Visual Arts Grade 12 Task 2: Pat 1
No ratings yet
Visual Arts Grade 12 Task 2: Pat 1
10 pages
Servo Drive and Motor Guide
No ratings yet
Servo Drive and Motor Guide
39 pages
GCSE Maths Practice Test 1H
No ratings yet
GCSE Maths Practice Test 1H
18 pages
Acknowledgement: Ikjot Singh 3070
No ratings yet
Acknowledgement: Ikjot Singh 3070
23 pages
The Basic of Socio Legal Research
100% (1)
The Basic of Socio Legal Research
5 pages
Days Since March 9, 2021 Calculator
No ratings yet
Days Since March 9, 2021 Calculator
1 page
EE 609 Tut-1 Questions
No ratings yet
EE 609 Tut-1 Questions
2 pages
2023 RRRT
No ratings yet
2023 RRRT
35 pages
Accra Wesley Girls' Math Mock Exam 2021
No ratings yet
Accra Wesley Girls' Math Mock Exam 2021
6 pages
Cambridge IGCSE™: English As A Second Language 0510/22 March 2021
No ratings yet
Cambridge IGCSE™: English As A Second Language 0510/22 March 2021
12 pages
Density Based Spatial Clustering (DBSCAN) : With Data Analysis
No ratings yet
Density Based Spatial Clustering (DBSCAN) : With Data Analysis
36 pages
SPT Hammer Energy Ratio Versus Drop Height: Technical Notes
No ratings yet
SPT Hammer Energy Ratio Versus Drop Height: Technical Notes
4 pages

5 LogRegNN

Uploaded by

5 LogRegNN

Uploaded by

Data Mining

Principal Component Analysis

Logistic Regression and Neural

CS 584 :: Fall 2024

• Brief review of probability

Toss a biased coin.

• If we toss this biased coin 𝑛 times.

𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෑ 𝑝(𝑦𝑖 |𝜃) = ෑ 𝜃 𝑦𝑖 (1 − 𝜃)𝑦𝑖

𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෑ 𝑝(𝑦𝑖 |𝜃) = ෑ 𝜃 𝑦𝑖 (1 − 𝜃)𝑦𝑖

log 𝑝 𝑦1 , . . , 𝑦𝑛 𝜃 = ෍ 𝑦𝑖 log 𝜃 + 1 − 𝑦𝑖 log(1 − 𝜃)

If we don’t know the parameter 𝜃 of the biased

𝜃 ∗ = arg max ෍ 𝑦𝑖 log 𝜃 + 1 − 𝑦𝑖 log(1 − 𝜃)

• Brief review of probability

Key idea: given one input sample, represented by its

However, the probability should be 𝟎~𝟏

can change as we need

Binary classification can be considered as predicting the

𝒘∗ = arg min 𝜀(𝒘)

𝒘∗ = arg min 𝜀(𝒘)

Optimization at k-th step:

Just replace 𝑿𝑛 with Φ(𝑿𝑛 )

• Brief review of probability

• Brief review of probability

Theoretically, can approximate any functions

Loss for Classification

How to train the model?

• Forward propagation to calculate training loss

• Cannot introduce non-linearity to the neural networks.

• View Logistic Regression as Perceptron, which is the

You might also like