0% found this document useful (0 votes)

105 views40 pages

Lecture6 Neural Network Basics v1.1

This document provides an introduction and overview of an introductory machine learning course at Carnegie Mellon University. It includes the following: - An outline of course logistics and contact information for the instructor. - Descriptions of key machine learning concepts and techniques that will be covered, including supervised learning, kernel machines, neural networks, deep learning, and probabilistic graphical models. - A list of related follow-on courses that would be relevant for machine learning master's and PhD students at CMU.

Uploaded by

Ricardo Falcão

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views40 pages

Lecture6 Neural Network Basics v1.1

Uploaded by

Ricardo Falcão

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Machine Learning

Yifeng Tao
School of Computer Science
Carnegie Mellon University

Yifeng Tao Carnegie Mellon University 1

Logistics
oCourse website:
http://www.cs.cmu.edu/~yifengt/courses/machine-learning
Slides uploaded after lecture
oTime: Mon-Fri 9:50-11:30am lecture, 11:30-12:00pm discussion
oContact: [email protected]

Yifeng Tao Carnegie Mellon University 2

What is machine learning?
oWhat are we talking when we talk about AI and ML?

Machine learning

Deep learning

Artificial intelligence

Yifeng Tao Carnegie Mellon University 3

What is machine learning

Natural language Computational

Computer vision
processing Biology

Machine Learning

Probability Statistics Calculus Linear algebra

Yifeng Tao Carnegie Mellon University 4

Where are we?
oSupervised learning: linear models
oKernel machines: SVMs and duality
oUnsupervised learning: latent space analysis and clustering
oSupervised learning: decision tree, kNN and model selection
oLearning theory: generalization and VC dimension
oNeural network (basics)
oDeep learning in CV and NLP
oProbabilistic graphical models

oReinforcement learning and its application in clinical text mining

oAttention mechanism and transfer learning in precision medicine

Yifeng Tao Carnegie Mellon University 5

What’s more after introduction?

Probabilistic
graphical
Deep learning models

Machine learning

Optimization Learning theory

Yifeng Tao Carnegie Mellon University 6

What’s more after introduction?
oSupervised learning: linear models
oKernel machines: SVMs and duality
oà Optimization
oUnsupervised learning: latent space analysis and clustering
oSupervised learning: decision tree, kNN and model selection
oLearning theory: generalization and VC dimension
o à Statistical machine learning
oNeural network (basics)
oDeep learning in CV and NLP
o à Deep learning
oProbabilistic graphical models

Yifeng Tao Carnegie Mellon University 7

Curriculum for an ML Master/Ph.D. student in CMU
o10701 Introduction to Machine Learning:
o http://www.cs.cmu.edu/~epxing/Class/10701/
o35705 Intermediate Statistics:
o http://www.stat.cmu.edu/~larry/=stat705/
o36708 Statistical Machine Learning:
o http://www.stat.cmu.edu/~larry/=sml/
o10725 Convex Optimization:
o http://www.stat.cmu.edu/~ryantibs/convexopt/
o10708 Probabilistic Graphical Models:
o http://www.cs.cmu.edu/~epxing/Class/10708-17/
o10707 Deep Learning:
o https://deeplearning-cmu-10707.github.io/
oBooks:
o Bishop. Pattern Recognition and Machine Learning
o Goodfellow et al. Deep learning

Yifeng Tao Carnegie Mellon University 8

Introduction to Machine Learning

Neural network (basics)

Yifeng Tao
School of Computer Science
Carnegie Mellon University

Slides adapted from Eric Xing, Maria-Florina Balcan, Russ

Salakhutdinov, Matt Gormley

Yifeng Tao Carnegie Mellon University 9

A Recipe for Supervised Learning
o1. Given training data:

o2. Choose each of these:

o Decision function

o Loss function

o3. Define goal and train with SGD:

o (take small steps opposite the gradient)

[Slide from Matt Gormley et al.]

Yifeng Tao Carnegie Mellon University 10

Logistic Regression
oThe prediction rule:

oIn this case, learning P(y|x) amounts to

learning conditional probability over two
Gaussian distribution.
oLimitation: only simple data distribution.

[Slide from Eric Xing et al.]

Yifeng Tao Carnegie Mellon University 11

Learning highly non-linear functions
of: X à y
of might be non-linear function
oX continuous or discrete vars
oy continuous or discrete vars

[Slide from Eric Xing et al.]

Yifeng Tao Carnegie Mellon University 12

From biological neuron networks to artificial neural networks

oSignals propagate through neurons in brain.

oSignals propagate through perceptrons in artificial neural network.

[Slide from Eric Xing et al.]

Yifeng Tao Carnegie Mellon University 13

Perceptron Algorithm and SVM
oPerceptron: simple learning algorithm for supervised classification
analyzed via geometric margins in the 50’s [Rosenblatt’57].
oSimilar to SVM, a linear classifier based on analysis of margins.
oOriginally introduced in the online learning scenario.
o Online learning model
o Its guarantees under large margins

[Slide from Maria-Florina Balcan et al.]

Yifeng Tao Carnegie Mellon University 14

The Online Learning Algorithm
oExample arrive sequentially.
oWe need to make a prediction.
oAfterwards observe the outcome.

oFor i=1, 2, ..., :

oApplication:
o Email classification
o Recommendation systems
o Add placement in a new market
[Slide from Maria-Florina Balcan et al.]

Yifeng Tao Carnegie Mellon University 15

Linear Separators: Perceptron Algorithm
oh(x) = wTx + w0, if h(x) ≥ 0, then label x as +,
otherwise label it as –

oSet t=1, start with the all zero vector w1.

oGiven example x, predict positive iff wtTx ≥ 0
oOn a mistake, update as follows:
o Mistake on positive, then update wt+1 ß wt + x
o Mistake on negative, then update wt+1 ß wt – x

oNatural greedy procedure:

o If true label of x is +1 and wt incorrect on x we
have wtTx < 0, wt+1Tx ß wtTx + xTx = wtTx + ||x||2,
so more chance wt+1 classifies x correctly.
o Similarly for mistakes on negative examples.

[Slide from Maria-Florina Balcan et al.]

Yifeng Tao Carnegie Mellon University 16

Perceptron: Example and Guarantee
oExample:

oGuarantee: If data has margin 𝛾 and

all points inside a ball of radius 𝑅, then
Perceptron makes ≤ (𝑅/𝛾)2 mistakes.
o Normalized margin: multiplying all points
by 100, or dividing all points by 100,
doesn’t change the number of mistakes;
algo is invariant to scaling.

[Slide from Maria-Florina Balcan et al.]

Yifeng Tao Carnegie Mellon University 17

Perceptron: Proof of Mistake Bound
oGuarantee: If data has margin 𝛾 and all points inside a ball of radius
𝑅, then Perceptron makes ≤ (𝑅/𝛾)2 mistakes.
oProof:
o Idea: analyze 𝑤𝑡T𝑤∗ and ǁ𝑤𝑡ǁ, where 𝑤∗ is the max-margin sep, ǁ𝑤∗ǁ=1.
o Claim 1: 𝑤𝑡+1T𝑤∗ ≥ 𝑤𝑡T𝑤∗ + 𝛾. (because 𝑥T𝑤∗ ≥ 𝛾)
o Claim 2: 𝑤𝑡+12 ≤ 𝑤𝑡2 + 𝑅2. (by Pythagorean Theorem)

o After 𝑀 mistakes:
o 𝑤𝑀+1T𝑤∗ ≥ 𝛾𝑀 (by Claim 1)
o ||𝑤𝑀+1|| ≤ 𝑅 𝑀 (by Claim 2)
o 𝑤𝑀+1T𝑤∗ ≤ ǁ𝑤𝑀+1ǁ (since 𝑤∗ is unit length)

o So, 𝛾𝑀 ≤ 𝑅𝑀, so 𝑀 ≤ (R/ 𝛾)2.

[Slide from Maria-Florina Balcan et al.]

Yifeng Tao Carnegie Mellon University 18

Multilayer perceptron (MLP)
oA simple and basic type of
feedforward neural networks
oContains many perceptrons that
are organized into layers
oMLP “perceptrons” are not
perceptrons in the strict sense

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 19

Artificial Neuron (Perceptron)

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 20

Artificial Neuron (Perceptron)

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 21

Activation Function
osigmoid activation function:
o Squashes the neuron’s output between 0 and 1
o Always positive
o Bounded
o Strictly increasing
o Used in classification output layer

otanh activation function:

o Squashes the neuron’s output between -1 and 1
o Bounded
o Strictly increasing
o A linear transformation of sigmoid function

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 22

Activation Function
oRectified linear (ReLU) activation:
o Bounded below by 0 (always non-negative)
o Tends to produce units with sparse
activities
o Not upper bounded
o Strictly increasing

oMost widely used activation function

oAdvantages:
o Biological plausibility
o Sparse activation
o Better gradient propagation: vanishing
gradient in sigmoidal activation

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 23

Activation Function in Alexnet
oA four-layer convolutional neural network
o ReLU: solid line
o Tanh: dashed line

[Slide from https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]

Yifeng Tao Carnegie Mellon University 24

Single Hidden Layer MLP

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 25

Capacity of MLP
oConsider a single layer neural network

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 26

Capacity of Neural Nets
oConsider a single layer neural network

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 27

MLP with Multiple Hidden Layers

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 28

Capacity of Neural Nets
oDeep learning playground

[Slide from https://playground.tensorflow.org]

Yifeng Tao Carnegie Mellon University 29

Training a Neural Network

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 30

Stochastic Gradient Descent

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 31

Mini-batch SGD
oMake updates based on a mini-batch of examples (instead of a
single example)
o The gradient is the average regularized loss for that mini-batch
o Can give a more accurate estimate of the gradient
o Can leverage matrix/matrix operations, which are more efficient

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 32

Backpropagation
oMethod used to train neural networks following gradient descent
oEssentially implementation of chain rule and dynamic programming

oThe derivative of last two terms:

[Slide from https://en.wikipedia.org/wiki/Backpropagation]

Yifeng Tao Carnegie Mellon University 33

Backpropagation

oIf oj is output à straightforward

oElse:

[Slide from https://en.wikipedia.org/wiki/Backpropagation]

Yifeng Tao Carnegie Mellon University 34

Weight Decay

[Slide from Russ Salakhutdinov et al.]

Yifeng Tao Carnegie Mellon University 35

Optimization: Momentum
oMomentum: Can use an exponential average of previous gradients:

o Can get pass plateous more quickly, by “gaining momentum”

o Works well in positions with bad Hessian matrix
oSGD w/o momentum, SGD w/ momentum

[Slide from http://ruder.io/optimizing-gradient-descent/]

Yifeng Tao Carnegie Mellon University 36

Momentum-based Optimization
o Nesterov accelerated gradient (NAG):

o Adagrad:
o smaller updates for params associated with frequently occurring features
o larger updates for params associated with infrequent features

o RMSprop and Adadelta:

o Reduce the aggressive, monotonically decreasing learning rate in Adagrad

o Adam

[Slide from http://ruder.io/optimizing-gradient-descent/]

Yifeng Tao Carnegie Mellon University 37

Demo of Optimization Methods

[Slide from http://ruder.io/optimizing-gradient-descent/]

Yifeng Tao Carnegie Mellon University 38

Take home message
oPerceptron is an online linear classifier
oMultilayer perceptron consists perceptrons with various activations
oBackpropagation is an implementation of calculating gradients of
neural network in a backward and dynamic programming way
oMomentum-based mini-batch gradient descent methods are used in
optimizing neural networks

oWhat’s next?
o Regularization in neural networks
o Widely used NN architecture in practice

Yifeng Tao Carnegie Mellon University 39

References
oEric Xing, Tom Mitchell. 10701 Introduction to Machine Learning:
http://www.cs.cmu.edu/~epxing/Class/10701-06f/
oBarnabás Póczos, Maria-Florina Balcan, Russ Salakhutdinov. 10715
Advanced Introduction to Machine Learning:
https://sites.google.com/site/10715advancedmlintro2017f/lectures
oMatt Gormley. 10601 Introduction to Machine Learning:
http://www.cs.cmu.edu/~mgormley/courses/10601/index.html
oWikipedia

Yifeng Tao Carnegie Mellon University 40

Implementing MLPs with Keras
No ratings yet
Implementing MLPs with Keras
61 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Module I
No ratings yet
Module I
109 pages
Week 1 - Artificial Neural Networks - Part I - Justin
No ratings yet
Week 1 - Artificial Neural Networks - Part I - Justin
56 pages
Unit-3 ML
No ratings yet
Unit-3 ML
21 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
L2 2 Ann
No ratings yet
L2 2 Ann
42 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Dulal Mondal LabReport-02
No ratings yet
Dulal Mondal LabReport-02
20 pages
Components of Soft Computing Explained
No ratings yet
Components of Soft Computing Explained
29 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Deep Learning
No ratings yet
Deep Learning
11 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
AIMLB PGP 2025 Session 13 14
No ratings yet
AIMLB PGP 2025 Session 13 14
44 pages
Day1 05 Introduction To DeepLearning Part
No ratings yet
Day1 05 Introduction To DeepLearning Part
20 pages
Unit 1
No ratings yet
Unit 1
29 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
Unit III
No ratings yet
Unit III
29 pages
2024 Scu ML 1 3 Pla
No ratings yet
2024 Scu ML 1 3 Pla
50 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
08 NN
No ratings yet
08 NN
117 pages
Introduction to Neural Networks Basics
No ratings yet
Introduction to Neural Networks Basics
96 pages
ML 4 PPT Unit Iv
No ratings yet
ML 4 PPT Unit Iv
71 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Unit-1 Deep Learning
No ratings yet
Unit-1 Deep Learning
71 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
2024 SCU ML 1 2 Introduction
No ratings yet
2024 SCU ML 1 2 Introduction
35 pages
CV 2025 Spring 14
No ratings yet
CV 2025 Spring 14
33 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Lecture 9
No ratings yet
Lecture 9
97 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
Understanding Neural Networks in AI
No ratings yet
Understanding Neural Networks in AI
13 pages
Basics of Deep Learning
No ratings yet
Basics of Deep Learning
20 pages
Soft Compute
No ratings yet
Soft Compute
21 pages
Lecture Notes: Introduction To Machine Learning For The Sciences
No ratings yet
Lecture Notes: Introduction To Machine Learning For The Sciences
80 pages
Neural Networks for Tech Enthusiasts
No ratings yet
Neural Networks for Tech Enthusiasts
43 pages
Neural Networks & Deep Learning - Study Notes
No ratings yet
Neural Networks & Deep Learning - Study Notes
8 pages
Neural Network
No ratings yet
Neural Network
7 pages
Neural Networks: Machine Learning Is Machine Learning Is
No ratings yet
Neural Networks: Machine Learning Is Machine Learning Is
23 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Artificial Neural Networks Overview
100% (1)
Artificial Neural Networks Overview
40 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
12 pages
Neural Networks Lecture Notes
No ratings yet
Neural Networks Lecture Notes
19 pages
Softcompution - 702 + Question Answer
No ratings yet
Softcompution - 702 + Question Answer
24 pages
Deep Learning U1
No ratings yet
Deep Learning U1
5 pages
Computational Intelligence Course
No ratings yet
Computational Intelligence Course
1 page
Path Optimization in 3D Printer: Algorithms and Experimentation System
No ratings yet
Path Optimization in 3D Printer: Algorithms and Experimentation System
6 pages
Intermediate Algebra Posttest
No ratings yet
Intermediate Algebra Posttest
3 pages
CSC 325 AI Lecture06 Adversarial Search Fall2024 10102024 041106pm
No ratings yet
CSC 325 AI Lecture06 Adversarial Search Fall2024 10102024 041106pm
65 pages
Performance Analysis
No ratings yet
Performance Analysis
3 pages
Error Detection / Correction: Computer Organization & Architecture
No ratings yet
Error Detection / Correction: Computer Organization & Architecture
18 pages
F1 CNS Cat1
No ratings yet
F1 CNS Cat1
2 pages
Algorithm Analysis
No ratings yet
Algorithm Analysis
56 pages
Worksheet - Maxima and Minima
No ratings yet
Worksheet - Maxima and Minima
2 pages
III Eee Cs3353 Cp&Ds QB Unit5
No ratings yet
III Eee Cs3353 Cp&Ds QB Unit5
6 pages
CS-E3190 Prob06 PDF
No ratings yet
CS-E3190 Prob06 PDF
2 pages
Homework #3: Solution of Linear Systems AX B
No ratings yet
Homework #3: Solution of Linear Systems AX B
6 pages
Signal Types and Their Properties
No ratings yet
Signal Types and Their Properties
24 pages
NN Unit - 1
100% (1)
NN Unit - 1
27 pages
DAA Syllabus 2022-23 As Per BoS
No ratings yet
DAA Syllabus 2022-23 As Per BoS
4 pages
3.back Propagation Networks (BPN)
No ratings yet
3.back Propagation Networks (BPN)
11 pages
Image Reconstruction in CT
No ratings yet
Image Reconstruction in CT
27 pages
Understanding Lipschitz Conditions and Euler's Method
No ratings yet
Understanding Lipschitz Conditions and Euler's Method
115 pages
Final Unit 2 Questions.
No ratings yet
Final Unit 2 Questions.
5 pages
An Attention Free Transformer - Cropped
No ratings yet
An Attention Free Transformer - Cropped
29 pages
Assignment 2 CE-415 Artificial Intelligence
No ratings yet
Assignment 2 CE-415 Artificial Intelligence
3 pages
Simplex Method
100% (1)
Simplex Method
34 pages
Kohonen SOM
No ratings yet
Kohonen SOM
4 pages
Computer Aided Geometric Design
No ratings yet
Computer Aided Geometric Design
289 pages
Lecture 13 Small Networks
No ratings yet
Lecture 13 Small Networks
66 pages
Seismic Data Processing Lecture
100% (1)
Seismic Data Processing Lecture
27 pages
Longest Substring with 2 Characters
No ratings yet
Longest Substring with 2 Characters
2 pages
Transportation Problems
No ratings yet
Transportation Problems
4 pages
Bilinear Transformation Made Easy: S A S H
No ratings yet
Bilinear Transformation Made Easy: S A S H
5 pages
State Space Models and Kalman Filter
No ratings yet
State Space Models and Kalman Filter
41 pages