0% found this document useful (0 votes)
20 views192 pages

DL Lecture Part1v1

The document outlines a two-week lecture series on deep learning, covering topics such as the history of deep learning, neural networks, mathematical foundations, and applications in various fields. It includes hands-on components, mathematical concepts, and discussions on state-of-the-art techniques in computer vision and natural language processing. Additionally, it lists key researchers and resources for further study in deep learning.

Uploaded by

jamilfelippe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views192 pages

DL Lecture Part1v1

The document outlines a two-week lecture series on deep learning, covering topics such as the history of deep learning, neural networks, mathematical foundations, and applications in various fields. It includes hands-on components, mathematical concepts, and discussions on state-of-the-art techniques in computer vision and natural language processing. Additionally, it lists key researchers and resources for further study in deep learning.

Uploaded by

jamilfelippe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 192

Introduction to Deep learning:

a 2-weeks lecture
Part 1

Presented by: Dra. Jeaneth Machicao October/2020


[email protected]
Course overview: STAT 453: Deep Learning, Spring 2020
by Prof. Sebastian Raschka
Part1: Introduction Part4: DL for computer vision and language modeling
● Introduction to deep learning ● Introduction to convolutional neural networks 1-2
● The brief history of deep learning ○CNNs Architectures Illustrated
● Single-layer neural networks: The perceptron ● Introduction to recurrent neural networks 1-2
● Motivation: cases of use Part5: Deep generative models
● Hands-on ● Autoencoders,
Part2: Mathematical and computational foundations ● Autoregressive models
● Linear algebra and calculus for deep learning ● Variational autoencoders
● Parameter optimization with gradient descent ● Normalizing Flow models
● Automatic differentiation & PyThorch ● Generative adversarial networks
Part3: Introduction to neural networks ● Evaluating generative models
● Multinomial logistic regression
● Multilayer pecerptrons http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
● Regularization https://github.com/rasbt/stat453-deep-learning-ss20
● Input normalization and weight initiliazation
• Course Playlist on youtube:
● Learning rated and advanced optimization algorithms Prof. Dalcimar Casanova
https://www.youtube.com/watch?v=0VD_2t6EdS4&list=PL9At2PVRU0ZqVArhU9QMyI3jSe113_m2-
Prof. Sebastian Raschka
https://www.youtube.com/watch?v=e_I0q3mmfw4&list=PLTKMiZHVd_2JkR6QtQEnml7swCnFBtq4P
Overview of our 2-weeks lecture!

1: Introduction 4: DL for computer vision and language modeling


● Introduction to deep learning ● Introduction to convolutional neural networks 1-2
● The brief history of deep learning ○ CNNs Architectures Illustrated
● Single-layer neural networks: The perceptron ● Introduction to recurrent neural networks 1-2
● Motivation: cases of use
● Hands-on (report)
2: Mathematical and computational foundations ● Deliver report of the hands-on
● Linear algebra and calculus for deep learning
● Parameter optimization with gradient descent
● Automatic differentiation & PyThorch
3: Introduction to neural networks
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/
● Multinomial logistic regression
https://github.com/rasbt/stat453-deep-learning-ss20
● Multilayer pecerptrons
● Regularization • Course Playlist on youtube:
● Input normalization and weight initiliazation Prof. Dalcimar Casanova
● Learning rated and advanced optimization algorithms Prof. Sebastian Raschka
5
Deep learning (DL) - A little of history (I)
● Lenet - Classic CNN. They were born in the early 90s.
● Predecessor: Neocognitron

(Y. LeCun - 1998)

Source: Rayner Montes


6
Deep learning (DL) - A little of history (II)
Big paper: “ImageNet Classification with Deep Convolutional
Neural Networks” (Alex Krizhevsky - 2012)

● Task: Object classification,


1000 classes. Millions of
training images (ImageNet
competition)

● Alexnet was much better than


all state of the art methods.

Source: Rayner Montes


7
Why did it take so long?
From early 90s to 2012
Answer: NO GPUS

9
Why GPU? (I)
More layers more training samples more execution time

Source: Rayner Montes


Why GPU? (II)
● Training Samples

Big NN
Success Rate

Medium NN

Small NN
Classic ML

Amount of training samples


Source: Rayner Montes
11
Object Classification - Classic Machine Learning

Feature ML
Extraction Algorithm

Source: Rayner Montes


12
Object Classification - Classic ML

Feature ML
Extraction Algorithm

X2 X2

X1 X1

Source: Rayner Montes


13
Object Classification - Classic ML

Feature ML
Extraction Algorithm

??
X2 X2 X2

X1 X1 X1

Source: Rayner Montes


14
Object Classification - Classic ML

Feature ML
Extraction Algorithm

● Results highly depends on


this phase.
● Designed by an expert

Source: Rayner Montes


16
https://paperswithcode.com/sota

https://github.com/terryum/awesome-deep-learning-papers

State-of-the-art
Computer Vision Speech Music
● Image Segmentation ● Speech Recognition ● Music Generation
● Image Classification ● Speech Synthesis ● Music Information Retrieval
● Object Detection ● Speech Enhancement ● Music Source Separation
● Image Generation ● Speaker Verification ● Music Modeling

Natural Language Processing Time Series Computer Code


● Machine Translation ● Imputation ● Dimensionality Reduction
● Question Answering ● Time Series Classification ● Feature Selection
● Sentiment Analysis ● Time Series Forecasting ● Code Generation
● Text Classification ● Gesture Recognition ● Program Synthesis

Medical Audio Playing Games


● Medical Image Segmentation ● Music Generation ● Atari Games
● Drug Discovery ● Audio Classification ● Continuous Control
● Lesion Segmentation ● Audio Generation ● Starcraft
● Brain Tumor Segmentation ● Sound Event Detection ● Real-Time Strategy Games
17
Computer Vision Natural Language Processing
● Image Segmentation; Image Classification ● Machine Translation
● Object Detection; Image Generation ● Question Answering
● Super-Resolution; Autonomous Vehicles ● Sentiment Analysis
● Video... ● Text Classification
● Representation Learning
● Word Embeddings 18
Some Applications Of Machine
Learning/Deep Learning

19
AI in marketing & sales: Propensity to buy
The problem Opportunity for DL
● A lack of knowledge about a customer’s ● Model using a combination of semantic analysis:
propensity to buy ○ Text written by the customer,
● “Propensity to buy” is the likelihood of a ○ Demographic information,
customer to purchase a particular product. ○ Purchase history
What can be achieved? ○ Information about how they navigate the website
● Classify potential customers by their to make a prediction for that customer’s propensity
likelihood to purchase a particular product. to buy.
● This can be integrated into to marketing and Data requirements:
sales strategies. ● A model like this would need historical data of
demographics and pre-purchase behavior of
customers linked to if a purchase was made.

Source: https://peltarion.com/use-cases/propensity-to-buy 20
Using AI to detect fraud

The problem Opportunity for DL


● Globally, fraud costs ~£3.24tn. ● Detect complicated underlying patterns
from seemingly unrelated information.
● Classical done by rules-based
● Ability to continuously learn and evolve to
algorithms
remain up to date with a dynamic
typically complicated and not always
○ environment.
very hard to circumvent.
Data requirements:
What can be achieved?
● Historical data of demographics and pre-
● The improved accuracy promises
purchase behavior of customers
substantial cost reduction for many
industries and sectors. ○ fraudulent or normal.

Source: https://peltarion.com/use-cases/fraud-detection 21
Automated defect detection
The problem Opportunity for DL
● Product quality testing is slow and ● DL for fully automated production line and
enable more accurate analysis of the quality
inefficient (bottlenecks).
of each individual part.
● Traditional automated systems are Data requirements
both expensive and difficult to ● Trained on images of manufactured parts,

implement.Opportunity for deep ○ defective or non-defective.


● Cameras mounted on the production line
learning
feed images to the model.
What can be achieved?
● AI to reduce production cost,
speed and accuracy.

Source: https://peltarion.com/use-cases/defect-detection 22
Audio analysis for industrial
maintenance
A key part of smart manufacturing and a
modern factory approach involves real-time
monitoring of machinery operating
conditions.
What can be achieved?
● DL to detect malfunctioning machinery
in real-time will lead to increased
productivity and decreased costs.
Data requirements
● Audio recordings
○ functioning or malfunctioning
machinery.
Mel-spectrogram of an industrial solenoid valve
● Microphones mounted in key parts of
each machine.

Source: https://peltarion.com/use-cases/machinery-operating-conditions 23
Improving customer service
through sentiment

The problem
● Frustration associated with bad experiences
can have a significant impact on customer
retention.
Opportunity for DL
Automated customer service phone calls and ● Natural language processing (NLP) are ideal
chatbots are becoming increasingly easy to for gaining insight into the user experience in
interact with. customer service interactions.
Data requirements
● Text or audio from historical examples
○ successful and unsuccessful automated
customer service interactions

Source: https://peltarion.com/use-cases/customer-service-sentiment-analysis 24
Main researchers in Deep Learning
● Samy Bengio https://research.google.com/pubs/bengio.html
● Yoshua Bengio http://www.iro.umontreal.ca/~bengioy/yoshua_en/research.html
● Thomas Dean htps://research.google.com/pubs/author189.html
● Jeffrey Dean https://research.google.com/pubs/jeff.html
● Nando de Freitas https://www.cs.ox.ac.uk/people/nando.defreitas/
● Geoff Hilton http://www.cs.toronto.edu/~hinton/
● Yann LeCun http://yann.lecun.com/
● Andrew Ng http://www.andrewng.org/
● Quoc Le, Honglak Lee, Tommy Poggio, ...

25
Resources
● Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. 2017 (✰✰✰✰✰)
● François Chollet. Deep Learning with Python. 2017 (✰✰✰✰)
○ Practitioner’s approach. Keras implementation per topic
● Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning (Adaptive Computation and Machine
Learning series). 2015 (✰✰)
○ Theoretical book. There is no code covered in the book.
● Michael Nielsen. Neural Networks and Deep Learning
○ Theory-based learning approach. Some code snippets.
● Gulli and Kapoor. TensorFlow Deep Learning Cookbook.
○ Lots of code and explanations of what the code is doing
● Adrian Rosebrock. Deep Learning for Computer Vision with Python.
● Sandro Skansi. Introduction to Deep Learning: From Logical Calculus to Artificial Intelligence. 2018
● Andriy Burkov. The Hundred-Page Machine Learning Book.
○ All started because of challenge accepted
● Andrew Ng. Machine Learning Yearning: Technical strategy for AI engineers, in the era of Deep Learning.
● Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. Learning from Data: A short course.
○ Supplement with lectures and videos.

26
Libraries for Deep Learning

27
Lecture 01

What Are Machine Learning


And Deep Learning?
An Overview.
STAT 453: Introduction to Deep Learning and Generative Models
Spring 2020

Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 1


The 3 Broad Categories Of ML (And DL)

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Source: Raschka and Mirjalily (2019). Python Machine Learning, 3rd Edition
Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 29
Machine Learning
Terminology and Notation

(Again, this also applies to DL)

1/5 -- What Is Machine Learning?


2/5 -- The 3 Broad Categories of ML
3/5 -- Machine Learning Terminology and Notation
4/5 -- Machine Learning Modeling Pipeline
5/5 --The Practical Aspects: Our Tools!

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 46


Machine Learning Jargon 1/2

• supervised learning:
learn function to map input x (features) to output y
(targets)

• structured data:
databases, spreadsheets/csv files

• unstructured data:
features like image pixels, audio signals, text
sentences
(previous to DL, extensive feature engineering required)

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 47


Supervised Learning (More Formal Notation)

"training examples"

Training set: 𝒟 = {⟨x[i], y [i]⟩, i = 1,… , n},

Unknown function: f(x) = y


sometimes t or o
Hypothesis: h(x) = ŷ

Classification Regression

m m
h : ℝ → 𝒴, 𝒴 = {1,...,k} h:ℝ →ℝ

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 48


Data Representation

x1T x1[1] x2[1] ⋯ xm[1]


x1
x2 xT2 x1[2] x2[2] ⋯ xm[2]
X= X=
x= ⋮ ⋮ ⋮ ⋱ ⋮

xm xTn x1[n] x2[n] ⋯ xm[n]

Feature vector Design Matrix Design Matrix

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 50


Data Representation (structured data)

m= _____

n= _____

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 51


Data Representation (unstructured data; images)

"traditional methods"

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 52


Data Representation (unstructured data; images)
Convolutional Neural Networks
Image batch dimensions: torch.Size([128, 1, 28, 28]) "NCHW" representation (more on that later)
Image label dimensions: torch.Size([128])

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 53


Machine Learning Jargon 2/2

• Training example, synonymous to


observation, training record, training instance, training sample (in some contexts, sample refers to a collection of
training examples)
• Feature, synonymous to
predictor, variable, independent variable, input, attribute, covariate
• Target, synonymous to
outcome, ground truth, output, response variable, dependent variable, (class) label (in classification)
• Output / Prediction, use this to distinguish from targets; here, means output from the model

• use loss L for a single training example


• use cost C for the average loss over the training set
• use ϕ( ⋅ ) , unless noted otherwise, for the activation function

(will make more sense later)

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 54


Machine Learning
Modeling Pipeline

(Like before, this also applies to DL)

1/5 -- What Is Machine Learning?


2/5 -- The 3 Broad Categories of ML
3/5 -- Machine Learning Terminology and Notation
4/5 -- Machine Learning Modeling Pipeline
5/5 --The Practical Aspects: Our Tools!

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 55


Supervised Learning Workflow

Labels
Training Data

Machine Learning
Algorithm

New Data Predictive Model Prediction

Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 56


Supervised Learning Workflow (more detailed)

Feature Extraction and Scaling


Feature Selection
Dimensionality Reduction Mostly not needed in DL
Sampling

Labels

Training Dataset
Learning
Final Model New Data
Labels Algorithm

Raw Test Dataset


Data
Labels

Preprocessing Learning Evaluation Prediction

Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization

Source: Raschka and Mirjalily (2019). Python Machine Learning, 3rd Edition
Sebastian Raschka STAT 453: Intro to Deep Learning SS 2020 57
Lecture 05

Fitting Neurons with


Gradient Descent
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
Perceptron Recap
b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
X
mm !!
X
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation T
w1 x1xwi1w+
i+b b = = xT wx+wb +
= bŷ = ŷ
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X i=1
i=1
x2 w2 ŷ <latexit sha1_base64="A6o7zW7lnOXrNRdOyoA7bzsujJ4=">AAACUXicbVFNS8QwEJ2t3+vXqkcvg4uwIkgrgl4E0YtHBVeF7VrSbNoNJm1JUnUp/Yse9OT/8OJBMdtdwa+BgZc375HJS5gJro3rvtScicmp6ZnZufr8wuLScmNl9VKnuaKsTVORquuQaCZ4wtqGG8GuM8WIDAW7Cm9PhvOrO6Y0T5MLM8hYV5I44RGnxFgqaPR9zWNJ/GMexy30dS6Dgh965U0hS3wIPLy3vY0hVootPMSRAf2Qxy1fEtMPo+KhvLnAr8N9OTJYQaXvE1MMyqDRdHfcqvAv8MagCeM6CxpPfi+luWSJoYJo3fHczHQLogyngpV1P9csI/SWxKxjYUIk092iSqTETcv0MEqV7cRgxX53FERqPZChVQ631r9nQ/K/WSc30UG34EmWG5bQ0UVRLtCkOIwXe1wxasTAAkIVt7si7RNFqLGfULcheL+f/Bdc7u54Fp/vNY+Ox3HMwjpsQAs82IcjOIUzaAOFR3iFd/ioPdfeHHCckdSpjT1r8KOc+U+YcbFc</latexit>

<latexit sha1_base64="5F4RtKSwN6gLVNG6+drmzi7FRow=">AAACfHicbVFNT9tAEF0bKJDSksKRAyNSpCBUZKOq7aVSVC4cg5SQSHGI1pt1ssqube2OGyLLv6L/jFt/Si8Vm8TiI2Gk1Ty992Z2djZMpTDoeX8dd2Nz6932zm7l/d6Hj/vVTwe3Jsk0422WyER3Q2q4FDFvo0DJu6nmVIWSd8LJ1Vzv/ObaiCRu4SzlfUVHsYgEo2ipQfVPYMRIUQgkj7AOgckUDCAHAT/BhwLuLFY235dsAdMndA4hBFqMxnhm3SuNFMVxGFnnfdmlZfMzO12vH1O0wgyKQbXmXXiLgHXgl6BGymgOqg/BMGGZ4jEySY3p+V6K/ZxqFEzyohJkhqeUTeiI9yyMqeKmny+WV8CpZYYQJdqeGGHBvqzIqTJmpkLrnE9vVrU5+ZbWyzD60c9FnGbIY7a8KMokYALzn4Ch0JyhnFlAmRZ2VmBjqilD+18VuwR/9cnr4Pbywrf45mut8atcxw45IiekTnzynTTINWmSNmHkn3Ps1J0z57/72T13vyytrlPWHJJX4X57BMCwuMI=</latexit>

((
<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

<latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>

Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

 00
00,if z <
. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

(z)==
(z)
wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>
11,if z > 10
<latexit sha1_base64="m1vO6djD/2/n5RhOD3QM1YDZNMw=">AAACTnicbVHPaxNBGJ1NtY1R21SPXj4MQr2UXRHqQSHUi8cUzA/IhjA7+XYzZHZ2nfm2mCz5C3spvflnePFgEZ1N9qCJDwYe730/Zt5EuZKWfP+b1zh48PDwqPmo9fjJ0+OT9umzgc0KI7AvMpWZUcQtKqmxT5IUjnKDPI0UDqPFx8ofXqOxMtOfaZnjJOWJlrEUnJw0bWNoZZLys9Vr+ABhhInUpXDz7BpasIEPIeFXKkHGsAZYwftKCms32HHDBL84sXJD1LN61rTd8c/9DWCfBDXpsBq9afsunGWiSFGTUNzaceDnNCm5ISkUrlthYTHnYsETHDuqeYp2Um7iWMMrp8wgzow7mmCj/t1R8tTaZRq5ypTT3O56lfg/b1xQ/G5SSp0XhFpsF8WFAsqgyhZm0qAgtXSECyPdXUHMueGC3A+0XAjB7pP3yeDNeeD41dtO97KOo8lesJfsjAXsgnXZJ9ZjfSbYDfvOfrJ779b74f3yfm9LG17d85z9g0bzDz4prWY=</latexit>

<latexit sha1_base64="T1uilhZzWsdKA050UzjQrhpqtEw=">AAACLnicbVBdS8MwFE3nd/2a+uhLcAgKIq0ICqKIIvio4JywjJGmt1tYmpYkFbayX+SLf0UfBBXx1Z9htvVBpwcCJ+fce5N7glRwbTzv1SlNTE5Nz8zOufMLi0vL5ZXVW51kikGVJSJRdwHVILiEquFGwF2qgMaBgFrQOR/4tXtQmifyxnRTaMS0JXnEGTVWapYviOatmG71tvExJgG0uMyZnaf7rreDyRHuYSIAe5gQ1y+Ek9GVgAyL2ma54u16Q+C/xC9IBRW4apafSZiwLAZpmKBa130vNY2cKsOZgL5LMg0pZR3agrqlksagG/lw3T7etEqIo0TZIw0eqj87chpr3Y0DWxlT09bj3kD8z6tnJjps5FymmQHJRg9FmcAmwYPscMgVMCO6llCmuP0rZm2qKDM2YdeG4I+v/Jfc7u36ll/vV07Pijhm0TraQFvIRwfoFF2iK1RFDD2gJ/SG3p1H58X5cD5HpSWn6FlDv+B8fQN9YKSI</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

b=
<latexit sha1_base64="DCslSBKTzJPCossa3lI7E6Re1xA=">AAAB83icbZBNS8NAEIY39avWr6pHL4tF8GJJRNCLUPTisYK1hSaUzXbSLt1swu5EKKF/w4sHRbz6Z7z5b9y2OWjrCwsP78wws2+YSmHQdb+d0srq2vpGebOytb2zu1fdP3g0SaY5tHgiE90JmQEpFLRQoIROqoHFoYR2OLqd1ttPoI1I1AOOUwhiNlAiEpyhtfyQXtMz6uMQkPWqNbfuzkSXwSugRgo1e9Uvv5/wLAaFXDJjup6bYpAzjYJLmFT8zEDK+IgNoGtRsRhMkM9untAT6/RplGj7FNKZ+3siZ7Ex4zi0nTHDoVmsTc3/at0Mo6sgFyrNEBSfL4oySTGh0wBoX2jgKMcWGNfC3kr5kGnG0cZUsSF4i19ehsfzumf5/qLWuCniKJMjckxOiUcuSYPckSZpEU5S8kxeyZuTOS/Ou/Mxby05xcwh+SPn8wdTHpCQ</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

1. Initialize w := 0 2 Rm , b := 0 <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
:
(a) ŷ [i] := <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJc
x[i]T w + b Compute output (prediction)

(b) err := (y [i] <latexit sha1_base64="wYimCRAo97FH6nzYS/fz3/CUouY=">AAACEHicbZDLSsNAFIYn3q23qEs3g0XUhZKIoAhC0Y1LBWuFJpbJ9NQOnVyYORFDyCO48VXcuFDErUt3vo3TNgut/jDwzX/OYeb8QSKFRsf5ssbGJyanpmdmK3PzC4tL9vLKlY5TxaHOYxmr64BpkCKCOgqUcJ0oYGEgoRH0Tvv1xh0oLeLoErME/JDdRqIjOENjtexND+Eec1CqoEfHdCu7yZvCL+gO9boM86wY3rdbdtXZdQaif8EtoUpKnbfsT68d8zSECLlkWjddJ0E/ZwoFl1BUvFRDwniP3ULTYMRC0H4+WKigG8Zp006szImQDtyfEzkLtc7CwHSGDLt6tNY3/6s1U+wc+rmIkhQh4sOHOqmkGNN+OrQtFHCUmQHGlTB/pbzLFONoMqyYENzRlf/C1d6ua/hiv1o7KeOYIWtknWwRlxyQGjkj56ROOHkgT+SFvFqP1rP1Zr0PW8escmaV/JL18Q3haZx5</latexit>


ŷ [i] ) Calculate error
(c) w := w + err ⇥ x[i] , <latexit sha1_base64="ypGRpFBWbrLkFvM0DBOQw+AAwc4=">AAACH3icbVDLSsNAFJ3UV62vqEs3g0UQhJKIqAhC0Y3LCvYBSSyT6aQdOpmEmYlaQv7Ejb/ixoUi4q5/46QtUlsPDJw5517uvcePGZXKsoZGYWFxaXmluFpaW9/Y3DK3dxoySgQmdRyxSLR8JAmjnNQVVYy0YkFQ6DPS9PvXud98IELSiN+pQUy8EHU5DShGSktt89QNker5QfqYwYtLOPU7gkQI6CoaEvmrP2X3qUO9rG2WrYo1Apwn9oSUwQS1tvntdiKchIQrzJCUjm3FykuRUBQzkpXcRJIY4T7qEkdTjvRULx3dl8EDrXRgEAn9uIIjdbojRaGUg9DXlfmectbLxf88J1HBuZdSHieKcDweFCQMqgjmYcEOFQQrNtAEYUH1rhD3kEBY6UhLOgR79uR50jiu2JrfnpSrV5M4imAP7INDYIMzUAU3oAbqAINn8ArewYfxYrwZn8bXuLRgTHp2wR8Ywx9A/KMf</latexit>

b := b + err Update parameters


Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
General Learning Principle

Let D = (hx[1] , y [1] i, hx[2] , y [2] i, ..., hx[n] , y [n] i) 2 (Rm ⇥ {0, 1})n
<latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

"On-line" mode
m This applies to all common neuron
1. Initialize w := 0 2 R , b := 0
models and (deep) neural network
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: architectures!


A. For every hx[i] , y [i] i 2 D <latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>
:
(a) Compute output (prediction) There are some variants of it, namely the
"batch mode" and the "minibatch mode"
(b) Calculate error
which we will briefly go over in the next
(c) Update w, b <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

slides and then discuss more later

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
General Learning Principle
Let D = (hx[1] , y[1] i, hx[2] , y[2] i, ..., hx[n] , y[n] i) 2 (Rm ⇥ {0, 1})n <latexit sha1_base64="Wal6cJLWU5bzcyF5u646zqqA+4A=">AAADVXiclZLfbtMwFMZP0jFGB6ywS24sKiSQpiqpkOAGaYJd7HIguk1qs8lxndaa7UT2CaKK+lo8yG4m3oQbJE7dSvvTIYGjyJ+/851fEsd5pZXHJPkZxa2NB5sPtx61tx8/ebrTefb82Je1E3IgSl2605x7qZWVA1So5WnlJDe5lif5xadF/eSbdF6V9ivOKpkZPrGqUIIjWWXnB4zAAAeEKQiaNTRwAHNg8IHu11TV5FqY0CzJuU7nUFD2O2XPaB5CChnpPcrMbjkjcDcIe4Hxr8z+GrP/F2YvXP9Ht2t0ew/9TWAoWi/345qWU8+XQDMhg5Qy1OHDqoGE2CmpOTHOwLLt80436SVhsHWRrkQXVuPovHM5GpeiNtKi0Nz7YZpUmDXcoRJaztuj2suKiws+kUOSlhvpsyacijl7Rc6YFaWj2yIL7s2OhhvvZyanpOE49XdrC/O+2rDG4n3WKFvVKK1YPqioNcOSLY4YGysnBeoZCS6condlYsodF0gHsU2bkN795HVx3O+lpD+/7e5/XG3HFryAl/QLUngH+3AIRzAAEV1Gv+IojuOr+Hdro7W5jMbRqmcXbo3Wzh+cKM3Z</latexit>

Most commonly used in DL, because


Minibatch mode
1. Choosing a subset (vs 1
(mix between on-line and batch)
example at a time)
1. Initialize w := 0 2 Rm , b := 0 <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>
takes advantage of
2. For every training epoch: vectorization (faster
A. Initialize w := 0, b := 0 iteration through epoch
than on-line)
<latexit sha1_base64="MHuS1lDrYLNfDe0EZMaG9lxFVEM=">AAACDHicbVDLSgMxFM3UV62vqks3wSK4kDKjgqIIRV24rGAf0Cklk2ba0ExmSO4oZegHuPFX3LhQxK0f4M6/MdPOQlsPBE7OuZd77/EiwTXY9reVm5tfWFzKLxdWVtfWN4qbW3UdxoqyGg1FqJoe0UxwyWrAQbBmpBgJPMEa3uAq9Rv3TGkeyjsYRqwdkJ7kPqcEjNQpltxrJoBgNyDQ9/zkYXR2YR+45zjTPfM1VXbZHgPPEicjJZSh2il+ud2QxgGTQAXRuuXYEbQTooBTwUYFN9YsInRAeqxlqCQB0+1kfMwI7xmli/1QmScBj9XfHQkJtB4GnqlMd9bTXir+57Vi8E/bCZdRDEzSySA/FhhCnCaDu1wxCmJoCKGKm10x7RNFKJj8CiYEZ/rkWVI/LDtHZfv2uFS5zOLIox20i/aRg05QBd2gKqohih7RM3pFb9aT9WK9Wx+T0pyV9WyjP7A+fwABpZmn</latexit>

B. For every {hx[i] , y [i] i, ..., hx[i+k] , y [i+k] i} ⇢ D <latexit sha1_base64="KqtCSuAcMIvMBYGIcT0Muf0Bu9k=">AAACZHicdVHRatswFJXdbe2ybnNb9jQYl4XBYMHYWbMkb2Xbwx47WNpC7AVZuU5FZNlI8mgQ/sm+9bEv+47JSQrd6C4Ijs65h3t1lFWCaxNFN56/8+jxk929p51n+89fvAwODs90WSuGE1aKUl1kVKPgEieGG4EXlUJaZALPs+WXVj//hUrzUv4wqwrTgi4kzzmjxlGzwCY2EVQuBEJSUHOZ5faq+WmnPG16YFdbCIla9/QgDMMePOz4sLznaS93LoDEYV1nGs3Gw6iwX5tZ0I3Cfv/TqD+AKBwM4vh46MBoPB6OhxCH0bq6ZFuns+A6mZesLlAaJqjW0ziqTGqpMpwJbDpJrbGibEkXOHVQ0gJ1atchNfDOMXPIS+WONLBm7zssLbReFZnrbFfU/2ot+ZA2rU0+Si2XVW1Qss2gvBZgSmgThzlXyIxYOUCZ4m5XYJdUUWbcv3RcCHcvhf+Ds34Yfwyj78fdk8/bOPbIa/KWvCcxGZIT8o2ckglh5Nbb9QLvwPvt7/tH/qtNq+9tPUfkr/Lf/AEATbhf</latexit>


:
(a) Compute output (prediction) 2. having fewer updates
(b) Calculate error than "on-line" makes
(c) Update w, b updates less noisy
<latexit sha1_base64="dSOm6mhH8W0/jVxZj2uADjYcZZw=">AAACBHicbVBNS8NAEJ3Ur1q/oh57WSyCBymJCnos6sFjBfsBTSmb7aZdutmE3Y1SQg9e/CtePCji1R/hzX/jps1BWx8MPN6bYWaeH3OmtON8W4Wl5ZXVteJ6aWNza3vH3t1rqiiRhDZIxCPZ9rGinAna0Exz2o4lxaHPacsfXWV+655KxSJxp8cx7YZ4IFjACNZG6tll75pyjZEXYj30g/Rhcoxyye/ZFafqTIEWiZuTCuSo9+wvrx+RJKRCE46V6rhOrLsplpoRTiclL1E0xmSEB7RjqMAhVd10+sQEHRqlj4JImhIaTdXfEykOlRqHvunMblXzXib+53USHVx0UybiRFNBZouChCMdoSwR1GeSEs3HhmAimbkVkSGWmGiTW8mE4M6/vEiaJ1X3tOrcnlVql3kcRSjDARyBC+dQgxuoQwMIPMIzvMKb9WS9WO/Wx6y1YOUz+/AH1ucPpw+Xcg==</latexit>

C. Update w, b : <latexit sha1_base64="fvrM1fkDFlOVkxHoGkYgMiCFIOA=">AAAB9HicbVDLSgMxFL2pr1pfVZdugkVwIWVGBV0W3bisYB/QDiWTZtrQTGZMMpUy9DvcuFDErR/jzr8x085CWw8EDufcyz05fiy4No7zjQorq2vrG8XN0tb2zu5eef+gqaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dJv5rTFTmkfywUxi5oVkIHnAKTFW8rohMUM/SJ+mZ9jvlStO1ZkBLxM3JxXIUe+Vv7r9iCYhk4YKonXHdWLjpUQZTgWblrqJZjGhIzJgHUslCZn20lnoKT6xSh8HkbJPGjxTf2+kJNR6Evp2MgupF71M/M/rJCa49lIu48QwSeeHgkRgE+GsAdznilEjJpYQqrjNiumQKEKN7alkS3AXv7xMmudV96Lq3F9Wajd5HUU4gmM4BReuoAZ3UIcGUHiEZ3iFNzRGL+gdfcxHCyjfOYQ/QJ8/gEuR6Q==</latexit>

3. makes more updates/


w := w + w, b := + b
<latexit sha1_base64="cm3FQz8Hc8HW9gDuCX9rtFlcymA=">AAACKHicbVDLSgMxFM34rPU16tJNsAiCUmZUUASxqAuXFewDOqUkaaYNzWSGJKOUoZ/jxl9xI6JIt36JmXbA2nogcHLOvdx7D444U9pxhtbc/MLi0nJuJb+6tr6xaW9tV1UYS0IrJOShrGOkKGeCVjTTnNYjSVGAOa3h3k3q1x6pVCwUD7of0WaAOoL5jCBtpJZ95QVId7GfPA3gxSWc+B1C75ZyjSa1I4jTql8Pt+yCU3RGgLPEzUgBZCi37HevHZI4oEITjpRquE6kmwmSmhFOB3kvVjRCpIc6tGGoQAFVzWR06ADuG6UN/VCaJzQcqZMdCQqU6gfYVKZLq2kvFf/zGrH2z5sJE1GsqSDjQX7MoQ5hmhpsM0mJ5n1DEJHM7ApJF0lEtMk2b0Jwp0+eJdXjontSdO5PC6XrLI4c2AV74AC44AyUwB0ogwog4Bm8gg/wab1Yb9aXNRyXzllZzw74A+v7BybXpBo=</latexit>

epoch than "batch" and


is thus faster
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 14
Linear Regression

Perceptron: Activation function is the threshold function


The output is a binary label ŷ 2 {0, 1} <latexit sha1_base64="y3piMpclCDl+9oRq2oU8ojsGOQg=">AAAB/XicbVDLSsNAFJ3UV62v+Ni5GSyCCymJCrosunFZwT6gCWUynbRDJ5MwcyPUEPwVNy4Ucet/uPNvnLZZaOuBC4dz7uXee4JEcA2O822VlpZXVtfK65WNza3tHXt3r6XjVFHWpLGIVScgmgkuWRM4CNZJFCNRIFg7GN1M/PYDU5rH8h7GCfMjMpA85JSAkXr2gTckkI1z7HGJvcw5xa6X9+yqU3OmwIvELUgVFWj07C+vH9M0YhKoIFp3XScBPyMKOBUsr3ipZgmhIzJgXUMliZj2s+n1OT42Sh+HsTIlAU/V3xMZibQeR4HpjAgM9bw3Ef/zuimEV37GZZICk3S2KEwFhhhPosB9rhgFMTaEUMXNrZgOiSIUTGAVE4I7//IiaZ3V3POac3dRrV8XcZTRITpCJ8hFl6iOblEDNRFFj+gZvaI368l6sd6tj1lrySpm9tEfWJ8/EWCUTw==</latexit>

b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation
w1 You can think of linear
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X
x2
<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

w2
<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>
<latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

regression as
..
Net input
Output
<latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

a linear neuron!
. <latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>
Linear Regression: Activation function is the identity function

<latexit sha1_base64="Ydlldv6MD7qsnCb2aSsw8EVIUWA=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtalv4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WjGMfUF7kkWMoKNlfy2Zj2BS6MzdINGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrsQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9bgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9ZypEp</latexit>
(x) = x

The output is a real number ŷ 2 R


<latexit sha1_base64="34cYQ5eewXcqLxIiHqs6CduP9zk=">AAAB/3icbVDLSsNAFJ3UV62vqODGzWARXJVEBV0W3bisYh/QhDKZTtuhk0mYuRFKzMJfceNCEbf+hjv/xkmbhbYeGDiccy/3zAliwTU4zrdVWlpeWV0rr1c2Nre2d+zdvZaOEkVZk0YiUp2AaCa4ZE3gIFgnVoyEgWDtYHyd++0HpjSP5D1MYuaHZCj5gFMCRurZB96IQDrJsMcl9kICoyBI77KeXXVqzhR4kbgFqaICjZ795fUjmoRMAhVE667rxOCnRAGngmUVL9EsJnRMhqxrqCQh0346zZ/hY6P08SBS5knAU/X3RkpCrSdhYCbzhHrey8X/vG4Cg0s/5TJOgEk6OzRIBIYI52XgPleMgpgYQqjiJiumI6IIBVNZxZTgzn95kbROa+5Zzbk9r9avijrK6BAdoRPkogtURzeogZqIokf0jF7Rm/VkvVjv1sdstGQVO/voD6zPHwgUlho=</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 16
(Least-Squares) Linear Regression iteratively

• A very naive way to fit a linear regression model (and any neural net)
is to start with initializing the parameters to 0's or small random values
• Then, for k rounds
• Choose another random set of weights
• If the model performs better, keep those weights
• If the model performs worse, discard the weights

There's a better way!


• We will analyze what effect a change of a parameter has on the
predictive performance (loss) of the model
then, we change the weight a little bit in the direction that
improves the performance (minimizes the loss) the most
• We do this in several (small) steps until the loss does not further
decrease

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 20
(Least-Squares) Linear Regression
The update rule turns out to be this:
"On-line" mode
Perceptron learning rule Stochastic gradient descent

m 1. Initialize w := 0 2 Rm , b := 0
1. Initialize w := 0 2 R , b := 0
<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

<latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch:


2. For every training epoch:
[i] [i]
A. For every hx[i] , y [i] i 2 D
A. For every hx , y i 2 D
<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

[i]
(a) ŷ := x[i]T w + b
(a) ŷ [i] := x[i]T w + b <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>

(b) rw L = y [i] ŷ [i] x[i]


<latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>

(b) err := (y [i] ŷ [i] ) <latexit sha1_base64="lT23XFPchd8kc3sHX+4ft8LZq5M=">AAACQHicbVBLTxsxGPRSHiEFGuiRi0VUCQ5Eu1AJLkgRXHroIZXIQ8ou0beON7Hi9a5sLxBZ+9N64Sf01jMXDkWo157qTRbxHMnSeOYb+fOEKWdKu+5vZ+HD4tLySmW1+nFtfeNTbXOro5JMEtomCU9kLwRFORO0rZnmtJdKCnHIaTecnBV+95JKxRJxrqcpDWIYCRYxAtpKg1rXFxByGBg/Bj0OI3OV53jGCXDzPccn2A/ZaNdM8wvTZ0GO97E/Bv10L+w9/Bi/LuVBre423BnwW+KVpI5KtAa1X/4wIVlMhSYclOp7bqoDA1Izwmle9TNFUyATGNG+pQJiqgIzKyDHX6wyxFEi7REaz9TnCQOxUtM4tJPFnuq1V4jvef1MR8eBYSLNNBVk/lCUcawTXLSJh0xSovnUEiCS2V0xGYMEom3nVVuC9/rLb0nnoOEdNtwfX+vN07KOCtpGO2gXeegINdE31EJtRNBPdIv+oHvnxrlzHpy/89EFp8x8Ri/g/PsPLyyxUA==</latexit>

rb L = y [i] ŷ [i]
<latexit sha1_base64="wYimCRAo97FH6nzYS/fz3/CUouY=">AAACEHicbZDLSsNAFIYn3q23qEs3g0XUhZKIoAhC0Y1LBWuFJpbJ9NQOnVyYORFDyCO48VXcuFDErUt3vo3TNgut/jDwzX/OYeb8QSKFRsf5ssbGJyanpmdmK3PzC4tL9vLKlY5TxaHOYxmr64BpkCKCOgqUcJ0oYGEgoRH0Tvv1xh0oLeLoErME/JDdRqIjOENjtexND+Eec1CqoEfHdCu7yZvCL+gO9boM86wY3rdbdtXZdQaif8EtoUpKnbfsT68d8zSECLlkWjddJ0E/ZwoFl1BUvFRDwniP3ULTYMRC0H4+WKigG8Zp006szImQDtyfEzkLtc7CwHSGDLt6tNY3/6s1U+wc+rmIkhQh4sOHOqmkGNN+OrQtFHCUmQHGlTB/pbzLFONoMqyYENzRlf/C1d6ua/hiv1o7KeOYIWtknWwRlxyQGjkj56ROOHkgT+SFvFqP1rP1Zr0PW8escmaV/JL18Q3haZx5</latexit>

(c) w := w + err ⇥ x[i]


<latexit sha1_base64="ypGRpFBWbrLkFvM0DBOQw+AAwc4=">AAACH3icbVDLSsNAFJ3UV62vqEs3g0UQhJKIqAhC0Y3LCvYBSSyT6aQdOpmEmYlaQv7Ejb/ixoUi4q5/46QtUlsPDJw5517uvcePGZXKsoZGYWFxaXmluFpaW9/Y3DK3dxoySgQmdRyxSLR8JAmjnNQVVYy0YkFQ6DPS9PvXud98IELSiN+pQUy8EHU5DShGSktt89QNker5QfqYwYtLOPU7gkQI6CoaEvmrP2X3qUO9rG2WrYo1Apwn9oSUwQS1tvntdiKchIQrzJCUjm3FykuRUBQzkpXcRJIY4T7qEkdTjvRULx3dl8EDrXRgEAn9uIIjdbojRaGUg9DXlfmectbLxf88J1HBuZdSHieKcDweFCQMqgjmYcEOFQQrNtAEYUH1rhD3kEBY6UhLOgR79uR50jiu2JrfnpSrV5M4imAP7INDYIMzUAU3oAbqAINn8ArewYfxYrwZn8bXuLRgTHp2wR8Ywx9A/KMf</latexit>
<latexit sha1_base64="bG/7iZYm4z3wJyVWyFcMRLqQjzA=">AAACJ3icbVDLSgMxFM34tr5GXboJFqEuLDMq6EYpunHhQsE+oDOWO2nahmYyQ5IRyjB/48ZfcSOoiC79EzNtwUc9EDg5J5ebc4KYM6Ud58Oamp6ZnZtfWCwsLa+srtnrGzUVJZLQKol4JBsBKMqZoFXNNKeNWFIIA07rQf889+t3VCoWiRs9iKkfQlewDiOgjdSyTz0BAYdWGmTYC0H3CPD0MsMn2AtYt5QOstu0yfwM72GvB/r7ntu7uGUXnbIzBJ4k7pgU0RhXLfvZa0ckCanQhINSTdeJtZ+C1IxwmhW8RNEYSB+6tGmogJAqPx3mzPCOUdq4E0lzhMZD9edECqFSg9BE2cmjqL9eLv7nNRPdOfZTJuJEU0FGizoJxzrCeWm4zSQlmg8MASKZ+SsmPZBAtKm2YEpw/0aeJLX9sntQdq4Pi5WzcR0LaAttoxJy0RGqoAt0haqIoHv0iF7Qq/VgPVlv1vvo6ZQ1ntlEv2B9fgGqgKXG</latexit>

b := b + err <latexit sha1_base64="IKvCkhRCp9fG8NqWukFmHIJ1x5U=">AAAB9HicbVDLSgNBEOz1GeMr6tHLYBAEIeyqoAhC0IvHCOYByRJmJ73JkNnZdWY2EJZ8hxcPinj1Y7z5N04eB00saCiquunuChLBtXHdb2dpeWV1bT23kd/c2t7ZLezt13ScKoZVFotYNQKqUXCJVcONwEaikEaBwHrQvxv79QEqzWP5aIYJ+hHtSh5yRo2V/IBc35CAnBJUirQLRbfkTkAWiTcjRZih0i58tToxSyOUhgmqddNzE+NnVBnOBI7yrVRjQlmfdrFpqaQRaj+bHD0ix1bpkDBWtqQhE/X3REYjrYdRYDsjanp63huL/3nN1IRXfsZlkhqUbLooTAUxMRknQDpcITNiaAllittbCetRRZmxOeVtCN78y4ukdlbyzkvuw0WxfDuLIweHcAQn4MEllOEeKlAFBk/wDK/w5gycF+fd+Zi2LjmzmQP4A+fzBwMHkE0=</latexit>


(c) w := w + ⌘ ⇥ ( rw L) <latexit sha1_base64="FphU47ufT+xxOU2mgSketA7ntqU=">AAACMnicbVBNSwMxEM36bf2qevQSLEJFLLsqKIIgelHwoGBtoVvKbJrVYDa7JLNKWfY3efGXCB70oIhXf4RpLajVB4GXNzPMmxckUhh03SdnaHhkdGx8YrIwNT0zO1ecX7gwcaoZr7JYxroegOFSKF5FgZLXE80hCiSvBdeH3XrthmsjYnWOnYQ3I7hUIhQM0Eqt4rEfAV4FYXab0909+uO3Rn2OQH0UETe0vE59BYGEVvbdk/coA5md5KutYsmtuD3Qv8TrkxLp47RVfPDbMUsjrpBJMKbhuQk2M9AomOR5wU8NT4BdwyVvWKrA+mhmvZNzumKVNg1jbZ9C2lN/TmQQGdOJAtvZ9WgGa13xv1ojxXCnmQmVpMgV+1oUppJiTLv50bbQnKHsWAJMC+uVsivQwNCmXLAheIMn/yUXGxVvs+KebZX2D/pxTJAlskzKxCPbZJ8ckVNSJYzckUfyQl6de+fZeXPev1qHnP7MIvkF5+MTumWqfQ==</latexit>

b := b + ⌘ ⇥ ( rb L)
<latexit sha1_base64="4UyIQU5LvSnufAbl66XgeInFnL8=">AAACFnicbVDLSgNBEJz1bXxFPXoZDEJEEnZVUARB9OLBQwSjgWwIPZOJDpmdXWZ6hbDkK7z4K148KOJVvPk3Th4HNRY0FFXddHexREmLvv/lTUxOTc/Mzs3nFhaXllfyq2vXNk4NF1Ueq9jUGFihpBZVlKhELTECIqbEDeuc9f2be2GsjPUVdhPRiOBWy7bkgE5q5kuMHh1TtkNDgUBDlJGwtFiioQamoJmxXhgB3nFQ2UVvu5kv+GV/ADpOghEpkBEqzfxn2Ip5GgmNXIG19cBPsJGBQcmV6OXC1IoEeAduRd1RDW59Ixu81aNbTmnRdmxcaaQD9edEBpG13Yi5zv6N9q/XF//z6im2DxuZ1EmKQvPhonaqKMa0nxFtSSM4qq4jwI10t1J+BwY4uiRzLoTg78vj5Hq3HOyV/cv9wsnpKI45skE2SZEE5ICckHNSIVXCyQN5Ii/k1Xv0nr03733YOuGNZtbJL3gf32YVnbQ=</latexit>

{ <latexit sha1_base64="HqmpBwZTVZzQ+LXmwMEWLfT2Iq0=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0oMeiF49V7Ae0oWy2k3bpZhN2N0IJ/QdePCji1X/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz00Mv65Ypbdecgq8TLSQVyNPrlr94gZmmE0jBBte56bmL8jCrDmcBpqZdqTCgb0yF2LZU0Qu1n80un5MwqAxLGypY0ZK7+nshopPUkCmxnRM1IL3sz8T+vm5rw2s+4TFKDki0WhakgJiazt8mAK2RGTCyhTHF7K2EjqigzNpySDcFbfnmVtC6q3mXVva9V6jd5HEU4gVM4Bw+uoA530IAmMAjhGV7hzRk7L86787FoLTj5zDH8gfP5A5wujWc=</latexit>
learning rate
negative gradient

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
(Least-Squares) Linear Regression
The update rule turns out to be this:
"On-line" mode

1. Initialize w := 0 2 Rm , b := 0 <latexit sha1_base64="NlqcCPP+x7/BYO7u8+fZM6fVAMw=">AAAB+HicbVDLSsNAFL2pr1ofjbp0M1gKrkqiBUUQim5cVrAPaEOZTCft0MkkzEyEGvolblwo4tZPceffOGmz0NYDFw7n3MucOX7MmdKO820V1tY3NreK26Wd3b39sn1w2FZRIgltkYhHsutjRTkTtKWZ5rQbS4pDn9OOP7nN/M4jlYpF4kFPY+qFeCRYwAjWRhrY5X6I9dgPUn+Grq6RM7ArTs2ZA60SNycVyNEc2F/9YUSSkApNOFaq5zqx9lIsNSOczkr9RNEYkwke0Z6hAodUeek8+AxVjTJEQSTNCI3m6u+LFIdKTUOTrZrFVMteJv7n9RIdXHopE3GiqSCLh4KEIx2hrAU0ZJISzaeGYCKZyYrIGEtMtOmqZEpwl7+8StpnNfe85tzXK42bvI4iHMMJnIILF9CAO2hCCwgk8Ayv8GY9WS/Wu/WxWC1Y+c0R/IH1+QOB5ZJS</latexit>

2. For every training epoch: Coincidentally, this


A. For every hx[i] , y [i] i 2 D appears almost
to be the same as the
<latexit sha1_base64="KiRtzBWzB6OcgzuVeAuk4oyYAnU=">AAACJHicbZDLSsNAFIYn9VbrLerSzWARXEhJRFBwU9SFywr2Akksk+mkHTqZhJmJGEIexo2v4saFF1y48VmctFlo64GBj//8hznn92NGpbKsL6OysLi0vFJdra2tb2xumds7HRklApM2jlgkej6ShFFO2ooqRnqxICj0Gen648ui370nQtKI36o0Jl6IhpwGFCOlpb557jLEh4xAN0Rq5AfZQ36XOdTLj2CWlghdUXoon/owYtlV3jfrVsOaFJwHu4Q6KKvVN9/dQYSTkHCFGZLSsa1YeRkSimJG8pqbSBIjPEZD4mjkKCTSyyZH5vBAKwMYREI/ruBE/T2RoVDKNPS1s1hRzvYK8b+ek6jgzMsojxNFOJ5+FCQMqggWicEBFQQrlmpAWFC9K8QjJBBWOteaDsGePXkeOscNW/PNSb15UcZRBXtgHxwCG5yCJrgGLdAGGDyCZ/AK3own48X4MD6n1opRzuyCP2V8/wAUwKWr</latexit>

[i]
(a) ŷ := <latexit sha1_base64="oOGy76Ku6BuYykfyTG/uo81hVcc=">AAACLHicbVDLSsNAFJ34rPUVdelmsAgVoSQqKIJQ7MZlhb4giWUynbRDJw9mJmoI+SA3/oogLizi1u9w0lbQ1gMDh3POZe49bsSokIYx0hYWl5ZXVgtrxfWNza1tfWe3JcKYY9LEIQt5x0WCMBqQpqSSkU7ECfJdRtrusJb77XvCBQ2Dhkwi4vioH1CPYiSV1NVr9gDJNMnuUos6Gby8gragfR/ZLu2XbR/JgeuljxO7kcEf5SGD8Bi6MI8ddfWSUTHGgPPEnJISmKLe1V/tXohjnwQSMySEZRqRdFLEJcWMZEU7FiRCeIj6xFI0QD4RTjo+NoOHSulBL+TqBRKO1d8TKfKFSHxXJfNlxayXi/95Viy9CyelQRRLEuDJR17MoAxh3hzsUU6wZIkiCHOqdoV4gDjCUvVbVCWYsyfPk9ZJxTytGLdnper1tI4C2AcHoAxMcA6q4AbUQRNg8ARewDsYac/am/ahfU6iC9p0Zg/8gfb1Dbbkp9A=</latexit>
x[i]T w + b perceptron rule,
except that the
B. For weight j in {1, ..., m}:
prediction is a real
(b) @L [i] number
= y [i] ŷ [i] xj
@wj and we have a learning
@L
(c) wj := wj + ⌘ ⇥ ( ) rate
@wj <latexit sha1_base64="nF1f4UxlvznbL34ywCUov53skk4=">AAACMHicbVDLSgMxFM34tr6qLt0Ei6CIZUYFRRBEF7pwoWBroVPKnTRjo5kHyR2lDPNJbvwU3Sgo4tavMNMW0eqBkMO59ybnHi+WQqNtv1hDwyOjY+MTk4Wp6ZnZueL8QlVHiWK8wiIZqZoHmksR8goKlLwWKw6BJ/mld3OU1y9vudIiCi+wE/NGAFeh8AUDNFKzeHzXvKZ7+zS/1qnLEaiLIuCarm5Q11fAUjcGhQIkdQPANgOZnmZZSr9lM5qtNYslu2x3Qf8Sp09KpI+zZvHRbUUsCXiITILWdceOsZHmbzLJs4KbaB4Du4ErXjc0BOOpkXYXzuiKUVrUj5Q5IdKu+nMihUDrTuCZztyzHqzl4n+1eoL+biMVYZwgD1nvIz+RFCOap0dbQnGGsmMIMCWMV8raYFJCk3HBhOAMrvyXVDfLzlbZPt8uHRz245ggS2SZrBKH7JADckLOSIUwck+eyCt5sx6sZ+vd+ui1Dln9mUXyC9bnF0D0qJs=</latexit>

C. @L
= y [i] ŷ [i]
@b
@L
b := b + ⌘ ⇥ ( )
@b
<latexit sha1_base64="B3uxUTIFObznd8Z1Q4hI5WItBVg=">AAACKnicbVDJSgNBFOxxN25Rj14eBkERw4wKiiC4XDx4UDAqZEJ40+kxjT0L3W+EMMz3ePFXvHhQxKsfYk8M4lbQUNRbul4FqZKGXPfVGRoeGR0bn5isTE3PzM5V5xcuTZJpLho8UYm+DtAIJWPRIElKXKdaYBQocRXcHpf1qzuhjUziC+qlohXhTSxDyZGs1K4eBrC3DwGsgy8IwScZCQOrG+CHGnnup6hJogI/QupyVPlpUeTwJQfFWrtac+tuH/CXeANSYwOctatPfifhWSRi4gqNaXpuSq283MiVKCp+ZkSK/BZvRNPSGK2jVt4/tYAVq3QgTLR9MUFf/T6RY2RMLwpsZ+nY/K6V4n+1ZkbhbiuXcZqRiPnnR2GmgBIoc4OO1IKT6lmCXEvrFXgXbUZk063YELzfJ/8ll5t1b6vunm/XDo4GcUywJbbMVpnHdtgBO2FnrME4u2eP7Jm9OA/Ok/PqvH22DjmDmUX2A877B8tDpcU=</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 23
This learning rule (from the previous slide)
is called (stochastic) gradient descent.
So, how did we get there?

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 24
Back to Linear Regression

b
<latexit sha1_base64="s6L+Z+fhtGywXDdOyCIOKOnOTTA=">AAAB6HicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjC/YD2lA220m7drMJuxuhhP4CLx4U8epP8ua/cdvmoK0vLDy8M8POvEEiuDau++0UNja3tneKu6W9/YPDo/LxSVvHqWLYYrGIVTegGgWX2DLcCOwmCmkUCOwEk7t5vfOESvNYPphpgn5ER5KHnFFjrWYwKFfcqrsQWQcvhwrkagzKX/1hzNIIpWGCat3z3MT4GVWGM4GzUj/VmFA2oSPsWZQ0Qu1ni0Vn5MI6QxLGyj5pyML9PZHRSOtpFNjOiJqxXq3Nzf9qvdSEN37GZZIalGz5UZgKYmIyv5oMuUJmxNQCZYrbXQkbU0WZsdmUbAje6snr0L6qepab15X6bR5HEc7gHC7BgxrU4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh85nz/E44zm</latexit>
<latexit

x1
<latexit sha1_base64="Z7jxfJr8/pbKF9IEHv5u2p28PzU=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwuyjaA=</latexit>

Activation
w1
<latexit sha1_base64="yC0dBIEl9qTv0X7X9wtBoMi5o3k=">AAAB+nicbZBNS8NAEIYnftb6lerRS7AInkoigh6rXjxWsB/QhrLZbtulm03YnVRL7E/x4kERr/4Sb/4bt2kO2vrCwsM7M8zsG8SCa3Tdb2tldW19Y7OwVdze2d3bt0sHDR0lirI6jUSkWgHRTHDJ6shRsFasGAkDwZrB6GZWb46Z0jyS9ziJmR+SgeR9Tgkaq2uXOsgeMb2iyMeZNe3aZbfiZnKWwcuhDLlqXfur04toEjKJVBCt254bo58ShZwKNi12Es1iQkdkwNoGJQmZ9tPs9KlzYpye04+UeRKdzP09kZJQ60kYmM6Q4FAv1mbmf7V2gv1LP+UyTpBJOl/UT4SDkTPLwelxxSiKiQFCFTe3OnRIFKFo0iqaELzFLy9D46ziGb47L1ev8zgKcATHcAoeXEAVbqEGdaDwAM/wCm/Wk/VivVsf89YVK585hD+yPn8ADeOUgA==</latexit>

<latexit sha1_base64="ozSIzVA/SGXegmac4XRXthOpvw0=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+l6/XHGr7lxkFbwcKpCr0S9/9QYxSyOukElqTNdzE/QzqlEwyaelXmp4QtmYDnnXoqIRN342X3VKzqwzIGGs7VNI5u7viYxGxkyiwHZGFEdmuTYz/6t1Uwyv/EyoJEWu2OKjMJUEYzK7mwyE5gzlxAJlWthdCRtRTRnadEo2BG/55FVoXVQ9y3eXlfp1HkcRTuAUzsGDGtThFhrQBAZDeIZXeHOk8+K8Ox+L1oKTzxzDHzmfPwosjZ8=</latexit>

X
x2
<latexit sha1_base64="8ur8Qnjf68veizOKVqkUmBXGiPw=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlYQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwANNo2h</latexit>

w2 <latexit sha1_base64="dwm2weui/za/rSvUQ73K/Rk4/5k=">AAAB7XicbZDLSgMxFIbP1Futt1GXboJFcFVmRNBl0Y3LCvYC7VAyaaaNzWVIMkIZ+g5uXCji1vdx59uYtrPQ1h8CH/85h5zzxylnxgbBt1daW9/Y3CpvV3Z29/YP/MOjllGZJrRJFFe6E2NDOZO0aZnltJNqikXMaTse387q7SeqDVPywU5SGgk8lCxhBFtntXqGDQXu+9WgFsyFViEsoAqFGn3/qzdQJBNUWsKxMd0wSG2UY20Z4XRa6WWGppiM8ZB2HUosqIny+bZTdOacAUqUdk9aNHd/T+RYGDMRsesU2I7Mcm1m/lfrZja5jnIm08xSSRYfJRlHVqHZ6WjANCWWTxxgopnbFZER1phYF1DFhRAun7wKrYta6Pj+slq/KeIowwmcwjmEcAV1uIMGNIHAIzzDK7x5ynvx3r2PRWvJK2aO4Y+8zx+cYY8j</latexit>

<latexit sha1_base64="Vi95YwknFrFzcB5LyqgiYSoMf0U=">AAAB7nicbZBNS8NAEIYn9avWr6pHL4tF8FQSEeqx6MVjBfsBbSib7aZdutmE3YkQQn+EFw+KePX3ePPfuG1z0NYXFh7emWFn3iCRwqDrfjuljc2t7Z3ybmVv/+DwqHp80jFxqhlvs1jGuhdQw6VQvI0CJe8lmtMokLwbTO/m9e4T10bE6hGzhPsRHSsRCkbRWt3BhGKezYbVmlt3FyLr4BVQg0KtYfVrMIpZGnGFTFJj+p6boJ9TjYJJPqsMUsMTyqZ0zPsWFY248fPFujNyYZ0RCWNtn0KycH9P5DQyJosC2xlRnJjV2tz8r9ZPMbzxc6GSFLliy4/CVBKMyfx2MhKaM5SZBcq0sLsSNqGaMrQJVWwI3urJ69C5qnuWH65rzdsijjKcwTlcggcNaMI9tKANDKbwDK/w5iTOi/PufCxbS04xcwp/5Hz+ALHDj8o=</latexit>

<latexit sha1_base64="sAAe226MpFncoK5AcSpzzUnkA9I=">AAAB6nicbZBNS8NAEIYn9avWr6pHL4tF8FSSIuix6MVjRfsBbSib7aZdutmE3YlSQn+CFw+KePUXefPfuG1z0NYXFh7emWFn3iCRwqDrfjuFtfWNza3idmlnd2//oHx41DJxqhlvsljGuhNQw6VQvIkCJe8kmtMokLwdjG9m9fYj10bE6gEnCfcjOlQiFIyite6f+rV+ueJW3bnIKng5VCBXo1/+6g1ilkZcIZPUmK7nJuhnVKNgkk9LvdTwhLIxHfKuRUUjbvxsvuqUnFlnQMJY26eQzN3fExmNjJlEge2MKI7Mcm1m/lfrphhe+ZlQSYpcscVHYSoJxmR2NxkIzRnKiQXKtLC7EjaimjK06ZRsCN7yyavQqlU9y3cXlfp1HkcRTuAUzsGDS6jDLTSgCQyG8Ayv8OZI58V5dz4WrQUnnzmGP3I+fwALsI2g</latexit> <latexit sha1_base64="0Hc81E1zKREVtkdUaco0RSs+Ymk=">AAAB63icbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHCrYW2qVk02wbmmSXZFYoS/+CFw+KePUPefPfmLZ70NYXAg/vzJCZN0qlsOj7315pbX1jc6u8XdnZ3ds/qB4etW2SGcZbLJGJ6UTUcik0b6FAyTup4VRFkj9G49tZ/fGJGysS/YCTlIeKDrWIBaM4s3o2U/1qza/7c5FVCAqoQaFmv/rVGyQsU1wjk9TabuCnGObUoGCSTyu9zPKUsjEd8q5DTRW3YT7fdUrOnDMgcWLc00jm7u+JnCprJypynYriyC7XZuZ/tW6G8XWYC51myDVbfBRnkmBCZoeTgTCcoZw4oMwItythI2ooQxdPxYUQLJ+8Cu2LeuD4/rLWuCniKMMJnMI5BHAFDbiDJrSAwQie4RXePOW9eO/ex6K15BUzx/BH3ucPMe2OUw==</latexit>

Output
.. <latexit sha1_base64="gvQE9cb1Dja5lCiXX5pMwfJapj8=">AAAB9HicbZBNS8NAEIY3ftb6VfXoZbEInkoigh6LXrxZwX5AG8pmO2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqjnUeSxj3QqYASkU1FGghFaigUWBhGYwvJnWmyPQRsTqAccJ+BHrKxEKztBafgfhCbO7FJMUJ91S2a24M9Fl8HIok1y1bumr04t5GoFCLpkxbc9N0M+YRsElTIqd1EDC+JD1oW1RsQiMn82WntBT6/RoGGv7FNKZ+3siY5Ex4yiwnRHDgVmsTc3/au0Uwys/E8qeBIrPPwpTSTGm0wRoT2jgKMcWGNfC7kr5gGnG0eZUtCF4iycvQ+O84lm+vyhXr/M4CuSYnJAz4pFLUiW3pEbqhJNH8kxeyZszcl6cd+dj3rri5DNH5I+czx+i4ZKm</latexit>

. Net input
<latexit sha1_base64="vnW9SOTDG2wSeqwpvMYjb0pYOfc=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJVEBF0W3biSCvYCbSiT6Uk7dHJh5qRYQt/EjQtF3Pom7nwbp2kW2vrDwMd/zmHO+f1ECo2O822V1tY3NrfK25Wd3b39A/vwqKXjVHFo8ljGquMzDVJE0ESBEjqJAhb6Etr++HZeb09AaRFHjzhNwAvZMBKB4AyN1bftHsITZveAVERJirO+XXVqTi66Cm4BVVKo0be/eoOYpyFEyCXTuus6CXoZUyi4hFmll2pIGB+zIXQNRiwE7WX55jN6ZpwBDWJlXoQ0d39PZCzUehr6pjNkONLLtbn5X62bYnDtZflJEPHFR0EqKcZ0HgMdCAUc5dQA40qYXSkfMcU4mrAqJgR3+eRVaF3UXMMPl9X6TRFHmZyQU3JOXHJF6uSONEiTcDIhz+SVvFmZ9WK9Wx+L1pJVzByTP7I+fwD13ZPb</latexit>

wm
<latexit sha1_base64="3SltFZgdSbEccduFdJMJ4sVJM+s=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMquUpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2Ucjds=</latexit>

xm <latexit sha1_base64="UZ/Cq01CQU77ibJgEHsrgiYApIY=">AAAB6nicbZBNSwMxEIZn61etX1WPXoJF8FR2RdBj0YvHirYW2qVk07QNTbJLMiuWpT/BiwdFvPqLvPlvTNs9aOsLgYd3ZsjMGyVSWPT9b6+wsrq2vlHcLG1t7+zulfcPmjZODeMNFsvYtCJquRSaN1Cg5K3EcKoiyR+i0fW0/vDIjRWxvsdxwkNFB1r0BaPorLunruqWK37Vn4ksQ5BDBXLVu+WvTi9mqeIamaTWtgM/wTCjBgWTfFLqpJYnlI3ogLcdaqq4DbPZqhNy4pwe6cfGPY1k5v6eyKiydqwi16koDu1ibWr+V2un2L8MM6GTFLlm84/6qSQYk+ndpCcMZyjHDigzwu1K2JAaytClU3IhBIsnL0PzrBo4vj2v1K7yOIpwBMdwCgFcQA1uoA4NYDCAZ3iFN096L9679zFvLXj5zCH8kff5A2aijdw=</latexit>

Inputs
<latexit sha1_base64="kW2ZbIA+FSvwKPaKbZPllX8WYNo=">AAAB9HicbZBNS8NAEIYnftb6VfXoZbEInkoigh6LXvRWwX5AG8pmu2mXbjZxd1Isob/DiwdFvPpjvPlv3LY5aOsLCw/vzLAzb5BIYdB1v52V1bX1jc3CVnF7Z3dvv3Rw2DBxqhmvs1jGuhVQw6VQvI4CJW8lmtMokLwZDG+m9eaIayNi9YDjhPsR7SsRCkbRWn4H+RNmdypJ0Uy6pbJbcWciy+DlUIZctW7pq9OLWRpxhUxSY9qem6CfUY2CST4pdlLDE8qGtM/bFhWNuPGz2dITcmqdHgljbZ9CMnN/T2Q0MmYcBbYzojgwi7Wp+V+tnWJ45WdiehNXbP5RmEqCMZkmQHpCc4ZybIEyLeyuhA2opgxtTkUbgrd48jI0ziue5fuLcvU6j6MAx3ACZ+DBJVThFmpQBwaP8Ayv8OaMnBfn3fmYt644+cwR/JHz+QONXpKY</latexit>

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

Convex loss function


X
L(w, b) = (ŷ [i] y [i] )2
<latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 51
Gradient Descent

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

Learning rate and Convex loss function


X
steepness of the L(w, b) = (ŷ [i] y [i] )2
gradient determine <latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i
how much we update

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 52
Gradient Descent

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

If the learning rate is too large,


we can overshoot

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

L
<latexit sha1_base64="P35O/4hZ2SiSHSfDfAG1bUHgTNI=">AAAB8nicbVBNS8NAFHypX7V+VT16WSyCp5KooMeiFw8eKlhbSEPZbLft0s0m7L4IJfRnePGgiFd/jTf/jZs2B20dWBhm3mPnTZhIYdB1v53Syura+kZ5s7K1vbO7V90/eDRxqhlvsVjGuhNSw6VQvIUCJe8kmtMolLwdjm9yv/3EtRGxesBJwoOIDpUYCEbRSn43ojhiVGZ301615tbdGcgy8QpSgwLNXvWr249ZGnGFTFJjfM9NMMioRsEkn1a6qeEJZWM65L6likbcBNks8pScWKVPBrG2TyGZqb83MhoZM4lCO5lHNIteLv7n+SkOroJMqCRFrtj8o0EqCcYkv5/0heYM5cQSyrSwWQkbUU0Z2pYqtgRv8eRl8nhW987r7v1FrXFd1FGGIziGU/DgEhpwC01oAYMYnuEV3hx0Xpx352M+WnKKnUP4A+fzB4GvkWQ=</latexit>

If the learning rate is too small,


convergence is very slow

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 53
Linear Regression Loss Derivative
X
L(w, b) = (ŷ [i] y [i] )2 Sum Squared Error (SSE) loss
<latexit sha1_base64="J5nwe8Lv0CM44wk1z3nGWinGIrk=">AAACK3icbZDLSgMxFIYzXmu9VV26CRahBS0zVdCNUOrGhYsK9gKdacmkmTY0cyHJKEOY93Hjq7jQhRfc+h6m7Sy0+kPg4z/ncHJ+N2JUSNN8NxYWl5ZXVnNr+fWNza3tws5uS4Qxx6SJQxbyjosEYTQgTUklI52IE+S7jLTd8eWk3r4jXNAwuJVJRBwfDQPqUYyktvqFuu0jOcKIqeu0NGXXU/fpEXTL8ALaIvb7iqawZI+QVEnaU13qpPAYJhmVe9V+oWhWzKngX7AyKIJMjX7h2R6EOPZJIDFDQnQtM5KOQlxSzEiat2NBIoTHaEi6GgPkE+Go6a0pPNTOAHoh1y+QcOr+nFDIFyLxXd05uUbM1ybmf7VuLL1zR9EgiiUJ8GyRFzMoQzgJDg4oJ1iyRAPCnOq/QjxCHGGp483rEKz5k/9Cq1qxTirmzWmxVs/iyIF9cABKwAJnoAauQAM0AQYP4Am8gjfj0XgxPozPWeuCkc3sgV8yvr4BCWGm5A==</latexit>
i
@L @ X [i]
= (ŷ y [i] )2
@wj @wj i
@ X
= ( (wT x[i] ) y [i] )2
@wj i
X @
T [i] [i]
= 2( (w x ) y ) ( (wT x[i] ) y [i] )
i
@wj
X d @
T [i] [i] T [i]
= 2( (w x ) y ) T [i]
w x
i
d(w x ) @w j
X d [i]
T [i] [i]
= 2( (w x ) y ) T x[i] ) j
x (Note that the activation function is the
i
d(w
X identity function in linear regression)
T [i] [i] [i]
= 2( (w x ) y )xj
<latexit sha1_base64="z+trg6C0NNIpeTPQSfXLY/fzu50=">AAAEu3icxVNNaxsxEFW82yZ1v5z22IuoabEPNbtuILkEQnvpoYcU4iRgrRetVmvL0X4gaZMsQj+yveXfRGtvceOUpKGFDggeM/OenkZMVHAmleddbbQc99Hjza0n7afPnr942dl+dSzzUhA6IjnPxWmEJeUsoyPFFKenhaA4jTg9ic4+1/WTcyoky7MjVRU0SPE0YwkjWNlUuN36jhKBiUYFFophDlGK1Yxgrr8as8pehHMD3+/Dm83rDUiWachgD82w0pWZ6DELDPwAqwb1J0OE2n8uI9k0xb2FoyjRF2Zy9BNfNuL9lfpKfEkfwgcr3OPrgXKw/9d+GkMxXBKNju/mmvsmewf5P5i9DOf/6u6VVjvsdL2Btwh4G/gN6IImDsPODxTnpExppgjHUo59r1CBrgdHODVtVEpaYHKGp3RsYYZTKgO92D0D39lMDJNc2JMpuMj+ytA4lbJKI9tZP0Cu1+rk72rjUiV7gWZZUSqakeVFScmhymG9yDBmghLFKwswEcx6hWSG7Q8ou+71EPz1J98Gx8OB/3HgfdvpHnxqxrEF3oC3oAd8sAsOwBdwCEaAOHvOxJk6M3ffJe7c5cvW1kbDeQ1uhFteA33FlnQ=</latexit>
i

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 54
Linear Regression Loss Derivative (alt.)

1 X [i] Mean Squared Error (MSE) loss often


L(w, b) = (ŷ y [i] )2
2n i scaled by factor 1/2 for convenience
<latexit sha1_base64="0l/20gVMJ+HD3Glq1eJB96T1CxM=">AAACOHicbZDLSgMxFIYz3q23qks3wSK0oGWmCroRRDcuBBWsFjrTkkkzbTCTGZKMMoQ8lhsfw524caGIW5/AtM7C2w+Bj/+cw8n5w5RRqVz30Rkbn5icmp6ZLc3NLywulZdXLmWSCUyaOGGJaIVIEkY5aSqqGGmlgqA4ZOQqvD4a1q9uiJA04RcqT0kQoz6nEcVIWatbPvVjpAYYMX1iqiMOI31rNmFYg/vQjwTC2jO6wQ30ZRZ3NTWw6g+Q0rnp6DYNDNyCeUG1TqNbrrh1dyT4F7wCKqDQWbf84PcSnMWEK8yQlG3PTVWgkVAUM2JKfiZJivA16pO2RY5iIgM9OtzADev0YJQI+7iCI/f7hEaxlHkc2s7hafJ3bWj+V2tnKtoLNOVppgjHX4uijEGVwGGKsEcFwYrlFhAW1P4V4gGyYSmbdcmG4P0++S9cNuredt0936kcHBZxzIA1sA6qwAO74AAcgzPQBBjcgSfwAl6de+fZeXPev1rHnGJmFfyQ8/EJoaKsPw==</latexit>

@L @ 1 X [i]
= (ŷ y [i] )2
@wj @wj 2n i
@ X 1
= ( (wT x[i] ) y [i] )2
@wj i 2n
X1 @
T [i] [i]
= ( (w x ) y ) ( (wT x[i] ) y [i] )
i
n @w j

1X T [i] [i] d @ T [i]


= ( (w x ) y ) T [i]
w x
n i d(w x ) @wj
1X T [i] [i] d [i]
= ( (w x ) y ) T [i]
x j (Note that the activation function is the
n i d(w x ) identity function in linear regression)
1X T [i] [i] [i]
= ( (w x ) y )xj
n i
<latexit sha1_base64="POm5xMKpXo7NIrIwItHpjIqhSdg=">AAAE/HicrVTLbtQwFHWbQMvwmpYlG4sRaGbBKBkqwQapKhsWLIrUaStNMpHjODNunYdihzay3F9hwwKE2PIh7PgbnAeaR6uOBnGlSMf3ce7xkWI/ZZQLy/q9sWmYd+5ubd9r3X/w8NHj9s7uMU/yDJMhTliSnfqIE0ZjMhRUMHKaZgRFPiMn/vm7sn7yiWScJvGRKFLiRmgS05BiJHTK2zF2nTBDWDopygRFDDoRElOMmPyg1Cx74Z0p+OItXGxebqirtpKDWB94HnkUdp0pErJQYzmiroIvYdGg3njgOK3VnDXNAnXX4XQSoW6l1Q/lhRof/cWXzabebNVs0xKXplqba4XcNelgb96DWlKjcn1lDUsA60Elg9tn1SrrbxmuZP8X1WuLvvTO5jTcaN0/eDejbXntjtW3qoDXgd2ADmji0Gv/coIE5xGJBWaI85FtpcKVpZeYEdVyck5ShM/RhIw0jFFEuCurn1fB5zoTwDDJ9BcLWGXnJySKOC8iX3eWF+DLtTJ5U22Ui/CNK2mc5oLEuF4U5gyKBJYvAQxoRrBghQYIZ1RrhXiKtJlCvxelCfbyla+D40HfftW3Pu519g8aO7bBU/AMdIENXoN98B4cgiHARmF8Nr4a38wr84v53fxRt25uNDNPwEKYP/8AZzay5Q==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 55
Batch Gradient Descent as Surface Plot

Lmin
<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

w2
<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

Updates perpendicular
to contour lines

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 57
Stochastic Gradient Descent as Surface Plot

Lmin
Stochastic updates
<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

are a bit noisier, because


each batch is an approximation
of the overall loss on the
w2
training set
<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

(later, in deep neural nets, we


will see why noisier updates are
actually helpful)

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 58
Batch Gradient Descent as Surface Plot
Lmin
<latexit sha1_base64="Mo8CVaK+/t2oXYiT1RZo2sPshjM=">AAAB+3icbVDLSsNAFL3xWesr1qWbYBFclUQFXRbduHBRwT6gCWEynbZDZyZhZiKWkF9x40IRt/6IO//GSZuFth4YOJxzL/fMiRJGlXbdb2tldW19Y7OyVd3e2d3btw9qHRWnEpM2jlksexFShFFB2ppqRnqJJIhHjHSjyU3hdx+JVDQWD3qakICjkaBDipE2UmjXfI70GCOW3eVh5nMq8tCuuw13BmeZeCWpQ4lWaH/5gxinnAiNGVKq77mJDjIkNcWM5FU/VSRBeIJGpG+oQJyoIJtlz50TowycYSzNE9qZqb83MsSVmvLITBZJ1aJXiP95/VQPr4KMiiTVROD5oWHKHB07RRHOgEqCNZsagrCkJquDx0girE1dVVOCt/jlZdI5a3jnDff+ot68LuuowBEcwyl4cAlNuIUWtAHDEzzDK7xZufVivVsf89EVq9w5hD+wPn8Ao1eU0g==</latexit>

w2
<latexit sha1_base64="UkXJNXFgsoaYDNhtSeqRDvmteSE=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMWI2i</latexit>

w1
<latexit sha1_base64="ELWCbynYAUOpCjzaHAkeFZeonCw=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEK1I2h</latexit>

If inputs are on very different scales


some weights will update more than
others ... and it will also harm convergence
(always normalize inputs!)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 59
Multilayer
Perceptrons

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 1


Marcos Históricos:

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 2


With 1 layer and 1 neuron

http://playground.tensorflow.org/
Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 3
With 1 layer and N neuron

http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/
Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 5
Multi-Layer Perceptron

 O grande desafio foi achar um algoritmo de


aprendizado para atualizar dos pesos das
camadas intermediarias

 Idéia Central
 Os erros dos elementos processadores da
camada de saída (conhecidos pelo treinamento
supervisionado) são retro-propagados para as
camadas intermediarias

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 6


Processo de aprendizado

 Processador j pertence à Camada de Saída:

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 7


Processo de aprendizado

 Processador j pertence à Camada Escondida:

8
Processo de aprendizado

 Fase 1: Feed-Forward

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 9


Processo de aprendizado

 Fase 1: Feed-Forward

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 10


Processo de aprendizado

 Fase 1: Feed-Forward

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 11


Processo de aprendizado

 Fase 1: Feed-Forward

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 12


Processo de aprendizado

 Fase 1: Feed-Backward
 Cálculo do erro da camada de saída

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 13


Processo de aprendizado

 Fase 1: Feed-Backward
 Atualização dos pesos da camada de saída

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 14


Processo de aprendizado

 Fase 1: Feed-Backward
 Cálculo do erro da 2º camada escondida

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 15


Processo de aprendizado

 Fase 1: Feed-Backward
 Atualização dos pesos da 2º camada escondida

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 16


Processo de aprendizado

 Fase 1: Feed-Backward
 Cálculo do erro da 1º camada escondida

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 17


Processo de aprendizado

 Fase 1: Feed-Backward
 Atualização dos pesos da 1º camada escondida

18
Exemplo MLP

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 19


Exemplo MLP

 Entrada:  Função de Ativação:


 x1=1, x2=0
 Saída Desejada:
 t3 = 1
 Pesos iniciais:  Derivada da Função
 wij(0) = 0 de Ativação:
 Taxa de
Aprendizagem:
 η = 0.5

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 20


Exemplo MLP

 Algoritmo de Aprendizado:

 Camada de Saída

 Camada Escondida

Source: Prof. Dalcimar Casanova. Curso Deep Learning - UTFPR - 2020 21


Exemplo MLP

 Feed-Forward:
 y1 = 1*0+1*0+0*0 = 0
 x1 = F(y1) = 0.5
 y2 = 1*0+1*0+0*0 = 0
 x2 = F(y2) = 0.5
 y3 = 1*0+0.5*0+0.5*0 = 0
 x3 = F(y3) = 0.5

22
Exemplo MLP

 Feed-Backward:
 t3-x3 = 1-0.5 = 0.5
 e3 = 0.5*0.25 = 0.125

23
Exemplo MLP

 Feed-Backward:
 w203 = 0+0.5*1*0.125 = 0.0625
 w213 = 0+0.5*0.5*0.125 = 0.0313
 w223 = 0+0.5*0.5*0.125 = 0.0313

24
Exemplo MLP

 Feed-Backward:
 e1 = 0.25*(0.125*0.0313) = 0.00097813
 e2 = 0.25*(0.125*0.0313) = 0.00097813

25
Exemplo MLP

 Feed-Backward:
 w101 = 0+0.5*1*0.00097813 =
0.00048907
 w102 = 0+0.5*1*0.00097813 =
0.00048907
 w111 = 0+0.5*1*0.00097813 =
0.00048907
 w112 = 0+0.5*1*0.00097813 =
0.00048907
 w121 = 0+0.5*0*0.00097813 =0
 w122 = 0+0.5*0*0.00097813 =0
26
Problema XOR

27
Problema XOR
 Borda de decisão  Borda de decisão
construída pelo 1º construída pelo 2º
neurônio escondido neurônio escondido

28
Problema XOR

 Borda de decisão construída pela rede


completa

29
With N layer and 1 neuron

http://playground.tensorflow.org/
30
Lecture 09

Multilayer Perceptrons

STAT 453: Deep Learning, Spring 2020


Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
Topics

Multilayer Perceptron Architecture

Nonlinear Activation Functions

Multilayer Perceptron Code Examples

Overfitting and Underfitting

Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Graph with Fully-Connected Layers
= Multilayer Perceptron
Nothing new, really
(bias not shown)
(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1 (2) (3)
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
w1,1
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
(1) o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

(1)
w1,2 a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>
a3 where a := σ(z) = σ(w⊤x + b)
<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(2) (1)
@l @l @o @a1 @a1 (Assume network for binary classification)
(1)
= · (2) · (1) · (1)
@w1,1 @o @a @a @w
1 1 1,1
(2) (1)
@l @o @a2 @a1
+ · (2) · (1)
· (1)
@o @a @a1 @w1,1
<latexit sha1_base64="duY3mtbRiHW1HhXeSoGG3GDEm3E=">AAADmXictVJdS8MwFM1aP2b92hR88SU4BEUZzRT0QcEPFPFJ0amwzpFm6QymTUlSZZT+Jv+Lb/4b0zlwbqIveqFwOPecm57L9WPOlHbdt4Jlj41PTBannOmZ2bn5UnnhRolEElonggt552NFOYtoXTPN6V0sKQ59Tm/9x+O8f/tEpWIiutbdmDZD3IlYwAjWhmqVCy9eIDFJvRhLzTCHPPvEz60UbaLsPl1D61kGnX34g1hk0CNtoZ0hjRjQYDMwH1fLx32rHlSMsuhXH/ri6wWAA1bP2fiDDLVfM9T+LUOrVHGrbq/gKEB9UAH9umiVXr22IElII004VqqB3Fg303w64TRzvETRGJNH3KENAyMcUtVMe5eVwVXDtGEgpPkiDXvsoCPFoVLd0DfKEOsHNdzLye96jUQHu82URXGiaUQ+HgoSDrWA+ZnCNpOUaN41ABPJzL9C8oDNurQ5ZscsAQ1HHgU3tSraqrqX25WDo/46imAZrIA1gMAOOABn4ALUAbGWrD3rxDq1l+1D+8w+/5Bahb5nEXwp++oddm8n9w==</latexit>
2

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
Graph with Fully-Connected Layers
= Multilayer Perceptron

(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1 (2) (3)
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
w1,1
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
(1) o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

(1)
w1,2 a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2 a3
output layer
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>

<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(layer 4)

2nd hidden layer here, we could also write o


(layer 3) as a (3)
input layer 1
(layer 1) 1st hidden layer
(layer 2)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Graph with Fully-Connected Layers
= Multilayer Perceptron
A more common counting/naming scheme, because then a perceptron/Adaline/
logistic regression model can be called a "1-layer neural network"

(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1 (2) (3)
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
w1,1
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
(1) o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

(1)
w1,2 a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2 a3
output layer
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>

<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(layer 4) layer 3

2nd hidden layer


(layer 3)
input layer
layer 2
(layer 1) 1st hidden layer
(layer 2) layer 1
layer 0
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
Graph with Fully-Connected Layers
= Multilayer Perceptron

(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1 (2) (3)
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
w1,1
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
(1) o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

(1)
w1,2 a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
(2)
w1,3
<latexit sha1_base64="AnHRMEgiEAO14EMHGCDfABkTtl0=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTfuJcomraS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8ODkWE=</latexit>
a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1) (1)
w3,2 a3 could use sigmoid here
<latexit sha1_base64="PK5wtTnoxgAbdbB3wQfMpe3ws7s=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RJW0l5Sci7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVcebgq1m6zOHJwCmdQAgeuoQb3UIcGEJDwDK/wZk2sF+vd+li0rlnZzAn8gfX5A8OOkWE=</latexit>

<latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

could use sigmoid here

could use sigmoid here

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
Graph with Fully-Connected Layers
= Multilayer Perceptron
(1)
a1
y1 y2 y3
<latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2) o1
<latexit sha1_base64="sJdgXiAVm2a4S+4dRd3rRrYB1HY=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK1hbaUDbbTbt0swm7EyGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmlldW19o7xZ2dre2d2r7h88mjjVjLdYLGPdCajhUijeQoGSdxLNaRRI3g7GN1O//cS1EbF6wCzhfkSHSoSCUbTSfdb3+tWaW3dnIMvEK0gNCjT71a/eIGZpxBUySY3pem6Cfk41Cib5pNJLDU8oG9Mh71qqaMSNn89OnZATqwxIGGtbCslM/T2R08iYLApsZ0RxZBa9qfif100xvPJzoZIUuWLzRWEqCcZk+jcZCM0ZyswSyrSwtxI2opoytOlUbAje4svL5PGs7p3X3buLWuO6iKMMR3AMp+DBJTTgFprQAgZDeIZXeHOk8+K8Ox/z1pJTzBzCHzifPw3gjaM=</latexit> <latexit sha1_base64="fJxEJZDwIRAXzsny9UbZpYFPXZ4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbRqVe+i6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEPZI2k</latexit>
<latexit sha1_base64="85kz+6+8sUyRlr+84amQIqvMQLQ=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0msoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbQuql6t6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEQ6I2l</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>

x2 (2)
a2 o2 <latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>
L(y, o)
<latexit sha1_base64="jRuGYuNAf6C7yfYguq+x/vIHq08=">AAACDHicbZDLSsNAFIYnXmu9VV26GSxCBSmJCrosunHhooK9QBvKZDpph05mwsxECCEP4MZXceNCEbc+gDvfxkkaQVt/GPj4zznMOb8XMqq0bX9ZC4tLyyurpbXy+sbm1nZlZ7etRCQxaWHBhOx6SBFGOWlpqhnphpKgwGOk402usnrnnkhFBb/TcUjcAI049SlG2liDSrUfID3GiCU3aS1nz0/i9Bj+sEiPTJddt3PBeXAKqIJCzUHlsz8UOAoI15ghpXqOHWo3QVJTzEha7keKhAhP0Ij0DHIUEOUm+TEpPDTOEPpCmsc1zN3fEwkKlIoDz3RmK6rZWmb+V+tF2r9wE8rDSBOOpx/5EYNawCwZOKSSYM1iAwhLanaFeIwkwtrkVzYhOLMnz0P7pO6c1u3bs2rjsoijBPbBAagBB5yDBrgGTdACGDyAJ/ACXq1H69l6s96nrQtWMbMH/sj6+AYGV5uW</latexit>

<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>

<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(2) o3
a3
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>

<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>

(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>

a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>
use softmax if this is a multi-class
(1) problem with mutually exclusive classes
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
Activation Functions

Question: What happens if we don't use non-linear


activation functions?

Multilayer Perceptron Architecture


Nonlinear Activation Functions

Multilayer Perceptron Code Examples


Overfitting and Underfitting
Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
Solving the XOR Problem with Non-Linear Activations

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
Solving the XOR Problem with Non-Linear Activations

1-hidden layer MLP 1-hidden layer MLP


with linear activation function with non-linear activation function (ReLU)

https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L08-mlp/code/xor-problem.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
A Selection of Common Activation Functions (1)

Identity (Logistic) Sigmoid

1
(z) =
<latexit sha1_base64="Rt8jTWtDgekNkOeHqTh73RoLHno=">AAAB9HicbVBNSwMxEJ31s9avqkcvwSLUS9lVQS9C0YvHCvYD2qVk02wbmmTXJFtol/4OLx4U8eqP8ea/MW33oK0PBh7vzTAzL4g508Z1v52V1bX1jc3cVn57Z3dvv3BwWNdRogitkYhHqhlgTTmTtGaY4bQZK4pFwGkjGNxN/caQKs0i+WhGMfUF7kkWMoKNlfy2Zj2BS+MzdIPGnULRLbszoGXiZaQIGaqdwle7G5FEUGkIx1q3PDc2foqVYYTTSb6daBpjMsA92rJUYkG1n86OnqBTq3RRGClb0qCZ+nsixULrkQhsp8Cmrxe9qfif10pMeO2nTMaJoZLMF4UJRyZC0wRQlylKDB9Zgoli9lZE+lhhYmxOeRuCt/jyMqmfl72LsvtwWazcZnHk4BhOoAQeXEEF7qEKNSDwBM/wCm/O0Hlx3p2PeeuKk80cwR84nz9f5JEt</latexit>
(z) = z <latexit sha1_base64="juP/lRLXXUm+xowd+Wp3uxbypdo=">AAACCXicbVDLSgMxFM3UV62vUZdugkVoEcuMCroRim5cVrAP6Awlk2ba0CQzJBmxHWbrxl9x40IRt/6BO//G9LHQ6oELh3Pu5d57gphRpR3ny8otLC4tr+RXC2vrG5tb9vZOQ0WJxKSOIxbJVoAUYVSQuqaakVYsCeIBI81gcDX2m3dEKhqJWz2Mic9RT9CQYqSN1LGhp2iPo9KoDC+gF0qEUzdL3UOP3Melo1E569hFp+JMAP8Sd0aKYIZax/70uhFOOBEaM6RU23Vi7adIaooZyQpeokiM8AD1SNtQgThRfjr5JIMHRunCMJKmhIYT9edEirhSQx6YTo50X817Y/E/r53o8NxPqYgTTQSeLgoTBnUEx7HALpUEazY0BGFJza0Q95GJQ5vwCiYEd/7lv6RxXHFPKs7NabF6OYsjD/bAPigBF5yBKrgGNVAHGDyAJ/ACXq1H69l6s96nrTlrNrMLfsH6+AYrgpi4</latexit>
1 + exp( z)

Tanh ("tanH") Hard Tanh

8
>
<1 if z > 1
exp(z) exp( z)
Tanh(z) = HardTanh(z) = 1 if z < 1
exp(z) + exp( z) >
:
<latexit sha1_base64="5cZuu210xeCBlZ4E5jtS8+GzZjM=">AAACKHicbZDLSgMxFIYzXmu9VV26CRbBIpYZFXQjim5cKtgqdErJpGdsMJMZkjNiO/Rx3PgqbkQUceuTmF7wVn8IfPnPOSTnDxIpDLruuzM2PjE5NZ2byc/OzS8sFpaWqyZONYcKj2WsrwJmQAoFFRQo4SrRwKJAwmVwc9KrX96CNiJWF9hOoB6xayVCwRlaq1E49BHuMLtgqtXd6JToAfVDzXjmw13Su2/RPm11Sl36ZW5+m41C0S27fdFR8IZQJEOdNQrPfjPmaQQKuWTG1Dw3wXrGNAouoZv3UwMJ4zfsGmoWFYvA1LP+ol26bp0mDWNtj0Lad39OZCwyph0FtjNi2DJ/az3zv1otxXC/ngmVpAiKDx4KU0kxpr3UaFNo4CjbFhjXwv6V8hazQaHNNm9D8P6uPArV7bK3U3bPd4tHx8M4cmSVrJEN4pE9ckROyRmpEE7uySN5Ia/Og/PkvDnvg9YxZzizQn7J+fgEGYOkGQ==</latexit>

<latexit sha1_base64="HUDbc5GxdYcfA36GuHXXRmmXRRg=">AAACmXicbZHPbtNAEMbXLtAS/oUi9dLLiAhUDkQ2rVQOLWpBoIhTEU1bKY6i9XqcrLpeW7vjQmL5nXgWbrwNm9QSacInrfTpm99odmfjQklLQfDH8zfu3X+wufWw9ejxk6fP2s+3L2xeGoF9kavcXMXcopIa+yRJ4VVhkGexwsv4+tO8fnmDxspcn9O0wGHGx1qmUnBy0aj9CxpFhD+p6nGTnHM9qfdmb+AYohjHUlfCDbB1C5YUwuumBWQKNczgg8ui6A70dp06mocr2OwfldMEzQ9p0cFLVIQ6aW4xaneCbrAQrJuwMR3W6GzU/h0luSgz1CQUt3YQBgUNK25ICoV1KyotFlxc8zEOnNU8QzusFput4ZVLEkhz444mWKTLHRXPrJ1msSMzThO7WpuH/6sNSkrfDyupi5JQi9tBaamAcph/EyTSoCA1dYYLI91dQUy44YLcZ7bcEsLVJ6+bi3fdcL8bfDvonHxs1rHFdtlLtsdCdshOWI+dsT4T3o535H32vvi7/qnf87/eor7X9Lxgd+R//wuLNLmu</latexit>
z otherwise

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 24
A Selection of Common Activation Functions (1)
(Logistic) Sigmoid

• Advantages of Tanh
• Mean centering
• Positive and negative values
• Larger gradients

Tanh ("tanH")
Additional tip: Also good to
normalize inputs to mean zero and
use random weight initialization
with avg. weight centered at zero
Also simple
derivative:

d
T anh(z) = 1 T anh(z)2
<latexit sha1_base64="DqKNoYPxBXT/vBenWHakJA2dvWM=">AAACD3icbVC7TsMwFHXKq5RXgJHFogKVgSopSLAgVbAwFqkvqS2V4zitVceJbAepjfIHLPwKCwMIsbKy8Tc4bQZoOdKVjs+5V773OCGjUlnWt5FbWl5ZXcuvFzY2t7Z3zN29pgwigUkDBywQbQdJwignDUUVI+1QEOQ7jLSc0U3qtx6IkDTgdTUOSc9HA049ipHSUt887noC4dhNYneS1BEfliYn8Ara8BRmr/u4khT6ZtEqW1PARWJnpAgy1PrmV9cNcOQTrjBDUnZsK1S9GAlFMSNJoRtJEiI8QgPS0ZQjn8hePL0ngUdacaEXCF1cwan6eyJGvpRj39GdPlJDOe+l4n9eJ1LeZS+mPIwU4Xj2kRcxqAKYhgNdKghWbKwJwoLqXSEeIh2Q0hGmIdjzJy+SZqVsn5Wtu/Ni9TqLIw8OwCEoARtcgCq4BTXQABg8gmfwCt6MJ+PFeDc+Zq05I5vZB39gfP4A+tmarA==</latexit>
dz
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 25
A Selection of Common Activation Functions (2)
ReLU (Rectified Linear Unit) Leaky ReLU
(
( z, if z 0
z, if z 0 LeakyReLU(z) =
ReLU(z) = ↵ ⇥ z, otherwise
0, otherwise <latexit sha1_base64="URhCySzdBd9uYPw3t3ZOk9dZRqk=">AAACg3icbVFNb9NAEF0bCiV8pXDkMiKiKgKldqlULpUquHDooSDSVoqjaLwZJ6us12Z3DCSW/wg/ixv/hk1qtbTlSSs9vTdvZ3Y2LbVyHEV/gvDO3Y179zcfdB4+evzkaXfr2akrKitpIAtd2PMUHWllaMCKNZ2XljBPNZ2l848r/+w7WacK85UXJY1ynBqVKYnspXH3F7RImH5yfUw4X3yh40Gzs3wNh51LN6WpMrX0nVxzqS7fwnYbBJVBA0tIpvQNIkiSqyjqcoa+TOXkrkUKnpH9oRzB1ZUJmUnbZtztRf1oDbhN4pb0RIuTcfd3MilklZNhqdG5YRyVPKrRspKamk5SOSpRznFKQ08N+oFG9XqHDbzyygSywvpjGNbqv4kac+cWeeorc+SZu+mtxP95w4qz96NambJiMvKiUVZp4AJWHwITZUmyXniC0io/K8gZWpTsv63jlxDffPJtcrrXj9/1o8/7vaMP7To2xQvxUuyIWByII/FJnIiBkIEItoPdIAo3wjfhXrh/URoGbea5uIbw8C+bYraB</latexit>

<latexit sha1_base64="8/mhuw0565qyUYql8RjpNUv+x0k=">AAACaXicbVHRShtBFJ1dbbVRa1Qq0r5cGioWSthVob4UpH3xwQdbGhWyIcxO7iaDs7PrzF1tsiz0G/vWH+hLf8JJXKxVDwwczj333pkzca6kpSD47flz88+eLyy+aCwtr7xcba6tn9qsMAI7IlOZOY+5RSU1dkiSwvPcIE9jhWfxxZdp/ewKjZWZ/k7jHHspH2qZSMHJSf3mT4gIf1D5DY871c7kPXxqQI0oxqHUpXDTbXWnTj7Adt0DMoEKJhAN8RICiKI7U3DPlNEIzbW0CP+GRKgH9eB+sxW0gxngMQlr0mI1TvrNX9EgE0WKmoTi1nbDIKdeyQ1JobBqRIXFnIsLPsSuo5qnaHvlLKkK3jllAElm3NEEM/V+R8lTa8dp7Jwpp5F9WJuKT9W6BSUHvVLqvCDU4nZRUiigDKaxw0AaFKTGjnBhpLsriBE3XJD7nIYLIXz45MfkdLcd7rWDr/utw891HIvsDXvLdljIPrJDdsROWIcJ9sdb9l55m95ff83f8l/fWn2v7tlg/8Fv3QCnIK87</latexit>

LeakyReLU(z) = max(0, z) + ↵ ⇥ min(0, z)


<latexit sha1_base64="3Anmn/Z9saq5q1nuVNWhWUho6OY=">AAACInicbZDPSxtBFMdno7YabZvq0ctgECKWsFuFtoeC2IuHHKKYGMiG8HbyYobMzi4zb4vpkr+ll/4rvfRQUU+Cf4yTHweb9MHAh+/3Pd68b5Qqacn3H73Cyuraq9frG8XNrTdv35XebzdtkhmBDZGoxLQisKikxgZJUthKDUIcKbyKht8m/tV3NFYm+pJGKXZiuNayLwWQk7qlLyHhDeU1hOHoAmuNceXHAf/KwxhuKv4Hx4c8BJUOgIckY7TOkXrqdEtlv+pPiy9DMIcym1e9W7oPe4nIYtQkFFjbDvyUOjkYkkLhuBhmFlMQQ7jGtkMNbl0nn5445vtO6fF+YtzTxKfqy4kcYmtHceQ6Y6CBXfQm4v+8dkb9z51c6jQj1GK2qJ8pTgmf5MV70qAgNXIAwkj3Vy4GYECQS7XoQggWT16G5sdqcFT1z4/LJ6fzONbZLttjFRawT+yEnbE6azDBfrLf7C+79X55f7w772HWWvDmMzvsn/KengFsEaGx</latexit>

ReLU(z) = max(0, z)
<latexit sha1_base64="tpvX57OOx4EZDDJwxaIFA+Uqi8w=">AAACA3icbVDLSgNBEJyNrxhfq970MhiECBJ2VdCLEPTiwUMUE4UkhNlJxwyZfTDTK0mWgBd/xYsHRbz6E978GyePgyYWNBRV3XR3eZEUGh3n20rNzM7NL6QXM0vLK6tr9vpGWYex4lDioQzVncc0SBFACQVKuIsUMN+TcOu1zwf+7QMoLcLgBrsR1Hx2H4im4AyNVLe3qggdTK7hstTP9fZOadVnnZyzT3t7dTvr5J0h6DRxxyRLxijW7a9qI+SxDwFyybSuuE6EtYQpFFxCP1ONNUSMt9k9VAwNmA+6lgx/6NNdozRoM1SmAqRD9fdEwnytu75nOn2GLT3pDcT/vEqMzZNaIoIoRgj4aFEzlhRDOgiENoQCjrJrCONKmFspbzHFOJrYMiYEd/LlaVI+yLuHeefqKFs4G8eRJttkh+SIS45JgVyQIikRTh7JM3klb9aT9WK9Wx+j1pQ1ntkkf2B9/gCi/JYz</latexit>

↵ = 0.025
<latexit sha1_base64="GxyagE92KSMVQEIKdX70/1bBiDc=">AAAB83icbVBNSwMxEJ31s9avqkcvi0XwVHaroheh6MVjBfsB3aXMptk2NJsNSVYopX/DiwdFvPpnvPlvTNs9aOuDYR7vzZDJiyRn2njet7Oyura+sVnYKm7v7O7tlw4OmzrNFKENkvJUtSPUlDNBG4YZTttSUUwiTlvR8G7qt56o0iwVj2YkaZhgX7CYETRWCgLkcoA3XsWrXnZLZdtncJeJn5My5Kh3S19BLyVZQoUhHLXu+J404RiVYYTTSTHINJVIhtinHUsFJlSH49nNE/fUKj03TpUtYdyZ+ntjjInWoySykwmagV70puJ/Xicz8XU4ZkJmhgoyfyjOuGtSdxqA22OKEsNHliBRzN7qkgEqJMbGVLQh+ItfXibNasU/r3gPF+XabR5HAY7hBM7AhyuowT3UoQEEJDzDK7w5mfPivDsf89EVJ985gj9wPn8ASyuQiA==</latexit>

ELU (Exponential Linear Unit)

PReLU (Parameterized Rectified Linear Unit)

here, alpha is a trainable parameter

(
z, if z 0
PReLU(z) =
↵=1
<latexit sha1_base64="yzGSqYTdMxl7DUavLZUx0UoSOM8=">AAAB73icbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0ItQ9OKxgv2ANpTJdtMu3Wzi7kYooX/CiwdFvPp3vPlv3LY5aOuDgcd7M8zMCxLBtXHdb6ewsrq2vlHcLG1t7+zulfcPmjpOFWUNGotYtQPUTHDJGoYbwdqJYhgFgrWC0e3Ubz0xpXksH8w4YX6EA8lDTtFYqd1FkQzx2uuVK27VnYEsEy8nFchR75W/uv2YphGThgrUuuO5ifEzVIZTwSalbqpZgnSEA9axVGLEtJ/N7p2QE6v0SRgrW9KQmfp7IsNI63EU2M4IzVAvelPxP6+TmvDKz7hMUsMknS8KU0FMTKbPkz5XjBoxtgSp4vZWQoeokBobUcmG4C2+vEyaZ1XvvOreX1RqN3kcRTiCYzgFDy6hBndQhwZQEPAMr/DmPDovzrvzMW8tOPnMIfyB8/kDgdGPnA==</latexit>
↵z, otherwise
<latexit sha1_base64="5XMW6SQ4ZR0mdd8VGBaZZP6S0qU=">AAACcnicbVFNT9tAEF2bfkD6FUBcWqmdNgIFCUU2IMEFCbWXHnpIqwaQ4ihab8bJivXa7I5pE8s/oH+vt/6KXvgBbFKrUOiTVnp6M29m922cK2kpCH55/tKDh48eL680njx99vxFc3XtxGaFEdgTmcrMWcwtKqmxR5IUnuUGeRorPI3PP8zrp5dorMz0V5rmOEj5WMtECk5OGjZ/QET4ncruF/zUq9qzbThqQI0oxrHUpXDjbfVXne3AVm0CmUAFM4jGeAEBRNGNlat8wstZdas5owmab9Ii3AyLUI/qBcNmK+gEC8B9EtakxWp0h82f0SgTRYqahOLW9sMgp0HJDUmhsGpEhcWci3M+xr6jmqdoB+Uisgo2nTKCJDPuaIKFettR8tTaaRq7zpTTxN6tzcX/1foFJYeDUuq8INTiz6KkUEAZzPOHkTQoSE0d4cJId1cQE264IPdLDRdCePfJ98nJbifc6wSf91vH7+s4ltkr9o61WcgO2DH7yLqsxwT77W14r7033pX/0n/r19n5Xu1ZZ//A37kGhZ6ziQ==</latexit>

ELU(z) = max(0, z) + min(0, ↵ ⇥ (exp(z)


<latexit sha1_base64="VXwC+63HXOVSA52FqXMXQP5tMss=">AAACKXicbZBBaxpBFMdnTZNamyYmOfYyVApKUtlNA8klICmFHnKwUKPgirwdnzpkdnaZeRs0i18nl3yVXFpoaXvNF+loPLTaPwz8+L/3ePP+UaqkJd//5RU2nm1uPS++KL3cfrWzW97bv7JJZgS2RKIS04nAopIaWyRJYSc1CHGksB1df5jX2zdorEz0F5qm2IthpOVQCiBn9cuNkHBC+cfL1qx6W+PnPIxhUvWPHB86ltoxD0GlY+AhyRgtr4Y4SefN73hQq/XLFb/uL8TXIVhChS3V7Je/hYNEZDFqEgqs7QZ+Sr0cDEmhcFYKM4spiGsYYdehBrezly8unfG3zhnwYWLc08QX7t8TOcTWTuPIdcZAY7tam5v/q3UzGp71cqnTjFCLp0XDTHFK+Dw2PpAGBampAxBGur9yMQYDgly4JRdCsHryOlwd14P3df/zSaVxsYyjyF6zN6zKAnbKGuwTa7IWE+yOPbDv7Id37331fnq/n1oL3nLmgP0j7/EPxPuimA==</latexit>
1)) PReLU(z) = max(0, z) + ↵ ⇥ min(0, z)
<latexit sha1_base64="vr0e7xo3W+vVX5oMYw4nIbD+6JQ=">AAACHnicbZDPaxNBFMdnU21j1JrWo5fBIESUsFtb2ksh1IsHD1HMD8iG8HbykgyZnV1m3pakS/6SXvqvePHQIoIn/W+cbHLQ1AcDH77f93jzvlGqpCXf/+2Vdh483N0rP6o8fvJ0/1n14LBjk8wIbItEJaYXgUUlNbZJksJeahDiSGE3mr1f+d1LNFYm+gstUhzEMNFyLAWQk4bVk5BwTnnrM35sL+tXr/k5D2OY1/23jt/wEFQ6BR6SjNE6R+rCGVZrfsMvit+HYAM1tqnWsPozHCUii1GTUGBtP/BTGuRgSAqFy0qYWUxBzGCCfYca3LpBXpy35K+cMuLjxLiniRfq3xM5xNYu4sh1xkBTu+2txP95/YzGZ4Nc6jQj1GK9aJwpTglfZcVH0qAgtXAAwkj3Vy6mYECQS7TiQgi2T74PnaNG8K7hfzquNS82cZTZC/aS1VnATlmTfWAt1maCXbOv7JbdeTfeN++792PdWvI2M8/ZP+X9+gP8tZ/j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 26
Model Evaluation

Multilayer Perceptron Architecture


Nonlinear Activation Functions
Multilayer Perceptron Code Examples
Overfitting and Underfitting
Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 36
Recommended Practice: Looking at Some Failure Cases

Failure cases of a ~93% accuracy (not very good, but beside the point)
2-layer (1-hidden layer) MLP on MNIST
(where t=target class and p=predicted class)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 37
Overfitting and Underfitting

We usually use the test set error


as estimator of the generalization error

Training Error

Error
Generalization Error

Overfitting

Model Capacity

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 38
Bias-Variance Decomposition
Bias-Variance Decomposition

General Definition: Intuition:

Bias [ ˆ
✓] = E[ ˆ
✓] ✓
Bias(θ)̂ ✓= E[θ]̂ − θ
<latexit sha1_base64="ArW0mTesET86qB5wWHfkERheXBA=">AAACKnicbVDJSgNBEO1xN25Rj14Gg+DFMKOCXgQXBI8KZoHMEGo6laRJz/TQXSOEId/jxV/x4kERr36InUUw6oOGV+9V0VUvSqUw5Hnvzszs3PzC4tJyYWV1bX2juLlVNSrTHCtcSaXrERiUIsEKCZJYTzVCHEmsRb2roV97QG2ESu6pn2IYQycRbcGBrNQsXgQqRQ2kdAIx5pcCzKCZB9RFgkEj6AJ9F+HZ9XR9MCbNYskreyO4f4k/ISU2wW2z+BK0FM9iTIhLMKbheymFOWgSXOKgEGQGU+A96GDD0uFeJsxHpw7cPau03LbS9iXkjtSfEznExvTjyHbGQF3z2xuK/3mNjNqnYS6SNCNM+PijdiZdUu4wN7clNHKSfUuAa2F3dXkXNHCy6RZsCP7vk/+S6mHZPyp7d8el88tJHEtsh+2yfeazE3bObtgtqzDOHtkze2VvzpPz4rw7H+PWGWcys82m4Hx+ARaKqXc=</latexit>

(we ignore noise in this


lecture for simplicity)

( )
2
Var(θ)̂ = E[θ2̂ ] − E[θ ]̂
h i
Var
Var(
ˆ
θ)̂ =✓ [E[(E[
✓] = ˆ
θ]̂ −Eθ)̂ 2]✓
2 ˆ
(E[✓]) 2

h i
<latexit sha1_base64="+kALwOjv4LtTP3EAUE/0PGX7wiU=">AAACR3icbVBBS+NAGJ3UddXq7lY9eglbFvSwJXEX1osgiuBRwVahieXL9EszOMmEmS9CCfl3Xrzubf+CFw+KeHRSK1jdBwNv3nsf882LcikMed4/pzH3af7zwuJSc3nly9dvrdW1nlGF5tjlSip9HoFBKTLskiCJ57lGSCOJZ9HlQe2fXaE2QmWnNM4xTGGUiVhwICsNWheBylEDKZ1BimUPdDUoA0qQoOoHCdDrJdw9DCTGNCNelNtVoMUoofDn5uFsfqs2B6221/EmcD8Sf0rabIrjQetvMFS8SDEjLsGYvu/lFJagSXCJVTMoDObAL2GEfUvrpU1YTnqo3B9WGbqx0vZk5E7UtxMlpMaM08gmU6DEvPdq8X9ev6B4JyxFlheEGX95KC6kS8qtS3WHQiMnObYEuBZ2V5cnoIGTrb5pS/Dff/kj6W13/F8d7+R3e29/Wsci22Df2Sbz2R+2x47YMesyzq7ZLbtnD86Nc+c8Ok8v0YYznVlnM2g4z8j+tLQ=</latexit>

Var✓ [✓] ˆ = E (E[✓] ˆ ✓)


ˆ2
<latexit sha1_base64="xMMnWtUSSo8rahxD33Zw+9RyoF4=">AAACQ3icbVBNSxxBEO0xfmUTdWOOuQwugjm4zKigF0EMgkcFdxVnJktNb81Osz3TQ3eNsAzz33LJH8jNP+AlB0PwKti7ruDXg4ZX71VR1S8upDDkedfOzIfZufmFxY+NT5+XlleaX1a7RpWaY4crqfRFDAalyLFDgiReFBohiyWex8MfY//8CrURKj+jUYFRBoNcJIIDWanXvAxVgRpI6RwyrLqg614VUooEdRCmQE9FtH8USkwo2Dh6qW8+r77/rLbqUItBSlGv2fLa3gTuW+JPSYtNcdJr/gn7ipcZ5sQlGBP4XkFRBZoEl1g3wtJgAXwIAwwsHR9somqSQe2uW6XvJkrbl5M7UZ9PVJAZM8pi25kBpea1Nxbf84KSkr2oEnlREub8cVFSSpeUOw7U7QuNnOTIEuBa2FtdnoIGTjb2hg3Bf/3lt6S71fa3297pTuvgcBrHIvvG1tgG89kuO2DH7IR1GGe/2A27Zf+c385f579z99g640xnvrIXcO4fAEIitAM=</latexit>

Sebastian Raschka STATRaschka


Sebastian 453: Intro to Deep
STAT Learning
479: Machine and
Learning Generative
FS 2018Models SS 2020 39
25
Bias & Variance vs Overfitting & Underfitting

Training Error

Underfitting Overfitting
increases increases Generalization Error

Variance
Bias

Model Capacity

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 40
Deep Learning Works Best with Large Datasets

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 42
Bias & Variance vs Overfitting & Underfitting

When reading DL resources, you'll notice many researchers use


bias and variance to describe underfitting and overfitting
(they are related but not the same!)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
Multilayer Perceptron Architecture
Nonlinear Activation Functions
Multilayer Perceptron Code Examples
Overfitting and Underfitting
Cats & Dogs and Custom Data Loaders

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
VGG16 Convolutional Neural Network for
Kaggle's Cats and Dogs Images
A "real world" example

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
Training/Validation/Test splits

Ratio depends on the dataset size, but a 80/5/15 split is usually a good idea

• Training set is used for training, it is not necessary to plot the training accuracy during training but
it can be useful
• Validation set accuracy provides a rough estimate of the generalization performance (it can be
optimistically biased if you design the network to do well on the validation set ("information
leakage")
• Test set should only be used once to get an unbiased estimate of the generalization performance

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
Training/Validation/Test splits

Epoch: 001/100 | Batch 000/156 | Cost: 1136.9125


Epoch: 001/100 | Batch 120/156 | Cost: 0.6327
Epoch: 001/100 Train Acc.: 63.35% | Validation Acc.: 62.12%
Time elapsed: 3.09 min
Epoch: 002/100 | Batch 000/156 | Cost: 0.6675
Epoch: 002/100 | Batch 120/156 | Cost: 0.6640
Epoch: 002/100 Train Acc.: 66.05% | Validation Acc.: 66.32%
Time elapsed: 6.15 min
Epoch: 003/100 | Batch 000/156 | Cost: 0.6137
Epoch: 003/100 | Batch 120/156 | Cost: 0.6311
Epoch: 003/100 Train Acc.: 65.82% | Validation Acc.: 63.76%
Time elapsed: 9.21 min
Epoch: 004/100 | Batch 000/156 | Cost: 0.5993
Epoch: 004/100 | Batch 120/156 | Cost: 0.5832
Epoch: 004/100 Train Acc.: 66.75% | Validation Acc.: 64.52%
Time elapsed: 12.27 min
Epoch: 005/100 | Batch 000/156 | Cost: 0.5918
Epoch: 005/100 | Batch 120/156 | Cost: 0.5747
Epoch: 005/100 Train Acc.: 68.29% | Validation Acc.: 67.00%
Time elapsed: 15.33 min
...

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 47
Parameters vs Hyperparameters
Parameters Hyperparameters
• weights (weight parameters) • minibatch size
• biases (bias units) • data normalization schemes
• number of epochs
• number of hidden layers
• number of hidden units
• learning rates
• (random seed, why?)
• loss function
• various weights (weighting terms)
• activation function types
• regularization schemes (more later)
• weight initialization schemes (more later)
• optimization algorithm type (more later)
• ...

(Mostly no scientific explanation, mostly engineering;


need to try many things -> "graduate student descent")

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 48
Custom DataLoader Classes ...
• Example showing how you can create your own data loader to efficiently iterate
through your own collection of images
(pretend the MNIST images there are some custom image collection)

https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L08-mlp/code/
custom-dataloader/custom-dataloader-example.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 49
DataLoader with Train/Validation/Test splits

https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L08-mlp/code/mnist-validation-split.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 50
Lecture 09

Regularization

STAT 453: Deep Learning, Spring 2020


Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
Goal: Reduce Overfitting

usually achieved by reducing model capacity


and/or reduction of the variance of the
predictions (as explained last lecture)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
Regularization

In the context of deep learning, regularization can be


understood as the process of adding information / changing
the objective function to prevent overfitting

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Regularization / Regularizing Effects

Goal: reduce overfitting


usually achieved by reducing model capacity and/or reduction of
the variance of the predictions (as explained last lecture)

Common Regularization Techniques for DNNs:


• Early stopping
• L1/L2 regularization (norm penalties)
• Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 4
Lecture Overview

1. Avoiding overfitting with more data


and data augmentation

2. Reducing network capacity & early


stopping

3. Adding norm penalties to the loss: L1 &


L2 regularization

4. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
General Strategies to Avoid Overfitting

1. Collecting more data is best & always recommended

2. Data augmentation is also helpful (e.g., for images: random


rotation, crop, translation ...)

3. Additionally, reducing the model capacity by reducing the


number of parameters or adding regularization (better)
helps

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Hig
Best Way to Reduce Overfitting is Collecting More Data

(Not
Figure 3: Illustration of bias and variance.

Figure 4:on
Softmax Learning
MNISTcurves of (test
subset softmax
setclassifiers fit toconstant)
size is kept MNIST.

the training set Raschka


Sebastian is small, the algorithm is more
STAT 453: likely
Intro to Deeppicking
Learningup noise
and in the training
Generative Models set so that the
SS 2020 7
Data Augmentation in PyTorch via TorchVision

Original

Randomly Augmented

Randomly Augmented
without resample=PIL.Image.BILINEAR

https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L09-regularization/code/
data-augmentation.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Use (0.5, 0.5, 0.5) for RGB images

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
Other Ways for Dealing with Overfitting
if Collecting More Data is not Feasible
=> Reducing Network's Capacity by Other Means

1. Avoiding overfitting with more data and data augmentation


2. Reducing network capacity & early stopping
3. Adding norm penalties to the loss: L1 & L2 regularization
4. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
Other Ways for Dealing with Overfitting
if Collecting More Data is not Feasible
=> Reducing Network's Capacity by Other Means

• choose a smaller architecture: fewer hidden layers & units,


add dropout, (use ReLU, which can result in "dead
activations", add L1 norm penalty)

• enforce smaller weights: Early stopping, L2 norm penalty

• add noise: Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
Early Stopping
Step 1: Split your dataset into 3 parts (always recommended)

• use test set only once at the end (for unbiased estimate of
generalization performance)
• use validation accuracy for tuning (always recommended)

Dataset
Training Validation Test
dataset dataset dataset

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 13
Early Stopping

Step 2: Early stopping (not very common anymore)

• reduce overfitting by observing the training/validation accuracy


gap during training and then stop at the "right" point

Training set

Accuracy Validation set

Good early stopping point

Epochs
Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 14
1. Avoiding overfitting with more data and data augmentation
2. Reducing network capacity & early stopping
3. Adding norm penalties to the loss: L1 & L2 regularization
4. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
L1/L2 Regularization

As I am sure you already know it from various statistics classes,


we will keep it short:

• L1-regularization => LASSO regression


• L2-regularization => Ridge regression (Thikonov regularization)

Basically, a "weight shrinkage" or a "penalty against complexity"

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 16
L1/L2 Regularization
for Linear Models (e.g., Logistic Regression)

Xn
1
Costw,b = L(y [i] , ŷ [i] )
n i=1
<latexit sha1_base64="V59LNcdyxr0eVoJo+5mccNlvb8E=">AAACUHicbVHLatwwFL2e9JG6r2m67EZ0KKRQBjsttJtASDZddDGBzExg7BhZI8+IyLKRrpMaoU/MJrt8RzZdtLSaF7RJLwgdnXMuujrKaykMRtFN0Nl68PDR4+0n4dNnz1+87L7aGZmq0YwPWSUrfZpTw6VQfIgCJT+tNadlLvk4Pz9a6OMLro2o1Am2NU9LOlOiEIyip7LuLEH+He1RZdBlNikpzvPCXroPZINz5wjZJ0mhKbOxs8qRxDRlZsV+7M4UISsno9J+c7vtmZ2I1LeHyZyibd3q/D7r9qJ+tCxyH8Rr0IN1DbLudTKtWFNyhUxSYyZxVGNqqUbBJHdh0hheU3ZOZ3zioaIlN6ldBuLIO89MSVFpvxSSJft3h6WlMW2Ze+didnNXW5D/0yYNFl9SK1TdIFdsdVHRSIIVWaRLpkJzhrL1gDIt/KyEzalPDv0fhD6E+O6T74PRXj/+2I+OP/UODtdxbMMbeAu7EMNnOICvMIAhMLiCW/gJv4Lr4EfwuxOsrJsdXsM/1Qn/AM9ytVA=</latexit>

Xn X
1 [i] [i] 2
L2-Regularized-Costw,b = L(y , ŷ ) + wj
n i=1 n j
<latexit sha1_base64="UZp3ipt8/eQftFzaePzqg+XpUpo=">AAACgHicbVFdb9MwFHXCYKN8rINHXqxVSEWsXVKQhpAmTexlD3sYaN0mNWnkOE7rzXEi+4YRLP8O/hdv/Bgk3DYTbONKls4998P3nptWgmsIgl+e/2Dt4aP1jcedJ0+fPd/sbr0402WtKBvTUpTqIiWaCS7ZGDgIdlEpRopUsPP06nARP//KlOalPIWmYnFBZpLnnBJwVNL9EQH7BuZ4NPjCZrUgin9n2eCw1GATExUE5mluru0OvsGptRjv4yhXhJrQGmlxpOsiMXw/tFOJ8SqTEmGObb+ZmgmPXXknmhMwjV35b/DbtkMk3KwZ+dvnEl8nl9NR0u0Fw2Bp+D4IW9BDrZ0k3Z9RVtK6YBKoIFpPwqCC2BAFnApmO1GtWUXoFZmxiYOSFEzHZimgxa8dk+G8VO5JwEv23wpDCq2bInWZi+X03diC/F9sUkP+ITZcVjUwSVcf5bXAUOLFNXDGFaMgGgcIVdzNiumcOGHA3azjRAjvrnwfnI2G4bth8Pl97+BTK8cGeoW2UR+FaA8doCN0gsaIot9ez9vxBr7v9/1dP1yl+l5b8xLdMv/jHwobwpQ=</latexit>

X
where: wj2 = ||w||22
<latexit sha1_base64="ibct1zBUFvljjClJ/FYMGMyWyc4=">AAACCnicbVDLSsNAFJ34rPUVdelmtAiuShIF3QhFNy4r2Ac0aZhMJ+20kwczE0tJunbjr7hxoYhbv8Cdf+Ok7UJbD1w4nHMv997jxYwKaRjf2tLyyuraemGjuLm1vbOr7+3XRZRwTGo4YhFvekgQRkNSk1Qy0ow5QYHHSMMb3OR+44FwQaPwXo5i4gSoG1KfYiSV5OpHtkgCtw+Hbr9twSuYZXaAZM/z0+E4y1yrbbl6ySgbE8BFYs5ICcxQdfUvuxPhJCChxAwJ0TKNWDop4pJiRsZFOxEkRniAuqSlaIgCIpx08soYniilA/2IqwolnKi/J1IUCDEKPNWZ3ynmvVz8z2sl0r90UhrGiSQhni7yEwZlBPNcYIdygiUbKYIwp+pWiHuIIyxVekUVgjn/8iKpW2XzrGzcnZcq17M4CuAQHINTYIILUAG3oApqAINH8AxewZv2pL1o79rHtHVJm80cgD/QPn8AjqmaLA==</latexit>
j

and λ is a hyperparameter

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 17
L2 Regularization for Multilayer Neural Networks

Xn L
X
1 [i] [i] (l) 2
L2-Regularized-Costw,b = L(y , ŷ ) + ||w ||F
n i=1 n
<latexit sha1_base64="RCGZxKRvoEmWPZXkSusdbx4t9Lk=">AAACm3icbVFdb9MwFHXC1ygf6+AFCSFZVEidYFVSJsHL0EQlmFAfBqLbpKaNHMdprTlOZN8AwfWf4qfwxr/BaYMYG1eydHzuPfdeHyel4BqC4JfnX7t+4+atrdudO3fv3d/u7jw40UWlKJvQQhTqLCGaCS7ZBDgIdlYqRvJEsNPkfNTkT78wpXkhP0NdsllOFpJnnBJwVNz9EQH7BmY83PvEFpUgin9n6d6o0GBjE+UElklmvtoX+A9OrMX4AEeZItSE1kiLI13lseEHoZ1LjDeVlAgztv16bqZ85uSdaEnA1HZz38XP2w6RcLum5EIf0fRxWrxa/Z0/N32xa1er+N18GHd7wSBYB74Kwhb0UBvHcfdnlBa0ypkEKojW0zAoYWaIAk4Fs52o0qwk9Jws2NRBSXKmZ2btrcXPHJPirFDuSMBr9qLCkFzrOk9cZbOtvpxryP/lphVkr2eGy7ICJulmUFYJDAVuPgqnXDEKonaAUMXdrpguifMM3Hd2nAnh5SdfBSfDQfhyEH7c7x2+be3YQo/RU9RHIXqFDtEROkYTRL1H3hvvvXfkP/FH/gd/vCn1vVbzEP0T/uQ3kk3NUw==</latexit>
l=1

sum over layers

(l) 2
where ||w ||F is the Frobenius norm (squared): <latexit sha1_base64="71TeQNuRgLqGJEbkEvQmFwkRrF4=">AAACAXicbVDLSsNAFJ3UV62vqBvBzWAR6qYkVdBlURCXFewD2jRMppN26OTBzEQpSdz4K25cKOLWv3Dn3zhps9DWAxcO59zLvfc4IaNCGsa3VlhaXlldK66XNja3tnf03b2WCCKOSRMHLOAdBwnCqE+akkpGOiEnyHMYaTvjq8xv3xMuaODfyUlILA8NfepSjKSSbP0gSXoekiPHjR/SflxhJ2mS2Nf9mq2XjaoxBVwkZk7KIEfD1r96gwBHHvElZkiIrmmE0ooRlxQzkpZ6kSAhwmM0JF1FfeQRYcXTD1J4rJQBdAOuypdwqv6eiJEnxMRzVGd2rZj3MvE/rxtJ98KKqR9Gkvh4tsiNGJQBzOKAA8oJlmyiCMKcqlshHiGOsFShlVQI5vzLi6RVq5qnVfP2rFy/zOMogkNwBCrABOegDm5AAzQBBo/gGbyCN+1Je9HetY9Za0HLZ/bBH2ifP8XjlxM=</latexit>

XX (l) 2
(l) 2
||w ||F = (wi,j )
<latexit sha1_base64="wC+HVng1YzDyBZBFO6aEDDWL7aY=">AAACJHicbVBJSwMxGM3Urdat6tFLsAgtSJmpgoIIRUE8VrALdKZDJs20aTMLScZSpvNjvPhXvHhwwYMXf4vpctDWByGP976P5D0nZFRIXf/SUkvLK6tr6fXMxubW9k52d68mgohjUsUBC3jDQYIw6pOqpJKRRsgJ8hxG6k7/euzXHwgXNPDv5TAkloc6PnUpRlJJdvZiNDI9JLuOGw+SVpxnhWQ0sm9aJXgJTRF5Np1ePZgfTG07psewlxRaJTub04v6BHCRGDOSAzNU7Oy72Q5w5BFfYoaEaBp6KK0YcUkxI0nGjAQJEe6jDmkq6iOPCCuehEzgkVLa0A24Or6EE/X3Row8IYaeoybHgcS8Nxb/85qRdM+tmPphJImPpw+5EYMygOPGYJtygiUbKoIwp+qvEHcRR1iqXjOqBGM+8iKplYrGSdG4O82Vr2Z1pMEBOAR5YIAzUAa3oAKqAINH8AxewZv2pL1oH9rndDSlzXb2wR9o3z95QaQC</latexit>
i j

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
L2 Regularization for Neural Nets

Regular gradient descent update:

@L
wi,j := wi,j ⌘
<latexit sha1_base64="5ZjwwihY2mDS1cW07vV16R1eYEo=">AAACNHicbVDLSgMxFM34rPVVdekmWAQXWmZUUASh6EbQRQX7gE4pd9JMG5t5kGSUMsxHufFD3IjgQhG3foOZtlRtPRA4nHvPzb3HCTmTyjRfjKnpmdm5+cxCdnFpeWU1t7ZekUEkCC2TgAei5oCknPm0rJjitBYKCp7DadXpnqf16h0VkgX+jeqFtOFB22cuI6C01Mxd3jdjtotvE3xyikd8D9tUAbZdASS2QxCKAce2B6pDgMdXSfKjjlxJM5c3C2YfeJJYQ5JHQ5SauSe7FZDIo74iHKSsW2aoGnE6mHCaZO1I0hBIF9q0rqkPHpWNuH90gre10sJuIPTzFe6rvx0xeFL2PEd3pnvL8Voq/lerR8o9bsTMDyNFfTL4yI04VgFOE8QtJihRvKcJEMH0rph0QCeldM5ZHYI1fvIkqewXrIOCdX2YL54N48igTbSFdpCFjlARXaASKiOCHtAzekPvxqPxanwYn4PWKWPo2UB/YHx9Aw3LqpE=</latexit>
@wi,j

Gradient descent update with L2 regularization:


✓ ◆
@L 2
wi,j := wi,j ⌘ + wi,j
<latexit sha1_base64="2cuezq6jhNya88Eqgh7isNRI5hg=">AAACYnicbVHLSsQwFE3re3yNutTFxUFQ1KFVQREE0Y0LFwqOCtNhuM2kYzRNS5IqQ+lPunPlxg8xHYvvC4HDuffcx0mYCq6N57047sjo2PjE5FRtemZ2br6+sHitk0xR1qKJSNRtiJoJLlnLcCPYbaoYxqFgN+HDaZm/eWRK80RemUHKOjH2JY84RWOpbn3w1M35FtwXcHgEn3gbAmYQIAh5v78OQaSQ5kGKynAUEMRo7iiK/LwovthPdQGblWIHAmFX6WGRy+Kr+7DrRrfe8JreMOAv8CvQIFVcdOvPQS+hWcykoQK1bvteajp5OZ4KVtSCTLMU6QP2WdtCiTHTnXxoUQFrlulBlCj7pIEh+12RY6z1IA5tZXmd/p0ryf9y7cxEB52cyzQzTNKPQVEmwCRQ+g09rhg1YmABUsXtrkDv0Lpj7K/UrAn+75P/guudpr/b9C/3GscnlR2TZJmsknXik31yTM7IBWkRSl6dMWfOmXfe3Jq74C59lLpOpVkiP8JdeQdk5bTc</latexit>
@wi,j n

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
L2 Regularization for Neural Nets in PyTorch

• For all layers, same as before ("automatic approach" via weight_decay)

• Or, manually: for epoch in range(NUM_EPOCHS):


model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.view(-1, 28*28).to(DEVICE)


targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)

cost = F.cross_entropy(logits, targets)

# regularize loss
L2 = 0.
for p in model.parameters():
L2 = L2 + (p**2).sum()
cost = cost + 2./targets.size(0) * LAMBDA * L2

optimizer.zero_grad()
cost.backward()

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 26
L2 Regularization for Neural Nets in PyTorch

• For all layers, same as before ("automatic approach" via weight_decay)

• Or, manually: for epoch in range(NUM_EPOCHS):


model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.view(-1, 28*28).to(DEVICE)


targets = targets.to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)

cost = F.cross_entropy(logits, targets)


Why did I use
# regularize loss
"/target.size(0)" here? L2 = 0.
for p in model.parameters():
L2 = L2 + (p**2).sum()
cost = cost + 2./targets.size(0) * LAMBDA * L2

optimizer.zero_grad()
cost.backward()

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 27
L2 Regularization for Neural Nets in PyTorch

• Or, if you only want to regularize the weights, not the biases:

# regularize loss
L2 = 0.
for name, p in model.named_parameters():
if 'weight' in name:
L2 = L2 + (p**2).sum()

cost = cost + 2./targets.size(0) * LAMBDA * L2

optimizer.zero_grad()
cost.backward()

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 28
Effect of Norm Penalties on the Decision Boundary
Assume a nonlinear model

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 29
Dropout*
*Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way
to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

1. Avoiding overfitting with more data and data augmentation


2. Reducing network capacity & early stopping
3. Adding norm penalties to the loss: L1 & L2 regularization
4. Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 30
Dropout

Original research articles:


Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012).
Improving neural networks by preventing co-adaptation of feature detectors. arXiv
preprint arXiv:1207.0580.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. The Journal of
Machine Learning Research, 15(1), 1929-1958.

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 31
Dropout in a Nutshell: Dropping Nodes

(1)
a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2) o1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>

x2 (2)
a2 o2 <latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>
L(y, o)
<latexit sha1_base64="jRuGYuNAf6C7yfYguq+x/vIHq08=">AAACDHicbZDLSsNAFIYnXmu9VV26GSxCBSmJCrosunHhooK9QBvKZDpph05mwsxECCEP4MZXceNCEbc+gDvfxkkaQVt/GPj4zznMOb8XMqq0bX9ZC4tLyyurpbXy+sbm1nZlZ7etRCQxaWHBhOx6SBFGOWlpqhnphpKgwGOk402usnrnnkhFBb/TcUjcAI049SlG2liDSrUfID3GiCU3aS1nz0/i9Bj+sEiPTJddt3PBeXAKqIJCzUHlsz8UOAoI15ghpXqOHWo3QVJTzEha7keKhAhP0Ij0DHIUEOUm+TEpPDTOEPpCmsc1zN3fEwkKlIoDz3RmK6rZWmb+V+tF2r9wE8rDSBOOpx/5EYNawCwZOKSSYM1iAwhLanaFeIwkwtrkVzYhOLMnz0P7pO6c1u3bs2rjsoijBPbBAagBB5yDBrgGTdACGDyAJ/ACXq1H69l6s96nrQtWMbMH/sj6+AYGV5uW</latexit>

<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>

<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(2) o3
a3
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>

<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>

(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>

a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>
Originally, drop probability 0.5
(1) (but 0.2-0.8 also common now)
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 32
Dropout in a Nutshell: Dropping Nodes

How do we drop the nodes practically/efficiently?

Bernoulli Sampling (during training):


• p := drop probability
• v := random sample from uniform distribution in range [0, 1]
• <latexit sha1_base64="2saPzN9SBNfiQrXUrd5jpDI59M4=">AAACMHicbZDNSitBEIV7/DfXn6hLN4Xhgqswc6+giILoQpcKRoVMCD2dGm3s6Rm6a4JhyCO58VF0o6CIW5/CniSCV++BhsNXVXTViTIlLfn+kzc2PjE5NT0zW/k1N7+wWF1aPrNpbgQ2RKpScxFxi0pqbJAkhReZQZ5ECs+j64Oyft5FY2WqT6mXYSvhl1rGUnByqF09DOPUcKVAQig1hAmnqyguuv1t6Lbl9i74EBLeUAEyhn7JYAeyT4bK4pC2qzW/7g8EP00wMjU20nG7eh92UpEnqEkobm0z8DNqFdyQFAr7lTC3mHFxzS+x6azmCdpWMTi4D78d6YDb3D1NMKBfJwqeWNtLItdZHmS/10r4v1ozp3irVUid5YRaDD+KcwWUQpkedKRBQarnDBdGul1BXHHDBbmMKy6E4PvJP83Zn3rwtx6cbNT29kdxzLBVtsbWWcA22R47YseswQS7ZQ/smb14d96j9+q9DVvHvNHMCvtH3vsHaBmn+Q==</latexit>
8i 2 v : vi := 0 if vi < p else vi
• a := a v (p × 100% of the activations a will be zeroed)
<latexit sha1_base64="szUeBY7jxQv01MfWqzesiW1S2RI=">AAACE3icbVDLSsNAFJ3UV42vqEs3g0UQFyVRQRGEohuXFewDmlAmk0k7dJIJM5NCCf0HN/6KGxeKuHXjzr9x0gaprQcGzpxzL/fe4yeMSmXb30ZpaXllda28bm5sbm3vWLt7TclTgUkDc8ZF20eSMBqThqKKkXYiCIp8Rlr+4Db3W0MiJOXxgxolxItQL6YhxUhpqWuduBFSfT/M0BheXcOZn8sDrsxfZTjuWhW7ak8AF4lTkAooUO9aX27AcRqRWGGGpOw4dqK8DAlFMSNj000lSRAeoB7paBqjiEgvm9w0hkdaCWDIhX6xghN1tiNDkZSjyNeV+YZy3svF/7xOqsJLL6NxkioS4+mgMGVQcZgHBAMqCFZspAnCgupdIe4jgbDSMZo6BGf+5EXSPK06Z1Xn/rxSuyniKIMDcAiOgQMuQA3cgTpoAAwewTN4BW/Gk/FivBsf09KSUfTsgz8wPn8Agrad7w==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 33
Dropout in a Nutshell: Dropping Nodes

How do we drop the nodes practically/efficiently?

Bernoulli Sampling (during training):


• p := drop probability
• v := random sample from uniform distribution in range [0, 1]
• <latexit sha1_base64="2saPzN9SBNfiQrXUrd5jpDI59M4=">AAACMHicbZDNSitBEIV7/DfXn6hLN4Xhgqswc6+giILoQpcKRoVMCD2dGm3s6Rm6a4JhyCO58VF0o6CIW5/CniSCV++BhsNXVXTViTIlLfn+kzc2PjE5NT0zW/k1N7+wWF1aPrNpbgQ2RKpScxFxi0pqbJAkhReZQZ5ECs+j64Oyft5FY2WqT6mXYSvhl1rGUnByqF09DOPUcKVAQig1hAmnqyguuv1t6Lbl9i74EBLeUAEyhn7JYAeyT4bK4pC2qzW/7g8EP00wMjU20nG7eh92UpEnqEkobm0z8DNqFdyQFAr7lTC3mHFxzS+x6azmCdpWMTi4D78d6YDb3D1NMKBfJwqeWNtLItdZHmS/10r4v1ozp3irVUid5YRaDD+KcwWUQpkedKRBQarnDBdGul1BXHHDBbmMKy6E4PvJP83Zn3rwtx6cbNT29kdxzLBVtsbWWcA22R47YseswQS7ZQ/smb14d96j9+q9DVvHvNHMCvtH3vsHaBmn+Q==</latexit>
8i 2 v : vi := 0 if vi < p else vi
• a := a v (p × 100% of the activations a will be zeroed)
<latexit sha1_base64="szUeBY7jxQv01MfWqzesiW1S2RI=">AAACE3icbVDLSsNAFJ3UV42vqEs3g0UQFyVRQRGEohuXFewDmlAmk0k7dJIJM5NCCf0HN/6KGxeKuHXjzr9x0gaprQcGzpxzL/fe4yeMSmXb30ZpaXllda28bm5sbm3vWLt7TclTgUkDc8ZF20eSMBqThqKKkXYiCIp8Rlr+4Db3W0MiJOXxgxolxItQL6YhxUhpqWuduBFSfT/M0BheXcOZn8sDrsxfZTjuWhW7ak8AF4lTkAooUO9aX27AcRqRWGGGpOw4dqK8DAlFMSNj000lSRAeoB7paBqjiEgvm9w0hkdaCWDIhX6xghN1tiNDkZSjyNeV+YZy3svF/7xOqsJLL6NxkioS4+mgMGVQcZgHBAMqCFZspAnCgupdIe4jgbDSMZo6BGf+5EXSPK06Z1Xn/rxSuyniKIMDcAiOgQMuQA3cgTpoAAwewTN4BW/Gk/FivBsf09KSUfTsgz8wPn8Agrad7w==</latexit>

Then, after training when making predictions (DL jargon: "inference")


scale activations via a := a (1 p) <latexit sha1_base64="VNKN/K5WhOwo17ucpiVRXZLFb2k=">AAACEHicbVDLSgMxFM3UV62vUZdugkWsC8uMCoogFN24rGAf0BlKJpNpQzPJkGSEMvQT3Pgrblwo4talO//GTNtFrR4InJxzL/feEySMKu0431ZhYXFpeaW4Wlpb39jcsrd3mkqkEpMGFkzIdoAUYZSThqaakXYiCYoDRlrB4Cb3Ww9EKir4vR4mxI9Rj9OIYqSN1LUPvRjpfhBlaAQvr+DMD3oiFBpWXHgMk6OuXXaqzhjwL3GnpAymqHftLy8UOI0J15ghpTquk2g/Q1JTzMio5KWKJAgPUI90DOUoJsrPxgeN4IFRQhgJaR7XcKzOdmQoVmoYB6YyX1jNe7n4n9dJdXThZ5QnqSYcTwZFKYNawDwdGFJJsGZDQxCW1OwKcR9JhLXJsGRCcOdP/kuaJ1X3tOrenZVr19M4imAP7IMKcME5qIFbUAcNgMEjeAav4M16sl6sd+tjUlqwpj274Beszx+rYpsK</latexit>

Q for you: Why is this required?

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 34
Dropout: Co-Adaptation Interpretation

Why does Dropout work well?


• Network will learn not to rely on particular connections too
heavily
• Thus, will consider more connections (because it cannot rely on
individual ones)
• The weight values will be more spread-out (may lead to smaller
weights like with L2 norm)
• Side note: You can certainly use different dropout probabilities in
different layers (assigning them proportional to the number of
units in a layer is not a bad idea, for example)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 35
Inverted Dropout

• Most frameworks implement inverted dropout


• Here, the activation values are scaled by the factor (1-p) during
training instead of scaling the activations during "inference"
• I believe Google started this trend (because it's computationally
cheaper in the long run if you use your model a lot after
training)
• PyTorch's Dropout implementation is also inverted Dropout

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
Dropout in PyTorch (Functional API)
class MultilayerPerceptron(torch.nn.Module):

def __init__(self, num_features, num_classes, drop_proba,


num_hidden_1, num_hidden_2):
super(MultilayerPerceptron, self).__init__()

self.drop_proba = drop_proba
self.linear_1 = torch.nn.Linear(num_features,
num_hidden_1)

self.linear_2 = torch.nn.Linear(num_hidden_1,
num_hidden_2)

self.linear_out = torch.nn.Linear(num_hidden_2,
num_classes)

def forward(self, x):


out = self.linear_1(x)
out = F.relu(out)
out = F.dropout(out, p=self.drop_proba, training=self.training)
out = self.linear_2(out)
out = F.relu(out)
out = F.dropout(out, p=self.drop_proba, training=self.training)
logits = self.linear_out(out)
probas = F.log_softmax(logits, dim=1)
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
Dropout in PyTorch ([more] Object-Oriented API)

class MultilayerPerceptron(torch.nn.Module):

def __init__(self, num_features, num_classes, drop_proba,


num_hidden_1, num_hidden_2):
super(MultilayerPerceptron, self).__init__()

self.my_network = torch.nn.Sequential(
torch.nn.Linear(num_features, num_hidden_1),
torch.nn.ReLU(),
torch.nn.Dropout(drop_proba),
torch.nn.Linear(num_hidden_1, num_hidden_2),
torch.nn.ReLU(),
torch.nn.Dropout(drop_proba),
torch.nn.Linear(num_hidden_2, num_classes)
)

def forward(self, x):


logits = self.my_network(x)
probas = F.softmax(logits, dim=1)
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
Dropout in PyTorch
Here, is is very important that you use model.train() and model.eval()!
for epoch in range(NUM_EPOCHS):
model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.view(-1, 28*28).to(DEVICE)

### FORWARD AND BACK PROP


logits, probas = model(features)

cost = F.cross_entropy(logits, targets)


optimizer.zero_grad()

cost.backward()
minibatch_cost.append(cost)
### UPDATE MODEL PARAMETERS
optimizer.step()

model.eval()
with torch.no_grad():
cost = compute_loss(model, train_loader)
epoch_cost.append(cost)
print('Epoch: %03d/%03d Train Cost: %.4f' % (
epoch+1, NUM_EPOCHS, cost))
print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
Dropout in PyTorch (Functional API)

Example implementation of the 3 previous slides:

https://github.com/rasbt/stat453-deep-learning-ss20/blob/master/L09-regularization/code/
dropout.ipynb

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 47
Dropout: More Practical Tips

• Don't use Dropout if your model does not overfit


• However, in that case above, it is then recommended to increase
the capacity to make it overfit, and then use dropout to be able
to use a larger capacity model (but make it not overfit)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 48
Lecture 10

Feature Normalization and


Weight Initialization
STAT 453: Deep Learning, Spring 2020
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat453-ss2020/

Slides:
https://github.com/rasbt/stat453-deep-learning-ss20/10_norm-and-init/

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 1
"Tricks" for Improving
Deep Neural Network Training

Today:
1. Feature/Input Normalization
(BatchNorm, InstanceNorm, GroupNorm, LayerNorm)
2. Weight Initialization (Xavier Glorot, Kaiming He)

Next Lecture:
3. Optimization Algorithms (RMSProp, Adagrad, ADAM)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 2
Part 1: Input Normalization

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 3
Recap: Why We Normalize Inputs for Gradient Descent
Surface of a convex cost function
minimum
(for simplicity)

w1
<latexit sha1_base64="mx/VFezHqvMY4WbFRc+kdsKvuk4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbTbt0swm7E6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuTaiFg94DjhfkQHSoSCUbTS/VPP65UrbtWdgSwTLycVyFHvlb+6/ZilEVfIJDWm47kJ+hnVKJjkk1I3NTyhbEQHvGOpohE3fjY7dUJOrNInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOyYbgLb68TJpnVe+86t1dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnRfn3fmYtxacfOYQ/sD5/AELJo2i</latexit>

(Keep in mind that we are using


the same learning rate for all weights, so large parameters
will dominate the updates)

w2
<latexit sha1_base64="qmpmSMS1f2q0FkcTJm+uMnFv//o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVinde8e4uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMqo2j</latexit>

"Standardization" of input features

0 [i] xj [i] µj
xj =
j
w1
<latexit sha1_base64="GSG5yknxkEDShk+Y1ybBQKfpdmU=">AAACHXicbZBNS8MwGMfT+TbnW9Wjl+AQvThaHehFGHrxOMG9wFpHmqVbXNKWJBVH6Rfx4lfx4kERD17Eb2O6VdDNPwR++T/PQ/L8vYhRqSzryyjMzS8sLhWXSyura+sb5uZWU4axwKSBQxaKtockYTQgDUUVI+1IEMQ9Rlre8CKrt+6IkDQMrtUoIi5H/YD6FCOlra5ZTe67t/vpTdKhbgrPIHR8gXCSuT/mIXR4rK+JI2mfI01ds2xVrLHgLNg5lEGuetf8cHohjjkJFGZIyo5tRcpNkFAUM5KWnFiSCOEh6pOOxgBxIt1kvF0K97TTg34o9AkUHLu/JxLEpRxxT3dypAZyupaZ/9U6sfJP3YQGUaxIgCcP+TGDKoRZVLBHBcGKjTQgLKj+K8QDpONROtCSDsGeXnkWmkcV+7hiX1XLtfM8jiLYAbvgANjgBNTAJaiDBsDgATyBF/BqPBrPxpvxPmktGPnMNvgj4/Mb38iibA==</latexit>

(scaled feature will have


<latexit sha1_base64="mx/VFezHqvMY4WbFRc+kdsKvuk4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbTbt0swm7E6WE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmFldW19o7hZ2tre2d0r7x80TZxqxhsslrFuB9RwKRRvoEDJ24nmNAokbwWjm6nfeuTaiFg94DjhfkQHSoSCUbTS/VPP65UrbtWdgSwTLycVyFHvlb+6/ZilEVfIJDWm47kJ+hnVKJjkk1I3NTyhbEQHvGOpohE3fjY7dUJOrNInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tOyYbgLb68TJpnVe+86t1dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnRfn3fmYtxacfOYQ/sD5/AELJo2i</latexit>

zero mean, unit variance)

w2
<latexit sha1_base64="qmpmSMS1f2q0FkcTJm+uMnFv//o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCmhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVinde8e4uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEMqo2j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 4
However, normalizing
the inputs to the network
only affects the first hidden layer ...
What about the other hidden layers?

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 5
Batch Normalization ("BatchNorm")

Ioffe, S., & Szegedy, C. (2015, June). Batch Normalization: Accelerating


Deep Network Training by Reducing Internal Covariate Shift. In
International Conference on Machine Learning (pp. 448-456).
http://proceedings.mlr.press/v37/ioffe15.html

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 6
Batch Normalization ("BatchNorm")

• Normalizes hidden layer inputs


• Helps with exploding/vanishing gradient problems
• Can increase training stability and convergence rate
• Can be understood as additional (normalization) layers
(with additional parameters)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 7
(2)
Suppose, we have net input z1
<latexit sha1_base64="1DVaoxm3QRMSqTsntv9ubd0y06E=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNuGJtklyQp16a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqtx6p0iySd2YcU1/ggWQhI9hY6f6p5z2k5erppFcsuRV3BrRMvIyUIEO9V/zq9iOSCCoN4VjrjufGxk+xMoxwOil0E01jTEZ4QDuWSiyo9tPZwRN0YpU+CiNlSxo0U39PpFhoPRaB7RTYDPWiNxX/8zqJCS/9lMk4MVSS+aIw4chEaPo96jNFieFjSzBRzN6KyBArTIzNqGBD8BZfXibNasU7q3i356XaVRZHHo7gGMrgwQXU4Abq0AACAp7hFd4c5bw4787HvDXnZDOH8AfO5w/FWo+6</latexit>

associated with an activation in the 2nd hidden layer


(1)
a1
y1 y2 y3
<latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2) o1
<latexit sha1_base64="sJdgXiAVm2a4S+4dRd3rRrYB1HY=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK1hbaUDbbTbt0swm7EyGE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSKQw6LrfTmlldW19o7xZ2dre2d2r7h88mjjVjLdYLGPdCajhUijeQoGSdxLNaRRI3g7GN1O//cS1EbF6wCzhfkSHSoSCUbTSfdb3+tWaW3dnIMvEK0gNCjT71a/eIGZpxBUySY3pem6Cfk41Cib5pNJLDU8oG9Mh71qqaMSNn89OnZATqwxIGGtbCslM/T2R08iYLApsZ0RxZBa9qfif100xvPJzoZIUuWLzRWEqCcZk+jcZCM0ZyswSyrSwtxI2opoytOlUbAje4svL5PGs7p3X3buLWuO6iKMMR3AMp+DBJTTgFprQAgZDeIZXeHOk8+K8Ox/z1pJTzBzCHzifPw3gjaM=</latexit> <latexit sha1_base64="fJxEJZDwIRAXzsny9UbZpYFPXZ4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbRqVe+i6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEPZI2k</latexit>
<latexit sha1_base64="85kz+6+8sUyRlr+84amQIqvMQLQ=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0msoMeiF48V7Qe0oWy2k3bpZhN2N0Io/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHkyXoR3QoecgZNVZ6yPq1frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYf0KV4UzgtNRLNSaUjekQu5ZKGqH2J/NTp+TMKgMSxsqWNGSu/p6Y0EjrLApsZ0TNSC97M/E/r5ua8NqfcJmkBiVbLApTQUxMZn+TAVfIjMgsoUxxeythI6ooMzadkg3BW355lbQuql6t6t5fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnBfn3flYtBacfOYY/sD5/AEQ6I2l</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>

x2 (2)
a2 o2 <latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>
L(y, o)
<latexit sha1_base64="jRuGYuNAf6C7yfYguq+x/vIHq08=">AAACDHicbZDLSsNAFIYnXmu9VV26GSxCBSmJCrosunHhooK9QBvKZDpph05mwsxECCEP4MZXceNCEbc+gDvfxkkaQVt/GPj4zznMOb8XMqq0bX9ZC4tLyyurpbXy+sbm1nZlZ7etRCQxaWHBhOx6SBFGOWlpqhnphpKgwGOk402usnrnnkhFBb/TcUjcAI049SlG2liDSrUfID3GiCU3aS1nz0/i9Bj+sEiPTJddt3PBeXAKqIJCzUHlsz8UOAoI15ghpXqOHWo3QVJTzEha7keKhAhP0Ij0DHIUEOUm+TEpPDTOEPpCmsc1zN3fEwkKlIoDz3RmK6rZWmb+V+tF2r9wE8rDSBOOpx/5EYNawCwZOKSSYM1iAwhLanaFeIwkwtrkVzYhOLMnz0P7pO6c1u3bs2rjsoijBPbBAagBB5yDBrgGTdACGDyAJ/ACXq1H69l6s96nrQtWMbMH/sj6+AYGV5uW</latexit>

<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>

<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(2) o3
a3
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>

<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>

(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>

a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>

(1)
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 8
Now, consider all examples in a minibatch such that the net input of a
given training example
at layer 2 is written as (2)[i]
z1
<latexit sha1_base64="h3wsgVN87oEGw0nFBa6FKhw7OeY=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSLUS0mqoMeiF48V7AekadlsN+3SzSbsbpQa8j+8eFDEq//Fm//GbZuDtj4YeLw3w8w8P+ZMadv+tlZW19Y3Ngtbxe2d3b390sFhS0WJJLRJIh7Jjo8V5UzQpmaa004sKQ59Ttv++Gbqtx+oVCwS93oSUy/EQ8ECRrA2Uu+p7/TSSu0sdZmXZf1S2a7aM6Bl4uSkDDka/dJXdxCRJKRCE46Vch071l6KpWaE06zYTRSNMRnjIXUNFTikyktnV2fo1CgDFETSlNBopv6eSHGo1CT0TWeI9UgtelPxP89NdHDlpUzEiaaCzBcFCUc6QtMI0IBJSjSfGIKJZOZWREZYYqJNUEUTgrP48jJp1arOedW5uyjXr/M4CnAMJ1ABBy6hDrfQgCYQkPAMr/BmPVov1rv1MW9dsfKZI/gD6/MHvpWSBQ==</latexit>

where i 2 {1, ..., n}


<latexit sha1_base64="xavMh8GCyfhKIMvpZVu3NIGfpu4=">AAAB/HicbVBNS8NAEJ3Ur1q/oj16WSyChxISFfRY9OKxgv2AJpTNdtsu3WzC7kYIof4VLx4U8eoP8ea/cdvmoK0PBh7vzTAzL0w4U9p1v63S2vrG5lZ5u7Kzu7d/YB8etVWcSkJbJOax7IZYUc4EbWmmOe0mkuIo5LQTTm5nfueRSsVi8aCzhAYRHgk2ZARrI/XtKkM+E8jPvTpyHKeOhD/t2zXXcedAq8QrSA0KNPv2lz+ISRpRoQnHSvU8N9FBjqVmhNNpxU8VTTCZ4BHtGSpwRFWQz4+folOjDNAwlqaERnP190SOI6WyKDSdEdZjtezNxP+8XqqH10HORJJqKshi0TDlSMdolgQaMEmJ5pkhmEhmbkVkjCUm2uRVMSF4yy+vkva541043v1lrXFTxFGGYziBM/DgChpwB01oAYEMnuEV3qwn68V6tz4WrSWrmKnCH1ifP3y5krk=</latexit>

(1)
a1
In the next slides, let's omit the layer
index, as it may be distracting...
<latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

x1 (2) o1
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
<latexit sha1_base64="P19Wda8vivmLhYYW0RO0w4mIIBA=">AAAB6nicbVA9SwNBEJ2LXzF+RS1tFoNgFe5U0DJoYxnRmEByhL3NXrJkb/fYnRNCyE+wsVDE1l9k579xk1yhiQ8GHu/NMDMvSqWw6PvfXmFldW19o7hZ2tre2d0r7x88Wp0ZxhtMS21aEbVcCsUbKFDyVmo4TSLJm9HwZuo3n7ixQqsHHKU8TGhfiVgwik66192gW674VX8GskyCnFQgR71b/ur0NMsSrpBJam078FMMx9SgYJJPSp3M8pSyIe3ztqOKJtyG49mpE3LilB6JtXGlkMzU3xNjmlg7SiLXmVAc2EVvKv7ntTOMr8KxUGmGXLH5ojiTBDWZ/k16wnCGcuQIZUa4WwkbUEMZunRKLoRg8eVl8nhWDc6r/t1FpXadx1GEIziGUwjgEmpwC3VoAIM+PMMrvHnSe/HevY95a8HLZw7hD7zPH/6VjZk=</latexit>

(1)
a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>
a2
(2) o2 <latexit sha1_base64="lp3ZeQ57DPk1EUCeNsK7Ny2DKe8=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexWQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0mrVvUvqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwAAKI2a</latexit>

<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

(1)
a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

(2) o3
a3
<latexit sha1_base64="wT+53Eb88nVtTpSfn4qk4kRsjrU=">AAAB6nicbVBNSwMxEJ3Ur1q/qh69BIvgqexaQY9FLx4r2g9ol5JNs21oNlmSrFCW/gQvHhTx6i/y5r8xbfegrQ8GHu/NMDMvTAQ31vO+UWFtfWNzq7hd2tnd2z8oHx61jEo1ZU2qhNKdkBgmuGRNy61gnUQzEoeCtcPx7cxvPzFtuJKPdpKwICZDySNOiXXSg+rX+uWKV/XmwKvEz0kFcjT65a/eQNE0ZtJSQYzp+l5ig4xoy6lg01IvNSwhdEyGrOuoJDEzQTY/dYrPnDLAkdKupMVz9fdERmJjJnHoOmNiR2bZm4n/ed3URtdBxmWSWibpYlGUCmwVnv2NB1wzasXEEUI1d7diOiKaUOvSKbkQ/OWXV0nrourXqt79ZaV+k8dRhBM4hXPw4QrqcAcNaAKFITzDK7whgV7QO/pYtBZQPnMMf4A+fwABrI2b</latexit>

<latexit sha1_base64="vBxbcVs2Wnfm0yi6DKhPPczIBHw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOhcI+i</latexit>

(1)
a4
(2)
<latexit sha1_base64="uxWzlquY+EeW/UpcO69SCXeIYtQ=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGhdI+i</latexit>

a4
<latexit sha1_base64="vsgJntgqeAyiGWhpRcem3fXieTw=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPu1x7TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteJdVNy7Wql+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOi+o+j</latexit>

(1)
a5 <latexit sha1_base64="NHK0ywkULzi4Jl2BlAjdO8n2Yig=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquVfRY9OKxgv2Qdi3ZNNuGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Gbqt56o0iyS92YcU1/ggWQhI9hY6QH3Lh7Tsnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwis/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK+7deal2ncWRhyM4hjJ4cAk1uIU6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QOi/o+j</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 9
BatchNorm Step 1: Normalize Net Inputs

1 X [i]
µj = zj
n i
<latexit sha1_base64="iD6slSG69XRJ/P6UaG2r7Qr6f9w=">AAACEHicbVC7TsMwFHXKq5RXgJHFokIwVQkgwYJUwcJYJPqQmhA5rtO6tZ3IdpBKlE9g4VdYGECIlZGNv8F9DNBypCsdnXOv7r0nTBhV2nG+rcLC4tLySnG1tLa+sbllb+80VJxKTOo4ZrFshUgRRgWpa6oZaSWSIB4y0gwHVyO/eU+korG41cOE+Bx1BY0oRtpIgX3o8TTowwsIvUginLl5JnLoqZQHFD4E/bssa1M/zwO77FScMeA8caekDKaoBfaX14lxyonQmCGl2q6TaD9DUlPMSF7yUkUShAeoS9qGCsSJ8rPxQzk8MEoHRrE0JTQcq78nMsSVGvLQdHKke2rWG4n/ee1UR+d+RkWSaiLwZFGUMqhjOEoHdqgkWLOhIQhLam6FuIdMMNpkWDIhuLMvz5PGccU9qbg3p+Xq5TSOItgD++AIuOAMVME1qIE6wOARPINX8GY9WS/Wu/UxaS1Y05ld8AfW5w9PA5zK</latexit>

2 1 X [i] 2
j = (zj µj )
n i
<latexit sha1_base64="Kxx1/87hMgIScSq0BO+ahkkqQWk=">AAACIXicbVBNS8NAEN34WetX1KOXxSLUgyWpgl6EohePFawKSRo22027dXcTdjdCDfkrXvwrXjwo0pv4Z9zWHrT1wcDjvRlm5kUpo0o7zqc1N7+wuLRcWimvrq1vbNpb2zcqySQmLZywRN5FSBFGBWlpqhm5SyVBPGLkNrq/GPm3D0QqmohrPUhJwFFX0JhipI0U2qe+ol2Own67Ds8g9GOJcO4WuSigrzIeUlh9NGaeezQoCngIfZ6F/YN2PbQrTs0ZA84Sd0IqYIJmaA/9ToIzToTGDCnluU6qgxxJTTEjRdnPFEkRvkdd4hkqECcqyMcfFnDfKB0YJ9KU0HCs/p7IEVdqwCPTyZHuqWlvJP7neZmOT4OcijTTROCfRXHGoE7gKC7YoZJgzQaGICypuRXiHjIhaRNq2YTgTr88S27qNfeo5l4dVxrnkzhKYBfsgSpwwQlogEvQBC2AwRN4AW/g3Xq2Xq0Pa/jTOmdNZnbAH1hf35gKoog=</latexit>

[i]
0 [i]
zj µj
zj =
<latexit sha1_base64="cTZJCSBPyPze6hHKcV7OLj0Ks3k=">AAACHnicbZDLSsQwFIZTr+N4q7p0ExxENw6tF3QjDLpxqeDoQFtLmknHjElbklQYQ5/Eja/ixoUigit9G9OxCx39IfDznXM4OX+UMSqV43xaY+MTk1PTtZn67Nz8wqK9tHwh01xg0sYpS0UnQpIwmpC2ooqRTiYI4hEjl9HNcVm/vCVC0jQ5V4OMBBz1EhpTjJRBob2n7zaKsH+ltUeDooCHEPqxQFjflbBkcAv6PA/7hfYl7XFkHAzthtN0hoJ/jVuZBqh0GtrvfjfFOSeJwgxJ6blOpgKNhKKYkaLu55JkCN+gHvGMTRAnMtDD8wq4bkgXxqkwL1FwSH9OaMSlHPDIdHKkruVorYT/1bxcxQeBpkmWK5Lg70VxzqBKYZkV7FJBsGIDYxAW1PwV4mtk0lEm0boJwR09+a+52G66O033bLfROqriqIFVsAY2gQv2QQucgFPQBhjcg0fwDF6sB+vJerXevlvHrGpmBfyS9fEFU+6img==</latexit>
j

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 10
BatchNorm Step 1: Normalize Net Inputs

1 X [i]
µj = zj
n i
<latexit sha1_base64="iD6slSG69XRJ/P6UaG2r7Qr6f9w=">AAACEHicbVC7TsMwFHXKq5RXgJHFokIwVQkgwYJUwcJYJPqQmhA5rtO6tZ3IdpBKlE9g4VdYGECIlZGNv8F9DNBypCsdnXOv7r0nTBhV2nG+rcLC4tLySnG1tLa+sbllb+80VJxKTOo4ZrFshUgRRgWpa6oZaSWSIB4y0gwHVyO/eU+korG41cOE+Bx1BY0oRtpIgX3o8TTowwsIvUginLl5JnLoqZQHFD4E/bssa1M/zwO77FScMeA8caekDKaoBfaX14lxyonQmCGl2q6TaD9DUlPMSF7yUkUShAeoS9qGCsSJ8rPxQzk8MEoHRrE0JTQcq78nMsSVGvLQdHKke2rWG4n/ee1UR+d+RkWSaiLwZFGUMqhjOEoHdqgkWLOhIQhLam6FuIdMMNpkWDIhuLMvz5PGccU9qbg3p+Xq5TSOItgD++AIuOAMVME1qIE6wOARPINX8GY9WS/Wu/UxaS1Y05ld8AfW5w9PA5zK</latexit>

2 1 X [i] 2
j = (zj µj )
n i
<latexit sha1_base64="Kxx1/87hMgIScSq0BO+ahkkqQWk=">AAACIXicbVBNS8NAEN34WetX1KOXxSLUgyWpgl6EohePFawKSRo22027dXcTdjdCDfkrXvwrXjwo0pv4Z9zWHrT1wcDjvRlm5kUpo0o7zqc1N7+wuLRcWimvrq1vbNpb2zcqySQmLZywRN5FSBFGBWlpqhm5SyVBPGLkNrq/GPm3D0QqmohrPUhJwFFX0JhipI0U2qe+ol2Own67Ds8g9GOJcO4WuSigrzIeUlh9NGaeezQoCngIfZ6F/YN2PbQrTs0ZA84Sd0IqYIJmaA/9ToIzToTGDCnluU6qgxxJTTEjRdnPFEkRvkdd4hkqECcqyMcfFnDfKB0YJ9KU0HCs/p7IEVdqwCPTyZHuqWlvJP7neZmOT4OcijTTROCfRXHGoE7gKC7YoZJgzQaGICypuRXiHjIhaRNq2YTgTr88S27qNfeo5l4dVxrnkzhKYBfsgSpwwQlogEvQBC2AwRN4AW/g3Xq2Xq0Pa/jTOmdNZnbAH1hf35gKoog=</latexit>

[i]
0 [i]
zj µj In practice:
zj =
[i]
<latexit sha1_base64="cTZJCSBPyPze6hHKcV7OLj0Ks3k=">AAACHnicbZDLSsQwFIZTr+N4q7p0ExxENw6tF3QjDLpxqeDoQFtLmknHjElbklQYQ5/Eja/ixoUigit9G9OxCx39IfDznXM4OX+UMSqV43xaY+MTk1PTtZn67Nz8wqK9tHwh01xg0sYpS0UnQpIwmpC2ooqRTiYI4hEjl9HNcVm/vCVC0jQ5V4OMBBz1EhpTjJRBob2n7zaKsH+ltUeDooCHEPqxQFjflbBkcAv6PA/7hfYl7XFkHAzthtN0hoJ/jVuZBqh0GtrvfjfFOSeJwgxJ6blOpgKNhKKYkaLu55JkCN+gHvGMTRAnMtDD8wq4bkgXxqkwL1FwSH9OaMSlHPDIdHKkruVorYT/1bxcxQeBpkmWK5Lg70VxzqBKYZkV7FJBsGIDYxAW1PwV4mtk0lEm0boJwR09+a+52G66O033bLfROqriqIFVsAY2gQv2QQucgFPQBhjcg0fwDF6sB+vJerXevlvHrGpmBfyS9fEFU+6img==</latexit>
j
0 [i]
zj µj
zj =q
2 +✏
j
<latexit sha1_base64="P+U9nRxp0b3JNLBqX3XzXT4h1O0=">AAACNHicbVDNSgMxGMzW//pX9eglWERBLLsq6EUQvQheFKwWuuuSTbNtapJdk6xQwz6UFx/EiwgeFPHqM5itPah1IGSY+YbkmyhlVGnXfXZKI6Nj4xOTU+Xpmdm5+crC4oVKMolJHScskY0IKcKoIHVNNSONVBLEI0Yuo+ujwr+8JVLRRJzrXkoCjtqCxhQjbaWwcmLu1vKwe2VMkwZ5Dvch9GOJsLkrxEKDm9DnWdjNDfTVjdTFRdscWX8LbkCfpIqyRNhsWKm6NbcPOEy8AamCAU7DyqPfSnDGidCYIaWanpvqwCCpKWYkL/uZIinC16hNmpYKxIkKTH/pHK5apQXjRNojNOyrPxMGcaV6PLKTHOmO+usV4n9eM9PxXmCoSDNNBP5+KM4Y1AksGoQtKgnWrGcJwpLav0LcQbYzbXsu2xK8vysPk4utmrdd8852qgeHgzomwTJYAevAA7vgAByDU1AHGNyDJ/AK3pwH58V5dz6+R0vOILMEfsH5/AIzMasl</latexit>

For numerical stability, where epsilon


is a small number like 1E-5

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 11
BatchNorm Step 2: Pre-Activation Scaling
[i]
0 [i]
zj µj
zj =q
2 +✏
j
<latexit sha1_base64="P+U9nRxp0b3JNLBqX3XzXT4h1O0=">AAACNHicbVDNSgMxGMzW//pX9eglWERBLLsq6EUQvQheFKwWuuuSTbNtapJdk6xQwz6UFx/EiwgeFPHqM5itPah1IGSY+YbkmyhlVGnXfXZKI6Nj4xOTU+Xpmdm5+crC4oVKMolJHScskY0IKcKoIHVNNSONVBLEI0Yuo+ujwr+8JVLRRJzrXkoCjtqCxhQjbaWwcmLu1vKwe2VMkwZ5Dvch9GOJsLkrxEKDm9DnWdjNDfTVjdTFRdscWX8LbkCfpIqyRNhsWKm6NbcPOEy8AamCAU7DyqPfSnDGidCYIaWanpvqwCCpKWYkL/uZIinC16hNmpYKxIkKTH/pHK5apQXjRNojNOyrPxMGcaV6PLKTHOmO+usV4n9eM9PxXmCoSDNNBP5+KM4Y1AksGoQtKgnWrGcJwpLav0LcQbYzbXsu2xK8vysPk4utmrdd8852qgeHgzomwTJYAevAA7vgAByDU1AHGNyDJ/AK3pwH58V5dz6+R0vOILMEfsH5/AIzMasl</latexit>

0 [i] 0 [i]
aj
<latexit sha1_base64="31SjWZyW5ZbmGL4Hkf1W/oFRZJA=">AAACInicbVDLSsNAFJ34tr6qLt0MFlEQSqKCuhCKblxWsCo0MdxMp3XqTBJmboQa8i1u/BU3LhR1JfgxTmsXvg4MHM45lzv3RKkUBl333RkZHRufmJyaLs3Mzs0vlBeXzkySacYbLJGJvojAcCli3kCBkl+kmoOKJD+Pro/6/vkN10Yk8Sn2Uh4o6MSiLRiglcLyfg7rRdi9zPOmCIqCHlDqd0ApCLvUZ60EaX77I7BJ/YijtcNyxa26A9C/xBuSChmiHpZf/VbCMsVjZBKMaXpuikEOGgWTvCj5meEpsGvo8KalMShugnxwYkHXrNKi7UTbFyMdqN8nclDG9FRkkwrwyvz2+uJ/XjPD9l6QizjNkMfsa1E7kxQT2u+LtoTmDGXPEmBa2L9SdgUaGNpWS7YE7/fJf8nZVtXbrnonO5Xa4bCOKbJCVskG8cguqZFjUicNwsgdeSBP5Nm5dx6dF+ftKzriDGeWyQ84H5+IfaOo</latexit>
= j · zj + j

These are learnable parameters

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 12
BatchNorm Step 2: Pre-Activation Scaling
[i]
0 [i]
zj µj
zj =
<latexit sha1_base64="cTZJCSBPyPze6hHKcV7OLj0Ks3k=">AAACHnicbZDLSsQwFIZTr+N4q7p0ExxENw6tF3QjDLpxqeDoQFtLmknHjElbklQYQ5/Eja/ixoUigit9G9OxCx39IfDznXM4OX+UMSqV43xaY+MTk1PTtZn67Nz8wqK9tHwh01xg0sYpS0UnQpIwmpC2ooqRTiYI4hEjl9HNcVm/vCVC0jQ5V4OMBBz1EhpTjJRBob2n7zaKsH+ltUeDooCHEPqxQFjflbBkcAv6PA/7hfYl7XFkHAzthtN0hoJ/jVuZBqh0GtrvfjfFOSeJwgxJ6blOpgKNhKKYkaLu55JkCN+gHvGMTRAnMtDD8wq4bkgXxqkwL1FwSH9OaMSlHPDIdHKkruVorYT/1bxcxQeBpkmWK5Lg70VxzqBKYZkV7FJBsGIDYxAW1PwV4mtk0lEm0boJwR09+a+52G66O033bLfROqriqIFVsAY2gQv2QQucgFPQBhjcg0fwDF6sB+vJerXevlvHrGpmBfyS9fEFU+6img==</latexit>
j

0 [i] 0 [i]
aj
<latexit sha1_base64="31SjWZyW5ZbmGL4Hkf1W/oFRZJA=">AAACInicbVDLSsNAFJ34tr6qLt0MFlEQSqKCuhCKblxWsCo0MdxMp3XqTBJmboQa8i1u/BU3LhR1JfgxTmsXvg4MHM45lzv3RKkUBl333RkZHRufmJyaLs3Mzs0vlBeXzkySacYbLJGJvojAcCli3kCBkl+kmoOKJD+Pro/6/vkN10Yk8Sn2Uh4o6MSiLRiglcLyfg7rRdi9zPOmCIqCHlDqd0ApCLvUZ60EaX77I7BJ/YijtcNyxa26A9C/xBuSChmiHpZf/VbCMsVjZBKMaXpuikEOGgWTvCj5meEpsGvo8KalMShugnxwYkHXrNKi7UTbFyMdqN8nclDG9FRkkwrwyvz2+uJ/XjPD9l6QizjNkMfsa1E7kxQT2u+LtoTmDGXPEmBa2L9SdgUaGNpWS7YE7/fJf8nZVtXbrnonO5Xa4bCOKbJCVskG8cguqZFjUicNwsgdeSBP5Nm5dx6dF+ftKzriDGeWyQ84H5+IfaOo</latexit>
= j · zj + j

Controls the mean


Controls the spread or scale
Technically, a BatchNorm layer could learn to perform
"standardization" with zero mean and unit variance

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 14
BatchNorm Step 1 & 2 Summarized
[i]
0 [i]
zj µj 0 [i] 0 [i]
zj = aj = j · zj + j
j
<latexit sha1_base64="31SjWZyW5ZbmGL4Hkf1W/oFRZJA=">AAACInicbVDLSsNAFJ34tr6qLt0MFlEQSqKCuhCKblxWsCo0MdxMp3XqTBJmboQa8i1u/BU3LhR1JfgxTmsXvg4MHM45lzv3RKkUBl333RkZHRufmJyaLs3Mzs0vlBeXzkySacYbLJGJvojAcCli3kCBkl+kmoOKJD+Pro/6/vkN10Yk8Sn2Uh4o6MSiLRiglcLyfg7rRdi9zPOmCIqCHlDqd0ApCLvUZ60EaX77I7BJ/YijtcNyxa26A9C/xBuSChmiHpZf/VbCMsVjZBKMaXpuikEOGgWTvCj5meEpsGvo8KalMShugnxwYkHXrNKi7UTbFyMdqN8nclDG9FRkkwrwyvz2+uJ/XjPD9l6QizjNkMfsa1E7kxQT2u+LtoTmDGXPEmBa2L9SdgUaGNpWS7YE7/fJf8nZVtXbrnonO5Xa4bCOKbJCVskG8cguqZFjUicNwsgdeSBP5Nm5dx6dF+ftKzriDGeWyQ84H5+IfaOo</latexit>

<latexit sha1_base64="cTZJCSBPyPze6hHKcV7OLj0Ks3k=">AAACHnicbZDLSsQwFIZTr+N4q7p0ExxENw6tF3QjDLpxqeDoQFtLmknHjElbklQYQ5/Eja/ixoUigit9G9OxCx39IfDznXM4OX+UMSqV43xaY+MTk1PTtZn67Nz8wqK9tHwh01xg0sYpS0UnQpIwmpC2ooqRTiYI4hEjl9HNcVm/vCVC0jQ5V4OMBBz1EhpTjJRBob2n7zaKsH+ltUeDooCHEPqxQFjflbBkcAv6PA/7hfYl7XFkHAzthtN0hoJ/jVuZBqh0GtrvfjfFOSeJwgxJ6blOpgKNhKKYkaLu55JkCN+gHvGMTRAnMtDD8wq4bkgXxqkwL1FwSH9OaMSlHPDIdHKkruVorYT/1bxcxQeBpkmWK5Lg70VxzqBKYZkV7FJBsGIDYxAW1PwV4mtk0lEm0boJwR09+a+52G66O033bLfROqriqIFVsAY2gQv2QQucgFPQBhjcg0fwDF6sB+vJerXevlvHrGpmBfyS9fEFU+6img==</latexit>

x1 (1) 0 (1) 0 (1) (1)


<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

z1
<latexit sha1_base64="Rx4MmlIzlMkTGZCiDYRfhhgcRVA=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mooMeiF48V7Ie0tWTTbBuaZJckK9Slv8KLB0W8+nO8+W9M2z1o64OBx3szzMwLYsGN9f1vb2l5ZXVtPbeR39za3tkt7O3XTZRoymo0EpFuBsQwwRWrWW4Fa8aaERkI1giG1xO/8ci04ZG6s6OYdSTpKx5ySqyT7p+6+CEt4ZNxt1D0y/4UaJHgjBQhQ7Vb+Gr3IppIpiwVxJgW9mPbSYm2nAo2zrcTw2JCh6TPWo4qIpnppNODx+jYKT0URtqVsmiq/p5IiTRmJAPXKYkdmHlvIv7ntRIbXnZSruLEMkVni8JEIBuhyfeoxzWjVowcIVRzdyuiA6IJtS6jvAsBz7+8SOqnZXxWxrfnxcpVFkcODuEISoDhAipwA1WoAQUJz/AKb572Xrx372PWuuRlMwfwB97nD8PUj7k=</latexit>
z1 <latexit sha1_base64="rGcA4ISjDHCLet3FrAdgewQxTEQ=">AAAB83icbVBNSwMxEJ31s9avqkcvwSLWS9mooMeiF48V7Ae0a8mm2TY0m12SrFCX/RtePCji1T/jzX9j2u5BWx8MPN6bYWaeHwuujet+O0vLK6tr64WN4ubW9s5uaW+/qaNEUdagkYhU2yeaCS5Zw3AjWDtWjIS+YC1/dDPxW49MaR7JezOOmReSgeQBp8RYqZs+nWQ9/JBW8GnWK5XdqjsFWiQ4J2XIUe+Vvrr9iCYhk4YKonUHu7HxUqIMp4JlxW6iWUzoiAxYx1JJQqa9dHpzho6t0kdBpGxJg6bq74mUhFqPQ992hsQM9bw3Ef/zOokJrryUyzgxTNLZoiARyERoEgDqc8WoEWNLCFXc3orokChCjY2paEPA8y8vkuZZFZ9X8d1FuXadx1GAQziCCmC4hBrcQh0aQCGGZ3iFNydxXpx352PWuuTkMwfwB87nD/FUkPY=</latexit>
a1 <latexit sha1_base64="a8K0S67W8s7/fnpMEqnG3+G6IX4=">AAAB83icbVBNSwMxEJ2tX7V+VT16CRaxXspGBT0WvXisYD+gXUs2zbah2eySZIWy7N/w4kERr/4Zb/4b03YP2vpg4PHeDDPz/FhwbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRVmTRiJSHZ9oJrhkTcONYJ1YMRL6grX98e3Ubz8xpXkkH8wkZl5IhpIHnBJjpV5KTrM+fkyr+CzrlytuzZ0BLROckwrkaPTLX71BRJOQSUMF0bqL3dh4KVGGU8GyUi/RLCZ0TIasa6kkIdNeOrs5QydWGaAgUrakQTP190RKQq0noW87Q2JGetGbiv953cQE117KZZwYJul8UZAIZCI0DQANuGLUiIklhCpub0V0RBShxsZUsiHgxZeXSeu8hi9q+P6yUr/J4yjCERxDFTBcQR3uoAFNoBDDM7zCm5M4L8678zFvLTj5zCH8gfP5A8p2kN0=</latexit>
a1
<latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>
first hidden layer

(2) 0 (2) 0 (2) (2)


z1
<latexit sha1_base64="1DVaoxm3QRMSqTsntv9ubd0y06E=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNuGJtklyQp16a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqtx6p0iySd2YcU1/ggWQhI9hY6f6p5z2k5erppFcsuRV3BrRMvIyUIEO9V/zq9iOSCCoN4VjrjufGxk+xMoxwOil0E01jTEZ4QDuWSiyo9tPZwRN0YpU+CiNlSxo0U39PpFhoPRaB7RTYDPWiNxX/8zqJCS/9lMk4MVSS+aIw4chEaPo96jNFieFjSzBRzN6KyBArTIzNqGBD8BZfXibNasU7q3i356XaVRZHHo7gGMrgwQXU4Abq0AACAp7hFd4c5bw4787HvDXnZDOH8AfO5w/FWo+6</latexit>
z1
<latexit sha1_base64="4GOHrLf7ihHsVS1lX0fTil/t05g=">AAAB83icbVBNS8NAEJ3Ur1q/qh69LBaxXkpSBT0WvXisYD+gjWWz3bRLN5uwuxFqyN/w4kERr/4Zb/4bN20O2vpg4PHeDDPzvIgzpW372yqsrK6tbxQ3S1vbO7t75f2DtgpjSWiLhDyUXQ8rypmgLc00p91IUhx4nHa8yU3mdx6pVCwU93oaUTfAI8F8RrA2Uj95Ok0HzkNSrZ+lg3LFrtkzoGXi5KQCOZqD8ld/GJI4oEITjpXqOXak3QRLzQinaakfKxphMsEj2jNU4IAqN5ndnKITowyRH0pTQqOZ+nsiwYFS08AznQHWY7XoZeJ/Xi/W/pWbMBHFmgoyX+THHOkQZQGgIZOUaD41BBPJzK2IjLHERJuYSiYEZ/HlZdKu15zzmnN3UWlc53EU4QiOoQoOXEIDbqEJLSAQwTO8wpsVWy/Wu/Uxby1Y+cwh/IH1+QPy2pD3</latexit>
a1
<latexit sha1_base64="gXjqclkA2OSFQ7aG3JVcNYMUGy0=">AAAB83icbVBNSwMxEJ31s9avqkcvwSLWS9mtgh6LXjxWsB/QriWbZtvQbDYkWaEs+ze8eFDEq3/Gm//GtN2Dtj4YeLw3w8y8QHKmjet+Oyura+sbm4Wt4vbO7t5+6eCwpeNEEdokMY9VJ8CaciZo0zDDaUcqiqOA03Ywvp367SeqNIvFg5lI6kd4KFjICDZW6qX4LOt7j2mldp71S2W36s6AlomXkzLkaPRLX71BTJKICkM41rrrudL4KVaGEU6zYi/RVGIyxkPatVTgiGo/nd2coVOrDFAYK1vCoJn6eyLFkdaTKLCdETYjvehNxf+8bmLCaz9lQiaGCjJfFCYcmRhNA0ADpigxfGIJJorZWxEZYYWJsTEVbQje4svLpFWrehdV7/6yXL/J4yjAMZxABTy4gjrcQQOaQEDCM7zCm5M4L8678zFvXXHymSP4A+fzB8v8kN4=</latexit>
a1
<latexit sha1_base64="dlZC39I2dgHPhMoyPismxy5ZjqY=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFe/uolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGero+h</latexit>
second hidden layer

...

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 15
Backpropagation for BatchNorm
Parameters

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 17
Let's consider a simpler case ...

y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(1) (2) (3)


wj (1) wj (2) wj
L(y, o) = l
x1 o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="F7Swd/C4QN3QVLL72remZ0dqisQ=">AAAB8nicbVDLSgNBEJz1GeMr6tHLYBDiJeyqoMegF48RzAM2a5idzCZjZmeWmV4lLPsZXjwo4tWv8ebfOHkcNLGgoajqprsrTAQ34LrfztLyyuraemGjuLm1vbNb2ttvGpVqyhpUCaXbITFMcMkawEGwdqIZiUPBWuHweuy3Hpk2XMk7GCUsiElf8ohTAlbyn7rZQ36fVbyTvFsqu1V3ArxIvBkpoxnq3dJXp6doGjMJVBBjfM9NIMiIBk4Fy4ud1LCE0CHpM99SSWJmgmxyco6PrdLDkdK2JOCJ+nsiI7Exozi0nTGBgZn3xuJ/np9CdBlkXCYpMEmni6JUYFB4/D/ucc0oiJElhGpub8V0QDShYFMq2hC8+ZcXSfO06p1Vvdvzcu1qFkcBHaIjVEEeukA1dIPqqIEoUugZvaI3B5wX5935mLYuObOZA/QHzucP4a6Q+w==</latexit>

aj
<latexit sha1_base64="yiocqbqzfqGkxQidQIigucf7A/c=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/Ssnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwks/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK97teal2lcWRhyM4hjJ4cAE1uIE6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QP04o/Z</latexit>
<latexit sha1_base64="a7DEW9TA7F3/3elNIcTaVl6Dj6g=">AAAB8nicbVBNS8NAEN3Ur1q/qh69BItQLyWpgh6LXjxWsB+QxrLZbtq1m92wO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzgpgzDY7zbRVWVtfWN4qbpa3tnd298v5BW8tEEdoikkvVDbCmnAnaAgacdmNFcRRw2gnG11O/80iVZlLcwSSmfoSHgoWMYDCS99RPH7L7tFo/zfrlilNzZrCXiZuTCsrR7Je/egNJkogKIBxr7blODH6KFTDCaVbqJZrGmIzxkHqGChxR7aezkzP7xCgDO5TKlAB7pv6eSHGk9SQKTGeEYaQXvan4n+clEF76KRNxAlSQ+aIw4TZIe/q/PWCKEuATQzBRzNxqkxFWmIBJqWRCcBdfXibtes09q7m355XGVR5HER2hY1RFLrpADXSDmqiFCJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+M0kPw=</latexit>

aj
<latexit sha1_base64="oFSf5mIvKefo1ut2K84bI/HuXwI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/ScvV00iuW3Io7A1omXkZKkKHeK351+xFJBJWGcKx1x3Nj46dYGUY4nRS6iaYxJiM8oB1LJRZU++ns4Ak6sUofhZGyJQ2aqb8nUiy0HovAdgpshnrRm4r/eZ3EhJd+ymScGCrJfFGYcGQiNP0e9ZmixPCxJZgoZm9FZIgVJsZmVLAheIsvL5NmteKdVbzb81LtKosjD0dwDGXw4AJqcAN1aAABAc/wCm+Ocl6cd+dj3ppzsplD+APn8wf2aI/a</latexit>
<latexit sha1_base64="9JMSM66KNyCTK9VIv0nIhGcw/P8=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBahXkpiBT0WvXisYD8gjWWz3bRrN9mwO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPz/FhwDbb9bRVWVtfWN4qbpa3tnd298v5BW8tEUdaiUkjV9YlmgkesBRwE68aKkdAXrOOPr6d+55EpzWV0B5OYeSEZRjzglICR3Kd++pDdp9X6adYvV+yaPQNeJk5OKihHs1/+6g0kTUIWARVEa9exY/BSooBTwbJSL9EsJnRMhsw1NCIh0146OznDJ0YZ4EAqUxHgmfp7IiWh1pPQN50hgZFe9Kbif56bQHDppTyKE2ARnS8KEoFB4un/eMAVoyAmhhCquLkV0xFRhIJJqWRCcBZfXibts5pTrzm355XGVR5HER2hY1RFDrpADXSDmqiFKJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+S6kP0=</latexit>

<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

@l @l @o
(3)
= ·
@wj @o @w(3)
<latexit sha1_base64="4bMSLhq2g4BUKinKV0nQn3Lzp8I=">AAACWnicfVFbS8MwGE2r7uplXt58CQ5hvozWCfoiDH3xcYK7wFpHmqVbXNrUJFVG6Z/0RQT/imC2FdRteCBwON/5vi858SJGpbKsD8Pc2NzK5QvFUnl7Z3evsn/QkTwWmLQxZ1z0PCQJoyFpK6oY6UWCoMBjpOtNbmf17gsRkvLwQU0j4gZoFFKfYqS0NKg8O75AOHEiJBRFDLL0h78Onh6TWuMsTeE1/MfHU+jgIVfLHr521qBSterWHHCV2BmpggytQeXNGXIcByRUmCEp+7YVKTeZDcaMpCUnliRCeIJGpK9piAIi3WQeTQpPtTKEPhf6hArO1d8dCQqknAaedgZIjeVybSauq/Vj5V+5CQ2jWJEQLxb5MYOKw1nOcEgFwYpNNUFYUH1XiMdI56P0b5R0CPbyk1dJ57xuN+r2/UW1eZPFUQDH4ATUgA0uQRPcgRZoAwzewZeRM/LGp2maRbO8sJpG1nMI/sA8+gaxY7X5</latexit>
j

(2)
@l @l @o @aj
(2)
= · (2) · (2)
@wj @o @a @wj
<latexit sha1_base64="g0Auodaw4LI+T5IzOKYJUIsl0jw=">AAACjnicfVFdT8IwFO3mF4IfUx99aSQm+EI2NWo0RKIvPGIiHwlD0pUOKt26tJ2GLPs5/iHf/DcWWAIC8SZNTs45vbc914sYlcq2fwxzY3Nreye3my/s7R8cWkfHTcljgUkDc8ZF20OSMBqShqKKkXYkCAo8Rlre6Hmitz6IkJSHr2ockW6ABiH1KUZKUz3ry/UFwokbIaEoYpClc/zZe39LSpcXaQor8B8fT6GL+1wte/iCB817rfXO9XXze1bRLtvTgqvAyUARZFXvWd9un+M4IKHCDEnZcexIdZNJY8xImndjSSKER2hAOhqGKCCym0zjTOG5ZvrQ50KfUMEpu3gjQYGU48DTzgCpoVzWJuQ6rRMr/66b0DCKFQnxbJAfM6g4nOwG9qkgWLGxBggLqt8K8RDpnJTeYF6H4Cx/eRU0L8vOVdl5uS5Wn7I4cuAUnIEScMAtqIIaqIMGwEbBcIx748G0zBuzYj7OrKaR3TkBf8qs/QJWbcgI</latexit>
j

(2) (1)
@l @l @o @aj @aj
(1)
= · (2) · (1)
· (1)
@wj @o @a @aj @wj
<latexit sha1_base64="6udM3tazQW1+8dioSi0gAe9yMoo=">AAACwnichVFbT8IwFO7mDfGG+uhLIzHRF7Kiib4YifrgoyaCJAxJVzopdO1sOw2Z+5O+8W/skAQFoidp8uW7nLbnBDFn2njeyHGXlldW1wrrxY3Nre2d0u5eQ8tEEVonkkvVDLCmnAlaN8xw2owVxVHA6VMwuMn1pzeqNJPi0Qxj2o7wi2AhI9hYqlMa+aHCJPVjrAzDHPJsit87/ef0GJ1kGbyEf/hkBn3SlWbWI3948LhXNe+10DvVZzn0Twb9ykzf3CmVvYo3LjgP0ASUwaTuO6VPvytJElFhCMdat5AXm3aaNyacZkU/0TTGZIBfaMtCgSOq2+l4BRk8skwXhlLZIwwcsz8TKY60HkaBdUbY9PSslpOLtFZiwot2ykScGCrI90VhwqGRMN8n7DJFieFDCzBRzL4Vkh62czJ260U7BDT75XnQqFbQaQU9nJVr15NxFMABOATHAIFzUAN34B7UAXGuHOoIR7q3bt99dfW31XUmmX3wq9yPL2zH3RI=</latexit>
j

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 19
(previously, we didn't write the
net input explicitly in the comp.
graph)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

(1) (2) (3)


wj (1) wj (2)
(·)
(2) w j L(y, o) = l
x1 ...
<latexit sha1_base64="DSGzTYKR+NJYbAJ1JhYvLKBcRTs=">AAAB9HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5AdynZbLYNTbJrki2U0t/hxYMiXv0x3vw3pu0etPXBwOO9GWbmhSln2rjut1NYW9/Y3Cpul3Z29/YPyodHLZ1kitAmSXiiOiHWlDNJm4YZTjupoliEnLbD4d3Mb4+o0iyRj2ac0kDgvmQxI9hYKfA16wtc9UmUmPNeueLW3DnQKvFyUoEcjV75y48SkgkqDeFY667npiaYYGUY4XRa8jNNU0yGuE+7lkosqA4m86On6MwqEYoTZUsaNFd/T0yw0HosQtspsBnoZW8m/ud1MxPfBBMm08xQSRaL4owjk6BZAihiihLDx5Zgopi9FZEBVpgYm1PJhuAtv7xKWhc177LmPVxV6rd5HEU4gVOoggfXUId7aEATCDzBM7zCmzNyXpx352PRWnDymWP4A+fzB0PvkcM=</latexit>

aj aj
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="F7Swd/C4QN3QVLL72remZ0dqisQ=">AAAB8nicbVDLSgNBEJz1GeMr6tHLYBDiJeyqoMegF48RzAM2a5idzCZjZmeWmV4lLPsZXjwo4tWv8ebfOHkcNLGgoajqprsrTAQ34LrfztLyyuraemGjuLm1vbNb2ttvGpVqyhpUCaXbITFMcMkawEGwdqIZiUPBWuHweuy3Hpk2XMk7GCUsiElf8ohTAlbyn7rZQ36fVbyTvFsqu1V3ArxIvBkpoxnq3dJXp6doGjMJVBBjfM9NIMiIBk4Fy4ud1LCE0CHpM99SSWJmgmxyco6PrdLDkdK2JOCJ+nsiI7Exozi0nTGBgZn3xuJ/np9CdBlkXCYpMEmni6JUYFB4/D/ucc0oiJElhGpub8V0QDShYFMq2hC8+ZcXSfO06p1Vvdvzcu1qFkcBHaIjVEEeukA1dIPqqIEoUugZvaI3B5wX5935mLYuObOZA/QHzucP4a6Q+w==</latexit>

<latexit sha1_base64="yiocqbqzfqGkxQidQIigucf7A/c=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/Ssnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwks/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK97teal2lcWRhyM4hjJ4cAE1uIE6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QP04o/Z</latexit>
<latexit sha1_base64="a7DEW9TA7F3/3elNIcTaVl6Dj6g=">AAAB8nicbVBNS8NAEN3Ur1q/qh69BItQLyWpgh6LXjxWsB+QxrLZbtq1m92wO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzgpgzDY7zbRVWVtfWN4qbpa3tnd298v5BW8tEEdoikkvVDbCmnAnaAgacdmNFcRRw2gnG11O/80iVZlLcwSSmfoSHgoWMYDCS99RPH7L7tFo/zfrlilNzZrCXiZuTCsrR7Je/egNJkogKIBxr7blODH6KFTDCaVbqJZrGmIzxkHqGChxR7aezkzP7xCgDO5TKlAB7pv6eSHGk9SQKTGeEYaQXvan4n+clEF76KRNxAlSQ+aIw4TZIe/q/PWCKEuATQzBRzNxqkxFWmIBJqWRCcBdfXibtes09q7m355XGVR5HER2hY1RFLrpADXSDmqiFCJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+M0kPw=</latexit>

zj
<latexit sha1_base64="/U5BI6r2ubz7lzZTHbYvf7aPySc=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgh6LXjxWsB/SriWbZtvYJLskWaEu/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEHOmjet+O0vLK6tr67mN/ObW9s5uYW+/oaNEEVonEY9UK8CaciZp3TDDaStWFIuA02YwvJr4zUeqNIvkrRnF1Be4L1nICDZWunvqPtynpcrJuFsoumV3CrRIvIwUIUOtW/jq9CKSCCoN4VjrtufGxk+xMoxwOs53Ek1jTIa4T9uWSiyo9tPpwWN0bJUeCiNlSxo0VX9PpFhoPRKB7RTYDPS8NxH/89qJCS/8lMk4MVSS2aIw4chEaPI96jFFieEjSzBRzN6KyAArTIzNKG9D8OZfXiSNStk7LXs3Z8XqZRZHDg7hCErgwTlU4RpqUAcCAp7hFd4c5bw4787HrHXJyWYO4A+czx8dI4/z</latexit>
<latexit sha1_base64="oFSf5mIvKefo1ut2K84bI/HuXwI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/ScvV00iuW3Io7A1omXkZKkKHeK351+xFJBJWGcKx1x3Nj46dYGUY4nRS6iaYxJiM8oB1LJRZU++ns4Ak6sUofhZGyJQ2aqb8nUiy0HovAdgpshnrRm4r/eZ3EhJd+ymScGCrJfFGYcGQiNP0e9ZmixPCxJZgoZm9FZIgVJsZmVLAheIsvL5NmteKdVbzb81LtKosjD0dwDGXw4AJqcAN1aAABAc/wCm+Ocl6cd+dj3ppzsplD+APn8wf2aI/a</latexit>
<latexit sha1_base64="9JMSM66KNyCTK9VIv0nIhGcw/P8=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBahXkpiBT0WvXisYD8gjWWz3bRrN9mwO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPz/FhwDbb9bRVWVtfWN4qbpa3tnd298v5BW8tEUdaiUkjV9YlmgkesBRwE68aKkdAXrOOPr6d+55EpzWV0B5OYeSEZRjzglICR3Kd++pDdp9X6adYvV+yaPQNeJk5OKihHs1/+6g0kTUIWARVEa9exY/BSooBTwbJSL9EsJnRMhsw1NCIh0146OznDJ0YZ4EAqUxHgmfp7IiWh1pPQN50hgZFe9Kbif56bQHDppTyKE2ARnS8KEoFB4un/eMAVoyAmhhCquLkV0xFRhIJJqWRCcBZfXibts5pTrzm355XGVR5HER2hY1RFDrpADXSDmqiFKJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+S6kP0=</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

Adding a
BatchNorm layer ...

(1) (2) (3)


wj (1) wj (2) 0 (2) 0 (2)
(·) (2) wj
x1 aj zj aj aj ...
<latexit sha1_base64="DSGzTYKR+NJYbAJ1JhYvLKBcRTs=">AAAB9HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5AdynZbLYNTbJrki2U0t/hxYMiXv0x3vw3pu0etPXBwOO9GWbmhSln2rjut1NYW9/Y3Cpul3Z29/YPyodHLZ1kitAmSXiiOiHWlDNJm4YZTjupoliEnLbD4d3Mb4+o0iyRj2ac0kDgvmQxI9hYKfA16wtc9UmUmPNeueLW3DnQKvFyUoEcjV75y48SkgkqDeFY667npiaYYGUY4XRa8jNNU0yGuE+7lkosqA4m86On6MwqEYoTZUsaNFd/T0yw0HosQtspsBnoZW8m/ud1MxPfBBMm08xQSRaL4owjk6BZAihiihLDx5Zgopi9FZEBVpgYm1PJhuAtv7xKWhc177LmPVxV6rd5HEU4gVOoggfXUId7aEATCDzBM7zCmzNyXpx352PRWnDymWP4A+fzB0PvkcM=</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="F7Swd/C4QN3QVLL72remZ0dqisQ=">AAAB8nicbVDLSgNBEJz1GeMr6tHLYBDiJeyqoMegF48RzAM2a5idzCZjZmeWmV4lLPsZXjwo4tWv8ebfOHkcNLGgoajqprsrTAQ34LrfztLyyuraemGjuLm1vbNb2ttvGpVqyhpUCaXbITFMcMkawEGwdqIZiUPBWuHweuy3Hpk2XMk7GCUsiElf8ohTAlbyn7rZQ36fVbyTvFsqu1V3ArxIvBkpoxnq3dJXp6doGjMJVBBjfM9NIMiIBk4Fy4ud1LCE0CHpM99SSWJmgmxyco6PrdLDkdK2JOCJ+nsiI7Exozi0nTGBgZn3xuJ/np9CdBlkXCYpMEmni6JUYFB4/D/ucc0oiJElhGpub8V0QDShYFMq2hC8+ZcXSfO06p1Vvdvzcu1qFkcBHaIjVEEeukA1dIPqqIEoUugZvaI3B5wX5935mLYuObOZA/QHzucP4a6Q+w==</latexit>

<latexit sha1_base64="yiocqbqzfqGkxQidQIigucf7A/c=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/Ssnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwks/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK97teal2lcWRhyM4hjJ4cAE1uIE6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QP04o/Z</latexit>
<latexit sha1_base64="a7DEW9TA7F3/3elNIcTaVl6Dj6g=">AAAB8nicbVBNS8NAEN3Ur1q/qh69BItQLyWpgh6LXjxWsB+QxrLZbtq1m92wO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzgpgzDY7zbRVWVtfWN4qbpa3tnd298v5BW8tEEdoikkvVDbCmnAnaAgacdmNFcRRw2gnG11O/80iVZlLcwSSmfoSHgoWMYDCS99RPH7L7tFo/zfrlilNzZrCXiZuTCsrR7Je/egNJkogKIBxr7blODH6KFTDCaVbqJZrGmIzxkHqGChxR7aezkzP7xCgDO5TKlAB7pv6eSHGk9SQKTGeEYaQXvan4n+clEF76KRNxAlSQ+aIw4TZIe/q/PWCKEuATQzBRzNxqkxFWmIBJqWRCcBdfXibtes09q7m355XGVR5HER2hY1RFLrpADXSDmqiFCJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+M0kPw=</latexit>

zj
<latexit sha1_base64="/U5BI6r2ubz7lzZTHbYvf7aPySc=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgh6LXjxWsB/SriWbZtvYJLskWaEu/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEHOmjet+O0vLK6tr67mN/ObW9s5uYW+/oaNEEVonEY9UK8CaciZp3TDDaStWFIuA02YwvJr4zUeqNIvkrRnF1Be4L1nICDZWunvqPtynpcrJuFsoumV3CrRIvIwUIUOtW/jq9CKSCCoN4VjrtufGxk+xMoxwOs53Ek1jTIa4T9uWSiyo9tPpwWN0bJUeCiNlSxo0VX9PpFhoPRKB7RTYDPS8NxH/89qJCS/8lMk4MVSS2aIw4chEaPI96jFFieEjSzBRzN6KyAArTIzNKG9D8OZfXiSNStk7LXs3Z8XqZRZHDg7hCErgwTlU4RpqUAcCAp7hFd4c5bw4787HrHXJyWYO4A+czx8dI4/z</latexit>
<latexit sha1_base64="MjYPTcWhFsUmb5oT0mVBLyugRyU=">AAAB+XicbVDLTsMwEHTKq5RXgCMXiwpRLlVSkOBYwYVjkehDakPkuE5r6jiR7VQqVv6ECwcQ4sqfcONvcNscoGWklUYzu9rdCRJGpXKcb6uwsrq2vlHcLG1t7+zu2fsHLRmnApMmjlksOgGShFFOmooqRjqJICgKGGkHo5up3x4TIWnM79UkIV6EBpyGFCNlJN+29VOmTzNfP2YPulI7y3y77FSdGeAycXNSBjkavv3V68c4jQhXmCEpu66TKE8joShmJCv1UkkShEdoQLqGchQR6enZ5Rk8MUofhrEwxRWcqb8nNIqknESB6YyQGspFbyr+53VTFV55mvIkVYTj+aIwZVDFcBoD7FNBsGITQxAW1NwK8RAJhJUJq2RCcBdfXiatWtU9r7p3F+X6dR5HERyBY1ABLrgEdXALGqAJMBiDZ/AK3ixtvVjv1se8tWDlM4fgD6zPH1+Ok3k=</latexit>
<latexit sha1_base64="Mon6EHXCa11EjnXsym3XNpquRp0=">AAAB9XicbVBNT8JAEJ3iF+IX6tHLRmLEC2nRRI9ELx4xkY8ECtkuW1jZbpvdrYY0/R9ePGiMV/+LN/+NC/Sg4EsmeXlvJjPzvIgzpW3728qtrK6tb+Q3C1vbO7t7xf2DpgpjSWiDhDyUbQ8rypmgDc00p+1IUhx4nLa88c3Ubz1SqVgo7vUkom6Ah4L5jGBtpF6CT9N+8pD2knL1LO0XS3bFngEtEycjJchQ7xe/uoOQxAEVmnCsVMexI+0mWGpGOE0L3VjRCJMxHtKOoQIHVLnJ7OoUnRhlgPxQmhIazdTfEwkOlJoEnukMsB6pRW8q/ud1Yu1fuQkTUaypIPNFfsyRDtE0AjRgkhLNJ4ZgIpm5FZERlphoE1TBhOAsvrxMmtWKc15x7i5KtessjjwcwTGUwYFLqMEt1KEBBCQ8wyu8WU/Wi/Vufcxbc1Y2cwh/YH3+APCYkiM=</latexit>
<latexit sha1_base64="oFSf5mIvKefo1ut2K84bI/HuXwI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/ScvV00iuW3Io7A1omXkZKkKHeK351+xFJBJWGcKx1x3Nj46dYGUY4nRS6iaYxJiM8oB1LJRZU++ns4Ak6sUofhZGyJQ2aqb8nUiy0HovAdgpshnrRm4r/eZ3EhJd+ymScGCrJfFGYcGQiNP0e9ZmixPCxJZgoZm9FZIgVJsZmVLAheIsvL5NmteKdVbzb81LtKosjD0dwDGXw4AJqcAN1aAABAc/wCm+Ocl6cd+dj3ppzsplD+APn8wf2aI/a</latexit>
<latexit sha1_base64="9JMSM66KNyCTK9VIv0nIhGcw/P8=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBahXkpiBT0WvXisYD8gjWWz3bRrN9mwO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPz/FhwDbb9bRVWVtfWN4qbpa3tnd298v5BW8tEUdaiUkjV9YlmgkesBRwE68aKkdAXrOOPr6d+55EpzWV0B5OYeSEZRjzglICR3Kd++pDdp9X6adYvV+yaPQNeJk5OKihHs1/+6g0kTUIWARVEa9exY/BSooBTwbJSL9EsJnRMhsw1NCIh0146OznDJ0YZ4EAqUxHgmfp7IiWh1pPQN50hgZFe9Kbif56bQHDppTyKE2ARnS8KEoFB4un/eMAVoyAmhhCquLkV0xFRhIJJqWRCcBZfXibts5pTrzm355XGVR5HER2hY1RFDrpADXSDmqiFKJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+S6kP0=</latexit>

(2)
0 (2)
zj µj 0 (2) 0 (2)
zj = aj = j · zj + j
j
<latexit sha1_base64="pi6ijXkXslQZaFrhMo0WalC9xvk=">AAACH3icbZBNS8NAEIY3flu/qh69LBZREUqiol4E0YvHClaFpobJdlu37iZhdyLUkH/ixb/ixYMi4q3/xm3Nwa+BhZf3mWF23jCRwqDr9p2R0bHxicmp6dLM7Nz8Qnlx6cLEqWa8zmIZ66sQDJci4nUUKPlVojmoUPLL8PZkwC/vuDYijs6xl/Cmgk4k2oIBWiso72WwngdZN7/ONrY3c3pI/Q4oBUGX+qwVI83uLe8WdIv6IUcLg3LFrbrDon+FV4gKKaoWlD/8VsxSxSNkEoxpeG6CzQw0CiZ5XvJTwxNgt9DhDSsjUNw0s+F9OV2zTou2Y21fhHTofp/IQBnTU6HtVIA35jcbmP+xRortg2YmoiRFHrGvRe1UUozpICzaEpozlD0rgGlh/0rZDWhgaCMt2RC83yf/FRfbVW+n6p3tVo6OizimyApZJRvEI/vkiJySGqkTRh7IE3khr86j8+y8Oe9frSNOMbNMfpTT/wQ5caE2</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 21
Backprop for BatchNorm Parameters
(2)
0 (2)
zj µj 0 (2) 0 (2)
zj = aj = j · zj + j
j
<latexit sha1_base64="pi6ijXkXslQZaFrhMo0WalC9xvk=">AAACH3icbZBNS8NAEIY3flu/qh69LBZREUqiol4E0YvHClaFpobJdlu37iZhdyLUkH/ixb/ixYMi4q3/xm3Nwa+BhZf3mWF23jCRwqDr9p2R0bHxicmp6dLM7Nz8Qnlx6cLEqWa8zmIZ66sQDJci4nUUKPlVojmoUPLL8PZkwC/vuDYijs6xl/Cmgk4k2oIBWiso72WwngdZN7/ONrY3c3pI/Q4oBUGX+qwVI83uLe8WdIv6IUcLg3LFrbrDon+FV4gKKaoWlD/8VsxSxSNkEoxpeG6CzQw0CiZ5XvJTwxNgt9DhDSsjUNw0s+F9OV2zTou2Y21fhHTofp/IQBnTU6HtVIA35jcbmP+xRortg2YmoiRFHrGvRe1UUozpICzaEpozlD0rgGlh/0rZDWhgaCMt2RC83yf/FRfbVW+n6p3tVo6OizimyApZJRvEI/vkiJySGqkTRh7IE3khr86j8+y8Oe9frSNOMbNMfpTT/wQ5caE2</latexit>

(1) (2) (3)


wj (1) wj (2) 0 (2) 0 (2)
(·) (2) wj
x1 aj zj aj aj ...
<latexit sha1_base64="DSGzTYKR+NJYbAJ1JhYvLKBcRTs=">AAAB9HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5AdynZbLYNTbJrki2U0t/hxYMiXv0x3vw3pu0etPXBwOO9GWbmhSln2rjut1NYW9/Y3Cpul3Z29/YPyodHLZ1kitAmSXiiOiHWlDNJm4YZTjupoliEnLbD4d3Mb4+o0iyRj2ac0kDgvmQxI9hYKfA16wtc9UmUmPNeueLW3DnQKvFyUoEcjV75y48SkgkqDeFY667npiaYYGUY4XRa8jNNU0yGuE+7lkosqA4m86On6MwqEYoTZUsaNFd/T0yw0HosQtspsBnoZW8m/ud1MxPfBBMm08xQSRaL4owjk6BZAihiihLDx5Zgopi9FZEBVpgYm1PJhuAtv7xKWhc177LmPVxV6rd5HEU4gVOoggfXUId7aEATCDzBM7zCmzNyXpx352PRWnDymWP4A+fzB0PvkcM=</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="F7Swd/C4QN3QVLL72remZ0dqisQ=">AAAB8nicbVDLSgNBEJz1GeMr6tHLYBDiJeyqoMegF48RzAM2a5idzCZjZmeWmV4lLPsZXjwo4tWv8ebfOHkcNLGgoajqprsrTAQ34LrfztLyyuraemGjuLm1vbNb2ttvGpVqyhpUCaXbITFMcMkawEGwdqIZiUPBWuHweuy3Hpk2XMk7GCUsiElf8ohTAlbyn7rZQ36fVbyTvFsqu1V3ArxIvBkpoxnq3dJXp6doGjMJVBBjfM9NIMiIBk4Fy4ud1LCE0CHpM99SSWJmgmxyco6PrdLDkdK2JOCJ+nsiI7Exozi0nTGBgZn3xuJ/np9CdBlkXCYpMEmni6JUYFB4/D/ucc0oiJElhGpub8V0QDShYFMq2hC8+ZcXSfO06p1Vvdvzcu1qFkcBHaIjVEEeukA1dIPqqIEoUugZvaI3B5wX5935mLYuObOZA/QHzucP4a6Q+w==</latexit>

<latexit sha1_base64="yiocqbqzfqGkxQidQIigucf7A/c=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/Ssnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwks/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK97teal2lcWRhyM4hjJ4cAE1uIE6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QP04o/Z</latexit>
<latexit sha1_base64="a7DEW9TA7F3/3elNIcTaVl6Dj6g=">AAAB8nicbVBNS8NAEN3Ur1q/qh69BItQLyWpgh6LXjxWsB+QxrLZbtq1m92wO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzgpgzDY7zbRVWVtfWN4qbpa3tnd298v5BW8tEEdoikkvVDbCmnAnaAgacdmNFcRRw2gnG11O/80iVZlLcwSSmfoSHgoWMYDCS99RPH7L7tFo/zfrlilNzZrCXiZuTCsrR7Je/egNJkogKIBxr7blODH6KFTDCaVbqJZrGmIzxkHqGChxR7aezkzP7xCgDO5TKlAB7pv6eSHGk9SQKTGeEYaQXvan4n+clEF76KRNxAlSQ+aIw4TZIe/q/PWCKEuATQzBRzNxqkxFWmIBJqWRCcBdfXibtes09q7m355XGVR5HER2hY1RFLrpADXSDmqiFCJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+M0kPw=</latexit>

zj
<latexit sha1_base64="/U5BI6r2ubz7lzZTHbYvf7aPySc=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgh6LXjxWsB/SriWbZtvYJLskWaEu/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEHOmjet+O0vLK6tr67mN/ObW9s5uYW+/oaNEEVonEY9UK8CaciZp3TDDaStWFIuA02YwvJr4zUeqNIvkrRnF1Be4L1nICDZWunvqPtynpcrJuFsoumV3CrRIvIwUIUOtW/jq9CKSCCoN4VjrtufGxk+xMoxwOs53Ek1jTIa4T9uWSiyo9tPpwWN0bJUeCiNlSxo0VX9PpFhoPRKB7RTYDPS8NxH/89qJCS/8lMk4MVSS2aIw4chEaPI96jFFieEjSzBRzN6KyAArTIzNKG9D8OZfXiSNStk7LXs3Z8XqZRZHDg7hCErgwTlU4RpqUAcCAp7hFd4c5bw4787HrHXJyWYO4A+czx8dI4/z</latexit>
<latexit sha1_base64="MjYPTcWhFsUmb5oT0mVBLyugRyU=">AAAB+XicbVDLTsMwEHTKq5RXgCMXiwpRLlVSkOBYwYVjkehDakPkuE5r6jiR7VQqVv6ECwcQ4sqfcONvcNscoGWklUYzu9rdCRJGpXKcb6uwsrq2vlHcLG1t7+zu2fsHLRmnApMmjlksOgGShFFOmooqRjqJICgKGGkHo5up3x4TIWnM79UkIV6EBpyGFCNlJN+29VOmTzNfP2YPulI7y3y77FSdGeAycXNSBjkavv3V68c4jQhXmCEpu66TKE8joShmJCv1UkkShEdoQLqGchQR6enZ5Rk8MUofhrEwxRWcqb8nNIqknESB6YyQGspFbyr+53VTFV55mvIkVYTj+aIwZVDFcBoD7FNBsGITQxAW1NwK8RAJhJUJq2RCcBdfXiatWtU9r7p3F+X6dR5HERyBY1ABLrgEdXALGqAJMBiDZ/AK3ixtvVjv1se8tWDlM4fgD6zPH1+Ok3k=</latexit>
<latexit sha1_base64="Mon6EHXCa11EjnXsym3XNpquRp0=">AAAB9XicbVBNT8JAEJ3iF+IX6tHLRmLEC2nRRI9ELx4xkY8ECtkuW1jZbpvdrYY0/R9ePGiMV/+LN/+NC/Sg4EsmeXlvJjPzvIgzpW3728qtrK6tb+Q3C1vbO7t7xf2DpgpjSWiDhDyUbQ8rypmgDc00p+1IUhx4nLa88c3Ubz1SqVgo7vUkom6Ah4L5jGBtpF6CT9N+8pD2knL1LO0XS3bFngEtEycjJchQ7xe/uoOQxAEVmnCsVMexI+0mWGpGOE0L3VjRCJMxHtKOoQIHVLnJ7OoUnRhlgPxQmhIazdTfEwkOlJoEnukMsB6pRW8q/ud1Yu1fuQkTUaypIPNFfsyRDtE0AjRgkhLNJ4ZgIpm5FZERlphoE1TBhOAsvrxMmtWKc15x7i5KtessjjwcwTGUwYFLqMEt1KEBBCQ8wyu8WU/Wi/Vufcxbc1Y2cwh/YH3+APCYkiM=</latexit>
<latexit sha1_base64="oFSf5mIvKefo1ut2K84bI/HuXwI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/ScvV00iuW3Io7A1omXkZKkKHeK351+xFJBJWGcKx1x3Nj46dYGUY4nRS6iaYxJiM8oB1LJRZU++ns4Ak6sUofhZGyJQ2aqb8nUiy0HovAdgpshnrRm4r/eZ3EhJd+ymScGCrJfFGYcGQiNP0e9ZmixPCxJZgoZm9FZIgVJsZmVLAheIsvL5NmteKdVbzb81LtKosjD0dwDGXw4AJqcAN1aAABAc/wCm+Ocl6cd+dj3ppzsplD+APn8wf2aI/a</latexit>
<latexit sha1_base64="9JMSM66KNyCTK9VIv0nIhGcw/P8=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBahXkpiBT0WvXisYD8gjWWz3bRrN9mwO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPz/FhwDbb9bRVWVtfWN4qbpa3tnd298v5BW8tEUdaiUkjV9YlmgkesBRwE68aKkdAXrOOPr6d+55EpzWV0B5OYeSEZRjzglICR3Kd++pDdp9X6adYvV+yaPQNeJk5OKihHs1/+6g0kTUIWARVEa9exY/BSooBTwbJSL9EsJnRMhsw1NCIh0146OznDJ0YZ4EAqUxHgmfp7IiWh1pPQN50hgZFe9Kbif56bQHDppTyKE2ARnS8KEoFB4un/eMAVoyAmhhCquLkV0xFRhIJJqWRCcBZfXibts5pTrzm355XGVR5HER2hY1RFDrpADXSDmqiFKJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+S6kP0=</latexit>

n
X 0 (2)[i] n
X
@l @l @a j @l
= (2)[i]
· = (2)[i]
@ j 0
i=1 @a j
@ j 0
i=1 @a j
<latexit sha1_base64="2OzzFHnbTuHimPhmKFR0ZerI7ww=">AAACv3icpVFNS+wwFE2r76nje89Rl26Cg+jbDK0KulDwY+NSwVFhWsttJh3jpGlJboUh9E+6EPw3ZsYBdRQ3Xggczj0nNzk3LaUwGATPnj8z++v33PxCY/HP339LzeWVK1NUmvEOK2Shb1IwXArFOyhQ8ptSc8hTya/Twemof/3AtRGFusRhyeMc+kpkggE6Kmk+RZkGZqMSNAqQVNZvOEo5QnJf00MamSpPrDgM61uravqNycJmndzf2q3t/10R107LegVOO6ZVXwz90dSk2QrawbjoZxBOQItM6jxpPka9glU5V8gkGNMNgxJjO7qeSV43osrwEtgA+rzroIKcm9iO86/phmN6NCu0OwrpmH3vsJAbM8xTp8wB78x0b0R+1etWmO3HVqiyQq7Y66CskhQLOlom7QnNGcqhA8C0cG+l7A5cSuhW3nAhhNNf/gyuttvhTju82G0dnUzimCdrZJ1skZDskSNyRs5JhzDvwEu9gSf9Y7/vK798lfrexLNKPpQ/fAFq+txw</latexit>

n
X 0 (2)[i] n
X
@l @l @a j @l 0 (2)[i]
= (2)[i]
· = (2)[i]
· zj
@ j 0
i=1 @a j
@ j 0
i=1 @a j
<latexit sha1_base64="2a+BRUZ8VjjyxkmZiTBOiAg5D9M=">AAAC2HicpVJBSxwxGM2Mta5b26569BJcpPayzGhBL8JiLx4VXJXujMM32cwaTTJDkhHWMOChpXj1p3nzV/gXzK4D1lV68YPA433v8ZLvS1pwpk0Q3Hv+zIfZj3ON+eanhc9fvrYWl450XipCeyTnuTpJQVPOJO0ZZjg9KRQFkXJ6nF78HPePL6nSLJeHZlTQWMBQsowRMI5KWg9RpoDYqABlGHDMq2ccDUEISM4rvIMjXYrEsp2wOrWywv9xWfhWJeendn3je5/FldOSQW6mHdOqt1LfF1vn2qsXfNJqB51gUvg1CGvQRnXtJ627aJCTUlBpCAet+2FQmNiOUwmnVTMqNS2AXMCQ9h2UIKiO7WQxFV5zzABnuXJHGjxh/3VYEFqPROqUAsyZnu6Nybd6/dJk27FlsigNleQpKCs5NjkebxkPmKLE8JEDQBRzd8XkDNzwjPsLTTeEcPrJr8HRRifc7IQHP9rd3XocDbSCVtE6CtEW6qI9tI96iHg9z3q/vT/+L//a/+vfPEl9r/Ysoxfl3z4Cdajl+g==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 22
Backprop Beyond the BatchNorm Layer

(1) (2) (3)


wj (1) wj (2) 0 (2) 0 (2)
(·) (2) wj
x1 aj zj aj aj ...
<latexit sha1_base64="DSGzTYKR+NJYbAJ1JhYvLKBcRTs=">AAAB9HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5AdynZbLYNTbJrki2U0t/hxYMiXv0x3vw3pu0etPXBwOO9GWbmhSln2rjut1NYW9/Y3Cpul3Z29/YPyodHLZ1kitAmSXiiOiHWlDNJm4YZTjupoliEnLbD4d3Mb4+o0iyRj2ac0kDgvmQxI9hYKfA16wtc9UmUmPNeueLW3DnQKvFyUoEcjV75y48SkgkqDeFY667npiaYYGUY4XRa8jNNU0yGuE+7lkosqA4m86On6MwqEYoTZUsaNFd/T0yw0HosQtspsBnoZW8m/ud1MxPfBBMm08xQSRaL4owjk6BZAihiihLDx5Zgopi9FZEBVpgYm1PJhuAtv7xKWhc177LmPVxV6rd5HEU4gVOoggfXUId7aEATCDzBM7zCmzNyXpx352PRWnDymWP4A+fzB0PvkcM=</latexit>

<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>
<latexit sha1_base64="F7Swd/C4QN3QVLL72remZ0dqisQ=">AAAB8nicbVDLSgNBEJz1GeMr6tHLYBDiJeyqoMegF48RzAM2a5idzCZjZmeWmV4lLPsZXjwo4tWv8ebfOHkcNLGgoajqprsrTAQ34LrfztLyyuraemGjuLm1vbNb2ttvGpVqyhpUCaXbITFMcMkawEGwdqIZiUPBWuHweuy3Hpk2XMk7GCUsiElf8ohTAlbyn7rZQ36fVbyTvFsqu1V3ArxIvBkpoxnq3dJXp6doGjMJVBBjfM9NIMiIBk4Fy4ud1LCE0CHpM99SSWJmgmxyco6PrdLDkdK2JOCJ+nsiI7Exozi0nTGBgZn3xuJ/np9CdBlkXCYpMEmni6JUYFB4/D/ucc0oiJElhGpub8V0QDShYFMq2hC8+ZcXSfO06p1Vvdvzcu1qFkcBHaIjVEEeukA1dIPqqIEoUugZvaI3B5wX5935mLYuObOZA/QHzucP4a6Q+w==</latexit>

<latexit sha1_base64="yiocqbqzfqGkxQidQIigucf7A/c=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/Ssnc66RVLbsWdAS0TLyMlyFDvFb+6/YgkgkpDONa647mx8VOsDCOcTgrdRNMYkxEe0I6lEguq/XR28ASdWKWPwkjZkgbN1N8TKRZaj0VgOwU2Q73oTcX/vE5iwks/ZTJODJVkvihMODIRmn6P+kxRYvjYEkwUs7ciMsQKE2MzKtgQvMWXl0nzrOJVK97teal2lcWRhyM4hjJ4cAE1uIE6NICAgGd4hTdHOS/Ou/Mxb8052cwh/IHz+QP04o/Z</latexit>
<latexit sha1_base64="a7DEW9TA7F3/3elNIcTaVl6Dj6g=">AAAB8nicbVBNS8NAEN3Ur1q/qh69BItQLyWpgh6LXjxWsB+QxrLZbtq1m92wO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPzgpgzDY7zbRVWVtfWN4qbpa3tnd298v5BW8tEEdoikkvVDbCmnAnaAgacdmNFcRRw2gnG11O/80iVZlLcwSSmfoSHgoWMYDCS99RPH7L7tFo/zfrlilNzZrCXiZuTCsrR7Je/egNJkogKIBxr7blODH6KFTDCaVbqJZrGmIzxkHqGChxR7aezkzP7xCgDO5TKlAB7pv6eSHGk9SQKTGeEYaQXvan4n+clEF76KRNxAlSQ+aIw4TZIe/q/PWCKEuATQzBRzNxqkxFWmIBJqWRCcBdfXibtes09q7m355XGVR5HER2hY1RFLrpADXSDmqiFCJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+M0kPw=</latexit>

zj
<latexit sha1_base64="/U5BI6r2ubz7lzZTHbYvf7aPySc=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9mtgh6LXjxWsB/SriWbZtvYJLskWaEu/RVePCji1Z/jzX9j2u5BWx8MPN6bYWZeEHOmjet+O0vLK6tr67mN/ObW9s5uYW+/oaNEEVonEY9UK8CaciZp3TDDaStWFIuA02YwvJr4zUeqNIvkrRnF1Be4L1nICDZWunvqPtynpcrJuFsoumV3CrRIvIwUIUOtW/jq9CKSCCoN4VjrtufGxk+xMoxwOs53Ek1jTIa4T9uWSiyo9tPpwWN0bJUeCiNlSxo0VX9PpFhoPRKB7RTYDPS8NxH/89qJCS/8lMk4MVSS2aIw4chEaPI96jFFieEjSzBRzN6KyAArTIzNKG9D8OZfXiSNStk7LXs3Z8XqZRZHDg7hCErgwTlU4RpqUAcCAp7hFd4c5bw4787HrHXJyWYO4A+czx8dI4/z</latexit>
<latexit sha1_base64="MjYPTcWhFsUmb5oT0mVBLyugRyU=">AAAB+XicbVDLTsMwEHTKq5RXgCMXiwpRLlVSkOBYwYVjkehDakPkuE5r6jiR7VQqVv6ECwcQ4sqfcONvcNscoGWklUYzu9rdCRJGpXKcb6uwsrq2vlHcLG1t7+zu2fsHLRmnApMmjlksOgGShFFOmooqRjqJICgKGGkHo5up3x4TIWnM79UkIV6EBpyGFCNlJN+29VOmTzNfP2YPulI7y3y77FSdGeAycXNSBjkavv3V68c4jQhXmCEpu66TKE8joShmJCv1UkkShEdoQLqGchQR6enZ5Rk8MUofhrEwxRWcqb8nNIqknESB6YyQGspFbyr+53VTFV55mvIkVYTj+aIwZVDFcBoD7FNBsGITQxAW1NwK8RAJhJUJq2RCcBdfXiatWtU9r7p3F+X6dR5HERyBY1ABLrgEdXALGqAJMBiDZ/AK3ixtvVjv1se8tWDlM4fgD6zPH1+Ok3k=</latexit>
<latexit sha1_base64="Mon6EHXCa11EjnXsym3XNpquRp0=">AAAB9XicbVBNT8JAEJ3iF+IX6tHLRmLEC2nRRI9ELx4xkY8ECtkuW1jZbpvdrYY0/R9ePGiMV/+LN/+NC/Sg4EsmeXlvJjPzvIgzpW3728qtrK6tb+Q3C1vbO7t7xf2DpgpjSWiDhDyUbQ8rypmgDc00p+1IUhx4nLa88c3Ubz1SqVgo7vUkom6Ah4L5jGBtpF6CT9N+8pD2knL1LO0XS3bFngEtEycjJchQ7xe/uoOQxAEVmnCsVMexI+0mWGpGOE0L3VjRCJMxHtKOoQIHVLnJ7OoUnRhlgPxQmhIazdTfEwkOlJoEnukMsB6pRW8q/ud1Yu1fuQkTUaypIPNFfsyRDtE0AjRgkhLNJ4ZgIpm5FZERlphoE1TBhOAsvrxMmtWKc15x7i5KtessjjwcwTGUwYFLqMEt1KEBBCQ8wyu8WU/Wi/Vufcxbc1Y2cwh/YH3+APCYkiM=</latexit>
<latexit sha1_base64="oFSf5mIvKefo1ut2K84bI/HuXwI=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuFfRY9OKxgv2Qdi3ZNNvGJtklyQpl6a/w4kERr/4cb/4b03YP2vpg4PHeDDPzgpgzbVz328mtrK6tb+Q3C1vbO7t7xf2Dpo4SRWiDRDxS7QBrypmkDcMMp+1YUSwCTlvB6Hrqt56o0iySd2YcU1/ggWQhI9hY6R73Hh/ScvV00iuW3Io7A1omXkZKkKHeK351+xFJBJWGcKx1x3Nj46dYGUY4nRS6iaYxJiM8oB1LJRZU++ns4Ak6sUofhZGyJQ2aqb8nUiy0HovAdgpshnrRm4r/eZ3EhJd+ymScGCrJfFGYcGQiNP0e9ZmixPCxJZgoZm9FZIgVJsZmVLAheIsvL5NmteKdVbzb81LtKosjD0dwDGXw4AJqcAN1aAABAc/wCm+Ocl6cd+dj3ppzsplD+APn8wf2aI/a</latexit>
<latexit sha1_base64="9JMSM66KNyCTK9VIv0nIhGcw/P8=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBahXkpiBT0WvXisYD8gjWWz3bRrN9mwO1FKyM/w4kERr/4ab/4bt20O2vpg4PHeDDPz/FhwDbb9bRVWVtfWN4qbpa3tnd298v5BW8tEUdaiUkjV9YlmgkesBRwE68aKkdAXrOOPr6d+55EpzWV0B5OYeSEZRjzglICR3Kd++pDdp9X6adYvV+yaPQNeJk5OKihHs1/+6g0kTUIWARVEa9exY/BSooBTwbJSL9EsJnRMhsw1NCIh0146OznDJ0YZ4EAqUxHgmfp7IiWh1pPQN50hgZFe9Kbif56bQHDppTyKE2ARnS8KEoFB4un/eMAVoyAmhhCquLkV0xFRhIJJqWRCcBZfXibts5pTrzm355XGVR5HER2hY1RFDrpADXSDmqiFKJLoGb2iNwusF+vd+pi3Fqx85hD9gfX5A+S6kP0=</latexit>

Since the minibatch mean and variance act as


parameters, we can/have to apply the
multivariable chain rule
(2)[i]
@l @l @z 0 j @l @µj @l @ j2
(2)[i]
= (2)[i]
· (2)[i]
+ · (2)[i] + 2 · (2)[i]
@zj 0
@z j @zj @µj @z @ j @z
<latexit sha1_base64="lse3N5tWJ0SQpFwKTyG1qA/FzSQ=">AAADO3icfVLPS8MwFE7r7/lr06OX4BAVYbRT0IsgevGo4FRYa0mzdEaTtiSpsJX+X178J7x58eJBEa/ezWZBO90eBD6+73vv5b3EjxmVyrKeDHNsfGJyanqmNDs3v7BYriydyygRmDRwxCJx6SNJGA1JQ1HFyGUsCOI+Ixf+7VFPv7gjQtIoPFOdmLgctUMaUIyUpryKceoEAuHUiZFQFDHIsh/c9W6u0o36ZpO6WQb34Qhr2l3PCm4HtyKVZwy3ZQWtKMGtUQ0dnng3xTYD2rA5RpeVtM3RVX147R/DkAZeuWrVrH7Av8DOQRXkceKVH51WhBNOQoUZkrJpW7Fy015tzEhWchJJYoRvUZs0NQwRJ9JN+2+fwTXNtGAQCX1CBfvs74wUcSk73NdOjtS1HNR65H9aM1HBnpvSME4UCfF3oyBhUEWw95FgiwqCFetogLCg+q4QXyO9K6W/W0kvwR4c+S84r9fs7Zp9ulM9OMzXMQ1WwCrYADbYBQfgGJyABsDGvfFsvBpv5oP5Yr6bH99W08hzlkEhzM8vMgQPFA==</latexit>
j j

(2)
@l 1 @l 1 @l 2(zj µj )
= (2)[i]
· + · + 2 ·
0
@z j j @µj n @ j n

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 23
class MultilayerPerceptron(torch.nn.Module): BatchNorm in PyTorch
def __init__(self, num_features, num_classes):
super(MultilayerPerceptron, self).__init__()

### 1st hidden layer


self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
self.linear_1_bn = torch.nn.BatchNorm1d(num_hidden_1)

### 2nd hidden layer


self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
self.linear_2_bn = torch.nn.BatchNorm1d(num_hidden_2)

### Output layer


self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)

def forward(self, x):


out = self.linear_1(x)
# note that batchnorm is in the classic
# sense placed before the activation
out = self.linear_1_bn(out)
out = F.relu(out)

out = self.linear_2(out)
out = self.linear_2_bn(out)
out = F.relu(out)

logits = self.linear_out(out)
probas = F.softmax(logits, dim=1)
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 25
class MultilayerPerceptron(torch.nn.Module): BatchNorm in PyTorch
def __init__(self, num_features, num_classes):
super(MultilayerPerceptron, self).__init__()

### 1st hidden layer


self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
self.linear_1_bn = torch.nn.BatchNorm1d(num_hidden_1)

### 2nd hidden layer


self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
self.linear_2_bn = torch.nn.BatchNorm1d(num_hidden_2)

### Output layer


self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)

def forward(self, x):


out = self.linear_1(x)
# note that batchnorm is in the classic
# sense placed before the activation
out = self.linear_1_bn(out)
out = F.relu(out)
don't forget model.train()
out = self.linear_2(out)
out = self.linear_2_bn(out) and model.eval()
out = F.relu(out)
in training and test loops
logits = self.linear_out(out)
probas = F.softmax(logits, dim=1)
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 26
BatchNorm During Prediction ("Inference")

• Use exponentially weighted average (moving average) of mean


and variance

running_mean = momentum * running_mean


+ (1 - momentum) * sample_mean

(where momentum is typically ~0.1; and same for variance)

• Alternatively, can also use global training set mean and variance

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 27
BatchNorm Variants

Pre-Activation Post-Activation
"Original" version May make more sense,
as discussed in but less common
previous slides

compute net inputs


compute net inputs

BatchNorm apply activation function

apply activation function BatchNorm

compute next-layer net inputs compute next-layer net inputs

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 34
Some Benchmarks
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md#bn----before-or-
after-relu

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 35
Some Benchmarks
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md#bn----before-or-
after-relu

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 36
Some Benchmarks
https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md#bn----before-or-
after-relu

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 37
Other Normalization Methods for Hidden Activations
4 Wu and He

Batch Norm Layer Norm Instance Norm Group Norm

H, W

H, W

H, W

H, W
C N C N C N C N

Figure 2. Normalization methods. Each subplot shows a feature map tensor. The
pixels in blue are normalized by the same mean and variance, computed by aggregating
the values of these pixels. Group Norm is illustrated using a group number of 2.

Wu, Y., & He, K. (2018). Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3-19).
Group-wise computation. Group convolutions have been presented by AlexNet
[28] for distributing a model into two GPUs. The concept of groups as a di-
mension for model design has been more widely studied recently. The work of
(will revisit
ResNeXt after introducing
[7] investigates the trade-offConvolutional
between depth, Neural
width, andNetworks)
groups, and
it suggests that a larger number of groups can improve accuracy under similar
computational cost. MobileNet [38] and Xception [39] exploit channel-wise (also
called “depth-wise”) convolutions, which are group convolutions with a group
number equal to the channel number. ShuffleNet [40] proposes a channel shuffle
operation that permutes the axes of grouped features. These methods all in-
Sebastian Raschka the channel
volve dividing STAT 453: Intro to Deep
dimension intoLearning
groups.andDespite
Generative
theModels SS 2020
relation to these 39
Part 2: Weight Initialization

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 42
Weight Initialization

• We previously discussed that we want to initialize weight to small,


random numbers to break symmetry
• Also, we want the weights to be relatively, why?

Tip (from an earlier slide):

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 43
Sidenote: Vanishing/Exploding Gradient Problems

(1)
w1,1
(1) a1 <latexit sha1_base64="51Rbp1GGPW28qr7Kl7NY0LPiq2o=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquCnosevFYwX5Iu5Zsmm1Dk+ySZIWy9Fd48aCIV3+ON/+N6XYP2vpg4PHeDDPzgpgzbVz32ymsrK6tbxQ3S1vbO7t75f2Dlo4SRWiTRDxSnQBrypmkTcMMp51YUSwCTtvB+Gbmt5+o0iyS92YSU1/goWQhI9hY6QH3vce06p1O++WKW3MzoGXi5aQCORr98ldvEJFEUGkIx1p3PTc2foqVYYTTaamXaBpjMsZD2rVUYkG1n2YHT9GJVQYojJQtaVCm/p5IsdB6IgLbKbAZ6UVvJv7ndRMTXvkpk3FiqCTzRWHCkYnQ7Hs0YIoSwyeWYKKYvRWREVaYGJtRyYbgLb68TFpnNe+85t5dVOrXeRxFOIJjqIIHl1CHW2hAEwgIeIZXeHOU8+K8Ox/z1oKTzxzCHzifP5zWj58=</latexit>

w1,1
(2)
y
<latexit sha1_base64="cs1Q9fet/6GNtc+Tzw/y6WCTX8Y=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0Io/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHkyXoR3QoecgZNVZqZP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03Mf6EKsOZwGmpl2pMKBvTIXYtlTRC7U/mh07JmVUGJIyVLWnIXP09MaGR1lkU2M6ImpFe9mbif143NeGNP+EySQ1KtlgUpoKYmMy+JgOukBmRWUKZ4vZWwkZUUWZsNiUbgrf88ippXVS9y6rbuKrUbvM4inACp3AOHlxDDe6hDk1ggPAMr/DmPDovzrvzsWgtOPnMMfyB8/kD6GeM/w==</latexit>

<latexit sha1_base64="5CoRH/4hNmmOELpSJVIbVc5Zpaw=">AAAB9XicbVBNSwMxEJ31s9avqkcvwSJUkLJRQY9FLx4r2A9otyWbZtvQbHZJspay9H948aCIV/+LN/+NabsHbX0w8Hhvhpl5fiy4Nq777aysrq1vbOa28ts7u3v7hYPDuo4SRVmNRiJSTZ9oJrhkNcONYM1YMRL6gjX84d3UbzwxpXkkH804Zl5I+pIHnBJjpc6om+JzhCedtITPJt1C0S27M6BlgjNShAzVbuGr3YtoEjJpqCBat7AbGy8lynAq2CTfTjSLCR2SPmtZKknItJfOrp6gU6v0UBApW9Kgmfp7IiWh1uPQt50hMQO96E3F/7xWYoIbL+UyTgyTdL4oSAQyEZpGgHpcMWrE2BJCFbe3IjogilBjg8rbEPDiy8ukflHGl2X34apYuc3iyMExnEAJMFxDBe6hCjWgoOAZXuHNGTkvzrvzMW9dcbKZI/gD5/MHvkORXA==</latexit>

<latexit sha1_base64="UUz+fFdIMuJxCEAyuchQQrqM+Xo=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/JkV0=</latexit>

x1 (2) (3)
<latexit sha1_base64="5HJHR/B9CHeIlPgqihTyAybn2c4=">AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8eK9gPaUDbbSbt0swm7G7GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvSATXxnW/ncLK6tr6RnGztLW9s7tX3j9o6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/Vbj6g0j+WDGSfoR3QgecgZNVa6f+p5vXLFrbozkGXi5aQCOeq98le3H7M0QmmYoFp3PDcxfkaV4UzgpNRNNSaUjegAO5ZKGqH2s9mpE3JilT4JY2VLGjJTf09kNNJ6HAW2M6JmqBe9qfif10lNeOVnXCapQcnmi8JUEBOT6d+kzxUyI8aWUKa4vZWwIVWUGZtOyYbgLb68TJpnVe+86t5dVGrXeRxFOIJjOAUPLqEGt1CHBjAYwDO8wpsjnBfn3fmYtxacfOYQ/sD5/AEMWo2i</latexit>

a1
<latexit sha1_base64="vfx38n+ae04OFRd5luhElMypRJ0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPue49puXo+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeXI+g</latexit>
w1,1
<latexit sha1_base64="2hc9HR5bv0+inQ8IsEn3Gd2v7nM=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QE7aS8rV87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFPkV4=</latexit>

L(y, o) = l
(1) o
<latexit sha1_base64="xkDVhV2R7yGjiI8Bkoa6EodHAlw=">AAAB/nicbVDLSsNAFL2pr1pfUXHlZrAIFaQkKuhGKLpx4aKCfUAbymQ6aYdOJmFmIpRQ8FfcuFDErd/hzr9x0mah1QMDh3Pu5Z45fsyZ0o7zZRUWFpeWV4qrpbX1jc0te3unqaJEEtogEY9k28eKciZoQzPNaTuWFIc+py1/dJ35rQcqFYvEvR7H1AvxQLCAEayN1LP3uiHWQ4J5ejupjI9RdIQuEe/ZZafqTIH+EjcnZchR79mf3X5EkpAKTThWquM6sfZSLDUjnE5K3UTRGJMRHtCOoQKHVHnpNP4EHRqlj4JImic0mqo/N1IcKjUOfTOZhVXzXib+53USHVx4KRNxoqkgs0NBwpGOUNYF6jNJieZjQzCRzGRFZIglJto0VjIluPNf/kuaJ1X3tOrcnZVrV3kdRdiHA6iAC+dQgxuoQwMIpPAEL/BqPVrP1pv1PhstWPnOLvyC9fENUuuUZw==</latexit>

(1)
w1,2 a2
<latexit sha1_base64="UEIEXkJI4Qcu+777LfA5dwpJBR0=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0mrWvFqFffuolS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGeYI+g</latexit>
<latexit sha1_base64="zmvhV5w6wvufBjgJnplzs3qmpp8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69BIvgqSQq6LHoxWML9gPaUDbbSbt2sxt2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZemHCmjed9O4W19Y3NreJ2aWd3b/+gfHjU0jJVFJtUcqk6IdHImcCmYYZjJ1FI4pBjOxzfzfz2EyrNpHgwkwSDmAwFixglxkoN2S9XvKo3h7tK/JxUIEe9X/7qDSRNYxSGcqJ11/cSE2REGUY5Tku9VGNC6JgMsWupIDHqIJsfOnXPrDJwI6lsCePO1d8TGYm1nsSh7YyJGellbyb+53VTE90EGRNJalDQxaIo5a6R7uxrd8AUUsMnlhCqmL3VpSOiCDU2m5INwV9+eZW0Lqr+ZdVrXFVqt3kcRTiBUzgHH66hBvdQhyZQQHiGV3hzHp0X5935WLQWnHzmGP7A+fwB2T+M9Q==</latexit>

l <latexit sha1_base64="E5Kc1ZKr520j8ga7QDzfGA0mefk=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cW7Ae0oWy2k3btZhN2N0IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4bua3n1BpHssHM0nQj+hQ8pAzaqzUEP1yxa26c5BV4uWkAjnq/fJXbxCzNEJpmKBadz03MX5GleFM4LTUSzUmlI3pELuWShqh9rP5oVNyZpUBCWNlSxoyV39PZDTSehIFtjOiZqSXvZn4n9dNTXjjZ1wmqUHJFovCVBATk9nXZMAVMiMmllCmuL2VsBFVlBmbTcmG4C2/vEpaF1Xvsuo2riq12zyOIpzAKZyDB9dQg3uoQxMYIDzDK7w5j86L8+58LFoLTj5zDH/gfP4A1LOM8g==</latexit>

(2)
<latexit sha1_base64="94K/4WUXlcb/JabfmqCJ0lfwAyA=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpSkCnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaVYrzmXFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8Ab/OkV0=</latexit>

x2
<latexit sha1_base64="gBTwEt+X3BPX1KgMo6lYVWIC09o=">AAAB6nicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGi/YA2lM120y7dbMLuRCyhP8GLB0W8+ou8+W/ctjlo64OBx3szzMwLEikMuu63s7K6tr6xWdgqbu/s7u2XDg6bJk414w0Wy1i3A2q4FIo3UKDk7URzGgWSt4LRzdRvPXJtRKwecJxwP6IDJULBKFrp/qlX7ZXKbsWdgSwTLydlyFHvlb66/ZilEVfIJDWm47kJ+hnVKJjkk2I3NTyhbEQHvGOpohE3fjY7dUJOrdInYaxtKSQz9fdERiNjxlFgOyOKQ7PoTcX/vE6K4ZWfCZWkyBWbLwpTSTAm079JX2jOUI4toUwLeythQ6opQ5tO0YbgLb68TJrVindece8uyrXrPI4CHMMJnIEHl1CDW6hDAxgM4Ble4c2Rzovz7nzMW1ecfOYI/sD5/AEN3o2j</latexit>

w3,1
(2) a2
<latexit sha1_base64="Rx/RXsiT+s/v11w3kFUY/JZyKRU=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXspuK+ix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/TcvV82i+W3Io7B1olXkZKkKHRL371BhFJBJWGcKx113Nj46dYGUY4nRZ6iaYxJmM8pF1LJRZU++n84Ck6s8oAhZGyJQ2aq78nUiy0nojAdgpsRnrZm4n/ed3EhFd+ymScGCrJYlGYcGQiNPseDZiixPCJJZgoZm9FZIQVJsZmVLAheMsvr5JWteLVKu7dRal+ncWRhxM4hTJ4cAl1uIUGNIGAgGd4hTdHOS/Ou/OxaM052cwx/IHz+QOf5o+h</latexit>

<latexit sha1_base64="cEzid11tbdqRtUGPTS00ftF0Nlk=">AAAB9XicbVBNS8NAEJ34WetX1aOXxSJUkJK0gh6LXjxWsB/QpmWz3bRLN5uwu7GUkP/hxYMiXv0v3vw3btsctPXBwOO9GWbmeRFnStv2t7W2vrG5tZ3bye/u7R8cFo6OmyqMJaENEvJQtj2sKGeCNjTTnLYjSXHgcdryxnczv/VEpWKheNTTiLoBHgrmM4K1kXqTflK9RE7aS0qVi7RfKNplew60SpyMFCFDvV/46g5CEgdUaMKxUh3HjrSbYKkZ4TTNd2NFI0zGeEg7hgocUOUm86tTdG6UAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVB5E4Kz/PIqaVbKTrVsP1wVa7dZHDk4hTMogQPXUIN7qEMDCEh4hld4sybWi/VufSxa16xs5gT+wPr8AcLlkV8=</latexit>

(1) (1)
w1,3 <latexit sha1_base64="MjoriDaix6GZhq5KkAUBLNsPXo8=">AAAB9XicbVBNS8NAEJ3Ur1q/qh69LBahgpTECnosevFYwX5Am5bNdtMu3WzC7sZSQv6HFw+KePW/ePPfuG1z0NYHA4/3ZpiZ50WcKW3b31ZubX1jcyu/XdjZ3ds/KB4eNVUYS0IbJOShbHtYUc4EbWimOW1HkuLA47Tlje9mfuuJSsVC8ainEXUDPBTMZwRrI/Um/cS5QNW0l5Sd87RfLNkVew60SpyMlCBDvV/86g5CEgdUaMKxUh3HjrSbYKkZ4TQtdGNFI0zGeEg7hgocUOUm86tTdGaUAfJDaUpoNFd/TyQ4UGoaeKYzwHqklr2Z+J/XibV/4yZMRLGmgiwW+TFHOkSzCNCASUo0nxqCiWTmVkRGWGKiTVAFE4Kz/PIqaV5WnGrFfrgq1W6zOPJwAqdQBgeuoQb3UIcGEJDwDK/wZk2sF+vd+li05qxs5hj+wPr8AcFZkV4=</latexit>

a3 <latexit sha1_base64="F0cJIqijoEg/scv4wVZxoymO2Dc=">AAAB8HicbVBNSwMxEJ2tX7V+VT16CRahXsquLeix6MVjBfsh7VqyabYNTbJLkhXK0l/hxYMiXv053vw3pu0etPXBwOO9GWbmBTFn2rjut5NbW9/Y3MpvF3Z29/YPiodHLR0litAmiXikOgHWlDNJm4YZTjuxolgEnLaD8c3Mbz9RpVkk780kpr7AQ8lCRrCx0gPuVx/Tsnc+7RdLbsWdA60SLyMlyNDoF796g4gkgkpDONa667mx8VOsDCOcTgu9RNMYkzEe0q6lEguq/XR+8BSdWWWAwkjZkgbN1d8TKRZaT0RgOwU2I73szcT/vG5iwis/ZTJODJVksShMODIRmn2PBkxRYvjEEkwUs7ciMsIKE2MzKtgQvOWXV0nrouJVK+5drVS/zuLIwwmcQhk8uIQ63EIDmkBAwDO8wpujnBfn3flYtOacbOYY/sD5/AGf6o+h</latexit>

Now, imagine, we have many layers and


sigmoid activations ...
(2) (1)
@l @l @o @a1 @a1
(1)
= · (2) · (1) · (1)
@w1,1 @o @a @a @w
1 1 1,1
(2) (1)
@l @o @a2 @a1
+ · (2) · (1)
· (1)
@o @a @a1 @w1,1
<latexit sha1_base64="duY3mtbRiHW1HhXeSoGG3GDEm3E=">AAADmXictVJdS8MwFM1aP2b92hR88SU4BEUZzRT0QcEPFPFJ0amwzpFm6QymTUlSZZT+Jv+Lb/4b0zlwbqIveqFwOPecm57L9WPOlHbdt4Jlj41PTBannOmZ2bn5UnnhRolEElonggt552NFOYtoXTPN6V0sKQ59Tm/9x+O8f/tEpWIiutbdmDZD3IlYwAjWhmqVCy9eIDFJvRhLzTCHPPvEz60UbaLsPl1D61kGnX34g1hk0CNtoZ0hjRjQYDMwH1fLx32rHlSMsuhXH/ri6wWAA1bP2fiDDLVfM9T+LUOrVHGrbq/gKEB9UAH9umiVXr22IElII004VqqB3Fg303w64TRzvETRGJNH3KENAyMcUtVMe5eVwVXDtGEgpPkiDXvsoCPFoVLd0DfKEOsHNdzLye96jUQHu82URXGiaUQ+HgoSDrWA+ZnCNpOUaN41ABPJzL9C8oDNurQ5ZscsAQ1HHgU3tSraqrqX25WDo/46imAZrIA1gMAOOABn4ALUAbGWrD3rxDq1l+1D+8w+/5Bahb5nEXwp++oddm8n9w==</latexit>
2

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 44
Sidenote: Vanishing/Exploding Gradient problems

Now, imagine, we have many layers and


logistic sigmoid activations ...
(2) (1)
@l @l @o @a1 @a1
(1)
= · (2) · (1) · (1)
@w1,1 @o @a @a1 @w1,1
1
(2) (1)
@l @o @a2 @a1
+ · (2) · (1) · (1)
@o @a @a @w
<latexit sha1_base64="duY3mtbRiHW1HhXeSoGG3GDEm3E=">AAADmXictVJdS8MwFM1aP2b92hR88SU4BEUZzRT0QcEPFPFJ0amwzpFm6QymTUlSZZT+Jv+Lb/4b0zlwbqIveqFwOPecm57L9WPOlHbdt4Jlj41PTBannOmZ2bn5UnnhRolEElonggt552NFOYtoXTPN6V0sKQ59Tm/9x+O8f/tEpWIiutbdmDZD3IlYwAjWhmqVCy9eIDFJvRhLzTCHPPvEz60UbaLsPl1D61kGnX34g1hk0CNtoZ0hjRjQYDMwH1fLx32rHlSMsuhXH/ri6wWAA1bP2fiDDLVfM9T+LUOrVHGrbq/gKEB9UAH9umiVXr22IElII004VqqB3Fg303w64TRzvETRGJNH3KENAyMcUtVMe5eVwVXDtGEgpPkiDXvsoCPFoVLd0DfKEOsHNdzLye96jUQHu82URXGiaUQ+HgoSDrWA+ZnCNpOUaN41ABPJzL9C8oDNurQ5ZscsAQ1HHgU3tSraqrqX25WDo/46imAZrIA1gMAOOABn4ALUAbGWrD3rxDq1l+1D+8w+/5Bahb5nEXwp++oddm8n9w==</latexit>
2 1 1,1

0 [i] [i] [i]


<latexit sha1_base64="WGFXY/bEn1e+AMGTXJciHXEYlPI=">AAACJXicbZDLSsNAFIYnXmu9RV26GSxiu7AkVdCFQtGNywr2Akksk8mkHTrJhJmJUENfxo2v4saFRQRXvorTC6itBwZ+vv8czpzfTxiVyrI+jYXFpeWV1dxafn1jc2vb3NltSJ4KTOqYMy5aPpKE0ZjUFVWMtBJBUOQz0vR71yO/+UCEpDy+U/2EeBHqxDSkGCmN2uaFK2knQkfFx/vMod6gBC/hBP0QFwdcwaJ9PGOU2mbBKlvjgvPCnooCmFatbQ7dgOM0IrHCDEnp2FaivAwJRTEjg7ybSpIg3EMd4mgZo4hILxtfOYCHmgQw5EK/WMEx/T2RoUjKfuTrzgiprpz1RvA/z0lVeO5lNE5SRWI8WRSmDCoOR5HBgAqCFetrgbCg+q8Qd5FAWOlg8zoEe/bkedGolO2TcuX2tFC9msaRA/vgABSBDc5AFdyAGqgDDJ7AC3gDQ+PZeDXejY9J64IxndkDf8r4+gby0aOt</latexit>
(z ) = (z ) · (1 (z ))

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 45
Sidenote: Vanishing/Exploding Gradient Problems

1 d e z
(z) = (z) = z 2
= (z)(1 (z))
<latexit sha1_base64="27EACsCo0KJp9Pu/nSJvZPikwFk=">AAACCnicbVDLSsNAFJ3UV62vqEs3o0WoiCVRQTdC0Y3LCvYBTSyT6aQOnUzCzERoQ9Zu/BU3LhRx6xe482+ctFlo64ELh3Pu5d57vIhRqSzr2yjMzS8sLhWXSyura+sb5uZWU4axwKSBQxaKtockYZSThqKKkXYkCAo8Rlre4CrzWw9ESBryWzWMiBugPqc+xUhpqWvuOpL2A1QZHcAL6PgC4cROExseQnKXHI3StNQ1y1bVGgPOEjsnZZCj3jW/nF6I44BwhRmSsmNbkXITJBTFjKQlJ5YkQniA+qSjKUcBkW4yfiWF+1rpQT8UuriCY/X3RIICKYeBpzsDpO7ltJeJ/3mdWPnnbkJ5FCvC8WSRHzOoQpjlAntUEKzYUBOEBdW3QnyPdB5Kp5eFYE+/PEuax1X7pGrdnJZrl3kcRbAD9kAF2OAM1MA1qIMGwOARPINX8GY8GS/Gu/ExaS0Y+cw2+APj8wdpQZjN</latexit>
1+e z
<latexit sha1_base64="W9vRWm8kOss6A+x0J6A9OywGNA0=">AAACPHicbVDLSgMxFM34rPU16tJNsAgt0jJTBd0IRTcuK9oH9EUmk2lDMw+SjNAO82Fu/Ah3rty4UMStazPt4KP1QODcc+7l5h4rYFRIw3jSFhaXlldWM2vZ9Y3NrW19Z7cu/JBjUsM+83nTQoIw6pGapJKRZsAJci1GGtbwMvEbd4QL6nu3chSQjov6HnUoRlJJPf2m7XCEIzuO7HEM24L2XZQfF+A5nBqkGxXHcRzlTXgEp0WhW44T/7tXecWfspDt6TmjZEwA54mZkhxIUe3pj23bx6FLPIkZEqJlGoHsRIhLihmJs+1QkADhIeqTlqIeconoRJPjY3ioFBs6PlfPk3Ci/p6IkCvEyLVUp4vkQMx6ifif1wqlc9aJqBeEknh4usgJGZQ+TJKENuUESzZSBGFO1V8hHiAVmlR5JyGYsyfPk3q5ZB6XjOuTXOUijSMD9sEByAMTnIIKuAJVUAMY3INn8AretAftRXvXPqatC1o6swf+QPv8AoHHqxk=</latexit>
dz (1 + e )

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 46
Weight Initialization

• Traditionally, we can initialize weights by sampling from a random


uniform distribution in range [0, 1], or better, [-0.5, 0.5]
• Or, we could sample from a Gaussian distribution with mean 0
and small variance (e.g., 0.1 or 0.01)
• When would you choose which?

Tip (from an earlier slide):

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 49
Weight Initialization

• Traditionally, we can initialize weights by sampling from a random


uniform distribution in range [0, 1], or better, [-0.5, 0.5]
• Or, we could sample from a Gaussian distribution with mean 0
and small variance (e.g., 0.1 or 0.01)
• When would you choose which?

Tip (from an earlier slide):

Sidenote: You can initialize the


bias units to all zeros

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 50
Custom Weight Initialization in PyTorch

class MLP(torch.nn.Module):

def __init__(self, num_features, num_hidden, num_classes):


super(MLP, self).__init__()

self.num_classes = num_classes

### 1st hidden layer


self.linear_1 = torch.nn.Linear(num_features, num_hidden)
self.linear_1.weight.detach().normal_(0.0, 0.1)
self.linear_1.bias.detach().zero_()

### Output layer


self.linear_out = torch.nn.Linear(num_hidden, num_classes)
self.linear_out.weight.detach().normal_(0.0, 0.1)
self.linear_out.bias.detach().zero_()

def forward(self, x):


out = self.linear_1(x)
out = torch.sigmoid(out)
logits = self.linear_out(out)
probas = torch.sigmoid(logits)
return logits, probas

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 51
Weight Initialization -- Xavier Initialization
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks."
Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

• TanH is a bit more robust regarding vanishing gradients (compared


to logistic sigmoid)
• It still has the problem of saturation (near zero gradients if inputs
are very large, positive or negative values)
• Xavier initialization is a small improvement for initializing weights
for tanH

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 52
Weight Initialization -- Xavier Initialization
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks."
Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

Method:

Step 1: Initialize weights from Gaussian or uniform distribution with (previous slide)

Step 2: Scale the weights proportional to the number of inputs to the layer

(For the first hidden layer, that is the number of features in the dataset;
for the second hidden layer, that is the number of units in the 1st hidden layer
etc.)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 53
Weight Initialization -- Xavier Initialization
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks."
Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

Method:

Scale the weights proportional to the number of inputs to the layer

In particular, scale as follows:

r where m is the number of


1
W(l) := W(l) · input units to the next
<latexit sha1_base64="izNCHW0oRgGvptGbBmEcdToHyaY=">AAACL3icbVDLSgMxFM3Ud32NunQTLEK7sMyooAiCKIjLCtYKnbFk0owNzTxM7gglzB+58VfciCji1r8w03ah1gOBwzn3cnNOkAquwHFerdLU9Mzs3PxCeXFpeWXVXlu/VkkmKWvSRCTyJiCKCR6zJnAQ7CaVjESBYK2gf1b4rQcmFU/iKxikzI/IXcxDTgkYqWOfexGBXhDqVn6rq6KW46NjjCdEj3YTwJ66l6C9UBKq3VxHhbnj1vI879gVp+4MgSeJOyYVNEajYz973YRmEYuBCqJU23VS8DWRwKlgednLFEsJ7ZM71jY0JhFTvh7mzfG2Ubo4TKR5MeCh+nNDk0ipQRSYySKI+usV4n9eO4Pw0Nc8TjNgMR0dCjOBIcFFebjLJaMgBoYQKrn5K6Y9YuoAU3HZlOD+jTxJrnfr7l5993K/cnI6rmMebaItVEUuOkAn6AI1UBNR9Iie0Rt6t56sF+vD+hyNlqzxzgb6BevrG+JJqP0=</latexit>
m(l 1) layer
e.g.,
(l) 2
Wi,j
<latexit sha1_base64="ZrTHYvjwi47FrFfNBvRI/WVln+U=">AAACFnicbVDLSgMxFM34rPVVdekmWIQWapmpgm4KRTeupIJ9QDsdMmnaxiaZIckIZZivcOOvuHGhiFtx59+YPhbaeiDk5Jx7ubnHDxlV2ra/raXlldW19dRGenNre2c3s7dfV0EkManhgAWy6SNFGBWkpqlmpBlKgrjPSMMfXo39xgORigbiTo9C4nLUF7RHMdJG8jInccOLaQHeJ0knzrF8AtuKcniTa/OobBfGrz5HnVLZLtpO3stkzT0BXCTOjGTBDFUv89XuBjjiRGjMkFItxw61GyOpKWYkSbcjRUKEh6hPWoYKxIly48laCTw2Shf2AmmO0HCi/u6IEVdqxH1TyZEeqHlvLP7ntSLdu3BjKsJIE4Gng3oRgzqA44xgl0qCNRsZgrCk5q8QD5BEWJsk0yYEZ37lRVIvFZ3TYun2LFu5nMWRAofgCOSAA85BBVyDKqgBDB7BM3gFb9aT9WK9Wx/T0iVr1nMA/sD6/AEMIpy9</latexit>
⇠ N (µ = 0, = 0.01)
(or uniform distr. in a fixed interval, as in the original paper)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 54
Xavier Initialization in PyTorch

Semi-Automatic:
...
self.linear = torch.nn.Linear(...)
torch.nn.init.xavier_uniform_(conv1.weight)
...

More conveniently for all weights in e.g., fully-connected layers:


...
def weights_init(m):
if isinstance(m, nn.Linear):
torch.nn.init.xavier_uniform_(m.weight)
torch.nn.init.xavier_uniform_(m.bias)

model.apply(weights_init)
...

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 58
Weight Initialization -- Xavier Initialization

r
(l) (l) 1
W := W ·
<latexit sha1_base64="oPuJLBzfUCNgUUxhWs6eByv4t9g=">AAACL3icbVDLSgMxFM3Ud32NunQTLEJdWGZUUARBFMRlBWuFzrRk0owNzTxM7gglzB+58VfciCji1r8w03ah1gOBwzn3cnNOkAquwHFerdLU9Mzs3PxCeXFpeWXVXlu/UUkmKWvQRCTyNiCKCR6zBnAQ7DaVjESBYM2gf174zQcmFU/iaxikzI/IXcxDTgkYqWNfeBGBXhDqZt7WVbGT4+MTjCdEj3YTwJ66l6C9UBKq3VxHbd0Su66f53nHrjg1Zwg8SdwxqaAx6h372esmNItYDFQQpVquk4KviQROBcvLXqZYSmif3LGWoTGJmPL1MG+Ot43SxWEizYsBD9WfG5pESg2iwEwWQdRfrxD/81oZhEe+5nGaAYvp6FCYCQwJLsrDXS4ZBTEwhFDJzV8x7RFTB5iKy6YE92/kSXKzV3P3a3tXB5XTs3Ed82gTbaEqctEhOkWXqI4aiKJH9Ize0Lv1ZL1YH9bnaLRkjXc20C9YX9+Adalk</latexit>
m[l 1]

Again, some DL jargon: This is sometimes called "fan in"


(= number of inputs to a layer)

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 59
Weight Initialization -- Xavier Initialization
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks."
Proceedings
derstanding of the thirteenth
the difficulty of traininginternational conference
deep feedforward on artificial intelligence and statistics. 2010.
neural networks

on Study Without Xavier weight


initialization

bove theoretical ideas, we have


E
stograms of activation values,
back-propagated gradients at
ifferent initialization methods.
es 6, 7 and 8) are from exper-
but qualitatively similar results
r datasets. W Xavier weight initialization

ues of the Jacobian matrix as-

@zi+1
(17)
@zi
ve the same dimension, the av-
onds to the average ratio of in- Figure 7: Back-propagated gradients normalized his-
d from zi to zi+1 , as well as tograms with hyperbolic tangent activation, with standard
vation variance going from zi (top) vs normalized (bottom) initialization. Top: 0-peak
zed initialization, this ratio is decreases for higher layers.
standard initialization, it drops
What was initially really surprising is that even when the
back-propagated gradients become smaller (standard ini-
tialization), the variance of the weights gradients is roughly
Sebastian Raschka STATacross
constant 453: layers,
Intro toasDeep
shownLearning
on Figureand Generativethis
8. However, Models SS 2020 63
Weight Initialization -- He Initialization
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp.
1026-1034. 2015.

• Assuming activations with mean 0, which is reasonable, Xavier Initialization assumes


a derivative of 1 for the activation function (which is reasonable for tanH)

• For ReLU, this is different, as the activations are not centered at zero anymore
• He initialization takes this into account (to see that worked out in math, see the
paper)

• The result is that we add a scaling factor of 20.5


r
(l) (l) 2
W := W ·
<latexit sha1_base64="JA0sAF6lHNr9Lk9gnRCh27gQgh0=">AAACL3icbVDLSgMxFM3UV62vqks3wSLowjJTBUUQREFcVrBWmBlLJpOxoZmHyR2hhPkjN/6KGxFF3PoXpo+FWg8EDufcy805QSa4Att+tUpT0zOzc+X5ysLi0vJKdXXtWqW5pKxFU5HKm4AoJnjCWsBBsJtMMhIHgrWD3tnAbz8wqXiaXEE/Y35M7hIecUrASJ3quRcT6AaRbhe3elvsFPjoGOMJ0aNhCthT9xK0F0lCdaPQ8a12xa7jF0XRqdbsuj0EniTOmNTQGM1O9dkLU5rHLAEqiFKuY2fgayKBU8GKipcrlhHaI3fMNTQhMVO+HuYt8JZRQhyl0rwE8FD9uaFJrFQ/DszkIIj66w3E/zw3h+jQ1zzJcmAJHR2KcoEhxYPycMgloyD6hhAqufkrpl1i6gBTccWU4PyNPEmuG3Vnr9643K+dnI7rKKMNtIm2kYMO0Am6QE3UQhQ9omf0ht6tJ+vF+rA+R6Mla7yzjn7B+voGggapZQ==</latexit>
m[l 1]

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 64
Weight Initialization -- He Initialization
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level
performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp.
1026-1034. 2015.

• Assuming activations with mean 0, which is reasonable, Xavier Initialization assumes


a derivative of 1 for the activation function (which is reasonable for tanH)

• For ReLU, this is different, as the activations are not centered at zero anymore
• He initialization takes this into account (to see that worked out in math, see the
paper)

• The result is that we add a scaling factor of 20.5


For Leaky ReLU with negative slope alpha:
r s
2 (l) (l) 2
W (l)
:= W (l)
· W := W ·
<latexit sha1_base64="JA0sAF6lHNr9Lk9gnRCh27gQgh0=">AAACL3icbVDLSgMxFM3UV62vqks3wSLowjJTBUUQREFcVrBWmBlLJpOxoZmHyR2hhPkjN/6KGxFF3PoXpo+FWg8EDufcy805QSa4Att+tUpT0zOzc+X5ysLi0vJKdXXtWqW5pKxFU5HKm4AoJnjCWsBBsJtMMhIHgrWD3tnAbz8wqXiaXEE/Y35M7hIecUrASJ3quRcT6AaRbhe3elvsFPjoGOMJ0aNhCthT9xK0F0lCdaPQ8a12xa7jF0XRqdbsuj0EniTOmNTQGM1O9dkLU5rHLAEqiFKuY2fgayKBU8GKipcrlhHaI3fMNTQhMVO+HuYt8JZRQhyl0rwE8FD9uaFJrFQ/DszkIIj66w3E/zw3h+jQ1zzJcmAJHR2KcoEhxYPycMgloyD6hhAqufkrpl1i6gBTccWU4PyNPEmuG3Vnr9643K+dnI7rKKMNtIm2kYMO0Am6QE3UQhQ9omf0ht6tJ+vF+rA+R6Mla7yzjn7B+voGggapZQ==</latexit>
m[l 1] (1 + ↵2 ) · m[l 1]
<latexit sha1_base64="nmIrqfSWhgxRMvGjHEqbVjdrPHo=">AAACQ3icbVDLSgMxFM34tr6qLt0Ei1ARy0wVFEEQ3bisYK3YmZZMmrGhmYfJHaGE+Tc3/oA7f8CNC0XcCqbtLNR6IHA451xu7vETwRXY9rM1MTk1PTM7N19YWFxaXimurl2pOJWU1WksYnntE8UEj1gdOAh2nUhGQl+wht87G/iNeyYVj6NL6CfMC8ltxANOCRipXbxxQwJdP9CNrKXLYjvDR8cYj4ku7cSAXXUnQWM3kITqaqbLzo5LRNIlrep2Hglbuil2HS/LsnaxZFfsIfA4cXJSQjlq7eKT24lpGrIIqCBKNR07AU8TCZwKlhXcVLGE0B65ZU1DIxIy5elhBxneMkoHB7E0LwI8VH9OaBIq1Q99kxwcp/56A/E/r5lCcOhpHiUpsIiOFgWpwBDjQaG4wyWjIPqGECq5+SumXWIaAlN7wZTg/D15nFxVK85epXqxXzo5zeuYQxtoE5WRgw7QCTpHNVRHFD2gF/SG3q1H69X6sD5H0Qkrn1lHv2B9fQO6prAy</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 65
PyTorch Default Weights
PyTorch uses the following scheme by default, which is
somewhat similar to Xavier initialization, and works ok
in practice most of the time

https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L148

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 66
PyTorch Default Weights

However, note that different layers have different defaults

https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/conv.py#L55

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 67
Note that if BatchNorm is used,
initial feature weight choice is less important anyway

Sebastian Raschka STAT 453: Intro to Deep Learning and Generative Models SS 2020 68
Hands-on Fashion-MNIST

Exercise in pairs:
● Ouput: A report of ~3 pages describing a solution to the Self sorting
wardrobe problem using Fashion-MNIST dataset to be delivered in 3
weeks.
○ Describe the problem, analysis and results obtained following the Data Science
process cycle (see file hands-on Excercise.pptx slides 4-5)
● Read/run the suggested tutorial using Google Colab
○ https://www.tensorflow.org/tutorials/keras/classification?hl=pt-br
○ Try other neural network architectures
● Read/run the tutorial Self sorting wardrobe using Peltarion platform
○ https://peltarion.com/knowledge-center/documentation/tutorials/self-sorting-wardrobe
○ Optional 4

You might also like