Iliya Valchanov
Neural Networks
Overview
365 DATA SCENCE 2
Table of Content:
Abstract .....................................................................................................................................3
1. The Layer ..............................................................................................................................4
2. What is a Deep Net? ...........................................................................................................5
3. Why Do we Need Non-Linearities to Stack Layers? ............................................................7
4. Activation Functions.............................................................................................................8
4.1 Common Activation Functions ................................................................................9
4.2 Softmax Activation ...............................................................................................10
5. Backpropagation...................................................................................................................11
5.1 Backpropagation formula ......................................................................................12
365 DATA SCENCE 3
Abstract
In these course notes, we are going to cover the advanced machine learning
algorithm known as deep learning which is capable of creating highly accurate
predictive models without having to give it any explicit instructions . It accomplishes
that by working with large amounts of unstructured data, and mimics the human
learning process by building a hierarchy of complex abstractions.
Keywords: deep net, backpropagation formula, Softmax activation, layers
365 DATA SCENCE 4
1. The Layer
An initial linear combination and the The layer is the building block of
added non-linearity form a neural networks
Minimal example (a Neural
simple neural network) networks
Input layer
Input layer
x1 Non- linearity
x1 Output
y xw+b ∫ y
x2 Linear
x2 Output layer
combination
In the minimal example we trained a Neural networks step on linear
neural network which had no depth. combinations, but add a non- linearity
There were solely an input layer and to each one of them. Mixing linear
an output layer. Moreover, the output combinations and non-linearities
was simply a linear combination of allows us to model arbitrary functions.
the input.
365 DATA SCENCE 5
2. What is a Deep Net?
This is a deep neural network (deep net) with 5 layers.
How to read this diagram:
A layer A unit (a neuron) Arrows represent
mathematical
transformations
Hidden Hidden Hidden
layer 1 layer 2 layer 3
Input layer
output layer
365 DATA SCENCE 6
The width of a layer is the number of units in that layer
The width of the net is the number of units of the biggest layer
The depth of the net is equal to the number of layers or the number of hidden layers.
The term has different definitions. More often than not, we are interested in the
number of hidden layers (as there are always input and output layers).
The width and the depth of the net are called hyperparameters. They are values
we manually chose when creating the net.
Hidden Hidden Hidden
layer 1 layer 2 layer 3
Input layer
Output layer
Width
Depth
365 DATA SCENCE 7
3. Why do we need non-linearities to stack layers?
You can see a net with no non-linearities: just linear combinations.
Hidden
layer 1
Input
layer
h = x w1
Output
layer
y = h * w2
y = x * w1 * w2
8x9 9x4
y=x*w
8x4
Two consecutive linear
transformations are equivalent to a
* single one.
Input
layer Output
layer
1x8 8x4 1x4
365 DATA SCENCE 8
4. Activation Functions
Input
Activation functions
x1
xw+b ∫
x2 Linear
combination
In the respective lesson, we gave an example of temperature change. The temperature
starts decreasing (which is a numerical change). Our brain is a kind of an ‘activation
function’. It tells us whether it is cold enough for us to put on a jacket.
Putting on a jacket is a binary action: 0 (no jacket) or 1 ( jacket).
This is a very intuitive and visual (yet not so practical) example of how activation
functions work.
Activation functions (non-linearities) are needed so we can break the linearity and
represent more complicated relationships
Moreover, activation functions are required in order to stack layers.
Activation functions transform inputs into outputs of a different kind.
365 DATA SCENCE 9
4.1 Common Activation Functions
Name Formula Derivative Graph Range
sigmoid (logistic (0,1)
function)
TanH
(hyperbolic (-1,1)
tangent)
ReLu (rectified (0,∞)
linear unit)
softmax (0,1)
All common activation functions are: monotonic, continuous, and differentiable. These
are important properties needed for the optimization.
365 DATA SCENCE 10
4.2 Softmax Activation
Hidden layer Output layer
ah = hw+b
The softmax activation transforms a bunch of arbitrarily large or small numbers into a
valid probability distribution.
While other activation functions get an input value and transform it, regardless of
the other elements, the softmax considers the information about the whole set of
numbers we have.
The values that softmax outputs are in the range from 0 to 1 and their sum is exactly 1
(like probabilities).
Example:
probability distribution
The property of the softmax to output probabilities is so useful and intuitive that it is
often used as the activation function for the final (output) layer.
However, when the softmax is used prior to that (as the activation of a hidden layer),
the results are not as satisfactory. That’s because a lot of the information about the
variability of the data is lost.
365 DATA SCENCE 11
5. Backpropagation
Forward propagation is the process of pushing inputs through the net. At the end of
each epoch, the obtained outputs are compared to targets to form the errors
Backpropagation of errors is an algorithm for neural networks using gradient
descent. It consists of calculating the contribution of each parameter to the errors. We
backpropagate the errors through the net and update the parameters (weights and
biases) accordingly.
365 DATA SCENCE 12
5.1 Backpropagation Formula
where
If you want to examine the full derivation, please make use of the PDF
we made available in the section: Backpropagation. A peek into the
Mathematics of Optimization.
Copyright 2022 365 Data Science Ltd. Reproduction is forbidden unless authorized. All rights reserved.
Learn DATA SCIENCE
anytime, anywhere, at your own pace.
If you found this resource useful, check out our e-learning program. We have
everything you need to succeed in data science.
Learn the most sought-after data science skills from the best experts in the field!
Earn a verifiable certificate of achievement trusted by employers worldwide and
future proof your career.
Comprehensive training, exams, certificates.
162 hours of video Exams & Certification Portfolio advice
599+ Exercises Personalized support New content
Downloadables Resume Builder & Feedback Career tracks
Join a global community of 1.8 M successful students with an annual subscription
at 60% OFF with coupon code 365RESOURCES.
$432 $172.80/year
Start at 60% Off
Iliya Valchanov
Email: [email protected]