CS-671: Deep Learning and its Applications
Lecture: 06
Computational Graphs
Aditya Nigam, Assistant Professor
School of Computing and Electrical Engineering (SCEE)
Indian Institute of Technology, Mandi
http://faculty.iitmandi.ac.in/ãditya/
[email protected] Presentation for CS-671@IIT Mandi (6 March, 2019)
(*Slides Credit : Calculus on Computational Graphs: Backpropagation, colah’s blog)
http://colah.github.io/posts/2015-08-Backprop/
February - May, 2019
Calculus on Computational Graphs: Backpropagation
Introduction
Backpropagation is the key algorithm that makes training deep
models computationally tractable.
Beyond its use in deep learning, backpropagation is a powerful
computational tool in many other areas, ranging from weather
forecasting to analyzing numerical stability it just goes by different
names.
The general, application independent, name is reverse-mode
differentiation.
Fundamentally, its a technique for calculating derivatives quickly.
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Computational Graphs
Computational graphs are a nice way to think about mathematical
expressions.
For example, consider the expression e=(a+b)(b+1). There are three
operations: two additions and one multiplication.
To create a computational graph, we make each of these operations,
along with the input variables, into nodes.
When one nodes value is the input to another node, an arrow goes
from one to another.
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Derivatives on Computational Graphs
The key is to understand derivatives on the edges.
If a directly affects c, then we want to know how it affects c.
If a changes a little bit, how does c change? We call this the partial
derivative of c with respect to a.
To evaluate the partial derivatives in graph, we need the sum rule and
the product rule:
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Derivatives on Computational Graphs
What if we want to understand how nodes that arent directly
connected affect each other?
Lets consider how e is affected by a. If we change a at a speed of 1, c
also changes at a speed of 1.
In turn, c changing at a speed of 1 causes e to change at a speed of
2. So e changes at a rate of 1 × 2 with respect to a.
The general rule is to sum over all possible paths from one node to the
other, multiplying the derivatives on each edge of the path together.
For example, to get the derivative of e with respect to b we get:
∂e
=1×2+1×3 (1)
∂b
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Factoring Paths
The problem with just summing over the paths is that its very easy to
get a combinatorial explosion in the number of possible paths.
If we want to get the derivative of Z wrt X by summing over all
paths, we need to sum over 3 × 3=9 paths:
Instead of just naively summing over the paths, it would be much
better to factor them:
∂Z
= (α + β + γ)(δ + + ζ) (2)
∂X
This is where forward-mode differentiation and reverse-mode
differentiation come in.
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Forward-mode differentiation
Instead of summing over all of the paths explicitly, they compute the
same sum more efficiently by merging paths back together at every
node.
In fact, both algorithms touch each edge exactly once!
It starts at an input to the graph and moves towards the end. At
every node, it sums all the paths feeding in.
Each of those paths represents one way in which the input affects
that node.
By adding them up, we get the total way in which the node is
affected by the input, its derivative.
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
At each node, it merges all paths which originated at that node.
Forward-mode differentiation tracks how one input affects every node.
Reverse-mode differentiation tracks how every node affects one
output.
Aditya Nigam (SCEE, IIT-Mandi)
[email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Forward-mode differentiation gives us the derivative of every node
with respect to b. The derivative of our output with respect to one of
our inputs.
Reverse-mode differentiation gives us the derivative of e with respect
to every node:
Forward-mode differentiation gave us the derivative of our output
with respect to a single input, but reverse-mode differentiation gives
us all of them.
Aditya Nigam (SCEE, IIT-Mandi)
[email protected] Lecture, February - May, 2019
Calculus on Computational Graphs: Backpropagation
Computational Victories
Reverse-mode differentiation looks like a strange way of doing the
same thing as the forward-mode.
Is there some advantage?
Forward-mode differentiation tracks how one input affects every node.
Reverse-mode differentiation tracks how every node affects one
output.
Aditya Nigam (SCEE, IIT-Mandi) [email protected] Lecture, February - May, 2019