0% found this document useful (0 votes)

6 views62 pages

Lecture1 About Deep Learning

The document provides an introduction to deep learning, highlighting key figures like Geoffrey Hinton and foundational concepts such as neural networks and backpropagation. It discusses the architecture of neural networks, including perceptrons, activation functions, and the role of weights and biases in learning. Additionally, it covers practical aspects of implementing deep learning techniques, including gradient computation and weight initialization strategies.

Uploaded by

Mayank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views62 pages

Lecture1 About Deep Learning

Uploaded by

Mayank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Introduction to Deeplearning

Introduction to Deeplearning 1 / 46
State-of-the-art of AI with Deeplearning

Introduction to Deeplearning 2 / 46
Geoffrey Hinton is known by
many to be the godfather of
deep learning. Aside from his
seminal 1986 paper on
backpropagation, Hinton has
invented several
foundational deep learning
techniques throughout his
decades-long career. Hinton
currently splits his time
between the University of
Toronto and Google Brain.

3 / 46
AI: Emulates Human Intelligence
ML: Emulates Human Learning
DL: Emulates Network of Neuron in Human Brain
4 / 46
Popular deep learning language and framework

5 / 46
1943, Warren McCulloch and Walter Pitts: Neuron

(a) Biological neuron (b) Artificial neuron

Neural nets/perceptrons are loosely inspired by biology. But they certainly

are not a model of how the brain works, or even how neurons work1.

1Diagram credit: Karpathy and Fei-Fei

Introduction to Deeplearning 6 / 46
1957, Frank Rosenblatt: Perceptron

Bias
b
x1 w1
Activate
function Output
Inputs x2 w2
Σ f y

x3 w3
Weights
A perceptron (specific type of artificial neuron) is a single layer linear model + an activa tion function f . We can find weights
(w ), and bia s (b) that minimizes a loss function using gradient descent.

Introduction to Deeplearning 7 / 46
8 / 46
9 / 46
10 / 46
11 / 46
12 / 46
13 / 46
14 / 46
15 / 46
1980, David Everett Rumelhart, Backpropagation
Input
layer Hidden Output
layers layer

x1
h1(1)
x2 y1
h2(1)
..
x3
..
..
yk
h(1)
m
xn
A feedforward backpropagation uses a layered architecture where information flows in one direction, from input to output, and
then errors are backpropagated to adjust the network’s weights.

Introduction to Deeplearning 16 / 46
Single Neuron

inputs = [ 1 , 2 , 3 ]
weights = [ 0.2 , 0.8 , - 0.5 ]
bias = 2 (modelling a single neuron, have one bias (one bias value per neuron).
The bias is an additional tuneable value but is not associated with any input in contrast to the
weights.

output = (inputs[ 0 ] * weights[ 0 ] +

inputs[ 1 ] * weights[ 1 ] +
inputs[ 2 ] * weights[ 2 ] + bias)
https://nnfs.io/bkr

17 / 46
A Layer of Neurons
Neural networks typically have layers that consist of more than one neuron. Layers
are nothing more than groups of neurons. Each neuron in a layer takes exactly the
same input — the input given to the layer (which can be either the training data or
the output from the previous layer), but contains its own set of weights and its own
bias, producing its own unique output. The layer’s output is a set of each of these
outputs — one per each neuron. Let’s say we have a scenario with 3 neurons in a
layer and 4 inputs:

18 / 46
inputs = [ 1 , 2 , 3 , 2.5 ]
weights1 = [ 0.2 , 0.8 , - 0.5 , 1 ]
weights2 = [ 0.5 , - 0.91 , 0.26 , - 0.5 ]
weights3 = [ - 0.26 , - 0.27 , 0.17 , 0.87 ]
bias1 = 2 bias2 = 3 bias3 = 0.5
outputs = [
# Neuron 1:
inputs[ 0 ] * weights1[ 0 ] +
inputs[ 1 ] * weights1[ 1 ] +
inputs[ 2 ] * weights1[ 2 ] +
inputs[ 3 ] * weights1[ 3 ] + bias1,
# Neuron 2:
inputs[ 0 ] * weights2[ 0 ] +
inputs[ 1 ] * weights2[ 1 ] +
inputs[ 2 ] * weights2[ 2 ] +
inputs[ 3 ] * weights2[ 3 ] + bias2,
# Neuron 3:
inputs[ 0 ] * weights3[ 0 ] +
inputs[ 1 ] * weights3[ 1 ] +
inputs[ 2 ] * weights3[ 2 ] +
inputs[ 3 ] * weights3[ 3 ] + bias3]
19 / 46
https://nnfs.io/mxo
20 / 46
NEURAL NETWORK activation
a(0)
1 = σ w 1,1 a1(0) + w 1,2 a2(0) + . . . + w 1,n an(0) + b1(0)
w1,1
!
a1(1) Σn
(0) + b(0)
w1,2 =σ w 1,i a
i 1
a2(0) i =1

w1,3
a2(1)
w1,4
a3(0) 
a1
( 1 ) 
w1,1 w1,2 ... w1,n
  (0) 
a1

b(0)

1
(1) w
w1,n (1)  a  = σ   2,1 w2,2 ... w 2,n   a(0)   
b2(0) 
a3  2      2  +
   ..    
 ..    .. .. . ..   ..   .. 
a4
(0) . (1)
am wm,1 wm,2 ... wm,n an(0) (0)
bm
.
a (1) = σ W (0)a (0) + b(0)
..
(1)
am

an(0)
# Inputs = n, # Neurons = m; # Weights = n × m.
Introduction to Deeplearning 21 / 46
Multilayer Perceptron

22 / 46
Activation Functions
Tanh
Sigmoid
ex —e−x
1 tanh(x ) =
σ(x ) = ex + e−x
1 + e−x
1 tanh(x )
1 σ(x )
x
0.5 —5 5
x —1
—5 5
Softmax (3-class example)
ReLU
exi
σ(xi ) = Σ , j = 1, 2, 3
f (x ) = max(0, x ) exj
j

6 1
f (x ) softmax (xi )
3 0.5
x
xi
—4 4
—5 5

Introduction to Deeplearning 23 / 46
Shape Recognition: Concept

Introduction to Deeplearning 24 / 46
Shape Recognition: Example

Introduction to Deeplearning 25 / 46
Digit Recognition: Concept

Introduction to Deeplearning 26 / 46
Layers break problem in pieces

Introduction to Deeplearning 27 / 46
Backpropagation: chain rule

1 The last activation a(L) = σ(w (L)a(L—1) + b(L)) is determined by a

weight, a bias, and the previous neuron’s activation (nonlinear
function like sigmoid/ ReLU).
2 Weighted sum, z(L) = w (L)a(L—1) + b(L) =⇒ a(L) = σ(z (L))

3 Conceptually,

Introduction to Deeplearning 28 / 46
Backpropagation: chain rule

1 a(L—1) is influenced by its own weight and bias, which means our tree
actually extends up higher...

Introduction to Deeplearning 29 / 46
Computing the first derivative

1 How sensitive the cost C0 is to small changes in the weight w(L), i..e.,
∂C0
∂w (L)

Introduction to Deeplearning 30 / 46
The Constituent Derivatives

1 To compute each derivative, we’ll use some relevant formula from the
way we’ve defined our neural network.

∂z (L) = a(L—1)
z(L) + w (L)a(L—1) + b(L) =⇒
∂w (L)
a(L) = σ(z (L) ) =⇒ ∂a = σ r(z (L) )
(L)

∂z (L)
∂C0
C0 = (a(L) − y )2 =⇒ (L) = 2(a(L) − y )
∂a

2
∂C0
∂w (L)
= ∂z (L) ∂a(L) ∂C0
∂w (L) ∂z (L) ∂a(L)
3 Putting this together with our constituent derivatives
∂C0 = a(L—1)σr(z (L))2(a(L) − y )
∂w (L)

Introduction to Deeplearning 31 / 46
Ex1: Chain rule of differentiation

1 The rate of change of a function of a function is the multiple of the

derivatives of those functions.
∂ f (g (x )) = ∂g ∂f
2
∂x ∂x ∂g
3
∂ f (g (h(i (j(k(x )))))) = ∂k ∂j ∂i ∂h ∂g ∂f
∂x ∂x ∂k ∂j ∂i ∂h ∂g
Compute the gradients of cost with respect to the initial weight ∂c
∂w4

Introduction to Deeplearning 32 / 46
Ex1: Chain rule of differentiation

Ex1: Compute the gradients of cost with respect to the initial weight ∂c
∂w4

Introduction to Deeplearning 33 / 46
Ex2: Weights Initialization

Experiment weights initialization of the NN for - Relu, Tanh, and Sigmoid

activation functions and comment.

1 Zero initialization
2 Constant initialization
3 Random initialization with very small values
4 Random initialization with very large values.

Introduction to Deeplearning 34 / 46
Functional Gradients

Introduction to Deeplearning 35 / 46
Taking the Gradient – Review

f (x ) = (−x + 3)2

f = q2 q = r +3 r = −x

∂f ∂q ∂r
= 2q =1 = −1
∂q ∂r ∂x
[colback=blue!5!white, colframe=blue!75!black, title=Chain rule]
∂f ∂f ∂q ∂r
= = 2q · 1 · (−1)
∂x ∂q ∂r ∂x
= −2(−x + 3) = 2x − 6

Introduction to Deeplearning 36 / 46
Let’s Do This Another Way

Suppose we have a box representing a function f .

This box does two things:

Forward: Given forward input n, compute f(n)
Backwards: Given backwards input g, return g · ∂f /∂n

n f(n)
g ∂f
∂n
f g

Introduction to Deeplearning 37 / 46
Let’s Do This Another Way: Functional Diagrams

f (x ) = (−x + 3)2

—x
(−x + 3)2
—x + 3
x −n n+3 n2
1

d
n2 = 2n = 2(−x + 3)
dn
= −2x + 6
d
· 1 = (−2x + 6) · 1
dn

Introduction to Deeplearning 38 / 46
Let’s Do This Another Way

f (x ) = (−x + 3)2

—x + 3

—x
(−x + 3)2
x −n n+3 n2

—2x + 6
1

d
=1
dn
1 ∗ (−2x + 6)

Introduction to Deeplearning 23 / 46
39
Let’s Do This Another Way

f (x ) = (−x + 3)2

—x —x + 3
(−x + 3)2
x −n n+3 n2

—2x + 6 —2x + 6
1

d
= −1
dn
−1 ∗ (−2x + 6)

Introduction to Deeplearning 40 / 46
Let’s Do This Another Way

f (x ) = (−x + 3)2

—x —x + 3
x (−x + 3)2
−n n+3 n2
2x − 6 1
—2x + 6 —2x + 6

Introduction to Deeplearning 41 / 46
Functional Gradients: Gates

Introduction to Deeplearning 42 / 46
Once more, with numbers!

Introduction to Deeplearning 43 / 46
f (x, y, z) = (x + y )z

1
n+m 5
4

10 n*m 50

Introduction to Deeplearning 44 / 46
f (x, y, z ) = (x + y )z

n+m
5

4 10

50
10 n∗m

5
1
∂
∂n
nm = m → 10 ∗ 1 ∂
∂m
nm = n → 5∗ 1

Introduction to Deeplearning 45 / 46
f (x, y, z ) = (x + y )z

10
n+m
5
4
10
10

50
10 n∗m

5
1
∂
∂n
n+m = 1 → 1 ∗ 10 ∗ 1 ∂
∂m
n+m = 1 → 1 ∗ 10 ∗ 1

Introduction to Deeplearning 46 / 46
Something More Complex

f (w, x) = 1
1+e−(w0x0+w1x1+w2)

w0
n∗m
x0
n+m n+m n∗−1 en n+1 1/n
w1
n∗m
x1
w2

Introduction to Deeplearning 47 / 46
f (w, x) = 1
1 + e—(w0x0+w1x1+w2)

2
w0
-2
*
x0 -1 4 1 -1 0.37 1.37 0.73
-3 + + * -1 en +1 1/n
w1
* 6
x1 -2
-3
w2

Introduction to Deeplearning 48 / 46
f (w, x) = 1
1+ e—(w0x0+w1x1+w2)

w02
-2
*
x0 -1 4 1 -1 0.37 1.37 0.73
-3 + + * -1 en +1 n—1
w1
* 6
x1 -2 -3
w2

Example Credit: Karpathy and Fei-Fei

Introduction to Deeplearning 49 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
∂ (n−1) = —n−2
w0 (d) ∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 0.73
4 1 -1 0.37 1.37
+ + * -1 en +1 n−1
0.73

w1-3 6
—(1.37)−2 ·1 = —0.53
*
x1 Where does 1.37 come from?
-2 -3

w2
Introduction to Deeplearning 50 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 1.37 0.73
4 1 -1 0.37
+ + * -1 en +1 n−1
-o.53 0.73

w1-3 6
1 ∗—0.53 = —0.53
*
x1
-2 -3

w2
Introduction to Deeplearning 51 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 0.37 1.37 0.73
4 1 -1
+ + * -1 en +1 n−1
-0.53 -0.53 0.73

w1-3 6
e−1 ∗—0.53 = —0.2
*
x1
-2 -3

w2
Introduction to Deeplearning 52 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 -1 0.37 1.37 0.73
4 1 * -1
+ + en +1 n−1
-0.2 -0.53 -0.53 0.73

w1-3 6
—1 ∗0.2 = —0.2
*
x1
-2 -3

w2
Neeta Nain, Dept CSE, MNIT Jaipur Introduction to Deeplearning 53 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 1 -1 0.37 1.37 0.73
4 * -1
+ + en +1 n−1
-0.2 -0.2 -0.53 -0.53 0.73

w1-3 6
1 ∗0.2 = 0.2
*
x1
-2 -3

w2
Neeta Nain, Dept CSE, MNIT Jaipur Introduction to Deeplearning 54 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e) ∂n
∂ (c + n) = 1
(f)
x0 -2 ∂n
-1 4 1 -1 0.37 1.37 0.73

+ + * -1 en +1 n−1
4 -0.2 -0.2 -0.53 -0.53 0.73

w1-3 6
1 ∗0.2 = 0.2
*
x1 -3
-2
0.2

w2
Neeta Nain, Dept CSE, MNIT Jaipur Introduction to Deeplearning 55 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e)
-2 ∂n
∂ (c + n) = 1
(f)
x0 ∂n
-1 0.2 4 1 -1 0.37 1.37 0.73

+ + * -1 en +1 n−1
4 -0.2 -0.2 -0.53 -0.53 0.73

w1-3 0.2
—1 ∗0.2 = —0.2
* 6
x1 -3 2 ∗0.2 = 0.4
-2
0.2

w2
Neeta Nain, Dept CSE, MNIT Jaipur Introduction to Deeplearning 56 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e)
-2 ∂n
∂ (c + n) = 1
(f)
x0 ∂n
-1 0.2 4 1 -1 0.37 1.37 0.73

+ + * -1 en +1 n−1
4 -0.2 -0.2 -0.53 -0.53 0.73

w1-3 0.2
—1 ∗0.2 = —0.2
* 6
x1 -3 2 ∗0.2 = 0.4
-2
0.2

w2
Introduction to Deeplearning 57 / 46
f (w, x ) = 1
1+ e−(w0 x0 +w1 x1 +w2 )
(a) ∂ (m + n) = 1
∂n
∂ (mn) = m
(b) ∂n
∂ (en ) = en
(c)
2 ∂n
w0 (d) ∂ (n−1) = —n−2
∂n
∂ (an) = a
* (e)
-2 ∂n
∂ (c + n) = 1
(f)
x0 ∂n
-1 0.2 4 1 -1 0.37 1.37 0.73

+ + * -1 en +1 n−1
4 -0.2 -0.2 -0.53 -0.53 0.73

w1-3 0.2
—1 ∗0.2 = —0.2
* 6
x1 -3 2 ∗0.2 = 0.4
-2
-0.6

w2
Neeta Nain, Dept CSE, MNIT Jaipur Introduction to Deeplearning 58 / 46
Does It Have To Be So Painful?

f (w, x) = 1
1+e−(w0x0+w1x1+w2)

w0
n∗m
x0
n+m n+m n∗ − 1 en n+1 1/x
w1
n∗m
1
x1 σ(n) = 1+e −n

Introduction to Deeplearning 59 / 46
Does It Have To Be So Painful?

σ(n) = 1
1 + e—n
e—n —n
∂ σ(n) = = 1+e − 1 1
∂n (1 + e—n)2 1 + e—n 1 + e—n
1 + e—n 1
= —n − = 1 − σ(n)
1+ e 1 + e—n
= (1 − σ(n))σ(n)

For the curious

Line 1 to 2:
∂ −1
σ(n) = ∗ 1 ∗ e−n ∗ −1
∂n (1 + e −n )2
Chain rule: d/dx (1/x) * d/dx (1+x) *
d/dx (eˆx) * d/dx (-x)

Introduction to Deeplearning 44 / 46
60
Ex3: Compute the upstream and downstream gradients of
the following functional graph
Given w0 = 2, x0 = 1, w1 = −3, x1 = 4, w2 = −5

f (w, x) = 1
1 + e—(w0x0+w1x1+w2)

w0 n*m w2

x0
n+m n+m σ(n)

w1 n*m

σ(n) = 1 ∂σ(n)
= (1 − σ(n))σ(n)
1 + e—n ∂n
Introduction to Deeplearning 61 / 46
Any Questions

Introduction to Deeplearning 62 / 46

AI-Driven Space Debris Tracking & Autonomous Avoidance
No ratings yet
AI-Driven Space Debris Tracking & Autonomous Avoidance
11 pages
Problem Statements IIC 2024
No ratings yet
Problem Statements IIC 2024
24 pages
Diffraction
No ratings yet
Diffraction
15 pages
Vote of Thanks Maharana Pratap
No ratings yet
Vote of Thanks Maharana Pratap
2 pages
Iic 2.0
No ratings yet
Iic 2.0
13 pages
Notes Ai Finals
No ratings yet
Notes Ai Finals
39 pages
Machine Learning - Lab Exercise Topics
No ratings yet
Machine Learning - Lab Exercise Topics
5 pages
An Investigation of Incorporating Mamba For Speech Enhancement
No ratings yet
An Investigation of Incorporating Mamba For Speech Enhancement
5 pages
Graham Denoising Diffusion Models For Out-Of-Distribution Detection CVPRW 2023 Paper
No ratings yet
Graham Denoising Diffusion Models For Out-Of-Distribution Detection CVPRW 2023 Paper
10 pages
Yadav 2021
No ratings yet
Yadav 2021
25 pages
RLAIF Research (Paper
No ratings yet
RLAIF Research (Paper
29 pages
Finbert Chinese
No ratings yet
Finbert Chinese
12 pages
On-Road Object Detection and Tracking Based On Radar and Vision Fusion A Review
No ratings yet
On-Road Object Detection and Tracking Based On Radar and Vision Fusion A Review
26 pages
Generative AI Landscape Self Assessment
No ratings yet
Generative AI Landscape Self Assessment
6 pages
The Super Weight in Large Language Models - Apple Machine Learning Research
No ratings yet
The Super Weight in Large Language Models - Apple Machine Learning Research
4 pages
Clustering Tutorial
No ratings yet
Clustering Tutorial
4 pages
Automatic Classification of Breeds of Dog Using
No ratings yet
Automatic Classification of Breeds of Dog Using
12 pages
Initial
No ratings yet
Initial
23 pages
Unit IV Erosion Dialation Lecture
No ratings yet
Unit IV Erosion Dialation Lecture
32 pages
Denoising of Astronomical Images
No ratings yet
Denoising of Astronomical Images
5 pages
(2025 TPAMI) Graph Foundation Models - Concepts, Opportunities and Challenges
No ratings yet
(2025 TPAMI) Graph Foundation Models - Concepts, Opportunities and Challenges
23 pages
Types of Machine Learning Tasks
No ratings yet
Types of Machine Learning Tasks
8 pages
Hybrid Machine Learning Based Multi-Stage Framewor
No ratings yet
Hybrid Machine Learning Based Multi-Stage Framewor
12 pages
LangChain RAG Cheat Sheet
No ratings yet
LangChain RAG Cheat Sheet
1 page
Building LLM Applications
No ratings yet
Building LLM Applications
14 pages
Answer Key Deep Nural Netwrok Unit 6
No ratings yet
Answer Key Deep Nural Netwrok Unit 6
8 pages
Artificial Neural Networks Lecture Notes-4
No ratings yet
Artificial Neural Networks Lecture Notes-4
55 pages
Graph Neural Networks and Its Applications in Healthcare
No ratings yet
Graph Neural Networks and Its Applications in Healthcare
31 pages
Smartbride AI ML Internship Report Golkonda Thilak
No ratings yet
Smartbride AI ML Internship Report Golkonda Thilak
2 pages
Cluster Analysis For Anomaly Detection in Accounting Data: An Audit Approach
No ratings yet
Cluster Analysis For Anomaly Detection in Accounting Data: An Audit Approach
16 pages
Chen Learning Implicit Fields For Generative Shape Modeling CVPR 2019 Paper
No ratings yet
Chen Learning Implicit Fields For Generative Shape Modeling CVPR 2019 Paper
10 pages
Nguyen Van Et Al. - 2025 - Water Meter Reading Based On Text Recognition Techniques and Deep Learning
No ratings yet
Nguyen Van Et Al. - 2025 - Water Meter Reading Based On Text Recognition Techniques and Deep Learning
13 pages
Input
No ratings yet
Input
7 pages
Lecture 19
No ratings yet
Lecture 19
18 pages
Deep Learning Specialization Coursera 92CDP89ZVKUC
No ratings yet
Deep Learning Specialization Coursera 92CDP89ZVKUC
1 page

Lecture1 About Deep Learning

Uploaded by

Lecture1 About Deep Learning

Uploaded by

Introduction to Deeplearning

(a) Biological neuron (b) Artificial neuron

Neural nets/perceptrons are loosely inspired by biology. But they certainly

1Diagram credit: Karpathy and Fei-Fei

output = (inputs[ 0 ] * weights[ 0 ] +

1 The last activation a(L) = σ(w (L)a(L—1) + b(L)) is determined by a

1 The rate of change of a function of a function is the multiple of the

Experiment weights initialization of the NN for - Relu, Tanh, and Sigmoid

Suppose we have a box representing a function f .

This box does two things:

Example Credit: Karpathy and Fei-Fei

For the curious

You might also like