0% found this document useful (0 votes)
51 views50 pages

L15 Intro-Rnn Slides

This document is a lecture overview for STAT 453: Introduction to Deep Learning and Generative Models, focusing on Recurrent Neural Networks (RNNs). It covers various topics including different ways to model text, sequence modeling tasks, backpropagation through time, Long-Short Term Memory (LSTM), and RNN classifiers in PyTorch. The lecture also discusses classic approaches for text classification and modern techniques like Transformers and self-supervised learning.

Uploaded by

sarv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views50 pages

L15 Intro-Rnn Slides

This document is a lecture overview for STAT 453: Introduction to Deep Learning and Generative Models, focusing on Recurrent Neural Networks (RNNs). It covers various topics including different ways to model text, sequence modeling tasks, backpropagation through time, Long-Short Term Memory (LSTM), and RNN classifiers in PyTorch. The lecture also discusses classic approaches for text classification and modern techniques like Transformers and self-supervised learning.

Uploaded by

sarv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 453: Introduction to Deep Learning and Generative Models

Sebastian Raschka
[Link]

Lecture 15
Introduction to
with Applications in Python

Recurrent Neural Networks


Sebastian Raschka STAT 453: Intro to Deep Learning 1
Lecture Overview

1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch


Sebastian Raschka STAT 453: Intro to Deep Learning 2
ff
ff
fi
There's more than one
way to bake a cake

1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 3


ff
ff
fi
A Classic Approach for Text Classi cation:
Bag-of-Words Model

vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>

⇥ ⇤ 'two': 7, ⇤ ⇥
y = 0, 1, 0 'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

training
class labels class labels

Raschka & Mirjalili. Python Machine Learning 3rd Ed.


Classi er
[Link]
book-3rd-edition/blob/master/ch08/[Link] (e.g., logistic regression, MLP, ...)

Sebastian Raschka STAT 453: Intro to Deep Learning 4


fi
fi
1D CNNs for text (and other sequence data)

T
h
e

s
u
n

i
s ...
s
h
i
n
i
n
g

.
.
.

Sebastian Raschka STAT 453: Intro to Deep Learning 5


Transformers & Self-Supervised Learning

Input sentence: A quick brown fox jumps over the lazy dog

15% randomly masked: A quick brown [MASK] jumps over the lazy dog

0.2% ant
BERT
... ...
Possible classes 11% fox
(all words) ...
...

0.01% zoo

Sebastian Raschka STAT 453: Intro to Deep Learning 6


How Can We Modify MLPs to
Capture Sequence Information?

1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 7


ff
ff
fi
Sequence data: order matters
The movie my friend has not seen is good
The movie my friend has seen is not good

<1> <2> <3> <4> <5> <6>


Output: y y y y y y

Input: x<1> x<2> x<3> x<4> x<5> x<6> Time


Image source: Sebastian Raschka, Vahid
Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 8


[Link]/scientificreports/

Applications:
Working with Sequential Data

• Text classi cation


• Speech recognition (acoustic modeling)
• language translation
• ...

Stock market predictions


Figure
Shen, 5. TheWenzheng
Zhen, basic architectural structure
Bao, and of our Huang.
De-Shuang model KEGRU. (1) We first
"Recurrent
consists of a number
Neural Network for of k-mer sequence
Predicting built byFactor
Transcription splitting DNA sequence.
Binding Sites." (2) B
first step, we
Scientific use the8,pre-trained
reports model
no. 1 (2018): word2vec to learn the k-mer embeddi
15270.
stacked into the embedding matrix that will be used to initialize the embeddin
GRU network to solve long-range dependencies problem and to learn feature
sequence. (4) The prediction results were generated by the dense layer and the
DNA or (amino acid/protein)
loss function to compare the prediction results with the true target labels.

Fig 8. Displays the actual data and the predicted data from the four models for each stock index in
Year 1 from 2010.10.01 to 2011.09.30.
sequence modeling
[Link]
Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using
stacked autoencoders and long-short term memory." PloS one 12, no. 7 (2017): e0180944.

one.0180944 July 14, 2017 16 / 24

Sebastian Raschka STAT 453: Intro to Deep Learning 9


fi
Overview time step t

y y<t>

Networks we used
previously: also called Recurrent Neural
h h<t>
feedforward neural Network (RNN)
networks

x x<t>

Image source: Sebastian Raschka, Vahid


Mirjalili. Python Machine Learning. 3rd
Recurrent edge
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 10


y<t> Single layer RNN y<t-1> y<t> y(t+1)

h<t> Unfold h<t-1> h<t> h(t+1)

x<t> x<t-1> x<t> x(t+1)

<t> Multilayer RNN <t-1> <t>


Image source: Sebastian Raschka, Vahid
Mirjalili. Python(t+1)
y y y y Machine Learning. 3rd
Edition. Packt, 2019

h<t>
2
h<t-1> h<t> h(t+1)
2 2 2

Unfold
<t>
h<t> Sebastian Raschka h <t-1>Intro to Deep Learning
STAT 453: h h(t+1) 11
Overview
y<t> Single layer RNN y<t-1> y<t> y(t+1)

h<t> Unfold h<t-1> h<t> h(t+1)

x<t> x<t-1> x<t> x(t+1)

Multilayer RNN
y<t> y<t-1> y<t> y(t+1)

h<t>
2
h<t-1> h<t> h(t+1)
2 2 2

Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1

x<t> Image source: Sebastian Raschka, Vahid


Mirjalili. Python Machine Learning. 3rd
x<t-1> x<t> x(t+1)
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 12


Each hidden unit receives
y<t> Single layer RNN y<t-1> y<t> y (t+1)
2 inputs

h<t> Unfold h<t-1> h<t> h(t+1)

x<t> x<t-1> x<t> x(t+1)

<t> Multilayer RNN


y y<t-1> y<t> y(t+1)

h<t>
2
h<t-1> h<t> h(t+1)
2 2 2

Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1

x<t> Image source: Sebastian Raschka, Vahid


Mirjalili. Python Machine Learning. 3rd
x<t-1> x<t> x(t+1)
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 13


The simple things are never simple
are they?
1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 14


ff
ff
fi
Different Types of Sequence Modeling Tasks

many-to-one one-to-many

Image source: Sebastian Raschka, Vahid


Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019

many-to-many many-to-many
Figure based on:
The Unreasonable E ectiveness of Recurrent Neural Networks by Andrej Karpathy ([Link] ectiveness/)

Sebastian Raschka STAT 453: Intro to Deep Learning 15


ff
ff
Different Types of Sequence Modeling Tasks

many-to-one one-to-many

Many-to-one: The input data is a sequence, but the output is a xed-


size vector, not a sequence.

Ex.: sentiment analysis, the input is some text,


many-to-many and the output is a
many-to-many
class label.

Sebastian Raschka STAT 453: Intro to Deep Learning 16

fi
Different Types of Sequence Modeling Tasks

many-to-one one-to-many

One-to-many: Input data is in a standard format (not a sequence), the


output is a sequence.

Ex.: Image captioning, where the input is an


many-to-many image, the output is a text
many-to-many
description of that image

Sebastian Raschka STAT 453: Intro to Deep Learning 17


Different Types of Sequence Modeling Tasks

Many-to-many: Both inputs and outputs are sequences. Can be


direct or delayed.

Ex.: Video-captioning, i.e., describing a sequence of images via text


(direct).

Translating one language into another (delayed)


many-to-one one-to-many

many-to-many many-to-many
Sebastian Raschka STAT 453: Intro to Deep Learning 18
How RNNs Look Like Under the Hood
1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 19


ff
ff
fi
Weight matrices in a single-hidden layer RNN

y<t> y<t-1> y<t> y<t+1>


[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
h<t> Unfold h<t-1> h<t> h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>

Image source: Sebastian Raschka, Vahid


Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 20


Weight matrices in a single-hidden layer RNN

y<t> y<t-1> y<t> y<t+1>


[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
h<t> Unfold h<t-1> h<t> h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>


Image source: Sebastian Raschka, Vahid
Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019
Net input: hti hti ht 1i
zh
<latexit sha1_base64="3XV0mGhvc3x8Roet0VNL2652jNw=">AAACfXicbVHJTsMwEHXCVspW4MjFoqrEWiUFCS5IFVw4gkQXqSmR4zqNVceJbAdRovwFX8aNX+ECTmkRXUay9PzmvfF4xosZlcqyPg1zaXllda2wXtzY3NreKe3uNWWUCEwaOGKRaHtIEkY5aSiqGGnHgqDQY6TlDe7yfOuFCEkj/qSGMemGqM+pTzFSmnJL706IVOD56VvmpkH2nDoM8T4jUEFHjFAGb+BE1MpFr9nk+rpYfzqtD/700/XP7UUWL3MDt1S2qtYo4Dywx6AMxvHglj6cXoSTkHCFGZKyY1ux6qZIKIp1+aKTSBIjPEB90tGQo5DIbjqaXgYrmulBPxL6cAVH7H9HikIph6GnlXmPcjaXk4tynUT5192U8jhRhOPfh/yEQRXBfBWwRwXBig01QFhQ3SvEARIIK72woh6CPfvledCsVe2Lau3xsly/HY+jAA7AITgCNrgCdXAPHkADYPBlQOPYODG+zYp5ZlZ/paYx9uyDqTCvfgAIfcRL</latexit>
= Whx x + Whh h + bh
Activation: hti
hti
h
<latexit sha1_base64="c0za36vP2jjpNw1BhpBy6yDCoco=">AAACP3icdVDLSsNAFJ3UV62vqks3g0Wom5JUQTdC0Y3LCvYBTQ2T6SQZOpmEmYlQQ//Mjb/gzq0bF4q4deckraCtHhg4nHvunXuPGzMqlWk+GYWFxaXlleJqaW19Y3OrvL3TllEiMGnhiEWi6yJJGOWkpahipBsLgkKXkY47vMjqnVsiJI34tRrFpB8in1OPYqS05JTbdohU4HppML5JbYa4zwhU0BY5G8MzaEvqh8gJbJf61W/33dj5pyOzHUKnXDFrZg44T6wpqYApmk750R5EOAkJV5ghKXuWGat+ioSiWE8t2YkkMcJD5JOephyFRPbT/P4xPNDKAHqR0I8rmKs/O1IUSjkKXe3M9peztUz8q9ZLlHfaTymPE0U4nnzkJQyqCGZhwgEVBCs20gRhQfWuEAdIIKx05CUdgjV78jxp12vWUa1+dVxpnE/jKII9sA+qwAInoAEuQRO0AAb34Bm8gjfjwXgx3o2PibVgTHt2wS8Yn19BALDS</latexit>
= h zh

Sebastian Raschka STAT 453: Intro to Deep Learning 21


Weight matrices in a single-hidden layer RNN

y<t> y<t-1> y<t> y<t+1>


[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
h<t> Unfold h<t-1> h<t> h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>


Image source: Sebastian Raschka, Vahid
Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019
Net input: Net input:
zh = Whx xhti + Whh hht
<latexit sha1_base64="3XV0mGhvc3x8Roet0VNL2652jNw=">AAACfXicbVHJTsMwEHXCVspW4MjFoqrEWiUFCS5IFVw4gkQXqSmR4zqNVceJbAdRovwFX8aNX+ECTmkRXUay9PzmvfF4xosZlcqyPg1zaXllda2wXtzY3NreKe3uNWWUCEwaOGKRaHtIEkY5aSiqGGnHgqDQY6TlDe7yfOuFCEkj/qSGMemGqM+pTzFSmnJL706IVOD56VvmpkH2nDoM8T4jUEFHjFAGb+BE1MpFr9nk+rpYfzqtD/700/XP7UUWL3MDt1S2qtYo4Dywx6AMxvHglj6cXoSTkHCFGZKyY1ux6qZIKIp1+aKTSBIjPEB90tGQo5DIbjqaXgYrmulBPxL6cAVH7H9HikIph6GnlXmPcjaXk4tynUT5192U8jhRhOPfh/yEQRXBfBWwRwXBig01QFhQ3SvEARIIK72woh6CPfvledCsVe2Lau3xsly/HY+jAA7AITgCNrgCdXAPHkADYPBlQOPYODG+zYp5ZlZ/paYx9uyDqTCvfgAIfcRL</latexit>
hti 1i
+ bh zhti
y = W yh h hti
+ by
<latexit sha1_base64="NMu0OseOc376w9HTtRiwx8IbBBM=">AAACSnicbVBNS8MwGE7n1Dm/qh69BIcgCKNVQS/C0IvHCe4D1lnSLN3C0rQkqVBLf58XT978EV48KOLFdB+i214IPHme533z5vEiRqWyrFejsFRcXlktrZXXNza3ts2d3aYMY4FJA4csFG0PScIoJw1FFSPtSBAUeIy0vOF1rrceiJA05HcqiUg3QH1OfYqR0pRrIidAauD56WPmpkl2nzoM8T4jUEFHjFAGL+HU1MpNg2x6HSz2H//6vcxNXLNiVa1RwXlgT0AFTKrumi9OL8RxQLjCDEnZsa1IdVMkFMV6fNmJJYkQHqI+6WjIUUBkNx1FkcFDzfSgHwp9uIIj9m9HigIpk8DTznxHOavl5CKtEyv/optSHsWKcDx+yI8ZVCHMc4U9KghWLNEAYUH1rhAPkEBY6fTLOgR79svzoHlStU+rJ7dnldrVJI4S2AcH4AjY4BzUwA2ogwbA4Am8gQ/waTwb78aX8T22FoxJzx74V4XiDzjytVE=</latexit>

Activation: Output:
hti hti hti
hhti =
<latexit sha1_base64="c0za36vP2jjpNw1BhpBy6yDCoco=">AAACP3icdVDLSsNAFJ3UV62vqks3g0Wom5JUQTdC0Y3LCvYBTQ2T6SQZOpmEmYlQQ//Mjb/gzq0bF4q4deckraCtHhg4nHvunXuPGzMqlWk+GYWFxaXlleJqaW19Y3OrvL3TllEiMGnhiEWi6yJJGOWkpahipBsLgkKXkY47vMjqnVsiJI34tRrFpB8in1OPYqS05JTbdohU4HppML5JbYa4zwhU0BY5G8MzaEvqh8gJbJf61W/33dj5pyOzHUKnXDFrZg44T6wpqYApmk750R5EOAkJV5ghKXuWGat+ioSiWE8t2YkkMcJD5JOephyFRPbT/P4xPNDKAHqR0I8rmKs/O1IUSjkKXe3M9peztUz8q9ZLlHfaTymPE0U4nnzkJQyqCGZhwgEVBCs20gRhQfWuEAdIIKx05CUdgjV78jxp12vWUa1+dVxpnE/jKII9sA+qwAInoAEuQRO0AAb34Bm8gjfjwXgx3o2PibVgTHt2wS8Yn19BALDS</latexit>
h zh
<latexit sha1_base64="VZLBo3Vq0UK/d3CPFwKrpcX15ss=">AAACP3icdVDLSsNAFJ3UV62vqEs3g0Wom5JUQTdC0Y3LCvYBTQ2T6SQdOpmEmYkQQ//Mjb/gzq0bF4q4deekraCtHhg4nHvunXuPFzMqlWU9GYWFxaXlleJqaW19Y3PL3N5pySgRmDRxxCLR8ZAkjHLSVFQx0okFQaHHSNsbXuT19i0Rkkb8WqUx6YUo4NSnGCktuWbLCZEaeH6Wjm4yhyEeMAIVdMSYjeAZdCQNQuSmjkeDyrf7buT+05HbDqFrlq2qNQacJ/aUlMEUDdd8dPoRTkLCFWZIyq5txaqXIaEo1lNLTiJJjPAQBaSrKUchkb1sfP8IHmilD/1I6McVHKs/OzIUSpmGnnbm+8vZWi7+Vesmyj/tZZTHiSIcTz7yEwZVBPMwYZ8KghVLNUFYUL0rxAMkEFY68pIOwZ49eZ60alX7qFq7Oi7Xz6dxFMEe2AcVYIMTUAeXoAGaAIN78AxewZvxYLwY78bHxFowpj274BeMzy+YMbEF</latexit>
y = y zy

Sebastian Raschka STAT 453: Intro to Deep Learning 22


Backpropagation through time T
X
L= Lhti
The overall loss can be computed
t=1
as the sum over all time steps <latexit sha1_base64="8wM3Qf5u+l3n4eMPNZD3NbvUcvM=">AAACEXicbZC7TsMwFIYdrqXcAowsFhVSpyopSLBUqmBh6FCk3qSmrRzXaa06TmSfIFVRX4GFV2FhACFWNjbehiTtAC2/ZOnTf86xfX43FFyDZX0ba+sbm1vbuZ387t7+waF5dNzSQaQoa9JABKrjEs0El6wJHATrhIoR3xWs7U5u03r7gSnNA9mAach6PhlJ7nFKILEGZjFfqzg68gcxVOxZP27McK0fO4LIkWAYsKMymuUHZsEqWZnwKtgLKKCF6gPzyxkGNPKZBCqI1l3bCqEXEwWcphc6kWYhoRMyYt0EJfGZ7sXZRjN8njhD7AUqORJw5v6eiImv9dR3k06fwFgv11Lzv1o3Au+6F3MZRsAknT/kRQJDgNN48JArRkFMEyBU8eSvmI6JIhSSENMQ7OWVV6FVLtkXpfL9ZaF6s4gjh07RGSoiG12hKrpDddREFD2iZ/SK3own48V4Nz7mrWvGYuYE/ZHx+QMPsJyK</latexit>

Werbos, Paul J. "Backpropagation through time: what


it does and how to do it." Proceedings of the IEEE 78,
ht 1i hti ht+1i
no. 10 (1990): 1550-1560. L
<latexit sha1_base64="OB2QLD8pEc1qhWA+7IjeMpEwfDA=">AAACAXicbZDLSsNAFIZPvNZ6i7oR3AwWwY0lqYIui25cuKhgL9DEMplO2qGTSZiZCCXUja/ixoUibn0Ld76NSZqFtv4w8PGfc2bm/F7EmdKW9W0sLC4tr6yW1srrG5tb2+bObkuFsSS0SUIeyo6HFeVM0KZmmtNOJCkOPE7b3ugqq7cfqFQsFHd6HFE3wAPBfEawTq2euX9znzgciwGnSJ/YyJE5T8o9s2JVrVxoHuwCKlCo0TO/nH5I4oAKTThWqmtbkXYTLDUj2YVOrGiEyQgPaDdFgQOq3CTfYIKOUqeP/FCmR2iUu78nEhwoNQ68tDPAeqhma5n5X60ba//CTZiIYk0FmT7kxxzpEGVxoD6TlGg+TgETydK/IjLEEhOdhpaFYM+uPA+tWtU+rdZuzyr1yyKOEhzAIRyDDedQh2toQBMIPMIzvMKb8WS8GO/Gx7R1wShm9uCPjM8fFaeWAQ==</latexit>
L
<latexit sha1_base64="59Mty+xww2zmBUetZB+ROhyozfU=">AAAB/3icbZC7TsMwFIZPyq2UWwCJhcWiQmKqkoIEYwULA0OR6EVqQuW4TmvVcSLbQapCB16FhQGEWHkNNt6GJM0ALUey9On/z7GPfy/iTGnL+jZKS8srq2vl9crG5tb2jrm711ZhLAltkZCHsuthRTkTtKWZ5rQbSYoDj9OON77K/M4DlYqF4k5PIuoGeCiYzwjWqdQ3D27uE4djMeQUaeTInKaVvlm1alZeaBHsAqpQVLNvfjmDkMQBFZpwrFTPtiLtJlhqRrILnVjRCJMxHtJeigIHVLlJvv8UHafKAPmhTI/QKFd/TyQ4UGoSeGlngPVIzXuZ+J/Xi7V/4SZMRLGmgswe8mOOdIiyMNCASUo0n6SAiWTproiMsMREp5FlIdjzX16Edr1mn9bqt2fVxmURRxkO4QhOwIZzaMA1NKEFBB7hGV7hzXgyXox342PWWjKKmX34U8bnDy/blY8=</latexit>
L
<latexit sha1_base64="EMp6/HTOx9fJUNOxmVHersAjsAw=">AAACAXicbZDLSsNAFIZPvNZ6i7oR3AwWQRBKUgVdFt24cFHBXqCJZTKdtEMnkzAzEUqoG1/FjQtF3PoW7nwbkzQLbf1h4OM/58zM+b2IM6Ut69tYWFxaXlktrZXXNza3ts2d3ZYKY0lok4Q8lB0PK8qZoE3NNKedSFIceJy2vdFVVm8/UKlYKO70OKJugAeC+YxgnVo9c//mPnE4FgNOkT6xkSNznpR7ZsWqWrnQPNgFVKBQo2d+Of2QxAEVmnCsVNe2Iu0mWGpGsgudWNEIkxEe0G6KAgdUuUm+wQQdpU4f+aFMj9Aod39PJDhQahx4aWeA9VDN1jLzv1o31v6FmzARxZoKMn3IjznSIcriQH0mKdF8nAImkqV/RWSIJSY6DS0LwZ5deR5atap9Wq3dnlXql0UcJTiAQzgGG86hDtfQgCYQeIRneIU348l4Md6Nj2nrglHM7MEfGZ8/EomV/w==</latexit>

y<t> y<t-1> y<t> y<t+1>


[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
h<t> Unfold h<t-1> h<t> h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>


Image source: Sebastian Raschka, Vahid
Mirjalili. Python Machine Learning. 3rd
Edition. Packt, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 23


Backpropagation through time
y<t> y<t-1> y<t> y<t+1>
[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
h<t> Unfold h<t-1> h<t> h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>

Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.

T
X
(t)
L= L
<latexit sha1_base64="g+kKpMNE6Ubf+cazn3//5JxxMks=">AAACA3icbVDLSsNAFJ34rPVVdaebwSLUTUmqoJtC0Y2LLir0BX0xmU7aoZNJmLkRSgi48VfcuFDErT/hzr8xabPQ1gMXDufcy7332L7gGkzz21hZXVvf2MxsZbd3dvf2cweHTe0FirIG9YSn2jbRTHDJGsBBsLavGHFtwVr25DbxWw9Mae7JOkx91nPJSHKHUwKxNMgdZ6vlrg7cQQhlK+qH9QhX+2EBzqPsIJc3i+YMeJlYKcmjFLVB7qs79GjgMglUEK07lulDLyQKOBUsynYDzXxCJ2TEOjGVxGW6F85+iPBZrAyx46m4JOCZ+nsiJK7WU9eOO10CY73oJeJ/XicA57oXcukHwCSdL3ICgcHDSSB4yBWjIKYxIVTx+FZMx0QRCnFsSQjW4svLpFkqWhfF0v1lvnKTxpFBJ+gUFZCFrlAF3aEaaiCKHtEzekVvxpPxYrwbH/PWFSOdOUJ/YHz+AO3elms=</latexit>
t=1
t
!
@L @L @y (t) (t) (t) X @h(t) @h(k)
= · · ·
@Whh @y (t) @h(t) k=1
@h (k) @W
hh
<latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 24


Backpropagation through time
y<t> y<t-1> y<t> y<t+1>
[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
<t>
h <t> Unfold h<t-1>
h h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>

Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.

t
!
@L (t)
@L @y (t) (t) X @h(t) @h(k)
= · · ·
@Whh @y (t) @h(t) k=1
@h (k) @W
hh
<latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>

computed as a multiplication of adjacent time steps:


(t) t
Y
@h @h(i)
=
@h(k) i=k+1
@h (i 1)
<latexit sha1_base64="F3uFmixT/xq2W9PWESuOPYxapGM=">AAACdHicfVHLSgMxFM2Mr1pfVRcudBEthYpYZqqgm0LRjcsK9gGdWjJppg3NPEjuCGWYL/Dv3PkZblybabvQVr0QOJxzXznXjQRXYFnvhrmyura+kdvMb23v7O4V9g9aKowlZU0ailB2XKKY4AFrAgfBOpFkxHcFa7vj+0xvvzCpeBg8wSRiPZ8MA+5xSkBT/cKr40lCEyciEjgR2PEJjFwvGaXPSRnO0/QPaaylmhPJcNBPeG18YWsSUvxfN/53N35pa7FfKFoVaxp4GdhzUETzaPQLb84gpLHPAqCCKNW1rQh6STaBCpbmnVixiNAxGbKuhgHxmeolU9NSXNLMAHuh1C8APGW/VyTEV2riuzozW1Utahn5m9aNwbvtJTyIYmABnQ3yYoEhxNkF8IBLRkFMNCBUcr0rpiOifQN9p7w2wV788jJoVSv2VaX6eF2s383tyKFjdIbKyEY3qI4eUAM1EUUfxpGBjVPj0zwxi2Zplmoa85pD9CPMyhc8JcD3</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 25


Backpropagation through time
y<t> y<t-1> y<t> y<t+1>
[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
Whh Whh
<t>
h<t> Unfold h<t-1>
h h<t+1>

Whx Whx Whx Whx

x<t> x<t-1> x<t> x<t+1>

Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.

T t
!
X @L (t)
@L @y (t) (t) X @h (t)
@h (k)
(t)
L= L = (t)
· (t)
· (k)
·
@Whh @y @h @h @Whh
<latexit sha1_base64="g+kKpMNE6Ubf+cazn3//5JxxMks=">AAACA3icbVDLSsNAFJ34rPVVdaebwSLUTUmqoJtC0Y2LLir0BX0xmU7aoZNJmLkRSgi48VfcuFDErT/hzr8xabPQ1gMXDufcy7332L7gGkzz21hZXVvf2MxsZbd3dvf2cweHTe0FirIG9YSn2jbRTHDJGsBBsLavGHFtwVr25DbxWw9Mae7JOkx91nPJSHKHUwKxNMgdZ6vlrg7cQQhlK+qH9QhX+2EBzqPsIJc3i+YMeJlYKcmjFLVB7qs79GjgMglUEK07lulDLyQKOBUsynYDzXxCJ2TEOjGVxGW6F85+iPBZrAyx46m4JOCZ+nsiJK7WU9eOO10CY73oJeJ/XicA57oXcukHwCSdL3ICgcHDSSB4yBWjIKYxIVTx+FZMx0QRCnFsSQjW4svLpFkqWhfF0v1lvnKTxpFBJ+gUFZCFrlAF3aEaaiCKHtEzekVvxpPxYrwbH/PWFSOdOUJ/YHz+AO3elms=</latexit>
t=1 <latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>
k=1

computed as a multiplication of adjacent time steps:


(t) t
Y
This is very problematic: @h @h(i)
=
Vanishing/Exploding gradient problem! @h(k) i=k+1
@h (i 1)
<latexit sha1_base64="F3uFmixT/xq2W9PWESuOPYxapGM=">AAACdHicfVHLSgMxFM2Mr1pfVRcudBEthYpYZqqgm0LRjcsK9gGdWjJppg3NPEjuCGWYL/Dv3PkZblybabvQVr0QOJxzXznXjQRXYFnvhrmyura+kdvMb23v7O4V9g9aKowlZU0ailB2XKKY4AFrAgfBOpFkxHcFa7vj+0xvvzCpeBg8wSRiPZ8MA+5xSkBT/cKr40lCEyciEjgR2PEJjFwvGaXPSRnO0/QPaaylmhPJcNBPeG18YWsSUvxfN/53N35pa7FfKFoVaxp4GdhzUETzaPQLb84gpLHPAqCCKNW1rQh6STaBCpbmnVixiNAxGbKuhgHxmeolU9NSXNLMAHuh1C8APGW/VyTEV2riuzozW1Utahn5m9aNwbvtJTyIYmABnQ3yYoEhxNkF8IBLRkFMNCBUcr0rpiOifQN9p7w2wV788jJoVSv2VaX6eF2s383tyKFjdIbKyEY3qI4eUAM1EUUfxpGBjVPj0zwxi2Zplmoa85pD9CPMyhc8JcD3</latexit>

Sebastian Raschka STAT 453: Intro to Deep Learning 26


Modeling Long Range Dependencies
1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 27


ff
ff
fi
Solutions to the vanishing/exploding
gradient problems

1) Gradient Clipping: set a max value for gradients if they grow


to large (solves only exploding gradient problem)

2) Truncated backpropagation through time (TBPTT)


simply limits the number of time steps the signal can backpropagate after each
forward pass. E.g., even if the
sequence has 100 elements/steps, we may only backpropagate through 20 or so

3) Long short-term memory (LSTM) -- uses a memory cell for


modeling long-range dependencies and avoid vanishing gradient
problems

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory."


Neural computation 9, no. 8 (1997): 1735-1780.
Sebastian Raschka STAT 453: Intro to Deep Learning 28
Long-short term memory (LSTM)

LSTM cell:

C <t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
h<t> time step
Figure: Sebastian Raschka, Vahid Mirjalili. Python
Machine Learning. 3rd Edition. Birmingham, UK: Packt

x<t>
Publishing, 2019

Sebastian Raschka STAT 453: Intro to Deep Learning 29


C<t-1> C<t>

f i g Tanh

σ σ Tanh y<t> σ Single


o layer RNN y<t-1> y<t> y(t+1)
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h <t-1> h<t> Unfold h<t-1> h<t> h(t+1)


To next
<t>
h time step

x<t> x<t-1> x<t> x(t+1)


x<t>
Multilayer RNN
y<t> y<t-1> y<t> y(t+1)

h<t>
2
h<t-1> h<t> h(t+1)
2 2 2

Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1

x<t> x<t-1> x<t> x(t+1)

Sebastian Raschka STAT 453: Intro to Deep Learning 30


Long-short term memory (LSTM)

Cell state at previous time step Cell state at current time step

C <t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
h<t> time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 31
Long-short term memory (LSTM)
activation from activation for next time step
previous time step

C <t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
h<t> time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 32
Long-short term memory (LSTM)
element-wise
multiplication operator element-wise addition operator

C <t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
h<t> time step

x<t> logistic sigmoid activation functions


Sebastian Raschka STAT 453: Intro to Deep Learning 33
Long-short term memory (LSTM)
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. "Learning to forget: Continual prediction with LSTM." (1999): 850-855.

"Forget Gate": controls which information is remembered, and which is forgotten;


can reset the cell state ⇣ ⌘
hti ht 1i
ft = Wf x x + Wf h h + bf
<latexit sha1_base64="sJ3l5er3vcIJBTSOejEbTXz8w+8=">AAACcnicbVFda9swFJXdrkvTfWQbfWmh0xYGHduCnRa2l0JoX/qYwpIU4szIimyLyLKRrkuC8Q/Y39vbfkVf+gMqpy5Nm14QHM49h3vvUZAJrsFx/lv2xuaLrZeN7ebOq9dv3rbevR/qNFeUDWgqUnUZEM0El2wAHAS7zBQjSSDYKJidVf3RFVOap/I3LDI2SUgkecgpAUP5rb+hX0B54mkeJcQTLIRDLyEQB2ExKv0ixPMS3xPz8k/hCSIjwTBgTy1R+e2xPn7Qx6v6H+66I6gcpad4FMNXv9V2Os6y8Dpwa9BGdfX91j9vmtI8YRKoIFqPXSeDSUEUcGqmNL1cs4zQGYnY2EBJEqYnxTKyEn8xzBSHqTJPAl6yq46CJFovksAoq2X1015FPtcb5xD+mhRcZjkwSe8GhbnAkOIqfzzlilEQCwMIVdzsimlMFKFgfqlpQnCfnrwOht2Oe9TpXhy3e6d1HA20jz6jQ+Sin6iHzlEfDRBF19audWB9tG7sPfuTXWdnW7XnA3pU9vdba6rAMg==</latexit>

C<t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
<t>
h time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 34
Long-short term memory (LSTM)
⇣ ⌘
hti ht 1i
"Input Gate": it = Wix x + Wih h + bi
<latexit sha1_base64="YIBvtcQUJ0Wokw4T7p1nOPRhstc=">AAACe3icbVFbS8MwFE7rfd6mPgoSHOK8jdYL+iKIvvio4JywzpFmaRtM05KciqP0T/jTfPOf+CKYzomXeSDw8V04J+f4qeAaHOfVssfGJyanpmcqs3PzC4vVpeVbnWSKsiZNRKLufKKZ4JI1gYNgd6liJPYFa/kPF6XeemRK80TeQD9lnZiEkgecEjBUt/rsxQQiP8h50c2hOPU0D2PiCRZA/UtqGYnjpwJ/EU/Ffe4JIkPBMGBPDVCx89sfffujn/49dzThl4nCUzyMYKtbrTkNZ1B4FLhDUEPDuupWX7xeQrOYSaCCaN12nRQ6OVHAqelS8TLNUkIfSMjaBkoSM93JB7sr8IZhejhIlHkS8ID9mchJrHU/9o2zHFb/1UryP62dQXDSyblMM2CSfjYKMoEhweUhcI8rRkH0DSBUcTMrphFRhII5V8Uswf375VFwu99wDxr714e1s/PhOqbRKlpHdeSiY3SGLtEVaiKK3qw1a9OqW+92zd62dz+ttjXMrKBfZR99AKRcxF4=</latexit>

"Input Node"
⇣: ⌘
gt = tanh Wgx xhti + Wgh hht 1i
+ bg
<latexit sha1_base64="fvcp5gZZlUn1zgPb29eUeqrBUeg=">AAACenicbVFLS8NAEN7EV62vqkcRFosvxJKooBdB9OJRwVqhqWWz3SRLN5uwO5GWkB/hX/PmL/HiwU2t+KgDCx/fg5md8VPBNTjOq2VPTc/MzlXmqwuLS8srtdW1e51kirImTUSiHnyimeCSNYGDYA+pYiT2BWv5/atSbz0xpXki72CYsk5MQskDTgkYqlt79mICkR/kYdHNoTj3gMgIe4IFsPcltYwU4kGBv4hB8Zh7gshQMAzYUyNUHPzyR9/26Kf90J0M+GWg8BQPI9jv1upOwxkVngTuGNTRuG66tRevl9AsZhKoIFq3XSeFTk4UcGq6VL1Ms5TQPglZ20BJYqY7+Wh1Bd42TA8HiTJPAh6xPxM5ibUexr5xlsPqv1pJ/qe1MwjOOjmXaQZM0s9GQSYwJLi8A+5xxSiIoQGEKm5mxTQiilAw16qaJbh/vzwJ7o8a7nHj6PakfnE5XkcFbaAttIdcdIou0DW6QU1E0Zu1ae1Yu9a7vWXv2wefVtsaZ9bRr7JPPgCJHMPm</latexit>

C<t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
<t>
h time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 35
Long-short term memory (LSTM)
Brief summary of the gates so far ...
Forget Gate Input Node Input Gate

⇣ ⌘
hti ht 1i
C = C ft (it gt )
<latexit sha1_base64="A97MWklIk/5C/CfIWjxPMXDnU9U=">AAACVHicbVFNS8MwGE47p3N+TT16CQ5BD452CnoRhrt4VHAqrHOkWdqFpUlJ3gqj7EfqQfCXePFgthWZzhcCD88HefMkTAU34HkfjltaKa+uVdarG5tb2zu13b0HozJNWYcqofRTSAwTXLIOcBDsKdWMJKFgj+GoPdUfX5g2XMl7GKesl5BY8ohTApbq10bt5zwQRMaCYcCBnqHJVSBYBMeL2qn/o+JADRTgqJ/DJNA8HsKJpVKRmXmMT4XCFC+Y+rW61/Bmg5eBX4A6Kua2X3sLBopmCZNABTGm63sp9HKigVO7RzXIDEsJHZGYdS2UJGGml89KmeAjywxwpLQ9EvCMXUzkJDFmnITWmRAYmr/alPxP62YQXfZyLtMMmKTzi6JMYFB42jAecM0oiLEFhGpud8V0SDShYP+hakvw/z55GTw0G/5Zo3l3Xm9dF3VU0AE6RMfIRxeohW7QLeogil7Rp4Mcx3l3vtySW55bXafI7KNf425/A3Oss+0=</latexit>

For updating the cell state


C<t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
<t>
h time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 36
Long-short term memory (LSTM)
Output gate for updating the values of hidden units:

⇣ ⌘
hti ht 1i
ot = Wox x + Woh h + bo
<latexit sha1_base64="8C/V3zmHVNyAb78wB9GVKa2R8v8=">AAACfHicbVHdatswFJbddUuzbvPay15ULC2klAY7LW1vBmW72WUGS1OIsyArsi0iS0Y6HgnGT9E3210fpTdlcpLSn+yA4OP74RydE+WCG/D9O8fdeLP59l1jq/l++8PHT97nnWujCk1Znyqh9E1EDBNcsj5wEOwm14xkkWCDaPq91gd/mDZcyV8wz9koI4nkMacELDX2bsOMQBrFparGJVRfQ8OTjISCxdB+lAZWUniGK/zIzKrfZSiITATDgEO9QNXxy0D65E+f+0+C9URUJ6pQ8ySFo7HX8jv+ovA6CFaghVbVG3t/w4miRcYkUEGMGQZ+DqOSaODUdmmGhWE5oVOSsKGFkmTMjMrF8ip8aJkJjpW2TwJesM8TJcmMmWeRddbDmtdaTf5PGxYQX45KLvMCmKTLRnEhMChcXwJPuGYUxNwCQjW3s2KaEk0o2Hs17RKC119eB9fdTnDa6f48a119W62jgfbQF9RGAbpAV+gH6qE+ouje2XfazpHz4B64x+7J0uo6q8wuelHu+T9EhcSg</latexit>

C<t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer

h<t-1> To next
<t>
h time step

x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 37
Long-short term memory (LSTM)
C<t-1> C<t>

f i g Tanh

σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
y<t> Single layer RNN y<t-1> y<t> y(t+1)
h<t-1> To next
<t>
h time step
h<t> Unfold h<t-1> h<t> h(t+1)
⇣ ⌘
x<t> hti hti
x<t> x<t-1> x<t> x(t+1)
h = ot tanh C
<latexit sha1_base64="1dACau9S6hE1StlYzeuAhCTUMQk=">AAACTnicbVFNSwMxEM3Wr1q/qh69BIugl7Krgl4E0YtHBatCt5ZsOtsNZpMlmRXKsr/Qi3jzZ3jxoIimtYJfA4HHe28mk5cok8Ki7z96lYnJqemZ6mxtbn5hcam+vHJhdW44tLiW2lxFzIIUClooUMJVZoClkYTL6OZ4qF/egrFCq3McZNBJWV+JWHCGjurWIUwZJlFcJOV1EUqm+hIo0tCMUHnwJeuyW2BJQ93TTkWmEhpKiHHzy3D8b39oRD/BrW694Tf9UdG/IBiDBhnXabf+EPY0z1NQyCWzth34GXYKZlBwN7YW5hYyxm9YH9oOKpaC7RSjOEq64ZgejbVxRyEdsd87CpZaO0gj5xwub39rQ/I/rZ1jvN8phMpyBMU/L4pzSVHTYba0JwxwlAMHGDfC7Up5wgzj6H6g5kIIfj/5L7jYbgY7ze2z3cbh0TiOKlkj62STBGSPHJITckpahJM78kReyKt37z17b977p7XijXtWyY+qVD8AQei2vQ==</latexit>

Multilayer RNN
y<t> y<t-1> y<t> y(t+1)

h<t>
2
h<t-1> h<t> h(t+1)
2 2 2

Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1

x<t> x<t-1> x<t> x(t+1)

Sebastian Raschka STAT 453: Intro to Deep Learning 38


Long-short term memory (LSTM)
• Still popular and widely used today
• A recent, related approach is the Gated Recurrent Unit (GRU)
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi
Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations
using RNN encoder-decoder for statistical machine translation." arXiv preprint
arXiv:1406.1078 (2014).

• Nice article exploring LSTMs and comparing them to GRUs


Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever.
"An empirical exploration of recurrent network architectures." In International
Conference on Machine Learning, pp. 2342-2350. 2015.

GRU image source: [Link]


Gated_recurrent_unit#/media/
File:Gated_Recurrent_Unit,_base_type.svg

Sebastian Raschka STAT 453: Intro to Deep Learning 39


Implementing an RNN / LSTM Classi er
1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 40


ff
ff
fi
fi
Different Types of Sequence Modeling Tasks

many-to-one one-to-many

many-to-many many-to-many
Figure based on:
The Unreasonable E ectiveness of Recurrent Neural Networks by Andrej Karpathy ([Link] ectiveness/)

Sebastian Raschka STAT 453: Intro to Deep Learning 41


ff
ff
A Classic Approach for Text Classi cation:
Bag-of-Words Model

vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>

⇥ ⇤ 'two': 7, ⇤ ⇥
y = 0, 1, 0 'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

training
class labels class labels

Raschka & Mirjalili. Python Machine Learning 3rd Ed.


[Link]
Classi er
blob/master/ch08/[Link]
(e.g., logistic regression, MLP, ...)

Sebastian Raschka STAT 453: Intro to Deep Learning 42


fi
fi
RNN Step 1): Building the Vocabulary

vocabulary = {
'<unk>': 0,
'and': 1,
"Raw" training dataset
'is': 2
'one': 3,
x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

'shining': 4,
x[2] = ”The weather is sweet”
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

'sun': 5,
[3]
x[3] = ”The sun is shining, x = ”The sun is shining, 'sweet': 6,
[3]
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
the weather is sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two” <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

'the': 7,
the weather is sweet, and one and one is two”
'two': 8,
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

⇤ ⇥ 'weather': 9,
y = 0, 1, 0
'<pad>': 10
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>

class labels }

Sebastian Raschka STAT 453: Intro to Deep Learning 43


RNN Step 2): Training Example Texts to Indices

vocabulary = {
'<unk>': 0, x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

'and': 1, [7 5 2 4 ... 10 10 10]


"Raw" training dataset 'is': 2
'one': 3,
'shining': 4, x[2] = ”The weather is sweet”
x[1] = ”The sun is shining” <latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

'sun': 5,
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

x[2] = ”The weather is sweet”


'sweet': 6,
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

[7 9 2 6 ... 10 10 10]
[3]
[3]
x = ”The sun is shining, x = ”The sun is shining,
[3] 'the': 7,
x =
is sweet, and one”The sun
theand is
weather shining,
one isis two”
sweet, and one and one is two”
'two': 8,
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

is sweet, and one and one is two” [3]


'weather': 9, x[3] = ”The sun is shining, x = ”The sun
[3]
the weather is sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and
'<pad>': 10 <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

the weather is sweet, and one and one is two”


}
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

[7 5 2 4 ... 3 2 8]

Sebastian Raschka STAT 453: Intro to Deep Learning 44


RNN Step 3): Indices to One-Hot Representation

vocabulary = {
'<unk>': 0,
'and': 1,

x[1] = ”The sun is shining”


<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
'is': 2
"Raw" training dataset 'one': 3,
'shining': 4,

[7
[0 0 0 0 0 0 1 0 0 0]
'sun': 5,
x[1] = ”The sun is shining”
'sweet': 6, [0 0 0 0 1 0 0 0 0 0]
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

5
x[2] = ”The weather is sweet”
'the': 7,
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>

[0 0 1 0 0 0 0 0 0 0]

2
x[3] = ”The sun is shining, 'two': 8,
the weather is sweet, and one and one is two” [0 0 0 1 0 0 0 0 0 0]
'weather': 9,

4
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>

sweet, and one and one is two”


'<pad>': 10
...

...
}

10
10 [0 0 0 1 0 0 0 0 0 1]

[0 0 0 1 0 0 0 0 0 1]
10]

[0 0 0 1 0 0 0 0 0 1]

Sebastian Raschka STAT 453: Intro to Deep Learning 45


RNN Step 4): One-Hot to Real via Embedding Matrix
vocabulary = {

[7
[0 0 0 0 0 0 1 0 0 0]
'<unk>': 0,
Embedding is a linear layer
'and': 1,

5
x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

'is': 2 y<t>

2
y<t-1> y<t> y<t+1>
'one': 3,
[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh

4
Whh Whh
'shining': 4, h<t> Unfold h<t-1> h<t> h<t+1>
'sun': 5, ... Whx Whx Whx Whx
'sweet': 6, x<t> x<t-1> x<t> x<t+1>
'the': 7,
10

'two': 8,
'weather': 9,
10

'<pad>': 10 Embedding matrix


}
10]

2
<latexit sha1_base64="TVjt2kyC6Dbuejtxylidcr/mRIg=">AAADHHicbZJNb9MwGMed8DbCyzo4crGoQJyivLbpbYILxyHRbVJTVY7rdtYcJ7IdtCrqB9mFr8KFAwhx4YDEt8FxrG2sWMpf/zy/57GdJ09RMypVEPxx3Dt3791/sPfQe/T4ydP9wcGzY1k1ApMprlglTgskCaOcTBVVjJzWgqCyYOSkOH/X8ZNPREha8Y9qU5N5idacrihGSocWB06UF2RNeVuUSAl6sfUCP4SvYeJHWiPjQ38CYZ57oX2LjMZGE0P6vMjWxEaTDsQWjExyqnWiIxokN0Bfl/lZB1ID4t2KkQETo2O9e+eDDoztVulVRacaZAaM7K1T46MO9JukNje8PiO0IDYdyK5BTvjyqkWLwTDwA7PgrgmtGQK7jhaDX/mywk1JuMIMSTkLg1rNWyQUxYxsvbyRpEb4HK3JTFuOSiLnrfm5W/hKR5ZwVQn9cAVN9GZFi0opN2WhM/X9zuRt1gX/x2aNWmXzlvK6UYTj/qBVw6CqYDcpcEkFwYpttEFYUH1XiM+QQFjpefJ0E8Lbn7xrjvVkjPzgQzI8fGvbsQdegJfgDQjBGByC9+AITAF2Lp0vzjfnu/vZ/er+cH/2qa5ja56Df5b7+y825MtG</latexit>

3
0.1 4.2 2.1 1.9
61.1 1.2 1.3 1.47
6 7
62.1 2.2 2.3 2.47
6 7 ⇥ ⇤
One-hot vector
× 6663.1 2.6 1.5 9.17 =
<latexit sha1_base64="s/CtZM4DYjH6j0+JzMP1dsUSATE=">AAACH3icbVDLSgMxFM3UVx1fVZdugkVxNcwUbV0W3bisYB/QKSWT3rahmcyQZMQy9E/c+CtuXCgi7vo3pg+Kth7I5XDOveTeE8ScKe26Yyuztr6xuZXdtnd29/YPcodHNRUlkkKVRjySjYAo4ExAVTPNoRFLIGHAoR4Mbid+/RGkYpF40MMYWiHpCdZllGgjtXNFP4AeE2kQEi3Z08guOR4+xwXnylRvUX3f9kF0Fm3tXN513CnwKvHmJI/mqLRz334nokkIQlNOlGp6bqxbKZGaUQ4j208UxIQOSA+ahgoSgmql0/tG+MwoHdyNpHlC46n6eyIloVLDMDCdZr++WvYm4n9eM9Hd61bKRJxoEHT2UTfhWEd4EhbuMAlU86EhhEpmdsW0TySh2kRqmxC85ZNXSa3geEXHvb/Ml2/mcWTRCTpFF8hDJVRGd6iCqoiiZ/SK3tGH9WK9WZ/W16w1Y81njtEfWOMfsgyfDw==</latexit>

7 7.1 2.5 1.5 1.5


[0 0 0 0 0 0 1 0 0 0] 64.1 2.6 2.2 8.87
7
65.1 3.6 1.5 9.17 Hidden layer output
6 7
66.1 9.1 7.4 9.07
6 7
67.1 2.5 1.5 1.57
6 7
68.1 6.1 1.5 6.27
6 7
49.1 5.5 1.1 9.15
1.1 5.3 4.8 9.1
Sebastian Raschka STAT 453: Intro to Deep Learning 46
In Practice, Skip Steps 3 & 4 And ...
use a lookup function ([Link])
vocabulary = {
'<unk>': 0,
'and': 1,
'is': 2
x[1] = ”The sun is shining” x[1] = ”The sun is shining”
'one': 3,
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>

'shining': 4,
'sun': 5,
'sweet': 6,
[7 5 2 4 ... 10 10 10]
'the': 7,
'two': 8, 2
<latexit sha1_base64="TVjt2kyC6Dbuejtxylidcr/mRIg=">AAADHHicbZJNb9MwGMed8DbCyzo4crGoQJyivLbpbYILxyHRbVJTVY7rdtYcJ7IdtCrqB9mFr8KFAwhx4YDEt8FxrG2sWMpf/zy/57GdJ09RMypVEPxx3Dt3791/sPfQe/T4ydP9wcGzY1k1ApMprlglTgskCaOcTBVVjJzWgqCyYOSkOH/X8ZNPREha8Y9qU5N5idacrihGSocWB06UF2RNeVuUSAl6sfUCP4SvYeJHWiPjQ38CYZ57oX2LjMZGE0P6vMjWxEaTDsQWjExyqnWiIxokN0Bfl/lZB1ID4t2KkQETo2O9e+eDDoztVulVRacaZAaM7K1T46MO9JukNje8PiO0IDYdyK5BTvjyqkWLwTDwA7PgrgmtGQK7jhaDX/mywk1JuMIMSTkLg1rNWyQUxYxsvbyRpEb4HK3JTFuOSiLnrfm5W/hKR5ZwVQn9cAVN9GZFi0opN2WhM/X9zuRt1gX/x2aNWmXzlvK6UYTj/qBVw6CqYDcpcEkFwYpttEFYUH1XiM+QQFjpefJ0E8Lbn7xrjvVkjPzgQzI8fGvbsQdegJfgDQjBGByC9+AITAF2Lp0vzjfnu/vZ/er+cH/2qa5ja56Df5b7+y825MtG</latexit>

3
'weather': 9, 0.1 4.2 2.1 1.9
'<pad>': 10 61.1 1.2 1.3 1.47
} 6 7
[Link] 62.1 2.2 2.3 2.47 4.
6 7
63.1 2.6 1.5 9.17
6 7
64.1
6 2.6 2.2 8.87
7 3.
65.1 3.6 1.5 9.17 2.
2 3 6 7
<latexit sha1_base64="03anDufsMo/ee+aqhs3+Mtoie60=">AAAC8XiclVLLbtQwFHXCoyW8prBkYzECsYqSeXW6q2DDskhMW2kyGjnOnalVx4lsB3UUzV900wUIseVv2PVveuNGoz5gwZV8dHzPPX5dp6UUxkbRpec/ePjo8db2k+Dps+cvXnZ2Xh2aotIcJryQhT5OmQEpFEyssBKOSw0sTyUcpaefGv3oG2gjCvXVrkqY5WypxEJwZjE13/G2khSWQtVpzqwWZ+tgN4zpe9oLh4jxBpMkGDqhH442wh5mUBi0jpHDHuI4HDdCrxV6DvsOB43Qv+G4tVTCs8IanP+TYFHs3EO34AB32rj/TwgSUNnm3vNONwojF/Q+iVvSJW0czDt/kqzgVQ7KcsmMmcZRaWc101ZwCesgqQyUjJ+yJUyRKpaDmdWuY2v6DjMZXRQah7LUZW86apYbs8pTrMTznZi7WpP8mzat7GI8q4UqKwuKX2+0qCS1BW3aTzOhgVu5QsK4FnhWyk+YZtziJwnwEeK7V75PDrGvozD6Mujuf2yfY5u8IW/JBxKTXbJPPpMDMiHcU96599374Rv/wv/p/7ou9b3W85rcCv/3Fbxtx08=</latexit>

66.1 9.1 7.4 9.07


7.1 2.5 1.5 1.5 6 7
65.1
6 3.6 1.5 9.17 7
67.1
6 2.5 1.5 1.57
7 1.
64.1 2.6 2.2 8.87 68.1 6.1 1.5 6.27
6 7 6 7
Embedded sentence 62.1
6 2.2 2.3 2.47 7
49.1 5.5 1.1 9.15
...
63.1 9.17 1.1 5.3 4.8 9.1
of 1 training example 6
6· · ·
2.6
···
1.5
···
7
· · ·7
6 7
61.1 5.3 4.8 9.17
6 7
41.1 5.3 4.8 9.15
1.1 5.3 4.8 9.1

Sebastian Raschka STAT 453: Intro to Deep Learning 47


IMDB Dataset
• Dataset for binary sentiment classi cation
• 25,000 highly polar movie reviews for training, and 25,000 for testing

[Link]

Sebastian Raschka STAT 453: Intro to Deep Learning 48


fi
Good Resource

[Link]

[Link]
migration_tutorial.ipynb#scrollTo=EC054Wlr0-xB

Sebastian Raschka STAT 453: Intro to Deep Learning 49


Lecture Overview
1. Di erent Ways to Model Text

2. Sequence Modeling with RNNs

3. Di erent Types of Sequence Modeling Tasks

4. Backpropagation Through Time

5. Long-Short Term Memory (LSTM)

6. Many-to-one Word RNNs

7. RNN Classi ers in PyTorch

Sebastian Raschka STAT 453: Intro to Deep Learning 50


ff
ff
fi

You might also like