L15 Intro-Rnn Slides
L15 Intro-Rnn Slides
Sebastian Raschka
[Link]
Lecture 15
Introduction to
with Applications in Python
vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>
[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>
⇥ ⇤ 'two': 7, ⇤ ⇥
y = 0, 1, 0 'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
training
class labels class labels
T
h
e
s
u
n
i
s ...
s
h
i
n
i
n
g
.
.
.
Input sentence: A quick brown fox jumps over the lazy dog
15% randomly masked: A quick brown [MASK] jumps over the lazy dog
0.2% ant
BERT
... ...
Possible classes 11% fox
(all words) ...
...
0.01% zoo
Applications:
Working with Sequential Data
Fig 8. Displays the actual data and the predicted data from the four models for each stock index in
Year 1 from 2010.10.01 to 2011.09.30.
sequence modeling
[Link]
Bao, Wei, Jun Yue, and Yulei Rao. "A deep learning framework for financial time series using
stacked autoencoders and long-short term memory." PloS one 12, no. 7 (2017): e0180944.
y y<t>
Networks we used
previously: also called Recurrent Neural
h h<t>
feedforward neural Network (RNN)
networks
x x<t>
h<t>
2
h<t-1> h<t> h(t+1)
2 2 2
Unfold
<t>
h<t> Sebastian Raschka h <t-1>Intro to Deep Learning
STAT 453: h h(t+1) 11
Overview
y<t> Single layer RNN y<t-1> y<t> y(t+1)
Multilayer RNN
y<t> y<t-1> y<t> y(t+1)
h<t>
2
h<t-1> h<t> h(t+1)
2 2 2
Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1
h<t>
2
h<t-1> h<t> h(t+1)
2 2 2
Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1
many-to-one one-to-many
many-to-many many-to-many
Figure based on:
The Unreasonable E ectiveness of Recurrent Neural Networks by Andrej Karpathy ([Link] ectiveness/)
many-to-one one-to-many
fi
Different Types of Sequence Modeling Tasks
many-to-one one-to-many
many-to-many many-to-many
Sebastian Raschka STAT 453: Intro to Deep Learning 18
How RNNs Look Like Under the Hood
1. Di erent Ways to Model Text
Activation: Output:
hti hti hti
hhti =
<latexit sha1_base64="c0za36vP2jjpNw1BhpBy6yDCoco=">AAACP3icdVDLSsNAFJ3UV62vqks3g0Wom5JUQTdC0Y3LCvYBTQ2T6SQZOpmEmYlQQ//Mjb/gzq0bF4q4deckraCtHhg4nHvunXuPGzMqlWk+GYWFxaXlleJqaW19Y3OrvL3TllEiMGnhiEWi6yJJGOWkpahipBsLgkKXkY47vMjqnVsiJI34tRrFpB8in1OPYqS05JTbdohU4HppML5JbYa4zwhU0BY5G8MzaEvqh8gJbJf61W/33dj5pyOzHUKnXDFrZg44T6wpqYApmk750R5EOAkJV5ghKXuWGat+ioSiWE8t2YkkMcJD5JOephyFRPbT/P4xPNDKAHqR0I8rmKs/O1IUSjkKXe3M9peztUz8q9ZLlHfaTymPE0U4nnzkJQyqCGZhwgEVBCs20gRhQfWuEAdIIKx05CUdgjV78jxp12vWUa1+dVxpnE/jKII9sA+qwAInoAEuQRO0AAb34Bm8gjfjwXgx3o2PibVgTHt2wS8Yn19BALDS</latexit>
h zh
<latexit sha1_base64="VZLBo3Vq0UK/d3CPFwKrpcX15ss=">AAACP3icdVDLSsNAFJ3UV62vqEs3g0Wom5JUQTdC0Y3LCvYBTQ2T6SQdOpmEmYkQQ//Mjb/gzq0bF4q4deekraCtHhg4nHvunXuPFzMqlWU9GYWFxaXlleJqaW19Y3PL3N5pySgRmDRxxCLR8ZAkjHLSVFQx0okFQaHHSNsbXuT19i0Rkkb8WqUx6YUo4NSnGCktuWbLCZEaeH6Wjm4yhyEeMAIVdMSYjeAZdCQNQuSmjkeDyrf7buT+05HbDqFrlq2qNQacJ/aUlMEUDdd8dPoRTkLCFWZIyq5txaqXIaEo1lNLTiJJjPAQBaSrKUchkb1sfP8IHmilD/1I6McVHKs/OzIUSpmGnnbm+8vZWi7+Vesmyj/tZZTHiSIcTz7yEwZVBPMwYZ8KghVLNUFYUL0rxAMkEFY68pIOwZ49eZ60alX7qFq7Oi7Xz6dxFMEe2AcVYIMTUAeXoAGaAIN78AxewZvxYLwY78bHxFowpj274BeMzy+YMbEF</latexit>
y = y zy
Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.
T
X
(t)
L= L
<latexit sha1_base64="g+kKpMNE6Ubf+cazn3//5JxxMks=">AAACA3icbVDLSsNAFJ34rPVVdaebwSLUTUmqoJtC0Y2LLir0BX0xmU7aoZNJmLkRSgi48VfcuFDErT/hzr8xabPQ1gMXDufcy7332L7gGkzz21hZXVvf2MxsZbd3dvf2cweHTe0FirIG9YSn2jbRTHDJGsBBsLavGHFtwVr25DbxWw9Mae7JOkx91nPJSHKHUwKxNMgdZ6vlrg7cQQhlK+qH9QhX+2EBzqPsIJc3i+YMeJlYKcmjFLVB7qs79GjgMglUEK07lulDLyQKOBUsynYDzXxCJ2TEOjGVxGW6F85+iPBZrAyx46m4JOCZ+nsiJK7WU9eOO10CY73oJeJ/XicA57oXcukHwCSdL3ICgcHDSSB4yBWjIKYxIVTx+FZMx0QRCnFsSQjW4svLpFkqWhfF0v1lvnKTxpFBJ+gUFZCFrlAF3aEaaiCKHtEzekVvxpPxYrwbH/PWFSOdOUJ/YHz+AO3elms=</latexit>
t=1
t
!
@L @L @y (t) (t) (t) X @h(t) @h(k)
= · · ·
@Whh @y (t) @h(t) k=1
@h (k) @W
hh
<latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>
Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.
t
!
@L (t)
@L @y (t) (t) X @h(t) @h(k)
= · · ·
@Whh @y (t) @h(t) k=1
@h (k) @W
hh
<latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>
Werbos, Paul J. "Backpropagation through time: what it does and how to do it." Proceedings of the IEEE 78, no. 10 (1990): 1550-1560.
T t
!
X @L (t)
@L @y (t) (t) X @h (t)
@h (k)
(t)
L= L = (t)
· (t)
· (k)
·
@Whh @y @h @h @Whh
<latexit sha1_base64="g+kKpMNE6Ubf+cazn3//5JxxMks=">AAACA3icbVDLSsNAFJ34rPVVdaebwSLUTUmqoJtC0Y2LLir0BX0xmU7aoZNJmLkRSgi48VfcuFDErT/hzr8xabPQ1gMXDufcy7332L7gGkzz21hZXVvf2MxsZbd3dvf2cweHTe0FirIG9YSn2jbRTHDJGsBBsLavGHFtwVr25DbxWw9Mae7JOkx91nPJSHKHUwKxNMgdZ6vlrg7cQQhlK+qH9QhX+2EBzqPsIJc3i+YMeJlYKcmjFLVB7qs79GjgMglUEK07lulDLyQKOBUsynYDzXxCJ2TEOjGVxGW6F85+iPBZrAyx46m4JOCZ+nsiJK7WU9eOO10CY73oJeJ/XicA57oXcukHwCSdL3ICgcHDSSB4yBWjIKYxIVTx+FZMx0QRCnFsSQjW4svLpFkqWhfF0v1lvnKTxpFBJ+gUFZCFrlAF3aEaaiCKHtEzekVvxpPxYrwbH/PWFSOdOUJ/YHz+AO3elms=</latexit>
t=1 <latexit sha1_base64="Ejg2z7c4PpB/tLaYRnYpzPWaTcw=">AAADJHichVLLahsxFNVM0jZ1X06y7EbEFJyNmUkKCYVAaDZdZJFCHQcsd9DIGo+w5oF0p2CEPqab/ko3XaQJXXTTb6nGGUjjadoLgsM59+g+pLiUQkMQ/PT8tfUHDx9tPO48efrs+Yvu5ta5LirF+JAVslAXMdVcipwPQYDkF6XiNIslH8Xzk1offeJKiyL/AIuSTzI6y0UiGAVHRZvemw5JFGWGlFSBoBKffjR92LX2liEZhTROzMhGJsWptUf/tZiFbThM2LQAvOK41dtl0rtWInkCfaKrLDLzo9CJYFeva1nvkeb3NtRK+8f4RIlZCrudqNsLBsEycBuEDeihJs6i7hWZFqzKeA5MUq3HYVDCxNR1mOS2QyrNS8rmdMbHDuY043pilo9s8SvHTHFSKHdywEv2T4ehmdaLLHaZdcN6VavJv2njCpLDiRF5WQHP2U2hpJIYClz/GDwVijOQCwcoU8L1illK3fbA/at6CeHqyG1wvjcI9wd771/3jt8269hAL9EO6qMQHaBj9A6doSFi3mfvq3fpffe/+N/8a//HTarvNZ5tdCf8X78B3t4IGA==</latexit>
k=1
LSTM cell:
C <t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
h<t> time step
Figure: Sebastian Raschka, Vahid Mirjalili. Python
Machine Learning. 3rd Edition. Birmingham, UK: Packt
x<t>
Publishing, 2019
f i g Tanh
h<t>
2
h<t-1> h<t> h(t+1)
2 2 2
Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1
Cell state at previous time step Cell state at current time step
C <t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
h<t> time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 31
Long-short term memory (LSTM)
activation from activation for next time step
previous time step
C <t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
h<t> time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 32
Long-short term memory (LSTM)
element-wise
multiplication operator element-wise addition operator
C <t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
h<t> time step
C<t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
<t>
h time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 34
Long-short term memory (LSTM)
⇣ ⌘
hti ht 1i
"Input Gate": it = Wix x + Wih h + bi
<latexit sha1_base64="YIBvtcQUJ0Wokw4T7p1nOPRhstc=">AAACe3icbVFbS8MwFE7rfd6mPgoSHOK8jdYL+iKIvvio4JywzpFmaRtM05KciqP0T/jTfPOf+CKYzomXeSDw8V04J+f4qeAaHOfVssfGJyanpmcqs3PzC4vVpeVbnWSKsiZNRKLufKKZ4JI1gYNgd6liJPYFa/kPF6XeemRK80TeQD9lnZiEkgecEjBUt/rsxQQiP8h50c2hOPU0D2PiCRZA/UtqGYnjpwJ/EU/Ffe4JIkPBMGBPDVCx89sfffujn/49dzThl4nCUzyMYKtbrTkNZ1B4FLhDUEPDuupWX7xeQrOYSaCCaN12nRQ6OVHAqelS8TLNUkIfSMjaBkoSM93JB7sr8IZhejhIlHkS8ID9mchJrHU/9o2zHFb/1UryP62dQXDSyblMM2CSfjYKMoEhweUhcI8rRkH0DSBUcTMrphFRhII5V8Uswf375VFwu99wDxr714e1s/PhOqbRKlpHdeSiY3SGLtEVaiKK3qw1a9OqW+92zd62dz+ttjXMrKBfZR99AKRcxF4=</latexit>
"Input Node"
⇣: ⌘
gt = tanh Wgx xhti + Wgh hht 1i
+ bg
<latexit sha1_base64="fvcp5gZZlUn1zgPb29eUeqrBUeg=">AAACenicbVFLS8NAEN7EV62vqkcRFosvxJKooBdB9OJRwVqhqWWz3SRLN5uwO5GWkB/hX/PmL/HiwU2t+KgDCx/fg5md8VPBNTjOq2VPTc/MzlXmqwuLS8srtdW1e51kirImTUSiHnyimeCSNYGDYA+pYiT2BWv5/atSbz0xpXki72CYsk5MQskDTgkYqlt79mICkR/kYdHNoTj3gMgIe4IFsPcltYwU4kGBv4hB8Zh7gshQMAzYUyNUHPzyR9/26Kf90J0M+GWg8BQPI9jv1upOwxkVngTuGNTRuG66tRevl9AsZhKoIFq3XSeFTk4UcGq6VL1Ms5TQPglZ20BJYqY7+Wh1Bd42TA8HiTJPAh6xPxM5ibUexr5xlsPqv1pJ/qe1MwjOOjmXaQZM0s9GQSYwJLi8A+5xxSiIoQGEKm5mxTQiilAw16qaJbh/vzwJ7o8a7nHj6PakfnE5XkcFbaAttIdcdIou0DW6QU1E0Zu1ae1Yu9a7vWXv2wefVtsaZ9bRr7JPPgCJHMPm</latexit>
C<t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
<t>
h time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 35
Long-short term memory (LSTM)
Brief summary of the gates so far ...
Forget Gate Input Node Input Gate
⇣ ⌘
hti ht 1i
C = C ft (it gt )
<latexit sha1_base64="A97MWklIk/5C/CfIWjxPMXDnU9U=">AAACVHicbVFNS8MwGE47p3N+TT16CQ5BD452CnoRhrt4VHAqrHOkWdqFpUlJ3gqj7EfqQfCXePFgthWZzhcCD88HefMkTAU34HkfjltaKa+uVdarG5tb2zu13b0HozJNWYcqofRTSAwTXLIOcBDsKdWMJKFgj+GoPdUfX5g2XMl7GKesl5BY8ohTApbq10bt5zwQRMaCYcCBnqHJVSBYBMeL2qn/o+JADRTgqJ/DJNA8HsKJpVKRmXmMT4XCFC+Y+rW61/Bmg5eBX4A6Kua2X3sLBopmCZNABTGm63sp9HKigVO7RzXIDEsJHZGYdS2UJGGml89KmeAjywxwpLQ9EvCMXUzkJDFmnITWmRAYmr/alPxP62YQXfZyLtMMmKTzi6JMYFB42jAecM0oiLEFhGpud8V0SDShYP+hakvw/z55GTw0G/5Zo3l3Xm9dF3VU0AE6RMfIRxeohW7QLeogil7Rp4Mcx3l3vtySW55bXafI7KNf425/A3Oss+0=</latexit>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
<t>
h time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 36
Long-short term memory (LSTM)
Output gate for updating the values of hidden units:
⇣ ⌘
hti ht 1i
ot = Wox x + Woh h + bo
<latexit sha1_base64="8C/V3zmHVNyAb78wB9GVKa2R8v8=">AAACfHicbVHdatswFJbddUuzbvPay15ULC2klAY7LW1vBmW72WUGS1OIsyArsi0iS0Y6HgnGT9E3210fpTdlcpLSn+yA4OP74RydE+WCG/D9O8fdeLP59l1jq/l++8PHT97nnWujCk1Znyqh9E1EDBNcsj5wEOwm14xkkWCDaPq91gd/mDZcyV8wz9koI4nkMacELDX2bsOMQBrFparGJVRfQ8OTjISCxdB+lAZWUniGK/zIzKrfZSiITATDgEO9QNXxy0D65E+f+0+C9URUJ6pQ8ySFo7HX8jv+ovA6CFaghVbVG3t/w4miRcYkUEGMGQZ+DqOSaODUdmmGhWE5oVOSsKGFkmTMjMrF8ip8aJkJjpW2TwJesM8TJcmMmWeRddbDmtdaTf5PGxYQX45KLvMCmKTLRnEhMChcXwJPuGYUxNwCQjW3s2KaEk0o2Hs17RKC119eB9fdTnDa6f48a119W62jgfbQF9RGAbpAV+gH6qE+ouje2XfazpHz4B64x+7J0uo6q8wuelHu+T9EhcSg</latexit>
C<t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
h<t-1> To next
<t>
h time step
x<t>
Sebastian Raschka STAT 453: Intro to Deep Learning 37
Long-short term memory (LSTM)
C<t-1> C<t>
f i g Tanh
σ σ Tanh σ o
To next
Wfx Wfh bf Wix Wih bi Wgx Wgh bg Wox Woh bo layer
y<t> Single layer RNN y<t-1> y<t> y(t+1)
h<t-1> To next
<t>
h time step
h<t> Unfold h<t-1> h<t> h(t+1)
⇣ ⌘
x<t> hti hti
x<t> x<t-1> x<t> x(t+1)
h = ot tanh C
<latexit sha1_base64="1dACau9S6hE1StlYzeuAhCTUMQk=">AAACTnicbVFNSwMxEM3Wr1q/qh69BIugl7Krgl4E0YtHBatCt5ZsOtsNZpMlmRXKsr/Qi3jzZ3jxoIimtYJfA4HHe28mk5cok8Ki7z96lYnJqemZ6mxtbn5hcam+vHJhdW44tLiW2lxFzIIUClooUMJVZoClkYTL6OZ4qF/egrFCq3McZNBJWV+JWHCGjurWIUwZJlFcJOV1EUqm+hIo0tCMUHnwJeuyW2BJQ93TTkWmEhpKiHHzy3D8b39oRD/BrW694Tf9UdG/IBiDBhnXabf+EPY0z1NQyCWzth34GXYKZlBwN7YW5hYyxm9YH9oOKpaC7RSjOEq64ZgejbVxRyEdsd87CpZaO0gj5xwub39rQ/I/rZ1jvN8phMpyBMU/L4pzSVHTYba0JwxwlAMHGDfC7Up5wgzj6H6g5kIIfj/5L7jYbgY7ze2z3cbh0TiOKlkj62STBGSPHJITckpahJM78kReyKt37z17b977p7XijXtWyY+qVD8AQei2vQ==</latexit>
Multilayer RNN
y<t> y<t-1> y<t> y(t+1)
h<t>
2
h<t-1> h<t> h(t+1)
2 2 2
Unfold
h<t> h<t-1> h<t> h(t+1)
1 1 1 1
many-to-one one-to-many
many-to-many many-to-many
Figure based on:
The Unreasonable E ectiveness of Recurrent Neural Networks by Andrej Karpathy ([Link] ectiveness/)
vocabulary = {
'and': 0,
"Raw" training dataset
'is': 1 Training set as design matrix
'one': 2,
[1]
x = ”The sun is shining”
'shining': 3,
2 3
0 1 0 1 1 0 1 0 0
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
[2]
x = ”The weather is sweet”
'sun': 4,
X = 40 15
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>
[3]
x[3] = ”The sun is shining, x = ”The sun is shining,
'sweet': 5,
1 0 0 0 1 1 0
[3]
sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two”
'the': 6, <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
2 3 2 1 1 1 2 1 1
sweet, and one and one is two”
<latexit sha1_base64="96E+QZra0hz6oQdF3fe0hkUBu2c=">AAACfnicbVFdS8MwFE3r16xfUx99iQ5FFGc7BfciDH3xcYLTwTpGmt1uwTQtSSqOsp/hH/PN3+KLWVdlTi9c7sk553KTmyDhTGnX/bDshcWl5ZXSqrO2vrG5Vd7eeVRxKim0aMxj2Q6IAs4EtDTTHNqJBBIFHJ6C59uJ/vQCUrFYPOhRAt2IDAQLGSXaUL3ymx8RPQzCrD3G19gPYMBEFhhOstex4+Ij7Jn8rrPYzdP3nd/EP2bjqZl6YbI2o3mzZ8cH0f8Z3CtX3KqbB/4LvAJUUBHNXvnd78c0jUBoyolSHc9NdDcjUjPKYez4qYKE0GcygI6BgkSgulm+vjE+NEwfh7E0KTTO2dmOjERKjaLAOCfLUvPahPxP66Q6rHczJpJUg6DTQWHKsY7x5C9wn0mgmo8MIFQyc1dMh0QSqs2POWYJ3vyT/4LHWtW7qNbuLyuNm2IdJbSHDtAx8tAVaqA71EQtRNGntW+dWKc2so/sM/t8arWtomcX/Qq7/gU8y6tk</latexit>
⇥ ⇤ 'two': 7, ⇤ ⇥
y = 0, 1, 0 'weather': 8, y = 0, 1, 0
}
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
training
class labels class labels
vocabulary = {
'<unk>': 0,
'and': 1,
"Raw" training dataset
'is': 2
'one': 3,
x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
'shining': 4,
x[2] = ”The weather is sweet”
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>
'sun': 5,
[3]
x[3] = ”The sun is shining, x = ”The sun is shining, 'sweet': 6,
[3]
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
the weather is sweet,xand=one
”The sun
theand is is
weather
one shining,
is two”
sweet, and one and one is two” <latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
'the': 7,
the weather is sweet, and one and one is two”
'two': 8,
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
⇤ ⇥ 'weather': 9,
y = 0, 1, 0
'<pad>': 10
<latexit sha1_base64="I1WxPE7V0m67om8S48xR6u6CtsY=">AAACG3icbVDLSgMxFM3UVx1fVZdugkVwUcpMFXQjFN24rGAf0Cklk95pQzOZIcmIZZj/cOOvuHGhiCvBhX9j+kC09UDgcM693Jzjx5wp7ThfVm5peWV1Lb9ub2xube8UdvcaKkokhTqNeCRbPlHAmYC6ZppDK5ZAQp9D0x9ejf3mHUjFInGrRzF0QtIXLGCUaCN1CxUvJHrgB+kowxfY86HPROobTbL7zHZK2C1hx/ZA9H7UbqHolJ0J8CJxZ6SIZqh1Cx9eL6JJCEJTTpRqu06sOymRmlEOme0lCmJCh6QPbUMFCUF10km2DB8ZpYeDSJonNJ6ovzdSEio1Cn0zOU6i5r2x+J/XTnRw3kmZiBMNgk4PBQnHOsLjonCPSaCajwwhVDLzV0wHRBKqTZ22KcGdj7xIGpWye1Ku3JwWq5ezOvLoAB2iY+SiM1RF16iG6oiiB/SEXtCr9Wg9W2/W+3Q0Z8129tEfWJ/feTSgbg==</latexit>
class labels }
vocabulary = {
'<unk>': 0, x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
'sun': 5,
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
[7 9 2 6 ... 10 10 10]
[3]
[3]
x = ”The sun is shining, x = ”The sun is shining,
[3] 'the': 7,
x =
is sweet, and one”The sun
theand is
weather shining,
one isis two”
sweet, and one and one is two”
'two': 8,
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
[7 5 2 4 ... 3 2 8]
vocabulary = {
'<unk>': 0,
'and': 1,
[7
[0 0 0 0 0 0 1 0 0 0]
'sun': 5,
x[1] = ”The sun is shining”
'sweet': 6, [0 0 0 0 1 0 0 0 0 0]
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
5
x[2] = ”The weather is sweet”
'the': 7,
<latexit sha1_base64="5I7HegCmiP0Jx+G/Dp7ZWqZNVss=">AAACGnicbVC7SgNBFJ31GeMramkzJAhWYTcK2ghBG8sIiQZ21zA7uWsGZx/M3FXDst9h46/YWChiJzb+jZNH4evAwOGce7lzTpBKodG2P62Z2bn5hcXSUnl5ZXVtvbKxea6TTHHo8EQmqhswDVLE0EGBErqpAhYFEi6C65ORf3EDSoskbuMwBT9iV7EIBWdopF7F8SKGgyDM74rL3G34BT2i1EO4w7zaHgC9BWODokJTfQuA1aJXqdl1ewz6lzhTUiNTtHqVd6+f8CyCGLlkWruOnaKfM4WCSyjKXqYhZfyaXYFraMwi0H4+jlbQHaP0aZgo82KkY/X7Rs4irYdRYCZHQfRvbyT+57kZhod+LuI0Q4j55FCYSYoJHfVE+0IBRzk0hHElzF8pHzDFOJo2y6YE53fkv+S8UXf26o2z/VrzeFpHiWyTKtklDjkgTXJKWqRDOLknj+SZvFgP1pP1ar1NRmes6c4W+QHr4wuB26CG</latexit>
[0 0 1 0 0 0 0 0 0 0]
2
x[3] = ”The sun is shining, 'two': 8,
the weather is sweet, and one and one is two” [0 0 0 1 0 0 0 0 0 0]
'weather': 9,
4
<latexit sha1_base64="JiNG/+AXF+oBM1ttTsBLjeFOxpM=">AAACT3icbVFNa9wwEJW3aZtsvzbNMRexS6GHZbGTQHsphPTSYwLZJGA7iyyPd8XKkpHG3SzG/7CX9ta/0UsPCSHyxoR8PRB6vDfDaJ6SQgqLvv/X67xYe/nq9fpG983bd+8/9DY/nlhdGg5jrqU2ZwmzIIWCMQqUcFYYYHki4TSZf2/8059grNDqGJcFxDmbKpEJztBJk14W5QxnSVZd1OdVuBvX9BulEcIFVv3jGVBbKiostTOhhJoO6yhqXXTmAlwvmFXBAgCHlKmUagV3t3Nwofv1pDfwR/4K9CkJWjIgLQ4nvT9RqnmZg0IumbVh4BcYV8yg4BLqblRaKBifsymEjiqWg42rVR41/eSUlGbauKOQrtT7HRXLrV3miatstrePvUZ8zgtLzL7GlVBFiaD47aCslBQ1bcKlqTDAUS4dYdwI91bKZ8wwju4Lui6E4PHKT8nJzijYHe0c7Q32D9o41sk26ZPPJCBfyD75QQ7JmHDyi/wjl+TK++399647bWnHa8kWeYDOxg2D2bOH</latexit>
...
}
10
10 [0 0 0 1 0 0 0 0 0 1]
[0 0 0 1 0 0 0 0 0 1]
10]
[0 0 0 1 0 0 0 0 0 1]
[7
[0 0 0 0 0 0 1 0 0 0]
'<unk>': 0,
Embedding is a linear layer
'and': 1,
5
x[1] = ”The sun is shining”
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
'is': 2 y<t>
2
y<t-1> y<t> y<t+1>
'one': 3,
[
Wh= Whh ; Whx ]
Wyh Wyh Wyh Wyh
Whh
4
Whh Whh
'shining': 4, h<t> Unfold h<t-1> h<t> h<t+1>
'sun': 5, ... Whx Whx Whx Whx
'sweet': 6, x<t> x<t-1> x<t> x<t+1>
'the': 7,
10
'two': 8,
'weather': 9,
10
2
<latexit sha1_base64="TVjt2kyC6Dbuejtxylidcr/mRIg=">AAADHHicbZJNb9MwGMed8DbCyzo4crGoQJyivLbpbYILxyHRbVJTVY7rdtYcJ7IdtCrqB9mFr8KFAwhx4YDEt8FxrG2sWMpf/zy/57GdJ09RMypVEPxx3Dt3791/sPfQe/T4ydP9wcGzY1k1ApMprlglTgskCaOcTBVVjJzWgqCyYOSkOH/X8ZNPREha8Y9qU5N5idacrihGSocWB06UF2RNeVuUSAl6sfUCP4SvYeJHWiPjQ38CYZ57oX2LjMZGE0P6vMjWxEaTDsQWjExyqnWiIxokN0Bfl/lZB1ID4t2KkQETo2O9e+eDDoztVulVRacaZAaM7K1T46MO9JukNje8PiO0IDYdyK5BTvjyqkWLwTDwA7PgrgmtGQK7jhaDX/mywk1JuMIMSTkLg1rNWyQUxYxsvbyRpEb4HK3JTFuOSiLnrfm5W/hKR5ZwVQn9cAVN9GZFi0opN2WhM/X9zuRt1gX/x2aNWmXzlvK6UYTj/qBVw6CqYDcpcEkFwYpttEFYUH1XiM+QQFjpefJ0E8Lbn7xrjvVkjPzgQzI8fGvbsQdegJfgDQjBGByC9+AITAF2Lp0vzjfnu/vZ/er+cH/2qa5ja56Df5b7+y825MtG</latexit>
3
0.1 4.2 2.1 1.9
61.1 1.2 1.3 1.47
6 7
62.1 2.2 2.3 2.47
6 7 ⇥ ⇤
One-hot vector
× 6663.1 2.6 1.5 9.17 =
<latexit sha1_base64="s/CtZM4DYjH6j0+JzMP1dsUSATE=">AAACH3icbVDLSgMxFM3UVx1fVZdugkVxNcwUbV0W3bisYB/QKSWT3rahmcyQZMQy9E/c+CtuXCgi7vo3pg+Kth7I5XDOveTeE8ScKe26Yyuztr6xuZXdtnd29/YPcodHNRUlkkKVRjySjYAo4ExAVTPNoRFLIGHAoR4Mbid+/RGkYpF40MMYWiHpCdZllGgjtXNFP4AeE2kQEi3Z08guOR4+xwXnylRvUX3f9kF0Fm3tXN513CnwKvHmJI/mqLRz334nokkIQlNOlGp6bqxbKZGaUQ4j208UxIQOSA+ahgoSgmql0/tG+MwoHdyNpHlC46n6eyIloVLDMDCdZr++WvYm4n9eM9Hd61bKRJxoEHT2UTfhWEd4EhbuMAlU86EhhEpmdsW0TySh2kRqmxC85ZNXSa3geEXHvb/Ml2/mcWTRCTpFF8hDJVRGd6iCqoiiZ/SK3tGH9WK9WZ/W16w1Y81njtEfWOMfsgyfDw==</latexit>
<latexit sha1_base64="OKMSIS0rHzI4u0yb4ZDXL5GhtgA=">AAACGHicbVC7SgNBFJ31GeMramkzJAhWcVcFbYSgjWUEkwjJGmYnd5PB2dll5q4kLPsZNv6KjYUitun8Gycxha8DA4dz7uXOOUEihUHX/XDm5hcWl5YLK8XVtfWNzdLWdtPEqebQ4LGM9U3ADEihoIECJdwkGlgUSGgFdxcTv3UP2ohYXeMoAT9ifSVCwRlaqVs66EQMB0GYDfPbrO35OT2jtIMwxKx8PQBqUkWFoWYglFD9ct4tVdyqOwX9S7wZqZAZ6t3SuNOLeRqBQi6ZMW3PTdDPmEbBJeTFTmogYfyO9aFtqWIRGD+bBsvpnlV6NIy1fQrpVP2+kbHImFEU2MlJDPPbm4j/ee0Uw1M/EypJERT/OhSmkmJMJy3RntDAUY4sYVwL+1fKB0wzjrbLoi3B+x35L2keVr2j6uHVcaV2PqujQHZJmewTj5yQGrkkddIgnDyQJ/JCXp1H59l5c96/Ruec2c4O+QFn/AnSvZ+f</latexit>
'shining': 4,
'sun': 5,
'sweet': 6,
[7 5 2 4 ... 10 10 10]
'the': 7,
'two': 8, 2
<latexit sha1_base64="TVjt2kyC6Dbuejtxylidcr/mRIg=">AAADHHicbZJNb9MwGMed8DbCyzo4crGoQJyivLbpbYILxyHRbVJTVY7rdtYcJ7IdtCrqB9mFr8KFAwhx4YDEt8FxrG2sWMpf/zy/57GdJ09RMypVEPxx3Dt3791/sPfQe/T4ydP9wcGzY1k1ApMprlglTgskCaOcTBVVjJzWgqCyYOSkOH/X8ZNPREha8Y9qU5N5idacrihGSocWB06UF2RNeVuUSAl6sfUCP4SvYeJHWiPjQ38CYZ57oX2LjMZGE0P6vMjWxEaTDsQWjExyqnWiIxokN0Bfl/lZB1ID4t2KkQETo2O9e+eDDoztVulVRacaZAaM7K1T46MO9JukNje8PiO0IDYdyK5BTvjyqkWLwTDwA7PgrgmtGQK7jhaDX/mywk1JuMIMSTkLg1rNWyQUxYxsvbyRpEb4HK3JTFuOSiLnrfm5W/hKR5ZwVQn9cAVN9GZFi0opN2WhM/X9zuRt1gX/x2aNWmXzlvK6UYTj/qBVw6CqYDcpcEkFwYpttEFYUH1XiM+QQFjpefJ0E8Lbn7xrjvVkjPzgQzI8fGvbsQdegJfgDQjBGByC9+AITAF2Lp0vzjfnu/vZ/er+cH/2qa5ja56Df5b7+y825MtG</latexit>
3
'weather': 9, 0.1 4.2 2.1 1.9
'<pad>': 10 61.1 1.2 1.3 1.47
} 6 7
[Link] 62.1 2.2 2.3 2.47 4.
6 7
63.1 2.6 1.5 9.17
6 7
64.1
6 2.6 2.2 8.87
7 3.
65.1 3.6 1.5 9.17 2.
2 3 6 7
<latexit sha1_base64="03anDufsMo/ee+aqhs3+Mtoie60=">AAAC8XiclVLLbtQwFHXCoyW8prBkYzECsYqSeXW6q2DDskhMW2kyGjnOnalVx4lsB3UUzV900wUIseVv2PVveuNGoz5gwZV8dHzPPX5dp6UUxkbRpec/ePjo8db2k+Dps+cvXnZ2Xh2aotIcJryQhT5OmQEpFEyssBKOSw0sTyUcpaefGv3oG2gjCvXVrkqY5WypxEJwZjE13/G2khSWQtVpzqwWZ+tgN4zpe9oLh4jxBpMkGDqhH442wh5mUBi0jpHDHuI4HDdCrxV6DvsOB43Qv+G4tVTCs8IanP+TYFHs3EO34AB32rj/TwgSUNnm3vNONwojF/Q+iVvSJW0czDt/kqzgVQ7KcsmMmcZRaWc101ZwCesgqQyUjJ+yJUyRKpaDmdWuY2v6DjMZXRQah7LUZW86apYbs8pTrMTznZi7WpP8mzat7GI8q4UqKwuKX2+0qCS1BW3aTzOhgVu5QsK4FnhWyk+YZtziJwnwEeK7V75PDrGvozD6Mujuf2yfY5u8IW/JBxKTXbJPPpMDMiHcU96599374Rv/wv/p/7ou9b3W85rcCv/3Fbxtx08=</latexit>
[Link]
[Link]
[Link]
migration_tutorial.ipynb#scrollTo=EC054Wlr0-xB