Part RAIN
A Intuition Encodings
ofect Roommates Apple pie g
Sunny o Food
Butiger
Rainy 9 ochitinen
91
Problem 01 Fined rule
weather Sunny Apple pie
weather _Rainy Burger
nisi point
l
g Tuning
I P weather Model Tippie
o
O O
03 ftp
fi l1
Rainy
foI
Burger
Pplfood
IF
Tip
T
2 1
top IIP weather
Es xD
Prothom Round Robin Schedule
Applebee chicken
Burger
PIP
OP has to be fed back as
Network need to rember last food
oisengly
Permutation
Above permutation matrix can be used as a
model to the RR scheduling which
realize
takes and nextday food
a
food as an
gives
n.iq
previous day food
00
I
01
has to be remembered
0107oz RTs attitude
0 1
3 1 3 1
do not cook
Problem 03 If Sunny
and the last
just repeat
day left over food
If Rainy Coo autre nent day's food according
to the RR scheduling
with the previous day food for Rr
Now along
weather Depending
One should have to check the
choose yesterday's food or
upon the weather
food
today's food Along with yesterday's
also Mp
Today's weather is
an
Identity same dayfood
ment dayfood
Permutation
Swim
same
food
Rainy
selects
mentdayfood
Ifttrisisttreiyexopgetedg
.co weaq.ru
Pabterfoodmatiinqpqpw
D L
t
ahem
TO
i at
Merge
Imad
operation
AEdd.mg om
I
1 o
O
0
I O
l
O O
O o o
O I 0
o 2 p
O I 0
O
Mase
1 I
F inert
old I
RNN
r
food
Food f weather
PART B Mathematical formulation of RNN useful for indening
Name recognition in the tent modules beopleinnewsart
Given any sentence find table out the names Tentabding
say XL Kz X Ktx
well
X Sachin played Shoaib very
egg 1stword 2ndword
firstenthmed
I O L O 0
Y
cop
KD Ya Yst YL Ty
ere Tx Ty TYMmaybrfgthmayofnofqsesamn.TOSeg
ist
X ilp training example
Txt and Ty 3 Corresponding lengths
1 Hot encoding
Wordrepresentation
dimensional 0 1
Iii binaryvector I Hotencoding
Our goal to get a label 9 for given X
Learning a mapping from X Y
o
Xi f I 8
0
Hugesize as
everyBig Mw 0 74
8 Y
If
variable length
You i ftp 8
I O
I No parametric 80
i l STY
sharing cqsonkyyd.jo y
x'n ff
Ilp vector
concatenated
we require our network to read the Sentence word by
word from left to right seeing data as a Time Seg
1,47 qujBTharetheparameleist
7
1 949 7 Wya
a
tI I
c
O
RN N
unit Unrolled
xstxTHan
4xas.FI
3
yjyfyf.gg
KHAO O ykssnpomm
t.IE
T T T T
see n3n4 x axe they are
u
also utilized
7
qso 0 miticized
act g Waadt 7 Waar't't ba
t 8Etanhlrau
y g Wyaa by
Gazsigmoid
softman
Li
Y
via 7
1 444 7
Wya
shared
2 weights
t
u
G
Rummy
Unrolled
Han
TXSD.FI xGx
astkgfklaaaht ifwa.it't ba
hoodoo d xD xlo.TT 0,000 1
2Dmatr
Take a 100 dim
iiEnE.in nEsD
T
I
F
history
a
Matrix Augmentation
weight
Haa Wan Wa Matin
100 100 100 10 ooo I 00 X 10100
Egiontatanat
I
fast 10100 1 flesticle Concert
Wafa X't Goo xD
HI IF fat b
Therefore our equations
att g Wa Ea't x tba
ight Gz Wyatt by
Parameters to be learned
Wa ba Wy by
tf f
KD KD Dem ME
sina.mil Tt Tt Tt
a x
Dj Dq
Fans
µy shared
Wordleuel loss
L't fights yet y logy
C yctDdogfi yst
Senle
neeleuellos LCya.y
EAL't
ya
yay
L k 4K Li
yg
Tv
W
Tv
w
Tv w FEI
Tu Tu Tu Tu
24 He Hz My
Part C
Vanishing Gradient RNN backprop
L L the 1L 1L where toga
I
oftruechanPfIdietfd
I7gtabitt oIEHulfotiwii.E.io
i ceedmEnYaI
asffe.ioffiiI dHfhYidldaigtIatioI
w
See Computation Grabhfer
oostwButIsusinFtrmeaEghtcsd.BoonhstonanaGwen
off.GL nd at
dependent
1
ahhisorisdered
µebadwwd t
pathogens I network
and total
To Compute adofmphE'S
bfeq afteradding
If auffyspossible
dhu
Ek Es Ei
rotor
it AdpdfYsdmuinae
qswy 8 t8 8 8 w
Am ME OI
d 52
on 8 8 8 few
me
Ost
d si
Ew I F Ew
offs offs off of I.EE
fsw fwI
fhwIffmmed's 3
fIutfhwtfwI
Boundedmy siffluis
i relatedtow
This algorithm backpropagation in times need to
which requires
Compute a drained differentiation
recursive and refetature multiplication Often
o s
oo
i
Since these
Si's are activated
of heme are bounded
therefore there derivates are also bounded
Sigmoid s Yu
tanh s 1
may enplodelVanish
Wordleuelt
L't eights
ya y logy
C ya dog l ya
Sentence level loss
L ya
y 2 L't ya
yay
RNN faces is Vanishing gradients and
Mayor Problem
as the backpropagation is in time it fails to update
well with respect to some old time
This causes RNN
Vanilla RNN Block diagrams
to forget about
longterm eventsand
their effect
JEET't
act D Ffs ahtktanhfwaf.at xeti
If Ct
t ba
PART D Gated Recurrent Unit GRU
is the captain
Virat is a
good player He
w
Three persons
going with us and all of them
are are
www
n
Problem is that how to remember that subject is
o
11
Smgdur Pw
was were
11femaleMy
Male
He She
we consider our hidden state memory cells
to understand such dependencies and tract keep on
memorizing them till the point they are required or
remembered
Something more important need to be
Chthonto toggle
The 1 two Eton Theo Teo
is the Captain
Virat is a
good player He
I
E I CE l
mTN
E
g MALE
11
Remember he is
whatto
some bit value got set keep
to it is
We need remember Subjects Gundy or
Singular or plural Lmatmate
Here It can be seen as the cell value
at time
Gammaumy It can see as the gate fattonsblou
u Some information
1 or
Fintan tmtkEgIq
0
Our Ilp's are at
New cue state
needstobecomput
and
AJ
Candidate update value
men
I tanh Wic x Jtbc
Neural network parameterized with
by Weibo
input D
as ut and yet Wuphattfate
tuGbyak
Pu
ofkluf.it xht5ftbu
Neural network parameterized by Wu bu with
input as Clt I
and Xlt
This network tries to learn wheather we need to update
the current cell State or not
thungwmagthiminanummatthnighiitighi
SimfdifiedCRU
Cellulpdate
D
at The cut 1 pug est
un mu
retain
what needs to How much to
beupdate
M Fon1FTI
Cht cult The all are
of Same dimentions
then cell update
Say doo xD
equation is elementwise mullifteiahs
Pu understands cell States bits behaviour twenty
few key are
effectively so
justtry to gate their
and tearing
as reduce the loss
neeaze.dz
pwooyrdeeu.antamanysebitmy
Full GRU o for gating
b
tanhGale to it x
Tu 0
wife bu
IT 0 W K't x b
Relevance another neural network parameterized
gate
D
with infant as Cat
by r br Itsy
It The E't d pay act
D
afterSeveral updates handle longer orange dependencies
vanishing gradient and better convergence Aft d
PART E GRU to LSTM
GRU LSTM
netwamgate
Est tanhfw.IEodY3it5ftbe
E Itanh We a 7 7 tbc
4
on LSTM relevance gate
update gate GRU
signifies which
Tu 0 Wu
Edt zesty bu bits requis
Elt and est i updation remaining bits are not
are
weighted by Pw Changed Modified
Update Galt LSTM
4
Tu o Wu East 7 7 bn
mfies whichbits need to be update
mny
Relevance gate Gru algentin.IS
0CWrEcstYzdtY b Tf
Forget Galt LSTM absentinLSTM GRO
A
17 0 Wf fast se't tbs
LSTMt will learn a forget gate separately Basically how
to
we need
forget from Clt 7 previousState
Output gate CLSTM absent in GRU
D
zesty b CaaIt
To F Wofact
at gated by Po is the final ol P
D
cht2Tu EltZg_puJ.o Et
updationes TM
D
Ct The Est pg Et
final o P GRU
ah D It
att l get
i i
ii
at Toa
It
tf
PART F Review
C 21h30 123070