RNN
It is one type of ANN
Applictions:
Speech Recognition
• Language Translator
• Market stock Predictor
• Name Entity prediction
Different types of RNN
• One-to-one:
• This is also called Plain Neural networks. It deals with a fixed
size of the input to the fixed size of output, where they are
independent of previous information/output.
• Example: Image classification.
• One-to-Many:
• It deals with a fixed size of information as input that gives a
sequence of data as output.
• Example: Image Captioning takes the image as input and
outputs a sentence of words.
Many-to-One:
It takes a sequence of information as input and outputs a fixed size
of the output.
Example: sentiment analysis where any sentence is classified as
expressing the positive or negative sentiment.
Many-to-Many:
It takes a Sequence of information as input and processes the
recurrently outputs as a Sequence of data.
Example: Machine Translation, where the RNN reads any sentence
in English and then outputs the sentence in French.
• Bidirectional Many-to-Many:
• Synced sequence input and output. Notice
that in every case are no pre-specified
constraints on the lengths sequences because
the recurrent transformation (green) is fixed
and can be applied as many times as we like.
• Example: Video classification where we wish
to label every frame of the video
How RNN works
Consider an unfolded RNN
• The formula for the current state can be
written as
Here, Ht is the new state, ht-1 is the previous state while xt is the current input.
We now have a state of the previous input instead of the input itself, because
the input neuron would have applied the transformations on our previous
input. So each successive input is called a time step.
• In this case, we have four inputs to be given to the
network, during a recurrence formula, the same
function and the same weights are applied to the
network at each time step.
• Taking the simplest form of a recurrent neural
network, let’s say that the activation function is
tanh, the weight at the recurrent neuron is Whh,
and the weight at the input the neuron is Wxh, we
can write the equation for the state at the time t
as
• The Recurrent neuron, in this case, is just
considering the immediately previous state.
For longer sequences, the equation can
involve multiple such states. Once the final
state is calculated we can go on to produce
the output.
Now, once the current state is calculated we can
calculate the output state as-
• Let me summarize the steps in a recurrent neuron
• A single time step of the input is supplied to the network i.e. xt is supplied to the
network
• We then calculate its current state using a combination of the current input and the
previous state i.e. we calculate ht
• The current ht becomes ht-1 for the next time step
• We can go as many time steps as the problem demands and combine the
information from all the previous states
• Once all the time steps are completed the final current state is used to calculate the
output yt
• The output is then compared to the actual output and the error is generated
• The error is then backpropagated to the network to update the weights(we shall go
into the details of backpropagation in further sections) and the network is trained.
• Long Short-Term Memory(LSTM):
• LSTM is an improved version of the regular RNN which was
designed to make it easy to capture long-term dependencies
in sequence data. A regular RNN functions in such a way that
the hidden state activation is influenced by the other local
activations nearest to them, which corresponds to a “short-
term memory”, while the network weights are influenced by
the computations that take place over entire long sequences,
which corresponds to “long-term memory”. Hence the RNN
was redesigned so that it has an activation state that can also
act as weights and preserve information over long distances,
hence the name “Long Short-Term Memory”.
• LSTMs are explicitly designed to avoid the
long-term dependency problem.
Remembering information for long periods is
practically their default behavior
• Architecture
This decides what info
Is to add to the cell state
LSTM
This sigmoid gate
determines how much
Output gate
Controls what
information goes thru goes into output
Ct-1
ht-1
Forget input Why sigmoid or tanh:
Sigmoid: 0,1 gating as switch.
gate gate Vanishing gradient problem in
The core idea is this cell state Ct, it is LSTM is handled already.
changed slowly, with only minor linear ReLU replaces tanh ok?
interactions. It is very easy for
information to flow along it
unchanged.
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell
state to output
RNN vs LSTM
Implementation
• Let’s start by importing the classes and functions required for this model and initializing the
random number generator to a constant value to ensure you can easily reproduce the results.
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
# fix random seed for reproducibility
tf.random.set_seed(7)
• You need to load the IMDB dataset. You are constraining the dataset to the top 5,000 words.
You will also split the dataset into train (50%) and test (50%) sets
# load the dataset but only keep the top n words, zero the rest
top_words = 5000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=top_words)
Next, you need to truncate and pad the input sequences, so they are all the same length for
modeling. The model will learn that the zero values carry no information. The sequences are
not the same length in terms of content, but same-length vectors are required to perform the
computation in Keras
# truncate and pad input sequences
max_review_length = 500
X_train = sequence.pad_sequences(X_train, maxlen=max_review_length)
X_test = sequence.pad_sequences(X_test, maxlen=max_review_length)
• You can now define, compile and fit your LSTM model.
• The first layer is the Embedded layer that uses 32-length vectors to represent each word.
The next layer is the LSTM layer with 100 memory units (smart neurons). Finally, because
this is a classification problem, you will use a Dense output layer with a single neuron
and a sigmoid activation function to make 0 or 1 predictions for the two classes (good
and bad) in the problem.
• Because it is a binary classification problem, log loss is used as the loss function
(binary_crossentropy in Keras). The efficient ADAM optimization algorithm is used. The
model is fit for only two epochs because it quickly overfits the problem. A large batch
size of 64 reviews is used to space out weight updates.
# create the model
embedding_vecor_length = 32
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length,
input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))