Handling Sequence Data in Pytorch

The document discusses handling sequential data using PyTorch, focusing on predicting electricity consumption based on historical data. It covers the importance of data splitting to avoid look-ahead bias, the creation of input sequences, and the implementation of various recurrent neural network architectures including RNNs, LSTMs, and GRUs. Additionally, it highlights the training and evaluation processes for these models, emphasizing the use of Mean Squared Error as the loss function for regression tasks.

Uploaded by

Samesh Bajracharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

Handling Sequence Data in Pytorch

Uploaded by

Samesh Bajracharya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Handling sequences with PyTorch

We've learned to handle tabular and image data. Let's now discuss sequential data.
Sequential data
Sequential data is ordered in time or space, where the order of the data points is
important and can contain temporal or spatial dependencies between them. Time
series, data recorded over time like stock prices, weather, or daily sales is sequential.
So is text, in which the order of words in a sentence determines its meaning. Another
example is audio waves, where the order of data points is crucial to the sound
reproduced when the audio file is played.
Electricity consumption prediction
In this chapter, we will tackle the problem of predicting electricity consumption based on
past patterns. We will use a subset of the electricity consumption dataset from the UC
Irvine Machine Learning Repository. It contains electricity consumption in kilowatts, or
kW, for a certain user recorded every 15 minutes for four years.

1. 1
Trindade,Artur. (2015). ElectricityLoadDiagrams20112014. UCI Machine Learning
Repository. [Link]

Train-test split
In many machine learning applications, one randomly splits the data into training and
testing sets. However, with sequential data, there are better approaches. If we split the
data randomly, we risk creating a look-ahead bias, where the model has information
about the future when making forecasts. In practice, we won't have information about
the future when making predictions, so our test set should reflect this reality. To avoid
the look-ahead bias, we should split the data by time. We will train on the first three
years of data, and test on the fourth year.
Creating sequences
To feed the training data to the model, we need to chunk it first to create sequences that
the model can use as training examples. First, we need to select the sequence length,
which is the number of data points in one training example. Let's make each forecast
based on the previous 24 hours. Because data is at 15 minute intervals, we need to use
24 times 4 which is 96 data points. In each example, the data point right after the input
sequence will be the target to predict.
Creating sequences in Python
Let's implement a Python function to create sequences. It takes the DataFrame and the
sequence length as inputs. We start with initializing two empty lists, xs for inputs and ys
for targets. Next, we iterate over the DataFrame. The loop only goes up to "len(df) -
seq_length", ensuring that for every iteration, there are always seq_length data points
available in the DataFrame for creating the sequence and a subsequent data point to
serve as the target. For each considered data point, we define inputs x as the electricity
consumption values starting from this point plus the next sequence length points, and
the target y as the subsequent electricity consumption value. The 1 passed to the iloc
method stands for the second DataFrame column, which stores the electricity
consumption data. Finally, we append the inputs and the target to pre-initialized lists,
and after the loop, return them as NumPy arrays.

TensorDataset
Let's use our function to create sequences from the training data. This gives us almost
35 thousand training examples. To convert them to a torch Dataset, we will use the
TensorDataset function. We pass it two arguments, the inputs and the targets. Each
argument is the NumPy array converted to a tensor with torch.from_numpy and parsed
to float. The TensorDataset behaves just like all other torch Datasets and it can be
passed to a DataLoader in the same way.
Applicability to other sequential data
Everything we have learned here can also be applied to other sequential data. For
example, Large Language Models are trained to predict the next word in a sentence, a
problem similar to predicting the next amount of electricity used. For speech recognition,
which means transcribing an audio recording of someone speaking to text, one would
typically use the same sequence-processing model architectures we will learn about
soon.

Recurrent Neural Networks

Recurrent neuron
So far, we built feed-forward neural networks where data is passed in one direction:
from inputs, through all the layers, to the outputs. Recurrent neural networks, or RNNs,
are similar, but also have connections pointing back. At each time step, a recurrent
neuron receives some input x, multiplied by the weights and passed through an
activation. Out come two values: the main output y, and the hidden state, h, that is fed
back to the same neuron. In PyTorch, a recurrent neuron is available as [Link].
Unrolling recurrent neuron through time
We can represent the same neuron once per time step, a visualization known as
unrolling a neuron through time. At a given time step, the neuron represented as a gray
circle receives input data x-zero and the previous hidden state h0 and produces output
y-zero and a hidden state h1.
At the next time step, it takes the next value x1 as input and its last hidden state, h1.
And so it continues until the end of the input sequence. Since at the first time step there
is no previous hidden state, h0 is typically set to zero. Notice that the output at each
time step depends on all the previous inputs. This allows recurrent networks to maintain
memory through time, which allows them to handle sequential data well.

Deep RNNs
We can also stack multiple layers of recurrent cells on top of each other to get a deep
recurrent neural network. In this case, each input will pass through multiple neurons one
after another, just like in dense and convolutional networks we have discussed before
.
Sequence-to-sequence architecture
Depending on the lengths of input and output sequences, we distinguish four different
architecture types. Let's look at them one by one. In a sequence-to-sequence
architecture, we pass the sequence as input and make use of the output produced at
every time step. For example, a real-time speech recognition model could receive audio
at each time step and output the corresponding text.

Sequence-to-vector architecture
In a sequence-to-vector architecture, we pass a sequence as input, but ignore all the
outputs but the last one. In other words, we let the model process the entire input
sequence before it produces the output. We can use this architecture to classify text as
one of multiple topics. It's a good idea to let the model "read" the whole text before it
decides what it's about. We will also use the sequence-to-vector architecture for
electricity consumption prediction.

Vector-to-sequence architecture
One can also build a vector-to-sequence architecture where we pass a single input and
replace all other inputs with zeros but make use of all the outputs from each time step.
This architecture can be used for text generation: given a single vector representing a
specific topic, style, or sentiment, a model can generate a sequence of words or
sentences.

Encoder-decoder architecture
Finally, in an encoder-decoder architecture, we pass the input sequences, and only then
start using the output sequence. This is different from sequence-to-sequence in which
outputs are generated while the inputs are still being received. A canonical use case is
machine translation. One cannot translate word by word; rather the entire input must be
processed before output generation can start.

RNN in PyTorch
Let's build a sequence-to-vector RNN in PyTorch. We define a model class with the init
method as usual. Inside it, we assign the [Link] layer to [Link], passing it an input
size of 1 since we only have one feature, the electricity consumption, an arbitrarily
chosen hidden size of 32 and 2 layers, and we set batch_first to True since our data will
have the batch size as its first dimension. We also define a linear layer mapping from
the hidden size of 32 to the output of 1. In the forward method, we initialize the first
hidden state to zeros using [Link] and assign it to h0. Its shape is the number of
layers (2) by input size, which we extract from x as [Link]-zero, by hidden state size
(32). Next, we pass the input x and the first hidden state through the RNN layer. Then,
we select only the last output by indexing the middle dimension with -1, pass the result
through the linear layer, and return.
LSTM and GRU cells
Short-term memory problem
Because RNN neurons pass the hidden state from one time step to the next, they can
be said to maintain some sort of memory. That's why they are often called RNN memory
cells, or just cells for short. However, this memory is very short-term: by the time a long
sentence is processed, the hidden state doesn't have much information about its
beginning. Imagine trying to translate a sentence between languages; as soon as we
have read it, we don't remember how it started. To solve this short-term memory
problem, two more powerful types of cells have been proposed: the Long Short-Term
Memory or LSTM cell and the Gated Recurrent Unit or GRU cell.
RNN cell
Before we look at LSTM and GRU cells, let's visualize the plain RNN cell. At each time
step t, it takes two inputs, the current input data x and the previous hidden state h. It
multiplies these inputs with the weights, applies activation, and outputs two things: the
current outputs y and the next hidden state.

LSTM cell
The LSTM cell has three inputs and outputs. Next to the input data x, there are two
hidden states: h represents the short-term memory and c the long-term memory. At
each time step, h and x are passed through some linear layers called gate controllers
which determine what is important enough to keep in the long-term memory. The gate
controllers first erase some parts of the long-term memory in the forget gate. Then, they
analyze x and h and store their most important parts in the long-term memory in the
input gate. This long-term memory, c, is one of the outputs of the cell. At the same time,
another gate called the output gate determines what the current output y should be. The
short-term memory output h is the same as y.
LSTM in PyTorch

Building an LSTM network in PyTorch is very similar to the plain RNN we have already
seen. In the init method, we only need to use the [Link] layer instead of [Link].
The arguments that the layer takes as inputs are the same. In the forward method, we
add the long-term hidden state c and initialize both h and c with zeros. Then, we pass h
and c as a tuple to the LSTM layer. Finally, we take the last output, pass it through the
linear layer and return just like before.

6. GRU cell
The GRU cell is a simplified version of the LSTM cell. It merges the long-term and short-
term memories into a single hidden state. It also doesn't use an output gate: the entire
hidden state is returned at each time step.
GRU in PyTorch
Building a GRU network in PyTorch is almost identical to the plain RNN. All we need to
do is replace the [Link] with [Link] when defining the layer in the init method, and then
call the new gru layer in the forward method.

Should I use RNN, LSTM, or GRU?

So, which type of recurrent network should we use: the plain RNN, LSTM, or GRU?
There is no single answer, but consider the following. Although plain RNNs have
revolutionized modeling of sequential data and are important to understand, they are
not used much these days because of the short-term memory problem. Our choice will
likely be between LSTM and GRU. GRU's advantage is that it's less complex than
LSTM, which means less computation. Other than that, the relative performance of
GRU and LSTM varies per use case, so it's often a good idea to try both and compare
the results. We will learn how to evaluate these models soon.
Training and evaluating RNNs
Mean Squared Error Loss
Up to now, we have been solving classification tasks using cross-entropy losses.
Forecasting of electricity consumption is a regression task, for which we will use a
different loss function: Mean Squared Error. Here is how it's calculated. The difference
between the predicted value and the target is the error. We then square it, and finally
average over the batch of examples. Squaring the errors plays two roles. First, it
ensures positive and negative errors don't cancel out, and second, it penalizes large
errors more than small ones. Mean Squared Error loss is available in PyTorch as
[Link].

Expanding tensors

Before we take a look at the model training and evaluation, we need to discuss two
useful concepts: expanding and squeezing tensors. Let's tackle expanding first. All
recurrent layers, RNNs, LSTMs, and GRUs, expect input in the shape: batch size,
sequence length, number of features. But as we loop over the DataLoader, we can see
that we got the shape batch size of 32 by the sequence length of 96. Since we are
dealing with only one feature, the electricity consumption, the last dimension is dropped.
We can add it, or expand the tensor, by calling view on the sequence and passing the
desired shape.
Squeezing tensors
Conversely, as we evaluate the model, we will need to revert the expansion we have
applied to the model inputs which can be achieved through squeezing. Let's see why
that's the case and how to do it. As we iterate through test data batches, we get labels
in shape batch size. Model outputs, however, are of shape batch size by 1, our number
of features. We will be passing the labels and the model outputs to the loss function,
and each PyTorch loss requires its inputs to be of the same shape. To achieve that, we
can apply the squeeze method to the model outputs. This will reshape them to match
the labels' shape.

Training loop
The training loop is similar to what we have already seen. We instantiate the model and
define the loss and the optimizer. Then, we iterate over epochs and training data
batches. For each batch, we reshape the input sequence as we have just discussed.
The rest of the training loop is the same as before.
Evaluation loop
Let's look at the evaluation loop. We start by setting up the Mean Squared Error metric
from torchmetrics. Then, we iterate through test data batches without computing the
gradients. Next, we reshape the model inputs just like during training, pass them to the
model, and squeeze the outputs. Finally, we update the metric. After the loop, we can
print the final metric value by calling compute on it, just like we did before.

LSTM vs. GRU

Here is our LSTM's test Mean Squared Error again. Let's see how it compares to a
GRU network. It seems that for our electricity consumption dataset, with the task
defined as predicting the next value based on the previous 24 hours of data, both
models perform similarly, with GRU achieving even a slightly lower error. In this case,
GRU might be preferred as it achieves the same or better results while requiring less
processing power.

Module 4
No ratings yet
Module 4
36 pages
Institute of Engineering and Technology Davv, Indore: Lab Assingment On
No ratings yet
Institute of Engineering and Technology Davv, Indore: Lab Assingment On
14 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
Unfolding RNN Computational Graphs
No ratings yet
Unfolding RNN Computational Graphs
42 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Dl-Unit 5
No ratings yet
Dl-Unit 5
10 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Module 5 (Chapter 10)
No ratings yet
Module 5 (Chapter 10)
17 pages
Deep Learning Sequential Models
No ratings yet
Deep Learning Sequential Models
105 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Advanced RNN Design & Applications
No ratings yet
Advanced RNN Design & Applications
41 pages
Chapter15 RNN
No ratings yet
Chapter15 RNN
29 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
63 pages
Unit 3
No ratings yet
Unit 3
8 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
CS115 Math For Computer Science
No ratings yet
CS115 Math For Computer Science
45 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
28 pages
Introduction to TensorFlow for AI
No ratings yet
Introduction to TensorFlow for AI
9 pages
Mod 4 - Full Notes
No ratings yet
Mod 4 - Full Notes
23 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
PyTorch Guide for Deep Learning
No ratings yet
PyTorch Guide for Deep Learning
5 pages
Deep Learning Unit 4
No ratings yet
Deep Learning Unit 4
11 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
8 pages
RNNs & Teacher Forcing Explained
No ratings yet
RNNs & Teacher Forcing Explained
121 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
DL Unit Iv
No ratings yet
DL Unit Iv
15 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
36 pages
Recurrent Neural Networks: Anahita Zarei, PH.D
No ratings yet
Recurrent Neural Networks: Anahita Zarei, PH.D
37 pages
RNN Tutorial
No ratings yet
RNN Tutorial
41 pages
Aids Ii
No ratings yet
Aids Ii
42 pages
Complex Neural Networks Made Easy by Chainer
No ratings yet
Complex Neural Networks Made Easy by Chainer
13 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Module5 Notes
No ratings yet
Module5 Notes
23 pages
Astro AI
No ratings yet
Astro AI
20 pages
DL 4
No ratings yet
DL 4
11 pages
Sentiment Analysis With An Recurrent Neural Networks
No ratings yet
Sentiment Analysis With An Recurrent Neural Networks
12 pages
NN UNIT 5 Notes
No ratings yet
NN UNIT 5 Notes
23 pages
NLP Unit-3A Notes
No ratings yet
NLP Unit-3A Notes
28 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
Peerj Cs 2481
No ratings yet
Peerj Cs 2481
32 pages
Chapter 5 - RNN Updated
No ratings yet
Chapter 5 - RNN Updated
116 pages
Module5 DL
No ratings yet
Module5 DL
18 pages
Build RNN with Numpy: Step-by-Step Guide
No ratings yet
Build RNN with Numpy: Step-by-Step Guide
36 pages
DLT Unit-3
No ratings yet
DLT Unit-3
41 pages
Deep Learning Subject Practicals Uni Mumbai
No ratings yet
Deep Learning Subject Practicals Uni Mumbai
13 pages
RES-TLL008F21-6036 Lab9
No ratings yet
RES-TLL008F21-6036 Lab9
10 pages
Vanilla RNN Presentation
No ratings yet
Vanilla RNN Presentation
2 pages
Module 5
No ratings yet
Module 5
21 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
Recurrent Neural Network Wiki
100% (1)
Recurrent Neural Network Wiki
7 pages
C6-Builders Guide
No ratings yet
C6-Builders Guide
26 pages
2111CS010077 Deep Learning
No ratings yet
2111CS010077 Deep Learning
10 pages
RNN Advantages and Disadvantages
No ratings yet
RNN Advantages and Disadvantages
13 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
SQL and Relational Algebra Guide
No ratings yet
SQL and Relational Algebra Guide
32 pages
ITPM - Business Analyst
No ratings yet
ITPM - Business Analyst
20 pages
Parker Swagelok Interchange Catalogue
No ratings yet
Parker Swagelok Interchange Catalogue
68 pages
Temporary Kiosk Design Criteria
No ratings yet
Temporary Kiosk Design Criteria
2 pages
Edu654 Quiz 2
0% (1)
Edu654 Quiz 2
3 pages
01 Laboratory Exercise 1 - ARG
No ratings yet
01 Laboratory Exercise 1 - ARG
4 pages
Udyog Aadhaar Certificate Acknowledgement
No ratings yet
Udyog Aadhaar Certificate Acknowledgement
1 page
Design Technology HL Paper 1
No ratings yet
Design Technology HL Paper 1
16 pages
Presentation On:-Boiler and Its Types: Presented By
No ratings yet
Presentation On:-Boiler and Its Types: Presented By
10 pages
User Manual : EBD M03+ Mini Electronic Load
No ratings yet
User Manual : EBD M03+ Mini Electronic Load
4 pages
Weekly Progress Report 17
No ratings yet
Weekly Progress Report 17
2 pages
Unit II Linear Data Structures
No ratings yet
Unit II Linear Data Structures
30 pages
Chapter 1 - The Importance of MIS
100% (1)
Chapter 1 - The Importance of MIS
47 pages
Media and Information
No ratings yet
Media and Information
19 pages
DLP - March 52024 ART
No ratings yet
DLP - March 52024 ART
3 pages
Pipe SS150
No ratings yet
Pipe SS150
25 pages
Pressure Drop Calcualtion
No ratings yet
Pressure Drop Calcualtion
11 pages
Types and Grades of Steel Pipes
No ratings yet
Types and Grades of Steel Pipes
4 pages
Introduction To Bash Scripting
No ratings yet
Introduction To Bash Scripting
6 pages
Recykal
No ratings yet
Recykal
7 pages
Serial No. 211716
100% (3)
Serial No. 211716
339 pages
LEAP PPT Nicole Cusap - 102232
No ratings yet
LEAP PPT Nicole Cusap - 102232
10 pages
26 - Fans
No ratings yet
26 - Fans
20 pages
Planos Hidráulicos SFL35 - JMC-601
No ratings yet
Planos Hidráulicos SFL35 - JMC-601
6 pages
FPGA Based Signal Processing Implementation For Hearing Impairment
No ratings yet
FPGA Based Signal Processing Implementation For Hearing Impairment
5 pages
SAP Material Management Guide
No ratings yet
SAP Material Management Guide
15 pages
Detuned Reactor Specifications: 5.67%, 7%, 14% Impedance
No ratings yet
Detuned Reactor Specifications: 5.67%, 7%, 14% Impedance
6 pages
TMC2209 UART Mode With The Trigorilla v1.0
No ratings yet
TMC2209 UART Mode With The Trigorilla v1.0
14 pages
Outdoor Face Recognition Terminal
No ratings yet
Outdoor Face Recognition Terminal
5 pages
Azzurro PH Hyd zp1 en
No ratings yet
Azzurro PH Hyd zp1 en
1 page

Handling Sequence Data in Pytorch

Uploaded by

Handling Sequence Data in Pytorch

Uploaded by

Handling sequences with PyTorch

Recurrent Neural Networks

Should I use RNN, LSTM, or GRU?

LSTM vs. GRU

You might also like