Skip to content

LSTM output dimensionsΒ #607

@emanjavacas

Description

@emanjavacas

According to the docs nn.LSTM outputs:

output : A (seq_len x batch x hidden_size) tensor containing the output features (h_t) from the last layer of the RNN, for each t
h_n : A (num_layers x batch x hidden_size) tensor containing the hidden state for t=seq_len
c_n : A (num_layers x batch x hidden_size) tensor containing the cell state for t=seq_len

However, if the LSTM is initialized as a bidirectional LSTM what you get is:

output : A (seq_len x batch x hidden_size * num_directions) tensor containing the output features (h_t) from the last layer of the RNN, for each t
h_n : A (num_layers * num_directions x batch x hidden_size) tensor containing the hidden state for t=seq_len
c_n : A (num_layers * num_directions x batch x hidden_size) tensor containing the cell state for t=seq_len

It'd be nice to make this explicit in the docs.
Also, I was wondering if the default concatenatin behaviour can be tweaked (to merging, for instance)?
These would all be nice features to have.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions