-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
As per my understanding, PackedSequence enabled use of variable-length mini-batches without worrying about gradients flowing through the zero-padded frames.
I am curious about if it is also valid in bidirectional RNNs. (bi-RNN)
If it performs reversal of the sequences, I think it should exclude zero-paddings.
If we include zero-paddings during reversal, in the output of the bi-RNN, zero-paddings will be concatenated to the non-paddings in some short sequences in a mini-batch.
This behaviour is different with the case when we use bi-RNNs with mini-batch of size 1.
Example of wrong case (where ● is non-paddng frame; ○ is zero-padding)>
(before reversal)
●●●●○○○○
●●●●●●○○
●●●●●●●○
(after reversal)
○○○○●●●●
○○●●●●●●
○●●●●●●●
Example of correct case>
(before reversal)
●●●●○○○○
●●●●●●○○
●●●●●●●○
(after reversal)
●●●●○○○○
●●●●●●○○
●●●●●●●○
I don't really know how the bi-RNNs are implemented, but I could not see the part that takes lengths informations to handle this problem.
Thank you.