Add key_padding_mask argument to Transformer module

## 🚀 Feature
Add `key_padding_mask` as an argument to the `Transformer/TransformerEncoder/TransformerDecoder` `forward` methods.

## Motivation

The current implementation of the Transformer only allows the use of the `attn_mask` parameter of the `MultiheadAttention` module. I think this can only be applied to a batch as a whole, not per input in the batch. I think it would be useful to allow the use of the `key_padding_mask` parameter for sequences with padded values.

## Pitch

I want the Transformer not to pay attention to padding elements in a sequence.

## Alternatives

Modify TransformerEncoderLayer

```
class TransformerEncoderLayer
    ...
    def forward(self, src, src_attn_mask=None, src_padding_mask=None):
        src2 = self.self_attn(src, src, src, attn_mask=src_attn_mask, key_padding_mask=src_padding_mask)[0]
    ...
```

Repeat for all relevant modules (`Transformer*`)

I know this can be achieved using custom `Transformer/Encoder/Decoder` modules, but my proposal may be a common issue which can be easily included.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add key_padding_mask argument to Transformer module #22374

🚀 Feature

Motivation

Pitch

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add key_padding_mask argument to Transformer module #22374

Description

🚀 Feature

Motivation

Pitch

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions