nn.Transformer #20170

zhangguanheng66 · 2019-05-06T16:32:24Z

Create a PR for comments. The model is still WIP but I want to have some feedbacks before moving too far. The transformer model depends on several modules, like MultiheadAttention (landed).

Transformer is implemented based on the paper (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf). Users have the flexibility to build a transformer with self-defined and/or built-in components (i.e encoder, decoder, encoder_layer, decoder_layer). Users could use Transformer class to build a standard transformer model and modify sub-layers as needed.

Add a few unit tests for the transformer module, as follow:
TestNN.test_Transformer_cell
TestNN.test_transformerencoderlayer
TestNN.test_transformerdecoderlayer
TestNN.test_transformer_args_check
TestScript.test_scriptmodule_transformer_cuda

There is another demonstration example for applying transformer module on the word language problem. pytorch/examples#555

torch/nn/modules/transformer.py

torch/nn/modules/__init__.py

ssnl · 2019-05-06T18:12:11Z

Nice progress. But I have several comments and I also don't see the test and doc entries.

zhangguanheng66 · 2019-05-06T18:15:15Z

Nice progress. But I have several comments and I also don't see the test and doc entries.

We haven't decided a unit test for the transformer model. I will add the doc entries.

Add transformer doc string.

torch/nn/modules/transformer.py

cpuhrsch · 2019-05-07T15:28:24Z

Looking much better! Just as @ssnl mentioned, please add unit tests.

remove the dependency of numpy library in Transformer.py. Add costom_encoder and costom_decoder to Transformer class.

Add a unit test for Transformer -- test_transformer_args_check Add a unit test for Transformer -- TestNN.test_transformerencoderlayer Add a unit test for Transformer -- TestNN.test_transformerencoderlayer Add fixed numerical results. Update TestNN.test_transformer_args_check to include src_mask, tgt_mask, memory_mask arguments

test/test_nn.py

… test_transformerencoderlayer.

cpuhrsch · 2019-05-13T16:02:24Z

Pinging @srush @kyunghyuncho @myleott @glample for additional feature requests / review.

cpuhrsch · 2019-05-13T16:04:40Z

Pinging @mansimov @jasonleeinf @myleott @fmassa for additional feature requests / review.

cpuhrsch · 2019-05-13T16:06:43Z

Pinging @stephenroller @douwekiela for additional feature requests / review.

zhangguanheng66 · 2019-05-13T16:09:49Z

We also have a PR for applying torch.nn.Transformer in word language problem (pytorch/examples#555). @myleott

torch/nn/modules/transformer.py

…tention module.

…r transformer module)

cpuhrsch · 2019-05-28T17:09:56Z

torch/nn/modules/transformer.py

+
+class Transformer(Module):
+    r"""A transformer model. User is able to modified the attributes as needed. The architechture
+        is based on the paper "Attention Is All You Need".


Add full citation

torch/nn/modules/transformer.py

…nsformer.

torch/nn/modules/transformer.py

facebook-github-bot

@zhangguanheng66 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zhangguanheng66 · 2019-06-07T15:18:10Z

fix #10459

DNGros · 2019-06-11T04:38:56Z

torch/nn/modules/transformer.py

+
+    def __init__(self, encoder_layer, num_layers, norm=None):
+        super(TransformerEncoder, self).__init__()
+        self.layers = _get_clones(encoder_layer, num_layers)


Is it intended behavior to have weight sharing? In a standard transformer (like from Attention is All You Need), I don't believe the weights are shared. I understand there has been followup work showing weight sharing can be beneficial in some cases, but if this is intended behavior, it might be useful to specify this difference from the paper's implementation in docs after the Vaswani et al. citation and cite the followup work.

As a broader point:
This seems to get at the fact that a Transformer is a rather high level component, without really consensus yet on its architecture with a lot decisions likely domain specific (weight sharing?, beam search decoder?, convolutions between layers?, output the intermediate attentions? hierarchy / grouping components of the task?, sparsity?... and many more ideas that are being rapidly published)

I don't mean to try and start needless debates, or to at all imply this Transformer code is not useful, but I will add in my thoughts that from my limited perspective that maybe Transformers might better belong in the contrib module or in the docs as a example that people can modify to meet their needs. The torch.nn.Modules module seems (at least currently) to be for more foundational components that can be composed into larger models, not full model architectures themselves.

There has already been fairly extensive discussion on this in #10459 , and it seemed like the consensus there was to focus on things like MultiHeadedAttention or PositionalEncoding for core and keep full architectures separate in the codebase.

As an addendum to that second part: Thinking about this more I could see how a TransformerEncoder (or like StackedSelfAttention or RecurrentSelfAttention) could maybe be considered a primitive component which could lend itself to maybe eventually being low level optimized and composed into novel things (though not really that much more primitive than something like a ResNet-block). However, when it starts seeming particularly high level, and full-archetecture-y is the inclusion of a prepackaged encoder and decoder Transformer for seq2seq, which, without a lot of additional components, seems likely would not meet all needs and could not be easily adapted/composed.
I don't know, and I don't really have any reason to hold strong opinions here. Feel free to dismiss this second part without real justification...

@DNGros Thanks for the comments. To your first question, it's not supposed to share the weights between layers. Actually, if you take a quick look at _get_clones function, it calls a "deepcopy" function.

For your second comment, we do realize there are many ongoing discussions about the transformer models and several variants in different domains. We want to provide a baseline for the research community and startups when people don't want to code from scratch. This module could help people try some preliminary ideas for the fast delivery. For some advanced users, we expect them to develop a specific transformer model, and they could possibly use this module as a reference or benchmark case.

As you suggest, we try to make the model highly "modularized". People can use nn.Tranformer, nn.TransformerEncoder, or EVEN nn.TransformerEncoderLayer, as needed. There is actually a word language example (pytorch/examples#555) where we use nn.TransformerEncoder as the seq2seq model.

In the future, if we see a variant of the transformer model requested widely by the community, we will continuously implement them in our framework. I see a baseline model could benefit more users as we optimize the module performance in the future.

@zhangguanheng66 Ok, sorry. I missed that.

Thanks for clarifying!

cpuhrsch

Merge at will

facebook-github-bot

@zhangguanheng66 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@zhangguanheng66 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-12T22:13:21Z

@zhangguanheng66 merged this pull request in 83cec5f.

A new PR for transformer model.

6ddb0b6

zhangguanheng66 requested review from cpuhrsch, gchanan and soumith May 6, 2019 16:32

pytorchbot added the module: nn Related to torch.nn label May 6, 2019

ssnl reviewed May 6, 2019

View reviewed changes

torch/nn/modules/transformer.py Show resolved Hide resolved

torch/nn/modules/transformer.py Outdated Show resolved Hide resolved

torch/nn/modules/transformer.py Outdated Show resolved Hide resolved

torch/nn/modules/__init__.py Outdated Show resolved Hide resolved

Guanheng Zhang added 2 commits May 6, 2019 12:09

Avoid expose TransformerBase

ecae39a

Add transformer doc string.

Merge branch 'master' into transform_new_pr

13a3f88

cpuhrsch reviewed May 7, 2019

View reviewed changes

torch/nn/modules/transformer.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 7, 2019

View reviewed changes

torch/nn/modules/transformer.py Outdated Show resolved Hide resolved

cpuhrsch reviewed May 7, 2019

View reviewed changes

torch/nn/modules/transformer.py Show resolved Hide resolved

brendan-ai2 mentioned this pull request May 8, 2019

Seq2Seq model decomposition allenai/allennlp#2517

Closed

Guanheng Zhang added 2 commits May 8, 2019 10:08

Remove TransformerBase class from transformer.py.

740ece8

remove the dependency of numpy library in Transformer.py. Add costom_encoder and costom_decoder to Transformer class.

pytorchbot added the module: docs Related to our documentation, both in docs/ and docblocks label May 10, 2019

cpuhrsch reviewed May 10, 2019

View reviewed changes

test/test_nn.py Show resolved Hide resolved

Guanheng Zhang added 2 commits May 10, 2019 13:42

Add additional small tensor tests in test_transformerdecoderlayer and…

70b0079

… test_transformerencoderlayer.

Fix a few lint errors.

041836a

cpuhrsch changed the title ~~[WIP] A new PR for transformer model.~~ nn.Transformer May 13, 2019

stephenroller reviewed May 13, 2019

View reviewed changes

torch/nn/modules/transformer.py Outdated Show resolved Hide resolved

Guanheng Zhang added 3 commits May 14, 2019 08:59

Merge remote-tracking branch 'origin/master' into transform_new_pr

528934c

Fix a typo in MultiheadAttention doc.

ab87b4d

Add an asserterror check back to the __init__ function of MultiheadAt…

f8792e2

…tention module.

Guanheng Zhang added 3 commits May 22, 2019 13:24

Revert an early commit (i.e. forward functions in nn.functional.py fo…

3a35680

…r transformer module)

Merge remote-tracking branch 'origin/master' into transformer

e1f6d42

Merge remote-tracking branch 'upstream/master' into transformer

d452c41

cpuhrsch reviewed May 28, 2019

View reviewed changes

torch/nn/modules/transformer.py Show resolved Hide resolved

cpuhrsch reviewed May 28, 2019

View reviewed changes

torch/nn/modules/transformer.py Show resolved Hide resolved

Guanheng Zhang added 2 commits May 28, 2019 14:52

Add a jit unit test (i.e. test_scriptmodule_transformer_cuda) for tra…

2a5af50

…nsformer.

Add full citation for transformer.

a798b8d

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 28, 2019

Fix two lint errors.

a625c73

cpuhrsch reviewed May 29, 2019

View reviewed changes

torch/nn/modules/transformer.py Show resolved Hide resolved

Add additional ref.

3a8a2f8

facebook-github-bot reviewed May 30, 2019

View reviewed changes

Merge branch 'master' into transform_new_pr

888d7f1

mttk mentioned this pull request Jun 4, 2019

[feature request] nn.Transformer #10459

Closed

Merge branch 'master' into transform_new_pr

0964bfd

DNGros reviewed Jun 11, 2019

View reviewed changes

Move the layer_norm to the end of each sub-layers.

0d15f41

cpuhrsch approved these changes Jun 12, 2019

View reviewed changes

facebook-github-bot reviewed Jun 12, 2019

View reviewed changes

Remove Non-ASCII character in transformer.py.

460476d

facebook-github-bot reviewed Jun 12, 2019

View reviewed changes

facebook-github-bot closed this in 83cec5f Jun 12, 2019

facebook-github-bot added the merged label Jun 12, 2019

zhangguanheng66 deleted the transform_new_pr branch July 12, 2019 17:06

mruberry added the Merged label Oct 28, 2020

albertz mentioned this pull request Nov 7, 2021

Transformer Modules rwth-i6/returnn_common#55

Merged

nn.Transformer #20170

nn.Transformer #20170

Uh oh!

Conversation

zhangguanheng66 commented May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ssnl commented May 6, 2019

Uh oh!

zhangguanheng66 commented May 6, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpuhrsch commented May 7, 2019

Uh oh!

Uh oh!

cpuhrsch commented May 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpuhrsch commented May 13, 2019

Uh oh!

cpuhrsch commented May 13, 2019

Uh oh!

zhangguanheng66 commented May 13, 2019

Uh oh!

Uh oh!

cpuhrsch May 28, 2019

Choose a reason for hiding this comment

Uh oh!

zhangguanheng66 May 29, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zhangguanheng66 commented Jun 7, 2019

Uh oh!

DNGros Jun 11, 2019

Choose a reason for hiding this comment

Uh oh!

DNGros Jun 11, 2019

Choose a reason for hiding this comment

Uh oh!

zhangguanheng66 Jun 11, 2019

Choose a reason for hiding this comment

Uh oh!

DNGros Jun 11, 2019

Choose a reason for hiding this comment

Uh oh!

cpuhrsch left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

zhangguanheng66 commented May 6, 2019 •

edited

Loading

cpuhrsch commented May 13, 2019 •

edited

Loading