Skip to content

Conversation

@glaringlee
Copy link
Contributor

@glaringlee glaringlee commented Aug 18, 2020

Stack from ghstack:

Differential Revision: D23182770

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request Aug 18, 2020
ghstack-source-id: 23b7358
Pull Request resolved: #43187
@glaringlee glaringlee requested review from zhangguanheng66 and removed request for ebetica and goldsborough August 18, 2020 03:21
@glaringlee glaringlee changed the title C++ APIs TransformerEncoder [WIP] C++ APIs TransformerEncoder Aug 18, 2020
@dr-ci
Copy link

dr-ci bot commented Aug 18, 2020

💊 CI failures summary and remediations

As of commit e70c50b (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 24 times.

glaringlee pushed a commit that referenced this pull request Aug 18, 2020
ghstack-source-id: 519e439
Pull Request resolved: #43187
@glaringlee glaringlee changed the title [WIP] C++ APIs TransformerEncoder C++ APIs TransformerEncoder Aug 19, 2020
glaringlee pushed a commit that referenced this pull request Aug 19, 2020
ghstack-source-id: dc1c507
Pull Request resolved: #43187
transformer_decoder_layer_test_helper(true);
}

void transformer_decoder_layer_test_helper_gelu(bool is_cuda) {
Copy link
Contributor

@zhangguanheng66 zhangguanheng66 Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why we need this helper specific for gelu? any idea for relu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we have in the test_nn.py.
I merged them into one test function for encoderlayer, but the decoder layer is contributed by external user who strictly followed the test_nn.py. I will refactor this a bit later

: d_model_(d_model), nhead_(nhead){}


TransformerEncoderOptions::TransformerEncoderOptions(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we publish transformer cpp later? The transformer model in general comes with decoder module, which I guess will be published later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 Current situation is that we have 2 people working on the transformer. Decoder will be in another PR. Pushing this first, so the decoder developer is easy to sync. And we will have Transformer top layer next week.

@glaringlee glaringlee requested a review from zou3519 August 24, 2020 15:34
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a full review) asking some questions


Tensor forward(
const Tensor& src,
const Tensor& src_mask = {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument is called "mask" in Python:

def forward(self, src: Tensor, mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor:
. Does it matter that this one is different?

I do prefer src_mask because it makes it clear what is getting masked...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 I make the name to src_mask for the same reason as @zou3519, agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to call src_mask and tgt_mask along with src_key_padding_mask and tgt_key_padding_mask (consistent with the python ones)

Comment on lines +255 to +256
// a. No way to know whether module in AnyModule has api to reset_parameters, so replace instead
// b. Allow user to add/delete normalization module when reset parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does reset_parameters() get called? is it a user API or something that is used in the framework?

Copy link
Contributor Author

@glaringlee glaringlee Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 As I mentioned before, this is a convenient function to reset parameters within the module. It is not necessary to call it. And I believe, it won't be called in most of the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the thing I am confused about here is:

  • there's no reset_parameters() or _reset_parameters on TransformerEncoderLayer in Python. Some of our Python nn.Modules do have reset_parameters() that re-initializes parameters.

And one potential concern is, let's say you have an optimizer that takes in a list of parameters (disclaimer: I have no clue how optimizers in C++ work). Is the following possible?

  • I create a TransformerEncoderLayer in C++
  • I add pass the parameters of a TransformerEncoderLayer to an optimizer (in C++)
  • then, I call reset_parameters() on TransformerEncoderLayer
  • The optimizer holds onto references of the original parameters (and not the new ones!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not possible since the C++ module underlying is a shared pointer, and optimizer doesn't clone the module but just shallow copy the module, so this case should be ok.

But the reset_parameters() it self is tricky, I admit that, even in python impl, reset_parameters() has different usage in different modules and not always there and not always get called. There is a PR that trying to standarize it by making reset_parameter() virtual and moved that to base class, but this change doesn't make sense since half of the module doesn't have it and virtual will make extra cost. So currently, reset_parameters() is just an optional functions.

glaringlee pushed a commit that referenced this pull request Aug 25, 2020
ghstack-source-id: d8ce09d
Pull Request resolved: #43187
@glaringlee glaringlee requested a review from zou3519 August 25, 2020 20:04
Comment on lines +24 to +25
// This constructor will create a new TransformerEncoderLayer obj based on passed in encoder_layer_options.
TransformerEncoderOptions(const TransformerEncoderLayerOptions& encoder_layer_options, int64_t num_layers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also keeps a shallow copy of the data in encoder_layer, right?

Copy link
Contributor Author

@glaringlee glaringlee Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519
No, TransformerEncoder will use encoder_layer_options as input, allocating a new TransformerEncdoerLayer.
This is the major difference between these two constructors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now, thanks for the clarification

glaringlee pushed a commit that referenced this pull request Aug 26, 2020
ghstack-source-id: 23d2e54
Pull Request resolved: #43187
glaringlee pushed a commit that referenced this pull request Aug 26, 2020
ghstack-source-id: a2597a6
Pull Request resolved: #43187
@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #43187 into gh/glaringlee/25/base will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           gh/glaringlee/25/base   #43187   +/-   ##
======================================================
  Coverage                  69.40%   69.40%           
======================================================
  Files                        378      378           
  Lines                      46606    46606           
======================================================
  Hits                       32346    32346           
  Misses                     14260    14260           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bda5e4...e70c50b. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

@glaringlee merged this pull request in 48e08f8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants