Skip to content

Conversation

@VinodSKumar
Copy link
Contributor

@VinodSKumar VinodSKumar commented Aug 7, 2020

Fixes #37756

@glaringlee glaringlee self-requested a review August 7, 2020 03:36
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 7, 2020
Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinodSKumar Thank you so much for making this change!

Can you tell me how you send the pull request to pytorch master branch? Your PR did not trigger our CI (contiguous integration) tests, so both you and I are not able to know whether your code is able to run under different envs.

Here is an example PR that triggered out CI test, check PR 42742 (add https://github.com/pytorch/pytorch/pull/ before). You can see at the bottom, it triggered many CI tests that test the code under different env, but your PR triggered only lint checks and some jenkins checks, I am wondering the way you create PRs to pytorch master branch.

Can please rebase your code? i checked in the encoder layer, so you don't need CMakeLists.txt and enum.h/cpp I guess.

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from a95acb5 to 7587be2 Compare August 9, 2020 04:08
@dr-ci
Copy link

dr-ci bot commented Aug 9, 2020

💊 CI failures summary and remediations

As of commit 54a0872 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from 7587be2 to c3e004b Compare August 9, 2020 17:52
@VinodSKumar
Copy link
Contributor Author

@VinodSKumar Thank you so much for making this change!

Can you tell me how you send the pull request to pytorch master branch? Your PR did not trigger our CI (contiguous integration) tests, so both you and I are not able to know whether your code is able to run under different envs.

Here is an example PR that triggered out CI test, check PR 42742 (add https://github.com/pytorch/pytorch/pull/ before). You can see at the bottom, it triggered many CI tests that test the code under different env, but your PR triggered only lint checks and some jenkins checks, I am wondering the way you create PRs to pytorch master branch.

Can please rebase your code? i checked in the encoder layer, so you don't need CMakeLists.txt and enum.h/cpp I guess.

@glaringlee Thank you for your feedback and guidance, updated the CUDA test cases as per the comment. Fixed the reset parameters implementation.
The rebase was done by the following steps
Step 1: git fetch upstream
Step 2: git rebase upstream/master
Step 3: //resolved conflicts(in transformer files) and addressed the review comments
Step 4: git rebase --continue //due to conflicts
Step 5: git push origin CPP_API_TransformerDecoderLayer -f

I am not sure if I am missing something in the rebase and pushing the changes to git, the git log output is as below.
(base) v70786@master:/opt/ai/forks/pytorch$ git log --oneline
c3e004b837 (HEAD -> CPP_API_TransformerDecoderLayer, origin/CPP_API_TransformerDecoderLayer) As per the review feedback, rebased and addressed comments
283c148237 Python/C++ API Parity: TransformerDecoderLayer
87970b7 (upstream/master, upstream/fbcode/warm) Adds 'clip' alias for clamp (#42770)
b6810c1 Include/ExcludeDispatchKeySetGuard API (#42658)

Now the CI is still not getting triggered, Please help me to check the reason for the CI not getting triggered.

@VinodSKumar VinodSKumar requested a review from glaringlee August 9, 2020 18:17
@glaringlee
Copy link
Contributor

glaringlee commented Aug 10, 2020

@VinodSKumar Yep, this is strange to me as well. I will figure it out, will update here for you later. Thank you!

@malfet
Copy link
Contributor

malfet commented Aug 10, 2020

The rebase was done by the following steps
Step 1: git fetch upstream
Step 2: git rebase upstream/master
Step 3: //resolved conflicts(in transformer files) and addressed the review comments
Step 4: git rebase --continue //due to conflicts
Step 5: git push origin CPP_API_TransformerDecoderLayer -f

This looks exactly like my workflow for rebasing/updating PRs.

@malfet
Copy link
Contributor

malfet commented Aug 10, 2020

@VinodSKumar are you by any chance watching your fork of the repo on CircleCI?

@VinodSKumar
Copy link
Contributor Author

are you by any chance watching your fork of the repo on CircleCI

@malfet Could you please point me to the link on how to check it. May there is some logs which can help me to resolve it.

@glaringlee
Copy link
Contributor

are you by any chance watching your fork of the repo on CircleCI

@malfet Could you please point me to the link on how to check it. May there is some logs which can help me to resolve it.

@VinodSKumar you basically just need to go to CircleCI.com, and log into your account and see whether you are following your forked pytorch repo. CircleCI's people already checked for you, you are not following anything. We are still looking into why. cc @malfet

@glaringlee
Copy link
Contributor

glaringlee commented Aug 10, 2020

@VinodSKumar
I did a test and it seems I can trigger the CI tests for your PR.
Can you re-push your code and tell me exactly the time you pushed your code? So we can take a look at the webhook log and find out why you can not trigger our CI tests. Thanks a lot.

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from 0608d1d to 91b0206 Compare August 11, 2020 03:24
@VinodSKumar
Copy link
Contributor Author

@glaringlee There was a bug in CUDA test case, In one of the test case the tensor options were missed. I have fixed it and pushed.

Now the CI seems to be running, In case we need the time to take look at webhook log, the push was done at the following time
Bangalore, India Tue, 11 Aug 2020 at 08:55 IST
San Francisco, USA Mon, 10 Aug 2020 at 20:25 PDT
New York, USA Mon, 10 Aug 2020 at 23:25 EDT
Corresponding UTC Tue, 11 Aug 2020 at 03:25

One more point in CI, third_party/tensorpipe checkout is not successful, due to which some of the checks are not successful.

@glaringlee
Copy link
Contributor

@glaringlee There was a bug in CUDA test case, In one of the test case the tensor options were missed. I have fixed it and pushed.

Now the CI seems to be running, In case we need the time to take look at webhook log, the push was done at the following time
Bangalore, India Tue, 11 Aug 2020 at 08:55 IST
San Francisco, USA Mon, 10 Aug 2020 at 20:25 PDT
New York, USA Mon, 10 Aug 2020 at 23:25 EDT
Corresponding UTC Tue, 11 Aug 2020 at 03:25

One more point in CI, third_party/tensorpipe checkout is not successful, due to which some of the checks are not successful.

@VinodSKumar
There was a bug for tensorpipe, if you rebase your code, it will be fixed automatically.

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from 91b0206 to 2386c89 Compare August 11, 2020 08:43
@VinodSKumar
Copy link
Contributor Author

@glaringlee @malfet Thank you for your support and guidance, now CI is successful.

@glaringlee
Copy link
Contributor

glaringlee commented Aug 11, 2020

@VinodSKumar Can you rebase your code again due to some fixes from our end.

Then I am going to approve this change.

For TransformerDecoder work. I am currently thinking about what to support regarding the input:

In python version, input for both TransformerEncoder and Decoder contains TransformerEncoderLayer and DecoderLayer. If we let the TransformerEncoderOptions and DecoderOptions support using EncoderLayer and DecoderLayer as input, then we need to separate the header file between TransformerEncoder/Decoder and TransformerEncoderLayer/DecoderLayer, since the TransformerEncoderOptions/DecoderOptions need to include TransformerEncoderLayer/DecoderLayer module's header, and this will cause cycle inclusion if we put everything together.

The other option is that we don't support passing the TransformerEncoderLayer/DecoderLayer module as an input to TransformerEncoder/Decoder, instead, we pass TransformerEncoderLayerOptions/DecoderLayerOptions in, since anyway, we just clone the passing in EncoderLayer and DecoderLayer, in this case, we don't need to split the header file.

I am still thinking about that, if you have opinions, we can discuss here.

I will have a draft PR within next 2 days, i will let u know in #37756

Thanks again for your help!

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from 2386c89 to 91eefcc Compare August 11, 2020 17:49
@glaringlee
Copy link
Contributor

@VinodSKumar Thanks for rebase.
There is currently some buggy code in the master which breaks our CI system. See here:
https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-master

We might need a rebase later, I will post here.

@VinodSKumar
Copy link
Contributor Author

@VinodSKumar Thanks for rebase.
There is currently some buggy code in the master which breaks our CI system. See here:
https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-master

We might need a rebase later, I will post here.

Sure, will rebase once the issue in master it is resolved.

@VinodSKumar
Copy link
Contributor Author

For TransformerDecoder work. I am currently thinking about what to support regarding the input:

In python version, input for both TransformerEncoder and Decoder contains TransformerEncoderLayer and DecoderLayer. If we let the TransformerEncoderOptions and DecoderOptions support using EncoderLayer and DecoderLayer as input, then we need to separate the header file between TransformerEncoder/Decoder and TransformerEncoderLayer/DecoderLayer, since the TransformerEncoderOptions/DecoderOptions need to include TransformerEncoderLayer/DecoderLayer module's header, and this will cause cycle inclusion if we put everything together.

The other option is that we don't support passing the TransformerEncoderLayer/DecoderLayer module as an input to TransformerEncoder/Decoder, instead, we pass TransformerEncoderLayerOptions/DecoderLayerOptions in, since anyway, we just clone the passing in EncoderLayer and DecoderLayer, in this case, we don't need to split the header file.

I am still thinking about that, if you have opinions, we can discuss here.

@glaringlee Yes, regarding the decoder implementation, we would need to split the transformer files to avoid cyclic dependency.
Today I was implementing the 2nd option of using the TransformerDecoderLayerOptions as parameters, and wanted to get the feed back in the review. Using the decoder layer options as parameter avoids splitting the files, and the same pattern could be used in the Transformer module to configure the custom encoder / decoder. The draft PR 42886 is based on using layer options as parameter.

The down side of using the layer options as parameters, is that the encoder / decoder must be created from scratch. In scenarios where users want to reuse an existing (encoder / decoder) / (encoder layer / decoder layer) they would not be able to do so. If this is applicable in scenarios like transfer learning or continuing training, then we need to honor the support for encoder / decoder layers are parameters. Would it be alright to split the files like transformer_layer, transformer_coder and transformer.

@glaringlee
Copy link
Contributor

@VinodSKumar
Master branch is good now, can you please rebase? Thanks

@glaringlee
Copy link
Contributor

glaringlee commented Aug 12, 2020

For TransformerDecoder work. I am currently thinking about what to support regarding the input:
In python version, input for both TransformerEncoder and Decoder contains TransformerEncoderLayer and DecoderLayer. If we let the TransformerEncoderOptions and DecoderOptions support using EncoderLayer and DecoderLayer as input, then we need to separate the header file between TransformerEncoder/Decoder and TransformerEncoderLayer/DecoderLayer, since the TransformerEncoderOptions/DecoderOptions need to include TransformerEncoderLayer/DecoderLayer module's header, and this will cause cycle inclusion if we put everything together.
The other option is that we don't support passing the TransformerEncoderLayer/DecoderLayer module as an input to TransformerEncoder/Decoder, instead, we pass TransformerEncoderLayerOptions/DecoderLayerOptions in, since anyway, we just clone the passing in EncoderLayer and DecoderLayer, in this case, we don't need to split the header file.
I am still thinking about that, if you have opinions, we can discuss here.

@glaringlee Yes, regarding the decoder implementation, we would need to split the transformer files to avoid cyclic dependency.
Today I was implementing the 2nd option of using the TransformerDecoderLayerOptions as parameters, and wanted to get the feed back in the review. Using the decoder layer options as parameter avoids splitting the files, and the same pattern could be used in the Transformer module to configure the custom encoder / decoder. The draft PR 42886 is based on using layer options as parameter.

The down side of using the layer options as parameters, is that the encoder / decoder must be created from scratch. In scenarios where users want to reuse an existing (encoder / decoder) / (encoder layer / decoder layer) they would not be able to do so. If this is applicable in scenarios like transfer learning or continuing training, then we need to honor the support for encoder / decoder layers are parameters. Would it be alright to split the files like transformer_layer, transformer_coder and transformer.

Let's split the files. You don't need to split this one, I will split it for you after your code is landed, but for TransformerDecoder, please write them into transformerencoder.h for both module and options, cpp file can be one.

Currently, even we pass in encoder/decoder layer, people still can not use it, the reason is that in c++ implementation, when you call clone(), what we do currently is to create an empty module, eg, create an empty encoder/decoder layer and re-initialize all its member variables. I am not sure why since this is before I took over the c++ frontend codebase. I guess this is due to the nature of c++, it is not easy to duplicate a module in a generic way, but we don't have any complain on this implementation so far.

I talked to the python version owner, he never saw any example that an encoderlayer or a decoderlayer's forward() is called before passing into an TransformerEncoder/TransformerDecoder. So I think we can just split the file for now and provide two constructors for options, one is passing encoderlayer/decoderlayer option (and create a new module), the other one is passing ref of encoderlayer/decoderlayer. Though there is no much difference right now.

Your draft is similar with my draft, but for norm, we should use AnyModule. I will have my draft ready on Thursday since I have an urgent PR review tomorrow.

Will approve this one after rebase. And if you have slake or email, we can communicate there, otherwise, let use PR [42886]

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VinodSKumar see my previous comments, thanks

@VinodSKumar VinodSKumar force-pushed the CPP_API_TransformerDecoderLayer branch from 91eefcc to 54a0872 Compare August 12, 2020 03:30
@VinodSKumar
Copy link
Contributor Author

@glaringlee After rebase all the checks in CI have passed.
I am not on Pytorch Slack, please let me know if I can apply. My email id is [email protected]

@VinodSKumar VinodSKumar requested a review from glaringlee August 12, 2020 11:12
@glaringlee
Copy link
Contributor

@VinodSKumar
Thanks a lot. I will approve this. I'll shot you an email to discuss Slack, thanks.

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@glaringlee merged this pull request in 830423b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C++ API for Transformer model in libtorch 1.5.0

7 participants