-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Python/C++ API Parity: TransformerDecoderLayer #42717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python/C++ API Parity: TransformerDecoderLayer #42717
Conversation
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinodSKumar Thank you so much for making this change!
Can you tell me how you send the pull request to pytorch master branch? Your PR did not trigger our CI (contiguous integration) tests, so both you and I are not able to know whether your code is able to run under different envs.
Here is an example PR that triggered out CI test, check PR 42742 (add https://github.com/pytorch/pytorch/pull/ before). You can see at the bottom, it triggered many CI tests that test the code under different env, but your PR triggered only lint checks and some jenkins checks, I am wondering the way you create PRs to pytorch master branch.
Can please rebase your code? i checked in the encoder layer, so you don't need CMakeLists.txt and enum.h/cpp I guess.
a95acb5 to
7587be2
Compare
💊 CI failures summary and remediationsAs of commit 54a0872 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 26 times. |
7587be2 to
c3e004b
Compare
@glaringlee Thank you for your feedback and guidance, updated the CUDA test cases as per the comment. Fixed the reset parameters implementation. I am not sure if I am missing something in the rebase and pushing the changes to git, the git log output is as below. Now the CI is still not getting triggered, Please help me to check the reason for the CI not getting triggered. |
|
@VinodSKumar Yep, this is strange to me as well. I will figure it out, will update here for you later. Thank you! |
This looks exactly like my workflow for rebasing/updating PRs. |
|
@VinodSKumar are you by any chance watching your fork of the repo on CircleCI? |
@malfet Could you please point me to the link on how to check it. May there is some logs which can help me to resolve it. |
@VinodSKumar you basically just need to go to CircleCI.com, and log into your account and see whether you are following your forked pytorch repo. CircleCI's people already checked for you, you are not following anything. We are still looking into why. cc @malfet |
|
@VinodSKumar |
0608d1d to
91b0206
Compare
|
@glaringlee There was a bug in CUDA test case, In one of the test case the tensor options were missed. I have fixed it and pushed. Now the CI seems to be running, In case we need the time to take look at webhook log, the push was done at the following time One more point in CI, third_party/tensorpipe checkout is not successful, due to which some of the checks are not successful. |
@VinodSKumar |
91b0206 to
2386c89
Compare
|
@glaringlee @malfet Thank you for your support and guidance, now CI is successful. |
|
@VinodSKumar Can you rebase your code again due to some fixes from our end. Then I am going to approve this change. For TransformerDecoder work. I am currently thinking about what to support regarding the input: In python version, input for both TransformerEncoder and Decoder contains TransformerEncoderLayer and DecoderLayer. If we let the TransformerEncoderOptions and DecoderOptions support using EncoderLayer and DecoderLayer as input, then we need to separate the header file between TransformerEncoder/Decoder and TransformerEncoderLayer/DecoderLayer, since the TransformerEncoderOptions/DecoderOptions need to include TransformerEncoderLayer/DecoderLayer module's header, and this will cause cycle inclusion if we put everything together. The other option is that we don't support passing the TransformerEncoderLayer/DecoderLayer module as an input to TransformerEncoder/Decoder, instead, we pass TransformerEncoderLayerOptions/DecoderLayerOptions in, since anyway, we just clone the passing in EncoderLayer and DecoderLayer, in this case, we don't need to split the header file. I am still thinking about that, if you have opinions, we can discuss here. I will have a draft PR within next 2 days, i will let u know in #37756 Thanks again for your help! |
2386c89 to
91eefcc
Compare
|
@VinodSKumar Thanks for rebase. We might need a rebase later, I will post here. |
Sure, will rebase once the issue in master it is resolved. |
@glaringlee Yes, regarding the decoder implementation, we would need to split the transformer files to avoid cyclic dependency. The down side of using the layer options as parameters, is that the encoder / decoder must be created from scratch. In scenarios where users want to reuse an existing (encoder / decoder) / (encoder layer / decoder layer) they would not be able to do so. If this is applicable in scenarios like transfer learning or continuing training, then we need to honor the support for encoder / decoder layers are parameters. Would it be alright to split the files like transformer_layer, transformer_coder and transformer. |
|
@VinodSKumar |
Let's split the files. You don't need to split this one, I will split it for you after your code is landed, but for TransformerDecoder, please write them into transformerencoder.h for both module and options, cpp file can be one. Currently, even we pass in encoder/decoder layer, people still can not use it, the reason is that in c++ implementation, when you call clone(), what we do currently is to create an empty module, eg, create an empty encoder/decoder layer and re-initialize all its member variables. I am not sure why since this is before I took over the c++ frontend codebase. I guess this is due to the nature of c++, it is not easy to duplicate a module in a generic way, but we don't have any complain on this implementation so far. I talked to the python version owner, he never saw any example that an encoderlayer or a decoderlayer's forward() is called before passing into an TransformerEncoder/TransformerDecoder. So I think we can just split the file for now and provide two constructors for options, one is passing encoderlayer/decoderlayer option (and create a new module), the other one is passing ref of encoderlayer/decoderlayer. Though there is no much difference right now. Your draft is similar with my draft, but for norm, we should use AnyModule. I will have my draft ready on Thursday since I have an urgent PR review tomorrow. Will approve this one after rebase. And if you have slake or email, we can communicate there, otherwise, let use PR [42886] |
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VinodSKumar see my previous comments, thanks
91eefcc to
54a0872
Compare
|
@glaringlee After rebase all the checks in CI have passed. |
|
@VinodSKumar |
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glaringlee has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@glaringlee merged this pull request in 830423b. |
Fixes #37756