Skip to content

Conversation

@zhangguanheng66
Copy link
Contributor

Summary:
Import MultiheadAttention into the core pytorch framework.
Users now can import MultiheadAttention directly from torch.nn.
See "Attention Is All You Need" for more details related to MultiheadAttention function.

Differential Revision: D14577966

Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a WIP PR. Next time, prefix the title with [WIP] so that reviewers dont end up reviewing prematurely :)

@zhangguanheng66 zhangguanheng66 changed the title Import MultiheadAttention to PyTorch [WIP] Import MultiheadAttention to PyTorch Mar 22, 2019
@zhangguanheng66
Copy link
Contributor Author

Thanks @soumith . Will accommodate your review soon.

@cpuhrsch
Copy link
Contributor

@zhangguanheng66 - just because it deserves explicit mention: since this is a straight-up port from fairseq, before making any major changes to the code, make sure you wrote a ton of tests so we don't get lost.

@cpuhrsch cpuhrsch self-requested a review March 25, 2019 21:26
@zhangguanheng66
Copy link
Contributor Author

zhangguanheng66 commented Mar 27, 2019

@cpuhrsch @soumith Thanks for the feedbacks. I updated the code accordingly. Functions that are not tested have been removed. A unit test is created in test.nn. An additional test was conducted (see D14577966 more details). Instead of using fairseq.MultiheadAttention, torch.nn.MultiheadAttention is applied and the corresponding unit test in pytorch_translate works fine.

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this under the assumption that the most recent comments will also be resolved.

Pinging @soumith in case something is still missing.

@soumith
Copy link
Contributor

soumith commented Mar 28, 2019

if you dont mind, before this lands I'd like to page some folks in the NLP community to be assured that they dont need any more features from this.

cc: @srush @kyunghyuncho @myleott @glample

Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation for forward is still missing, and forward takes a lot of options. Please flesh out.

@zhangguanheng66
Copy link
Contributor Author

@soumith Sure. Let me know if any new features are necessary. I will work on the documentation for forward function at the same time.

@srush @kyunghyuncho @myleott @glample More unit tests on your end are welcome (I have two unit tests on my side).

@kyunghyuncho
Copy link

@mansimov @jasonleeinf care to take a quick look at this PR?

Copy link

@myleott myleott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the logic carefully, but let's test this in fairseq. Ideally we'll replace the multihead attention implementation in fairseq with this one.

@zhangguanheng66
Copy link
Contributor Author

@soumith @myleott Please let me know if you have additional suggestions before landing this PR.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangguanheng66 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Import MultiheadAttention into the core pytorch framework.
Users now can import MultiheadAttention directly from torch.nn.
See "Attention Is All You Need" for more details related to MultiheadAttention function.
Pull Request resolved: pytorch#18334

Differential Revision: D14577966

fbshipit-source-id: b18d945ea461c07948d2f33f5b497ca51591d0ce
@ezyang ezyang changed the title [WIP] Import MultiheadAttention to PyTorch Import MultiheadAttention to PyTorch Apr 11, 2019
@vadimkantorov
Copy link
Contributor

vadimkantorov commented Apr 11, 2019

The doc string is there, but it seems missing the rst autoclass entry @soumith

@facebook-github-bot
Copy link
Contributor

@zhangguanheng66 merged this pull request in 4b20fc8.

@zhangguanheng66 zhangguanheng66 deleted the export-D14577966 branch April 16, 2019 02:00
@tshrjn
Copy link

tshrjn commented May 1, 2019

Missing documentation for attn_mask, add_zero_attn and add_bias_kv.

@zhangguanheng66
Copy link
Contributor Author

zhangguanheng66 commented May 1, 2019

Missing documentation for attn_mask, add_zero_attn and add_bias_kv.

We had a PR to update the attn_mask. See here (#20071).

zhangguanheng66 added a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
Import MultiheadAttention into the core pytorch framework.
Users now can import MultiheadAttention directly from torch.nn.
See "Attention Is All You Need" for more details related to MultiheadAttention function.
Pull Request resolved: pytorch#18334

Differential Revision: D14577966

Pulled By: zhangguanheng66

fbshipit-source-id: 756c0deff623f3780651d9f9a70ce84516c806d3
@deepwilson
Copy link

What is the intuition behind adding "MultiheadAttention" block under activation.py? @zhangguanheng66

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.