Skip to content

Conversation

@myleott
Copy link

@myleott myleott commented Aug 30, 2018

Summary: Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training.

Differential Revision: D9594708

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalization should happen on the coalesced buffers instead of individual parameters

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@soumith
Copy link
Contributor

soumith commented Sep 2, 2018

would be super dope if you added a test for this, so that we dont regress on this in the future.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@myleott
Copy link
Author

myleott commented Sep 8, 2018

@pytorchbot retest this please

@teng-li
Copy link
Contributor

teng-li commented Sep 10, 2018

@myleott agreeing on the above comment, it's super risky to do any DDP change right before our release

@myleott
Copy link
Author

myleott commented Sep 10, 2018

I added the test yesterday :) But also this is a pretty trivial change and without it fp16 distributed training is much much worse, so I definitely think we should get it in before the release.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be important for stability and has a test now, so I'd vote to merge it before the release.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myleott has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

myleott is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…11109)

Summary:
Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training.
Pull Request resolved: #11109

Differential Revision: D9594708

fbshipit-source-id: 95fe299ce2776d664e6652a05f45d9471a80a326
teng-li added a commit to teng-li/pytorch that referenced this pull request Sep 11, 2018
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
…ytorch#11109)

Summary:
Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training.
Pull Request resolved: pytorch#11109

Differential Revision: D9594708

Pulled By: myleott

fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338
@ezyang ezyang added the merged label Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants