Skip to content

Conversation

@pietern
Copy link
Contributor

@pietern pietern commented Nov 4, 2019

Stack from ghstack:

It's never a good idea to throw from a destructor and per #28288 we
can't use std::make_shared on a class with a noexcept(false)
destructor.

To fix this, we abort instead of throw from the NCCLComm destructor.

Closes #28288.

Differential Revision: D18298271

It's never a good idea to throw from a destructor and per #28288 we
can't use `std::make_shared` on a class with a `noexcept(false)`
destructor.

To fix this, we `abort` instead of throw from the `NCCLComm` destructor.

Closes #28288.

Differential Revision: [D18298271](https://our.internmc.facebook.com/intern/diff/D18298271/)

[ghstack-poisoned]
@pietern pietern requested a review from mrshenli as a code owner November 4, 2019 14:39
pietern added a commit that referenced this pull request Nov 4, 2019
It's never a good idea to throw from a destructor and per #28288 we
can't use `std::make_shared` on a class with a `noexcept(false)`
destructor.

To fix this, we `abort` instead of throw from the `NCCLComm` destructor.

Closes #28288.

Differential Revision: [D18298271](https://our.internmc.facebook.com/intern/diff/D18298271/)

ghstack-source-id: 93182910
Pull Request resolved: #29118
@pietern pietern added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Nov 4, 2019
@pietern
Copy link
Contributor Author

pietern commented Nov 5, 2019

@pytorchbot retest this please

if (result != ncclSuccess) { \
std::string err = "NCCL error in: " + std::string(__FILE__) + ":" + \
std::to_string(__LINE__) + ", " + ncclGetErrorWithVersion(result); \
throw std::runtime_error(err); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still used in ~AutoNcclGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine; this change only fixes the std::make_shared compile error reported in #28288.

It would be nice to make all destructors noexcept, but that defies the purpose of a RAII wrapper like AutoNcclGroup. If you can't throw from its destructor, you would need to manually check the result of ncclGroupEnd, and the wrapper is no longer useful.

@pietern
Copy link
Contributor Author

pietern commented Nov 5, 2019

Test failures are unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants