-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Avoid unnecessary tensor clone in Cloneable #20995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When clone destination is different from the module's device, Cloneable currently calls clone() and then to() on every parameter and buffer, where the first clone is unnecessary. This commit removes it and directly calls to().
| copy->parameters_[parameter.key()].set_data( | ||
| device ? data.to(*device) : data); | ||
| auto data = device ? | ||
| (*parameter).to(*device) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. However, I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy. Do c++ api modules have a .to? IMO, they should have one and with optional argument bool copy=false, just like the one in python. If we have a choice, I would prefer adding a .to and deprecating .clone for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ssnl C++ module does have a .to API, but it does not support bool copy argument, and will modify the module's own parameters instead of creating a new module. It seems Python's module.to does not support bool copy either. Am I looking at the wrong place?
IIUC, Tensor.towill copy instead of move, and hence Module.clone() still performs a deepcopy after this PR, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes you are totally right! .to is inplace.
So.. hmm how about add a bool flag to .clone? Maybe named deep or force_copy etc? Idk...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can make that change. Let me make sure I understand the requirement. Are you suggesting that for Module.clone(), we are going to support two modes:
deep=Truewill be same as currentModule.clone()'s behavior, which always make a deep copy, i.e.,(*parameter).clone()if on the same device and(*parameter).to(device)otherwise.deep=False, will only make a copy when cloning to a different device.
I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy.
I think I misunderstood your comments yesterday. There was indeed a problem in that version when device is provided but is the same as the current device, where Tensor.to(device) becomes no-op. Fixed in the new commit. It seems some They are enabled, just didn't catch the problem. Let me add a test.ModuleTest in module.cpp are disabled, let me check.
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@pytorchbot rebase this please |
yf225
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Just one minor comment and we are ready to go.
| torch::NoGradGuard no_grad; | ||
| torch::Device device(torch::kCUDA, 0); | ||
| module->to(device); | ||
| auto module2 = module->clone(device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also have another test for the case where the module's original device and the clone-to device are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yf225, thanks for reviewing and thanks for the catch! Let me add a test.
|
@pytorchbot rebase this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| testDistinctParameters(module, module2); | ||
| } | ||
|
|
||
| TEST_F(ModuleTest, CloneCreatesDistinctParametersExplicitDevice_MultiCUDA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't seem to run in pytorch_linux_xenial_cuda9_cudnn7_py3_multigpu_test (https://circleci.com/gh/pytorch/pytorch/2244426?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link). It might be worthwhile to fix it before landing this PR, to make sure that we are running multi-gpu tests for libtorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I agree. Just haven't got a chance to work on that yet, and will fix that before landing this PR.
|
@pytorchbot rebase this please |
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
yf225
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome!
As pointed out by @ssnl in #20910, when clone destination is different from the module's device,
Cloneablecurrently callsclone()and thento()on every parameter and buffer, where the first clone is unnecessary.