Skip to content

Conversation

@mrshenli
Copy link
Contributor

As pointed out by @ssnl in #20910, when clone destination is different from the module's device,
Cloneable currently calls clone() and then to() on every parameter and buffer, where the first clone is unnecessary.

When clone destination is different from the module's device,
Cloneable currently calls clone() and then to() on every parameter
and buffer, where the first clone is unnecessary. This commit
removes it and directly calls to().
@pytorchbot pytorchbot added the module: cpp Related to C++ API label May 27, 2019
@mrshenli mrshenli mentioned this pull request May 27, 2019
copy->parameters_[parameter.key()].set_data(
device ? data.to(*device) : data);
auto data = device ?
(*parameter).to(*device) :
Copy link
Collaborator

@ssnl ssnl May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. However, I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy. Do c++ api modules have a .to? IMO, they should have one and with optional argument bool copy=false, just like the one in python. If we have a choice, I would prefer adding a .to and deprecating .clone for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssnl C++ module does have a .to API, but it does not support bool copy argument, and will modify the module's own parameters instead of creating a new module. It seems Python's module.to does not support bool copy either. Am I looking at the wrong place?

IIUC, Tensor.towill copy instead of move, and hence Module.clone() still performs a deepcopy after this PR, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes you are totally right! .to is inplace.

So.. hmm how about add a bool flag to .clone? Maybe named deep or force_copy etc? Idk...

Copy link
Contributor Author

@mrshenli mrshenli May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make that change. Let me make sure I understand the requirement. Are you suggesting that for Module.clone(), we are going to support two modes:

  1. deep=True will be same as current Module.clone()'s behavior, which always make a deep copy, i.e., (*parameter).clone() if on the same device and (*parameter).to(device) otherwise.
  2. deep=False, will only make a copy when cloning to a different device.

I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy.

I think I misunderstood your comments yesterday. There was indeed a problem in that version when device is provided but is the same as the current device, where Tensor.to(device) becomes no-op. Fixed in the new commit. It seems some ModuleTest in module.cpp are disabled, let me check. They are enabled, just didn't catch the problem. Let me add a test.

@mrshenli mrshenli changed the title [WIP] Avoid unnecessary tensor clone in Cloneable Avoid unnecessary tensor clone in Cloneable May 28, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrshenli
Copy link
Contributor Author

mrshenli commented Jun 6, 2019

@pytorchbot rebase this please

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! Just one minor comment and we are ready to go.

torch::NoGradGuard no_grad;
torch::Device device(torch::kCUDA, 0);
module->to(device);
auto module2 = module->clone(device);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also have another test for the case where the module's original device and the clone-to device are different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yf225, thanks for reviewing and thanks for the catch! Let me add a test.

@mrshenli
Copy link
Contributor Author

@pytorchbot rebase this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

testDistinctParameters(module, module2);
}

TEST_F(ModuleTest, CloneCreatesDistinctParametersExplicitDevice_MultiCUDA) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem to run in pytorch_linux_xenial_cuda9_cudnn7_py3_multigpu_test (https://circleci.com/gh/pytorch/pytorch/2244426?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link). It might be worthwhile to fix it before landing this PR, to make sure that we are running multi-gpu tests for libtorch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I agree. Just haven't got a chance to work on that yet, and will fix that before landing this PR.

facebook-github-bot pushed a commit that referenced this pull request Jul 26, 2019
Summary:
blocking #20995
Pull Request resolved: #23380

Differential Revision: D16517013

Pulled By: mrshenli

fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837
@mrshenli
Copy link
Contributor Author

@pytorchbot rebase this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in aae4874.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpp Related to C++ API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants