Avoid unnecessary tensor clone in Cloneable #20995

mrshenli · 2019-05-27T20:10:42Z

As pointed out by @ssnl in #20910, when clone destination is different from the module's device,
Cloneable currently calls clone() and then to() on every parameter and buffer, where the first clone is unnecessary.

When clone destination is different from the module's device, Cloneable currently calls clone() and then to() on every parameter and buffer, where the first clone is unnecessary. This commit removes it and directly calls to().

ssnl · 2019-05-27T20:14:35Z

torch/csrc/api/include/torch/nn/cloneable.h

-      copy->parameters_[parameter.key()].set_data(
-          device ? data.to(*device) : data);
+      auto data = device ?
+          (*parameter).to(*device) :


Thanks for the fix. However, I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy. Do c++ api modules have a .to? IMO, they should have one and with optional argument bool copy=false, just like the one in python. If we have a choice, I would prefer adding a .to and deprecating .clone for it.

@ssnl C++ module does have a .to API, but it does not support bool copy argument, and will modify the module's own parameters instead of creating a new module. It seems Python's module.to does not support bool copy either. Am I looking at the wrong place?

IIUC, Tensor.towill copy instead of move, and hence Module.clone() still performs a deepcopy after this PR, no?

Ahh yes you are totally right! .to is inplace.

So.. hmm how about add a bool flag to .clone? Maybe named deep or force_copy etc? Idk...

Sure, I can make that change. Let me make sure I understand the requirement. Are you suggesting that for Module.clone(), we are going to support two modes:

deep=True will be same as current Module.clone()'s behavior, which always make a deep copy, i.e., (*parameter).clone() if on the same device and (*parameter).to(device) otherwise.

deep=False, will only make a copy when cloning to a different device.

I'm not so sure about changing .clone. The name suggests that it should perform a deepcopy.

I think I misunderstood your comments yesterday. There was indeed a problem in that version when device is provided but is the same as the current device, where Tensor.to(device) becomes no-op. Fixed in the new commit. ~~It seems some ModuleTest in module.cpp are disabled, let me check.~~ They are enabled, just didn't catch the problem. Let me add a test.

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mrshenli · 2019-06-06T22:00:51Z

@pytorchbot rebase this please

yf225

It looks great! Just one minor comment and we are ready to go.

yf225 · 2019-07-16T18:46:40Z

test/cpp/api/module.cpp

+  torch::NoGradGuard no_grad;
+  torch::Device device(torch::kCUDA, 0);
+  module->to(device);
+  auto module2 = module->clone(device);


I think we should also have another test for the case where the module's original device and the clone-to device are different.

Hi @yf225, thanks for reviewing and thanks for the catch! Let me add a test.

mrshenli · 2019-07-22T16:51:53Z

@pytorchbot rebase this please

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yf225 · 2019-07-24T19:37:35Z

test/cpp/api/module.cpp

+  testDistinctParameters(module, module2);
+}
+
+TEST_F(ModuleTest, CloneCreatesDistinctParametersExplicitDevice_MultiCUDA) {


This test doesn't seem to run in pytorch_linux_xenial_cuda9_cudnn7_py3_multigpu_test (https://circleci.com/gh/pytorch/pytorch/2244426?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link). It might be worthwhile to fix it before landing this PR, to make sure that we are running multi-gpu tests for libtorch.

Thanks, I agree. Just haven't got a chance to work on that yet, and will fix that before landing this PR.

Summary: blocking #20995 Pull Request resolved: #23380 Differential Revision: D16517013 Pulled By: mrshenli fbshipit-source-id: 3f44ecf0e8d1e235165f2ce4396795ca38e2d837

mrshenli · 2019-07-26T15:11:22Z

@pytorchbot rebase this please

facebook-github-bot

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yf225

Looks awesome!

facebook-github-bot · 2019-07-26T22:09:22Z

@mrshenli merged this pull request in aae4874.

Avoid unnecessary tensor clone in Cloneable

2849bd3

When clone destination is different from the module's device, Cloneable currently calls clone() and then to() on every parameter and buffer, where the first clone is unnecessary. This commit removes it and directly calls to().

mrshenli requested review from ebetica, goldsborough and yf225 as code owners May 27, 2019 20:10

pytorchbot added the module: cpp Related to C++ API label May 27, 2019

mrshenli mentioned this pull request May 27, 2019

Fix C++ data parallel #20910

Closed

ssnl reviewed May 27, 2019

View reviewed changes

mrshenli added 2 commits May 28, 2019 07:03

always make a deepcopy in Module.clone()

5541704

Add tests for explicitly specifying the same destination clone device

02d0beb

mrshenli changed the title ~~[WIP] Avoid unnecessary tensor clone in Cloneable~~ Avoid unnecessary tensor clone in Cloneable May 28, 2019

facebook-github-bot reviewed May 28, 2019

View reviewed changes

Merge remote-tracking branch 'origin/master' into HEAD

9276ee7

yf225 reviewed Jul 16, 2019

View reviewed changes

pytorchbot and others added 2 commits July 22, 2019 16:51

Merge remote-tracking branch 'origin/master' into HEAD

6f75eba

address comments

49a04a4

facebook-github-bot reviewed Jul 22, 2019

View reviewed changes

mrshenli mentioned this pull request Jul 22, 2019

Support converting Python number to IValue in pybind_utils.h #22817

Closed

yf225 reviewed Jul 24, 2019

View reviewed changes

mrshenli mentioned this pull request Jul 25, 2019

Enable cpp api test in multigpu-test.sh #23380

Closed

Merge remote-tracking branch 'origin/master' into HEAD

831f21d

facebook-github-bot reviewed Jul 26, 2019

View reviewed changes

yf225 approved these changes Jul 26, 2019

View reviewed changes

facebook-github-bot closed this in aae4874 Jul 26, 2019

facebook-github-bot added the merged label Jul 26, 2019

mruberry added the Merged label Oct 28, 2020

Avoid unnecessary tensor clone in Cloneable #20995

Avoid unnecessary tensor clone in Cloneable #20995

Uh oh!

Conversation

mrshenli commented May 27, 2019

Uh oh!

ssnl May 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrshenli May 27, 2019

Choose a reason for hiding this comment

Uh oh!

ssnl May 28, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli May 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Jun 6, 2019

Uh oh!

yf225 left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli Jul 22, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Jul 22, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 Jul 24, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli Jul 25, 2019

Choose a reason for hiding this comment

Uh oh!

mrshenli commented Jul 26, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ssnl May 27, 2019 •

edited

Loading

mrshenli May 28, 2019 •

edited

Loading