Skip to content

Conversation

@VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Oct 31, 2019

Stack from ghstack:

Differential Revision: D18430810

VitalyFedyunin added a commit that referenced this pull request Nov 4, 2019
ghstack-source-id: 896744c
Pull Request resolved: #28991
Copy link
Contributor

@soumith soumith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my request. cc: @gchanan as the other reviewer who requested changes,

@kostmo
Copy link
Member

kostmo commented Dec 12, 2019

CircleCI build failures summary

As of commit ea3ed3d:

  • 3/3 broken upstream at merge base 56de885 (see grid view)
    • You may want to rebase on the viable/strict branch (see its recency history):
      • If your commit is newer than viable/strict, you can try basing on an older, stable commit:
        git fetch viable/strict
        git rebase --onto viable/strict $(git merge-base origin/master HEAD)
        
      • If your commit is older than viable/strict:
        git fetch viable/strict
        git rebase viable/strict
        
  • 0/3 failures introduced in this PR

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

3 upstream failures recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 5 times.


def convert(t):
if convert_to_format is not None and t.dim() == 4:
return t.to(device, dtype if t.is_floating_point() else None, non_blocking, memory_format=convert_to_format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't match the documentation (which says the only case for to with memory-format is the 1-arg case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon, but I read 'This can be called as' as: here examples of calls, but they are not limited to this options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not my reading of it, although I can see why you read it that way.

In particular, before memory_format was introduced, it corresponded exactly to the function signatures:

"to(Device device=None, ScalarType dtype=None, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
"to(ScalarType dtype, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",
"to(Tensor tensor, bool non_blocking=False, bool copy=False, *, MemoryFormat? memory_format=None)",

(remove copy because it's not supported and remove memory_format because we are considering the case before memory_format was introduced).

So, the introduction of memory_format changed this from "these are the supported signatures" to "these are some examples of supported signatures". I think the former is more useful and we should change it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, the example you added (.. function:: to(memory_format=torch.channels_last)) doesn't work because the parsing code hasn't been updated, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should do the following:

  1. Add a memory_format only-overload to the python parsing.
  2. List the valid calls, i.e.:
        .. function:: to(device=None, dtype=None, non_blocking=False, memory_format=None)
        .. function:: to(dtype, non_blocking=False, memory_format=None)
        .. function:: to(tensor, non_blocking=False, memory_format=None)
        .. function:: to(memory_format)

(or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had parsing updated https://github.com/pytorch/pytorch/pull/28991/files#diff-
7a05cfd8eb442889dffd6c3d2e4d0ddcR24

Will update inline and html docs as follow-up PR

the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
dtype and device for all parameters and buffers in this module
memory_format (:class:`torch.memory_format`): the desired memory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you planning to have a reference section for memory_format that you can point to? This description isn't full enough for the long term. (e.g. why only 4D parameters/buffers? -- it's not clear)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as soon as we land new defaults for .clone .to *_like ops I will work on updating docs.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 66f2bba.

@facebook-github-bot facebook-github-bot deleted the gh/VitalyFedyunin/24/head branch December 16, 2019 15:17
xxtEchjovs44 pushed a commit to xxtEchjovs44/pytorch that referenced this pull request Jan 29, 2020
wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
Summary: Pull Request resolved: pytorch#28991

Test Plan: Imported from OSS

Differential Revision: D18430810

Pulled By: VitalyFedyunin

fbshipit-source-id: 0693d4e31fc6f9831722c29fc83517f16ddfc028
@ppwwyyxx
Copy link
Collaborator

This PR adds an API model.to(memory_layout=torch.channels_last), which recursively applies NHWC layout to all 4D parameters in the model.

I think this is making an assumption that "any 4D parameters in the module needs a conversion to NHWC layout, if user wants to use nvidia's NHWC kernels". From what I can see this assumption is limiting in many ways:

  • Not all 4D parameters in a user's custom model means "NCHW". The dimensions could have arbitrary meanings, and changing their memory layout may hurt performance.
  • Not all parameters that are in "NCHW" format needs to be changed to a different format. They need to be changed only in certain layers where Nvidia provides an implementation, e.g. convolution. Imagine a user's custom variant of convolution kernel that may or may not benefit from this layout change.
  • Converting the layout of filters may not be the best strategy. Converting filters from NCHW to NHWC may happen to be what Nvidia now asks for when using NHWC inputs. However it seems possible that the next version of GPUs, or some other runtime library have different requirements. They may for example decide that NHWC inputs works best together with NCHW filters. Aligning the layout of filters together with the layout of inputs, under the same API, doesn't seem like a good idea.

It may be better if such an API:

  • only change layout of weights from certain fixed set of layers that can benefit from this change, e.g. convolution
  • has a more backend-specific name, such as model = optimize_for_cudnn_channels_last(model):
    • It seems to me that having to use NHWC filter layout is merely an optimization specifically made for cudnn, instead of a generally applied strategy.
    • what filter layout is best for NHWC input layout is a detail of cudnn, and should not be exposed to users. The API model.to(memory_layout=) implies that it is changing the layout of weights to NHWC, which is a cudnn requirement that might change in future hardwares.
    • Calling this function should be optional (not sure if it is now). Using NCHW filter layout together with NHWC input should be allowed with a perf penalty on users. With model.to(memory_layout=) it gives an impression that it is required (just like to(dtype) or to(device) has to match)

@dreiss
Copy link
Contributor

dreiss commented Sep 1, 2020

@ppwwyyxx , I totally agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants