Skip to content

report an error if num_channels is not divisible by num_groups for nn.GroupNorm#74293

Closed
XiaobingSuper wants to merge 3 commits intopytorch:masterfrom
XiaobingSuper:xiaobing/group_norm_doc
Closed

report an error if num_channels is not divisible by num_groups for nn.GroupNorm#74293
XiaobingSuper wants to merge 3 commits intopytorch:masterfrom
XiaobingSuper:xiaobing/group_norm_doc

Conversation

@XiaobingSuper
Copy link
Copy Markdown
Collaborator

For a GroupNorm module, if num_channels is not divisible by num_groups, we need to report an error when defining a module other than at the running step.

example:

import torch
m = torch.nn.GroupNorm(5, 6)
x = torch.randn(1, 6, 4, 4)
y = m(x)

before:

Traceback (most recent call last):
  File "group_norm_test.py", line 8, in <module>
    y = m(x)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 271, in forward
    input, self.num_groups, self.weight, self.bias, self.eps)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/functional.py", line 2500, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [1, 6, 4, 4] and num_groups=5

after:

Traceback (most recent call last):
  File "group_norm_test.py", line 6, in <module>
    m = torch.nn.GroupNorm(5, 6)
  File "/home/xiaobinz/miniconda3/envs/pytorch_test/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 251, in __init__
    raise ValueError('num_channels must be divisible by num_groups')

This PR also update the doc of num_groups.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 16, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/XiaobingSuper/pytorch/blob/8851a0c27072d3c91c5befc39bb04d86816747ad/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 16, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 6b4b676 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Copy Markdown
Contributor

@jbschlosser jbschlosser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! I'm good with this.

Do you mind rebasing on master - that should hopefully fix the failing checks.

@XiaobingSuper XiaobingSuper force-pushed the xiaobing/group_norm_doc branch from 8851a0c to edbcb36 Compare March 16, 2022 13:48
@jbschlosser
Copy link
Copy Markdown
Contributor

@XiaobingSuper Looks like the entry for torch.nn.quantized.GroupNorm in test/test_module_init.py is wrong since it passes (2, 3) for (num_groups, num_channels). I think it should be an easy fix to pass (2, 4) instead.

@XiaobingSuper XiaobingSuper force-pushed the xiaobing/group_norm_doc branch from daeda5b to 6b4b676 Compare March 17, 2022 01:01
@jbschlosser
Copy link
Copy Markdown
Contributor

@pytorchbot merge this please

@github-actions
Copy link
Copy Markdown
Contributor

Hey @XiaobingSuper.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request Mar 17, 2022
….GroupNorm (#74293)

Summary:
For a GroupNorm module, if num_channels is not divisible by num_groups, we need to report an error when defining a module other than at the running step.

example:
```
import torch
m = torch.nn.GroupNorm(5, 6)
x = torch.randn(1, 6, 4, 4)
y = m(x)
```
before:

```
Traceback (most recent call last):
  File "group_norm_test.py", line 8, in <module>
    y = m(x)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 271, in forward
    input, self.num_groups, self.weight, self.bias, self.eps)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/functional.py", line 2500, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [1, 6, 4, 4] and num_groups=5
```

after:

```
Traceback (most recent call last):
  File "group_norm_test.py", line 6, in <module>
    m = torch.nn.GroupNorm(5, 6)
  File "/home/xiaobinz/miniconda3/envs/pytorch_test/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 251, in __init__
    raise ValueError('num_channels must be divisible by num_groups')
```

This PR also update the doc of num_groups.

Pull Request resolved: #74293
Approved by: https://github.com/jbschlosser

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4a0f6e6c530b624bb5a4fbcaebe3fd43b1ff66c3

Reviewed By: malfet, osalpekar

Differential Revision: D34968521

fbshipit-source-id: dde19a3f642924cbd851ee59c6f52aa4ee37cf0c
@osalpekar
Copy link
Copy Markdown
Contributor

Hey @XiaobingSuper, thanks for making this change! It seems like this PR may have broken some tests in the PyTorch Opacus project, however (see tests here: https://github.com/pytorch/opacus/tree/main/opacus/tests). We'd love it if you could help address some of these test failures. Looping in @JohnlNguyen from the Opacus team who could provide some context.

Going forward, we'd love to find a way to provide this signal to you in advance.

@JohnlNguyen
Copy link
Copy Markdown

@XiaobingSuper
Copy link
Copy Markdown
Collaborator Author

@gchanan gchanan added the module: bc-breaking Related to a BC-breaking change label Mar 24, 2022
thiagocrepaldi pushed a commit to thiagocrepaldi/detectron2 that referenced this pull request May 27, 2022
The main root cause was that torch.nn.GroupNorm changed its behavior on
newer pyTorch versions (pytorch/pytorch#74293).
this PR fixes the RetinaNetHead according to the new behavior

Also, add_export_config was not exposed when caffe2
was not compiled along with pytorch, preventing users with legacy
scripts to call it. This PR exposes it and adds a warning message
informing users about its deprecation on future versions

This PR also adds torch_export_onnx.py with several unit tests for
detectron2 models.

ONNX is added as dependency to the CI to allow the
aforementioned tests to run

Lastly, because detectron2 CI still uses old pytorch versions (1.8, 1.9
and 1.10), this PR adds custom symbolic functions to fix bugs on these
versions
thiagocrepaldi pushed a commit to thiagocrepaldi/detectron2 that referenced this pull request May 31, 2022
The main root cause was that torch.nn.GroupNorm changed its behavior on
newer pyTorch versions (pytorch/pytorch#74293).
this PR fixes the RetinaNetHead according to the new behavior

Also, add_export_config was not exposed when caffe2
was not compiled along with pytorch, preventing users with legacy
scripts to call it. This PR exposes it and adds a warning message
informing users about its deprecation on future versions

This PR also adds torch_export_onnx.py with several unit tests for
detectron2 models.

ONNX is added as dependency to the CI to allow the
aforementioned tests to run

Lastly, because detectron2 CI still uses old pytorch versions (1.8, 1.9
and 1.10), this PR adds custom symbolic functions to fix bugs on these
versions
thiagocrepaldi pushed a commit to thiagocrepaldi/detectron2 that referenced this pull request Jul 15, 2022
The main root cause was that torch.nn.GroupNorm changed its behavior on
newer pyTorch versions (pytorch/pytorch#74293).
this PR fixes the RetinaNetHead according to the new behavior

Also, add_export_config was not exposed when caffe2
was not compiled along with pytorch, preventing users with legacy
scripts to call it. This PR exposes it and adds a warning message
informing users about its deprecation on future versions

This PR also adds torch_export_onnx.py with several unit tests for
detectron2 models.

ONNX is added as dependency to the CI to allow the
aforementioned tests to run

Lastly, because detectron2 CI still uses old pytorch versions (1.8, 1.9
and 1.10), this PR adds custom symbolic functions to fix bugs on these
versions
thiagocrepaldi pushed a commit to thiagocrepaldi/detectron2 that referenced this pull request Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants