report an error if num_channels is not divisible by num_groups for nn.GroupNorm by XiaobingSuper · Pull Request #74293 · pytorch/pytorch

XiaobingSuper · 2022-03-16T12:44:04Z

For a GroupNorm module, if num_channels is not divisible by num_groups, we need to report an error when defining a module other than at the running step.

example:

import torch
m = torch.nn.GroupNorm(5, 6)
x = torch.randn(1, 6, 4, 4)
y = m(x)

before:

Traceback (most recent call last):
  File "group_norm_test.py", line 8, in <module>
    y = m(x)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 271, in forward
    input, self.num_groups, self.weight, self.bias, self.eps)
  File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/functional.py", line 2500, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [1, 6, 4, 4] and num_groups=5

after:

Traceback (most recent call last):
  File "group_norm_test.py", line 6, in <module>
    m = torch.nn.GroupNorm(5, 6)
  File "/home/xiaobinz/miniconda3/envs/pytorch_test/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 251, in __init__
    raise ValueError('num_channels must be divisible by num_groups')

This PR also update the doc of num_groups.

pytorch-bot · 2022-03-16T12:44:08Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/XiaobingSuper/pytorch/blob/8851a0c27072d3c91c5befc39bb04d86816747ad/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-bionic-rocm4.5-py3.7	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/rocm`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
macos-arm64-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-arm64-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
macos-binary-conda	`ciflow/binaries`, `ciflow/binaries_conda`, `ciflow/default`	✅ triggered
macos-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
macos-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
windows-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries_libtorch`, `ciflow/default`	✅ triggered
windows-binary-wheel	`ciflow/binaries`, `ciflow/binaries_wheel`, `ciflow/default`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/scheduled`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.3-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`, `ciflow/xla`	🚫 skipped

facebook-github-bot · 2022-03-16T12:44:10Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74293
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 6b4b676 (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 2f24a85 on Mar 16 from 5:50pm to 6:32pm

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Lint / quick-checks on Mar 16 from 5:50pm to 6:32pm (2f24a85 - 495e69e)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jbschlosser

Thanks for the fix! I'm good with this.

Do you mind rebasing on master - that should hopefully fix the failing checks.

…efine nn.GroupNorm

jbschlosser · 2022-03-16T19:28:45Z

@XiaobingSuper Looks like the entry for torch.nn.quantized.GroupNorm in test/test_module_init.py is wrong since it passes (2, 3) for (num_groups, num_channels). I think it should be an easy fix to pass (2, 4) instead.

jbschlosser · 2022-03-17T13:36:35Z

@pytorchbot merge this please

github-actions · 2022-03-17T13:41:38Z

Hey @XiaobingSuper.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

….GroupNorm (#74293) Summary: For a GroupNorm module, if num_channels is not divisible by num_groups, we need to report an error when defining a module other than at the running step. example: ``` import torch m = torch.nn.GroupNorm(5, 6) x = torch.randn(1, 6, 4, 4) y = m(x) ``` before: ``` Traceback (most recent call last): File "group_norm_test.py", line 8, in <module> y = m(x) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1111, in _call_impl return forward_call(*input, **kwargs) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 271, in forward input, self.num_groups, self.weight, self.bias, self.eps) File "/home/xiaobinz/miniconda3/envs/pytorch_mater/lib/python3.7/site-packages/torch/nn/functional.py", line 2500, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: Expected number of channels in input to be divisible by num_groups, but got input of shape [1, 6, 4, 4] and num_groups=5 ``` after: ``` Traceback (most recent call last): File "group_norm_test.py", line 6, in <module> m = torch.nn.GroupNorm(5, 6) File "/home/xiaobinz/miniconda3/envs/pytorch_test/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 251, in __init__ raise ValueError('num_channels must be divisible by num_groups') ``` This PR also update the doc of num_groups. Pull Request resolved: #74293 Approved by: https://github.com/jbschlosser Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4a0f6e6c530b624bb5a4fbcaebe3fd43b1ff66c3 Reviewed By: malfet, osalpekar Differential Revision: D34968521 fbshipit-source-id: dde19a3f642924cbd851ee59c6f52aa4ee37cf0c

osalpekar · 2022-03-18T22:31:24Z

Hey @XiaobingSuper, thanks for making this change! It seems like this PR may have broken some tests in the PyTorch Opacus project, however (see tests here: https://github.com/pytorch/opacus/tree/main/opacus/tests). We'd love it if you could help address some of these test failures. Looping in @JohnlNguyen from the Opacus team who could provide some context.

Going forward, we'd love to find a way to provide this signal to you in advance.

JohnlNguyen · 2022-03-18T23:06:47Z

Hey @XiaobingSuper, Do you mind sending in the fix for Opacus?

The issue is here https://github.com/pytorch/opacus/blob/6a3e9bd99dca314596bc0313bb4241eac7c9a5d0/opacus/validators/batch_norm.py#L88

I think it should be a quick fix.

XiaobingSuper · 2022-03-21T01:10:14Z

@JohnlNguyen

Hey @XiaobingSuper, Do you mind sending in the fix for Opacus?

The issue is here https://github.com/pytorch/opacus/blob/6a3e9bd99dca314596bc0313bb4241eac7c9a5d0/opacus/validators/batch_norm.py#L88

I think it should be a quick fix.

Yes, there has a fix for Opacus: meta-pytorch/opacus#390

The main root cause was that torch.nn.GroupNorm changed its behavior on newer pyTorch versions (pytorch/pytorch#74293). this PR fixes the RetinaNetHead according to the new behavior Also, add_export_config was not exposed when caffe2 was not compiled along with pytorch, preventing users with legacy scripts to call it. This PR exposes it and adds a warning message informing users about its deprecation on future versions This PR also adds torch_export_onnx.py with several unit tests for detectron2 models. ONNX is added as dependency to the CI to allow the aforementioned tests to run Lastly, because detectron2 CI still uses old pytorch versions (1.8, 1.9 and 1.10), this PR adds custom symbolic functions to fix bugs on these versions

Refer to pytorch/pytorch#74293

XiaobingSuper requested review from albanD and jbschlosser as code owners March 16, 2022 12:44

pytorch-bot bot added the ciflow/default label Mar 16, 2022

facebook-github-bot added the cla signed label Mar 16, 2022

pytorchbot added the open source label Mar 16, 2022

jbschlosser approved these changes Mar 16, 2022

View reviewed changes

report an error if num_channels is not divisible by num_groups when d…

edbcb36

…efine nn.GroupNorm

XiaobingSuper force-pushed the xiaobing/group_norm_doc branch from 8851a0c to edbcb36 Compare March 16, 2022 13:48

fix test_nn.py error

7f60ddc

fix test_module_init.py error

6b4b676

XiaobingSuper force-pushed the xiaobing/group_norm_doc branch from daeda5b to 6b4b676 Compare March 17, 2022 01:01

pytorchmergebot closed this in 4a0f6e6 Mar 17, 2022

pmeier mentioned this pull request Mar 18, 2022

test/test_models::test_mobile_net_norm_layer is broken pytorch/vision#5642

Closed

gchanan added the module: bc-breaking Related to a BC-breaking change label Mar 24, 2022

thiagocrepaldi pushed a commit to thiagocrepaldi/detectron2 that referenced this pull request Jul 22, 2022

Fix BC issue with PyTorch's master torch.nn.GroupNorm

33f1a67

Refer to pytorch/pytorch#74293

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

report an error if num_channels is not divisible by num_groups for nn.GroupNorm#74293

report an error if num_channels is not divisible by num_groups for nn.GroupNorm#74293
XiaobingSuper wants to merge 3 commits intopytorch:masterfrom
XiaobingSuper:xiaobing/group_norm_doc

XiaobingSuper commented Mar 16, 2022

Uh oh!

pytorch-bot bot commented Mar 16, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 16, 2022 •

edited

Loading

Uh oh!

jbschlosser left a comment

Uh oh!

jbschlosser commented Mar 16, 2022

Uh oh!

jbschlosser commented Mar 17, 2022

Uh oh!

github-actions bot commented Mar 17, 2022

Uh oh!

osalpekar commented Mar 18, 2022

Uh oh!

JohnlNguyen commented Mar 18, 2022

Uh oh!

XiaobingSuper commented Mar 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

XiaobingSuper commented Mar 16, 2022

Uh oh!

pytorch-bot bot commented Mar 16, 2022

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Mar 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

jbschlosser commented Mar 16, 2022

Uh oh!

jbschlosser commented Mar 17, 2022

Uh oh!

github-actions bot commented Mar 17, 2022

Uh oh!

osalpekar commented Mar 18, 2022

Uh oh!

JohnlNguyen commented Mar 18, 2022

Uh oh!

XiaobingSuper commented Mar 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

facebook-github-bot commented Mar 16, 2022 •

edited

Loading