Add support for batch_norm fusion to the JIT #15146

apaszke · 2018-12-12T23:28:09Z

We don't support reductions yet, but simply decomposing batch_norm
into a kernel that computes the stats, and the fusing everything else
with ReLU and following pointwise ops provides nice speedups.

Note that this is only limited to inference mode for now, because we
don't support convolutions and batch norm in AD, so the fuser isn't
applied to those parts.

This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard torchvision models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe).

cc @zou3519 @zdevito @mruberry @ngimel

zou3519 · 2018-12-14T03:44:57Z

@apaszke there are some AutodiffSubgraphSlicing tests failing. I'm not sure if those are related to your PR

zou3519

(still reading through, not a full review yet)

zou3519 · 2018-12-19T22:04:54Z

torch/csrc/jit/passes/graph_fuser.cpp

If we chunk the output of batchnorm, then the chunk wouldn't get moved past the pointwise ops of the batchnorm because there isn't an opportunity to decompose the batchnorm, right? I'm not sure if people do this in practice though.

It wouldn't, but I really don't expect this to be the common case, and I don't want to make the code for chunk more complicated for no good reason.

zou3519 · 2018-12-19T22:11:04Z

torch/csrc/jit/passes/graph_fuser.cpp

nit: For some definitions of Fusable, cat and chunk nodes are also fusible, so the naming of this function (isFusable) bothers me a little. Maybe call it something like "isDecomposibleIntoFusibleMap"? (although not all of batchnorm is decomposible into pointwise ops, only the last piece of it is...)

I agree our definition of "fusable" is completely messed up, but clearing that up is a material for another PR. I simply followed whatever we were using.

zou3519 · 2018-12-19T22:23:27Z

torch/csrc/jit/passes/graph_fuser.cpp

Does this work? Doesn't Tensor? mean that the tensor is either defined or an undefined tensor? It's strange that we can write Optional[Tensor] in torchscript for that

Yep, Optional[Tensor] means that it can be undefined on the C++ side 😕

Yikes, thanks for the clarification

zou3519 · 2018-12-19T22:25:42Z

torch/csrc/jit/passes/graph_fuser.cpp

What does ncf stand for?

nit: _ncf_reshape might be a better name because we are reshaping instead of expanding

It's as in NCHW, but it works with any dimension hence F for features instead of HW. I can rename it to reshape if you'd really want to, but I wouldn't like to block this PR on it if that's the only problem.

zou3519 · 2018-12-19T22:28:34Z

test/test_jit.py

Add to TestFuser maybe?

Yep, will move.

zou3519 · 2018-12-19T22:29:39Z

torch/csrc/jit/fuser/codegen.cpp

How does threshold play a role in the batchnorm fusion?

Should this be ${0} <= ${1}? (not sure if this makes a difference)

We should also probably add a correctness test for this to TestFuser

It's an orthogonal improvement to the fuser. Lets us fuse whole blocks between convs in ResNets.

zou3519 · 2018-12-19T22:38:44Z

test/test_jit.py

Can we also add a correctness test that runs with the JIT and checks the output?

Yup, will do.

zou3519

BN changes look fine, I had some minor questions and comments, please read

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519

lgtm!

zou3519 · 2018-12-26T15:09:39Z

@apaszke looks like you might have to rebase this

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2018-12-27T21:20:15Z

@apaszke the test tolerance or the magnitude of the inputs might need to be updated:

Dec 27 18:20:34 ======================================================================
Dec 27 18:20:34 FAIL: test_fuse_batch_norm (__main__.TestFuser)
Dec 27 18:20:34 ----------------------------------------------------------------------
Dec 27 18:20:34 Traceback (most recent call last):
Dec 27 18:20:34   File "test_jit.py", line 10221, in test_fuse_batch_norm
Dec 27 18:20:34     self.assertEqual(out, out_noopt)
Dec 27 18:20:34   File "/var/lib/jenkins/workspace/test/common_utils.py", line 418, in assertEqual
Dec 27 18:20:34     assertTensorsEqual(x, y)
Dec 27 18:20:34   File "/var/lib/jenkins/workspace/test/common_utils.py", line 410, in assertTensorsEqual
Dec 27 18:20:34     self.assertLessEqual(max_err, prec, message)
Dec 27 18:20:34 AssertionError: tensor(1.2243e-05, device='cuda:0') not less than or equal to 1e-05 : 
Dec 27 18:20:34

apaszke · 2018-12-30T17:18:45Z

ROCm failures look unrelated

apaszke · 2018-12-30T20:32:22Z

I forgot to add CPU implementation for batch_norm_update_stats, which is fixed in the latest commit. Please review the ATen changes again!

apaszke · 2019-01-02T18:14:54Z

Red jobs are CI failures.

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zou3519 · 2019-01-02T19:30:13Z

aten/src/ATen/native/Normalization.cpp

momentum is unused in this function; it's only necessary for updating the running stats

zou3519 · 2019-01-02T19:32:54Z

aten/src/ATen/native/Normalization.cpp

This is unused now

zou3519 · 2019-01-02T19:34:16Z

aten/src/ATen/native/Normalization.cpp

Nit: epsilon not used, maybe remove the variable name?

zou3519

LGTM. There are some unused variables, let me know if you want to land this as-is or if you want to clean them up first

zou3519 · 2019-01-03T20:42:09Z

@apaszke this needs a rebase now

We don't support reductions yet, but simply decomposing batch_norm into a kernel that computes the stats, and the fusing everything else with ReLU and following pointwise ops provides nice speedups. Note that this is only limited to inference mode for now, because we don't support convolutions and batch norm in AD, so the fuser isn't applied to those parts.

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: We don't support reductions yet, but simply decomposing batch_norm into a kernel that computes the stats, and the fusing everything else with ReLU and following pointwise ops provides nice speedups. Note that this is only limited to inference mode for now, because we don't support convolutions and batch norm in AD, so the fuser isn't applied to those parts. This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe). cc zou3519 zdevito mruberry ngimel Pull Request resolved: pytorch/pytorch#15146 Differential Revision: D13548303 Pulled By: zou3519 fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a

Summary: Resubmit of #15146, which has been accidentally reverted. Pull Request resolved: #15897 Differential Revision: D13616093 Pulled By: zou3519 fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a

chanil1218 · 2019-06-11T06:44:24Z

@apaszke
Thank you for releasing nice code!
What I understood is that JIT fused batch_norm should only be used at inference time for speed up purpose. Due to the reason that batch norm is not supported in AD.

Is there any update on AD support for batch norm for using fused batch_norm on training time?
Or related issue that I could keep track of the progress of that support?

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Dec 12, 2018

soumith mentioned this pull request Dec 13, 2018

JIT fusion difference between module and functional interfaces #11141

Closed

fmassa mentioned this pull request Dec 17, 2018

[sync BN] #14267

Closed

apaszke force-pushed the bn_fusion branch from 35b8264 to 97516aa Compare December 19, 2018 15:56

apaszke closed this Dec 19, 2018

apaszke reopened this Dec 19, 2018

zou3519 reviewed Dec 19, 2018

View reviewed changes

zou3519 reviewed Dec 20, 2018

View reviewed changes

apaszke force-pushed the bn_fusion branch from 68223b8 to 222e48a Compare December 26, 2018 11:33

facebook-github-bot reviewed Dec 26, 2018

View reviewed changes

zou3519 approved these changes Dec 26, 2018

View reviewed changes

apaszke force-pushed the bn_fusion branch from 70e6083 to 046e6b7 Compare December 27, 2018 15:58

facebook-github-bot reviewed Dec 27, 2018

View reviewed changes

apaszke mentioned this pull request Dec 30, 2018

Simplify cat fusion #15633

Closed

apaszke force-pushed the bn_fusion branch from 2c83507 to 10a6c0d Compare December 30, 2018 20:31

facebook-github-bot reviewed Jan 2, 2019

View reviewed changes

zou3519 reviewed Jan 2, 2019

View reviewed changes

aten/src/ATen/native/Normalization.cpp Outdated

Copy link

Contributor

zou3519 Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused now

zou3519 reviewed Jan 2, 2019

View reviewed changes

aten/src/ATen/native/Normalization.cpp Outdated

Copy link

Contributor

zou3519 Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: epsilon not used, maybe remove the variable name?

zou3519 approved these changes Jan 2, 2019

View reviewed changes

apaszke added 5 commits January 7, 2019 19:35

clang-format complains...

3c6a95f

Adjust precision

4ae6ea2

Implement batch_norm_update_stats for CPU

c1b2bda

Review comments

9818e6d

apaszke force-pushed the bn_fusion branch from 10a6c0d to 9818e6d Compare January 7, 2019 18:35

facebook-github-bot reviewed Jan 7, 2019

View reviewed changes

facebook-github-bot closed this in 5e1b35b Jan 8, 2019

apaszke mentioned this pull request Jan 9, 2019

JIT Batch Norm fusion #15897

Closed

ezyang added open source merged labels Jun 24, 2019

Add support for batch_norm fusion to the JIT #15146

Add support for batch_norm fusion to the JIT #15146

Uh oh!

Conversation

apaszke commented Dec 12, 2018

Uh oh!

zou3519 commented Dec 14, 2018

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 Dec 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Dec 26, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Dec 27, 2018

Uh oh!

apaszke commented Dec 30, 2018

Uh oh!

apaszke commented Dec 30, 2018

Uh oh!

apaszke commented Jan 2, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

zou3519 Dec 19, 2018 •

edited

Loading