[jit] lower batchmm to non-diff optimization #19987

wanchaol · 2019-05-01T02:19:18Z

Stack from ghstack:

[jit] dispatch and expose linear op #20039 dispatch and expose linear op
[jit] split canonicalize_ops, make a decompose pass #19988 split canonicalize_ops, make a decompose pass
[jit] lower batchmm to non-diff optimization #19987 lower batchmm to non-diff optimization

Summary:
batchmm is actually a non differentiation optimization pass, it do the graph transformation and replacing mms with prim::BatchMMSide/Reduce, and the registered prim ops will execute the mms, this will go through autograd, not autodiff in fact, so we might can post pone the batchmm pass to right before the fusion pass, this will make the separation of differentiable optimization and non-differentiable optimization more clear, and also serve as the first step to make decompose addmm after the custom fusion pass

Test Plan:
Test that the graph of LSTM graph haven't changed because of this change.

Differential Revision: D15190356

lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

bwasti · 2019-05-03T20:41:30Z

torch/csrc/jit/graph_executor.cpp

    for (const auto& pass : getCustomPasses()) {
      pass(graph);
    }
+    // decomposition pass, decompose certain ops that will be used in the following


can you move this comment to the diff above?

… optimization" lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

facebook-github-bot · 2019-05-06T23:35:26Z

@wanchaol merged this pull request in 8fbde94.

apaszke · 2019-05-07T10:58:15Z

This is not the meaning of non-differentiable optimization passes! The point was that after the differentiable optimizations the graph can be still run with autograd enabled, not necessarily be symbolically differentiated. Why did we change this?

wanchaol · 2019-05-07T17:56:39Z

This is not the meaning of non-differentiable optimization passes! The point was that after the differentiable optimizations the graph can be still run with autograd enabled, not necessarily be symbolically differentiated. Why did we change this?

@apaszke hmmm ok, but I believe the graph after non-differentiation optimization passes can still be run with autograd enabled, the backward graph will go through compileSpec as well and run the fusion, and DifferentiableOp is still running autograd as it needed.

The reason that we need this is because, in the stacked diffs above, we want to lower the addmm/linear decomposition in a latter pass after custom fusion and before fusion, rather than canonicalizeOps (which happens way before custom fusion). This way we could still preserve any addmm decomposition and batchmm happens before fusion, minimize the performance impact

lower batchmm to non-diff optimization

fa583c9

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 1, 2019

This was referenced May 1, 2019

[jit] split canonicalize_ops, make a decompose pass #19988

Closed

move batchnorm and layernorm fusion to decomposition #19989

Closed

[jit] dispatch and expose linear op #20039

Closed

Update on "lower batchmm to non-diff optimization"

34d57c5

lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

wanchaol requested review from apaszke, bddppq, bwasti and zdevito May 2, 2019 01:08

replace inlineGraph on "lower batchmm to non-diff optimization"

553d501

lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

wanchaol changed the title ~~lower batchmm to non-diff optimization~~ [jit] lower batchmm to non-diff optimization May 2, 2019

wanchaol mentioned this pull request May 3, 2019

[jit] remove the addmm decomposation #19619

Closed

bwasti approved these changes May 3, 2019

View reviewed changes

wanchaol added 2 commits May 5, 2019 21:15

reorg code, add linear AD formula on "[jit] lower batchmm to non-diff…

24bb833

… optimization" lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

Update on "[jit] lower batchmm to non-diff optimization"

ecee6e7

lower batchmm to non-diff optimization gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head

zdevito approved these changes May 6, 2019

View reviewed changes

facebook-github-bot closed this in 8fbde94 May 6, 2019

zou3519 deleted the gh/wanchaol/1/head branch May 6, 2019 23:00

facebook-github-bot added the merged label May 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[jit] lower batchmm to non-diff optimization #19987

[jit] lower batchmm to non-diff optimization #19987

Uh oh!

wanchaol commented May 1, 2019 •

edited

Loading

Uh oh!

bwasti May 3, 2019

Uh oh!

facebook-github-bot commented May 6, 2019

Uh oh!

apaszke commented May 7, 2019

Uh oh!

wanchaol commented May 7, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[jit] lower batchmm to non-diff optimization #19987

[jit] lower batchmm to non-diff optimization #19987

Uh oh!

Conversation

wanchaol commented May 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwasti May 3, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 6, 2019

Uh oh!

apaszke commented May 7, 2019

Uh oh!

wanchaol commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

wanchaol commented May 1, 2019 •

edited

Loading

wanchaol commented May 7, 2019 •

edited

Loading