Skip to content

Conversation

@wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented May 1, 2019

Stack from ghstack:

Summary:
batchmm is actually a non differentiation optimization pass, it do the graph transformation and replacing mms with prim::BatchMMSide/Reduce, and the registered prim ops will execute the mms, this will go through autograd, not autodiff in fact, so we might can post pone the batchmm pass to right before the fusion pass, this will make the separation of differentiable optimization and non-differentiable optimization more clear, and also serve as the first step to make decompose addmm after the custom fusion pass

Test Plan:
Test that the graph of LSTM graph haven't changed because of this change.

Differential Revision: D15190356

lower batchmm to non-diff optimization

gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head
@wanchaol wanchaol requested review from apaszke, bddppq, bwasti and zdevito May 2, 2019 01:08
lower batchmm to non-diff optimization

gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head
@wanchaol wanchaol changed the title lower batchmm to non-diff optimization [jit] lower batchmm to non-diff optimization May 2, 2019
for (const auto& pass : getCustomPasses()) {
pass(graph);
}
// decomposition pass, decompose certain ops that will be used in the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this comment to the diff above?

wanchaol added 2 commits May 5, 2019 21:15
… optimization"

lower batchmm to non-diff optimization

gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head
lower batchmm to non-diff optimization

gh-metadata: pytorch pytorch 19987 gh/wanchaol/1/head
@zou3519 zou3519 deleted the gh/wanchaol/1/head branch May 6, 2019 23:00
@facebook-github-bot
Copy link
Contributor

@wanchaol merged this pull request in 8fbde94.

@apaszke
Copy link
Contributor

apaszke commented May 7, 2019

This is not the meaning of non-differentiable optimization passes! The point was that after the differentiable optimizations the graph can be still run with autograd enabled, not necessarily be symbolically differentiated. Why did we change this?

@wanchaol
Copy link
Collaborator Author

wanchaol commented May 7, 2019

This is not the meaning of non-differentiable optimization passes! The point was that after the differentiable optimizations the graph can be still run with autograd enabled, not necessarily be symbolically differentiated. Why did we change this?

@apaszke hmmm ok, but I believe the graph after non-differentiation optimization passes can still be run with autograd enabled, the backward graph will go through compileSpec as well and run the fusion, and DifferentiableOp is still running autograd as it needed.

The reason that we need this is because, in the stacked diffs above, we want to lower the addmm/linear decomposition in a latter pass after custom fusion and before fusion, rather than canonicalizeOps (which happens way before custom fusion). This way we could still preserve any addmm decomposition and batchmm happens before fusion, minimize the performance impact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants