-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[jit] split canonicalize_ops, make a decompose pass #19988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
torch/csrc/jit/graph_executor.cpp
Outdated
| DecomposeOps(graph); | ||
| // Autodiff and decompositon pass will replace some parts of graph with new graph | ||
| // these new graphs usually miss shape information on nodes, so we propagate shapes | ||
| // PropagateInputShapes(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some issues on nn.Dropout when enable this, I will need to look into that and will bring the pass back.
…ake a decompose pass" split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
…pass" split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pass changes are good. Can you address the comments about using compilation unit to load code?
| namespace torch { | ||
| namespace jit { | ||
|
|
||
| Value* decomposeOp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this suppose to do? If it is suppose to execute this once, it doesn't work, it is allocating a std::once_flag each time through the function.
refactor:
static std::shared_ptr<Graph> addmm_graph = createAddMMGraph();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer if we added a constructor to compilation unit so this can just be write. I don't want it to end up being the case that everyone copy/pastes this pattern.
static script::CompilationUnit addmm("....");
…t canonicalize_ops, make a decompose pass" split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
torch/csrc/jit/graph_executor.cpp
Outdated
| // these new graphs usually consists of control flow and miss shape information on nodes, so we run shape | ||
| // prop and constant prop, as well as loop unrolling in some cases | ||
| PropagateInputShapes(graph); | ||
| ConstantPropagation(graph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should call peephole optimization as well i think, and is there a reason to not just call into runOptimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runOptimization have bunch of other passes that we don't need to run, just want to minimize the passes that needs to be run, plus it does not run propagateInputShapes, so I just make a separate group of passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you call propagateInputShapes and then call into runOptimization? The downside is that we're now maintaining two separate sets of optimizations, and you might miss optimizations that you think you didn't need (as with prim::is_cuda because PeepholeOptimize wasn't called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds legit. the only concern i was having is that this will increase our compilation time quite a bit, that's why I want separate the passes in the first place. But given that we are potentially missing bunch of optimizations, I will change it to just run the optimization :)
…ass" split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
…it] split canonicalize_ops, make a decompose pass" split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
split canonicalize_ops, make a decompose pass gh-metadata: pytorch pytorch 19988 gh/wanchaol/2/head
Stack from ghstack:
Summary:
This PR enables the
addmmdecomposition happens later (after custom fusion, before batchmm), this will enable the custom fusion to see high level operator informations (such asaddmmandlinear, instead ofmm+add)Test plan:
Test if the decomposition pass works.
Differential Revision: D15190355