-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Prepare interpreter for function calling #21185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py
Prepare interpreter for function calling Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
suo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I really like how much easier the code is to follow now
| _(OP, "O") /* invoke operator X */ \ | ||
| _(LOAD, "R") /* push a value from a register X */ \ | ||
| _(MOVE, "R") /* push a value from register X, clearing the register */ \ | ||
| _(STOREN, "RI") /* store N values to registers [X, X+N) */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful to have both STOREN and STORE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably STORE is slightly faster for storing a single element and almost everything thing returns one argument.
| _(DROP, "") /* drop 1 value from the top of the stack */ \ | ||
| _(DROPR, "R") /* clear register X */ \ | ||
| _(LOADC, "C") /* push the constant X */ \ | ||
| _(JF, "P") /* pop the top of the stack, if false, branch to P */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"jump to X"
also, can we call this JMP_IF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to JMP_IF_FALSE.
| }; | ||
|
|
||
| // we are suspending, so we need to reset the stack to where we | ||
| // started if it started empty, except for the inputs we can avoid a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two separate comments got joined here
| std::vector<IValue> constant_table_; | ||
| std::vector<Operation> operator_table_; | ||
| int register_size_ = 0; | ||
| size_t n_outputs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem unnecessary, can we not just get these values from graph_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general idea is to not rely on having the graph_ except for debugging, making it clearer what the interpreter needs from the graph.
| } | ||
| return false; | ||
| case WAIT: | ||
| auto future = stack.back().toFuture(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing because we have a member named future which is shadowed here. Can we rename the member to future_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the followup PR already does this.
Prepare interpreter for function calling Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling Summary: In order to support calling functions without inlining, the interpreter will need to push another function 'frame' somewhere when we invoke the call operator. This behavior is not easily expressible in the current interpreter since ops can only push/pop to the stack. Furthermore, we have accumulated several other warts because of the design: * every Operation needs to return an int because a few ops branch * aten::wait has to communicate with the core interpreter loop via exceptions. * The way we encode Loop is a complicated desugaring because implementing it as an operator that can only push/pop the stack is hard. To enable an easy implementation of the call instruction, and to fix the above warts this patch changes the format of the interpreter instructions to make it easier to encode arbitrary functionality. Notes: * Instructions are now 64-bit words with an opcode, and two arguments * The var-arg register push/pop/move parts of the instructions are gone. Explicit instructions LOAD/STORE/MOVE now encode this logic. * LOOP/WAIT are directly implemented as operators. We no longer desugar. * We no longer need to lower Param and Return nodes, avoiding one preprocessing pass. * Debug info is stored off to the side since now multiple Instructions may correspond to one node in the graph. Future: * This patch should not be merged on its own. A followup will add a pass that optimizes the register push/pop logic to cut down the number of raw instructions. The pretty printer already does something similar to inline instructions. * The follow up patch will require performance testing before merging to make sure this does not regress interpreter overhead. Test Plan: test_jit.py gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Stack from ghstack:
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.
Furthermore, we have accumulated several other warts because of the design:
it as an operator that can only push/pop the stack is hard.
To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.
Notes:
Explicit instructions LOAD/STORE/MOVE now encode this logic.
preprocessing pass.
may correspond to one node in the graph.
Future:
pass that optimizes the register push/pop logic to cut down the
number of raw instructions. The pretty printer already does something
similar to inline instructions.
to make sure this does not regress interpreter overhead.
Test Plan: test_jit.py
Differential Revision: D15572818