Skip to content

Conversation

@zdevito
Copy link
Contributor

@zdevito zdevito commented May 31, 2019

Stack from ghstack:

Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

  • every Operation needs to return an int because a few ops branch
  • aten::wait has to communicate with the core interpreter loop via exceptions.
  • The way we encode Loop is a complicated desugaring because implementing
    it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:

  • Instructions are now 64-bit words with an opcode, and two arguments
  • The var-arg register push/pop/move parts of the instructions are gone.
    Explicit instructions LOAD/STORE/MOVE now encode this logic.
  • LOOP/WAIT are directly implemented as operators. We no longer desugar.
  • We no longer need to lower Param and Return nodes, avoiding one
    preprocessing pass.
  • Debug info is stored off to the side since now multiple Instructions
    may correspond to one node in the graph.

Future:

  • This patch should not be merged on its own. A followup will add a
    pass that optimizes the register push/pop logic to cut down the
    number of raw instructions. The pretty printer already does something
    similar to inline instructions.
  • The follow up patch will require performance testing before merging
    to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

Differential Revision: D15572818

Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py
@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 31, 2019
zdevito added 2 commits May 31, 2019 10:00
Prepare interpreter for function calling
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Copy link
Member

@suo suo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I really like how much easier the code is to follow now

_(OP, "O") /* invoke operator X */ \
_(LOAD, "R") /* push a value from a register X */ \
_(MOVE, "R") /* push a value from register X, clearing the register */ \
_(STOREN, "RI") /* store N values to registers [X, X+N) */ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to have both STOREN and STORE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably STORE is slightly faster for storing a single element and almost everything thing returns one argument.

_(DROP, "") /* drop 1 value from the top of the stack */ \
_(DROPR, "R") /* clear register X */ \
_(LOADC, "C") /* push the constant X */ \
_(JF, "P") /* pop the top of the stack, if false, branch to P */ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"jump to X"
also, can we call this JMP_IF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to JMP_IF_FALSE.

};

// we are suspending, so we need to reset the stack to where we
// started if it started empty, except for the inputs we can avoid a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two separate comments got joined here

std::vector<IValue> constant_table_;
std::vector<Operation> operator_table_;
int register_size_ = 0;
size_t n_outputs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem unnecessary, can we not just get these values from graph_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general idea is to not rely on having the graph_ except for debugging, making it clearer what the interpreter needs from the graph.

}
return false;
case WAIT:
auto future = stack.back().toFuture();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing because we have a member named future which is shadowed here. Can we rename the member to future_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the followup PR already does this.

@ezyang ezyang added facebook and removed facebook labels Jun 5, 2019
Prepare interpreter for function calling
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
Prepare interpreter for function calling
Summary: In order to support calling functions without inlining,
the interpreter will need to push another function 'frame' somewhere
when we invoke the call operator. This behavior is not easily expressible
in the current interpreter since ops can only push/pop to the stack.

Furthermore, we have accumulated several other warts because of the design:

* every Operation needs to return an int because a few ops branch
* aten::wait has to communicate with the core interpreter loop via exceptions.
* The way we encode Loop is a complicated desugaring because implementing
  it as an operator that can only push/pop the stack is hard.

To enable an easy implementation of the call instruction, and to fix the
above warts this patch changes the format of the interpreter instructions
to make it easier to encode arbitrary functionality.

Notes:
* Instructions are now 64-bit words with an opcode, and two arguments
* The var-arg register push/pop/move parts of the instructions are gone.
  Explicit instructions LOAD/STORE/MOVE now encode this logic.
* LOOP/WAIT are directly implemented as operators. We no longer desugar.
* We no longer need to lower Param and Return nodes, avoiding one
  preprocessing pass.
* Debug info is stored off to the side since now multiple Instructions
  may correspond to one node in the graph.

Future:
* This patch should not be merged on its own. A followup will add a
  pass that optimizes the register push/pop logic to cut down the
  number of raw instructions. The pretty printer already does something
  similar to inline instructions.
* The follow up patch will require performance testing before merging
  to make sure this does not regress interpreter overhead.

Test Plan: test_jit.py

gh-metadata: pytorch pytorch 21185 gh/zdevito/43/head
@zou3519 zou3519 deleted the gh/zdevito/43/head branch June 8, 2019 03:59
@facebook-github-bot
Copy link
Contributor

@zdevito merged this pull request in c53e4d0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants