-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Reduce number of stack manipulation instructions in interpreter. #21240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…reter." unfinished push/pop reduction gh-metadata: pytorch pytorch 21240 gh/zdevito/45/head
…reter." unfinished push/pop reduction gh-metadata: pytorch pytorch 21240 gh/zdevito/45/head
suo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting changes to clear my queue, pls re-request when it's ready
…reter." unfinished push/pop reduction gh-metadata: pytorch pytorch 21240 gh/zdevito/45/head
| // and we can short circuit doing many instructions here | ||
| // by either clearing the register (DROPR) or just popping the stack | ||
| // (DROP) | ||
| if (preprocess_.can_emit_inline[input->node()]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment about what the significant of the check to can_emit_inline is here
…reter." unfinished push/pop reduction gh-metadata: pytorch pytorch 21240 gh/zdevito/45/head
…reter." unfinished push/pop reduction gh-metadata: pytorch pytorch 21240 gh/zdevito/45/head
Reduce the number of stack manipulation instructions by finding places where we can operate directly on the stack.
Stack from ghstack:
For TestScript, this reduces the total number of instructions emitted across the test suite from 50k to 37k.
I also used the following script:
To time the interpreter overhead using the new approach compared to the old approach. Both performed with no measurable performance differences, suggesting the overheads lie not in the interpretation but in other aspects (e.g. refcounting, std::function invocation).