Skip to content

Conversation

@markshannon
Copy link
Member

@markshannon markshannon commented Jul 13, 2019

Refactors _PyObject_Vectorcall which (on my machine) reduces the performance regression observed in bpo-37562

https://bugs.python.org/issue37562

@markshannon markshannon changed the title bpo-37562 Remove redundant exception check in _PyObject_Vectorcall bpo-37562 Refactor _PyObject_Vectorcall to improve performance a bit. Jul 13, 2019
@jdemeyer
Copy link
Contributor

Am I right that this is really just inlining _PyVectorcall_Function?

Copy link
Contributor

@jdemeyer jdemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR rewrites some code as completely equivalent code. It happens that this new variant is very slightly faster (about 2 clock cycles per call). Since the old and new code are equivalent, this is just because of certain choices made by the compiler (like register allocation) and not because of a fundamental reason.

In my opinion, the benefit of this PR is too uncertain (maybe other compilers optimize this differently) and the complexity added by this PR is not justified.

@markshannon
Copy link
Member Author

markshannon commented Jul 21, 2019

It is not a completely equivalent. If it were, there would be no change in performance.

The control flow graph of the two formulations differs. The CFG for _PyObject_Vectorcall in this PR is equivalent to splitting the CFG in master at the first test, PyType_HasFeature(tp, _Py_TPFLAGS_HAVE_VECTORCALL).
This split removes the phi-node in the SSA form, making the task of the register allocator easier.

While it is entirely possible that a future compiler will produce as good code for master as for this PR, I know of know of no mechanism that will allow a compiler will produce better code from master than this PR.

As for additional complexity, I really don't see any. This PR adds 6 lines, of which 2 are asserts. IMO, this PR rewrites _PyObject_Vectorcall to be closer to the form specified in PEP 590.

@jdemeyer
Copy link
Contributor

jdemeyer commented Aug 6, 2019

I propose #15144 as simpler alternative.

@markshannon
Copy link
Member Author

I don't see how #15144 is a simpler alternative. It is larger and it doesn't work on Windows.

@markshannon
Copy link
Member Author

markshannon commented Sep 29, 2020

Since bpo-37562 is closed, and this code conflicts, I'm closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants