bpo-31336: Speed up type creation, which is highly dominated by slow dict lookups.#3279
Conversation
Objects/typeobject.c
Outdated
|
|
||
| res = NULL; | ||
| /* keep a strong reference to mro because type->tp_mro can be replaced | ||
| during PyDict_GetItem(dict, name) */ |
| dict = ((PyTypeObject *)base)->tp_dict; | ||
| assert(dict && PyDict_Check(dict)); | ||
| res = PyDict_GetItem(dict, name); | ||
| res = _PyDict_GetItem_KnownHash(dict, name, hash); |
There was a problem hiding this comment.
PyDict_GetItem() always clears errors. Call PyErr_Clear() if _PyDict_GetItem_KnownHash() returns NULL.
There was a problem hiding this comment.
I don't see well the difference between PyDict_GetItem() and _PyDict_GetItem_KnownHash() in term of performance. Maybe the difference is that PyDict_GetItem() calls PyErr_Fetch/PyErr_Restore. In that case, PyDict_GetItemWithError() would have the same speed, no?
There was a problem hiding this comment.
Using PyDict_GetItemWithError() adds only a half of the speed up.
The difference is:
- PyDict_GetItem() calls PyThreadState_GET/PyErr_Fetch/PyErr_Restore.
- PyDict_GetItem() checks for str and reads a hash every time. I don't understand well why this affects performance.
There was a problem hiding this comment.
KnownHash is extremely short in comparison and probably gets inlined and streamlined with LTO. Substantially less branching.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…hained name lookups in _PyType_Lookup().
|
Nice catches, but certainly, I didn't expect the Spanish Inquisition! |
|
Nobody expects the Spanish Inquisition! @serhiy-storchaka: please review the changes made to this pull request. |
vstinner
left a comment
There was a problem hiding this comment.
I would prefer to use the _Py_IDENTIFIER API rather than using _PyDict_GetItem_KnownHash().
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
…m update_one_slot().
…r ceases to be a valid base type, there'll probably be larger code sections to change than this one.
|
I agree that using the Instead, I followed Naokis idea of spliting |
Objects/typeobject.c
Outdated
| } | ||
| do { | ||
| descr = _PyType_Lookup(type, p->name_strobj); | ||
| descr = _PyType_LookupUncached(type, p->name_strobj, &error); |
There was a problem hiding this comment.
Would be nice to add a comment explaining why the uncached lookup.
…common in Python 3 now. Also make the step from "return NULL" error handling to "goto error" reference cleanup explicit.
| } | ||
| else | ||
| Py_INCREF(bases); | ||
| else { |
There was a problem hiding this comment.
Is there any effect of this change? I tested class C: pass and class C(object): pass and didn't see any difference.
There was a problem hiding this comment.
I consider this part a cleanup that makes it clearer what operations are needed when, and how error cases are dealt with.
I didn't measure any speed difference either.
Objects/typeobject.c
Outdated
| if (res != NULL) | ||
| break; | ||
| if (PyErr_Occurred()) | ||
| return NULL; |
| if (!PyUnicode_CheckExact(name) || | ||
| (hash = ((PyASCIIObject *) name)->hash) == -1) | ||
| { | ||
| hash = PyObject_Hash(name); |
There was a problem hiding this comment.
PyObject_Hash() can call user code. mro is a borrowed reference here, it should be increfed before calling PyObject_Hash().
Or calculate the hash before getting mro.
There was a problem hiding this comment.
Good catch. I'll move the hash() call up.
Objects/typeobject.c
Outdated
| /* Internal API to look for a name through the MRO, bypassing the method cache. | ||
| This returns a borrowed reference, and might set an exception! */ | ||
| static PyObject * | ||
| _PyType_LookupUncached(PyTypeObject *type, PyObject *name, int *error) |
There was a problem hiding this comment.
Local functions usually use different name convention. All lower characters and no Py prefix.
There was a problem hiding this comment.
I was trying to hint at the name of the existing function. A different name would be less clear, I think.
There was a problem hiding this comment.
IMHO this rather adds a confusion.
There was a problem hiding this comment.
I renamed the function to find_name_in_mro().
Objects/typeobject.c
Outdated
| } | ||
|
|
||
| Py_hash_t hash; | ||
| *error = 1; |
There was a problem hiding this comment.
Alternatively you could return a special value for signalling error. E.g. (PyObject *)(-1). Test what is faster.
There was a problem hiding this comment.
I can set error to -1 on exceptions. That will distinguish all three cases: ok, error with exception, error without exception. Only -1 will need a call to PyErr_Clear() then.
There was a problem hiding this comment.
I dislike things like (PyObject *)(-1) because I wouldn't want to assume that they are not valid pointer values on any system whatsoever.
Distinguish between "error" and "error with exception" cases in _PyType_LookupUncached(). Fix a reference leak of "mro" on lookup errors. Resolves issues found by Serhiy Storchaka.
|
@serhiy-storchaka: thank you for your excellent feedback. |
Objects/typeobject.c
Outdated
| /* Internal API to look for a name through the MRO, bypassing the method cache. | ||
| This returns a borrowed reference, and might set an exception! */ | ||
| static PyObject * | ||
| _PyType_LookupUncached(PyTypeObject *type, PyObject *name, int *error) |
There was a problem hiding this comment.
IMHO this rather adds a confusion.
Objects/typeobject.c
Outdated
| in a context that propagates the exception out. | ||
| */ | ||
| PyErr_Clear(); | ||
| PyType_Ready(type) < 0) { |
There was a problem hiding this comment.
Either move PyType_Ready() to the previous line (if the resulting line will be not too long), or indent it as in the current code and move { to a new line for readability (new PEP 7 rule).
There was a problem hiding this comment.
I cleaned it up a little.
| in a context that propagates the exception out. | ||
| */ | ||
| if (error == -1) | ||
| PyErr_Clear(); |
There was a problem hiding this comment.
PyErr_Clear() always is called after _PyType_LookupUncached() if error == -1. Why not call it inside _PyType_LookupUncached()?
There was a problem hiding this comment.
The question is which interface is better: should the function itself always swallow any exceptions, or should the callers decide how to deal with them? The comments in the original _PyType_Lookup() implementation suggest that others have already been as unhappy as me about the interface of that function, so why keep making people unhappy?
Objects/typeobject.c
Outdated
| the same type will call it again -- hopefully | ||
| in a context that propagates the exception out. | ||
| */ | ||
| if (error == -1) |
…isting) API names. Suggested by Serhiy Storchaka.
|
The In the failing test, it should find The correct place to fix this is obviously |
|
That being said, this can obviously be made to work again by ignoring all exceptions also in the new helper function, just as |
|
I pushed a fix that eats lookup exceptions, exactly like |
…g lookup exceptions, just like "_PyType_Lookup()" did previously.
c52e47a to
4efde8e
Compare
| @@ -0,0 +1,2 @@ | |||
| Speed up class creation by reducing overhead in the necessary special method | |||
There was a problem hiding this comment.
Add speed up estimation: 10-20%
|
What was the cause of this failure? Could this failure be fixed by fixing ElementTree? |
|
See my comment above. Yes, ET should definitely be fixed. I pushed #3545. |
|
Feel free to revert 4efde8e if you think that fixing cElementTree is enough here. |
…ce eating lookup exceptions, just like "_PyType_Lookup()" did previously." This reverts commit 4efde8e.
…) from functions that swallow live exceptions.
|
I've reverted that change and added |
|
Note that both AppVeyor and Travis will be happy again once #3545 is merged. |
I don't recall why I was opposed to the change, but it changed a lot in the meanwhile :-)
| dict = ((PyTypeObject *)base)->tp_dict; | ||
| assert(dict && PyDict_Check(dict)); | ||
| res = PyDict_GetItem(dict, name); | ||
| res = _PyDict_GetItem_KnownHash(dict, name, hash); |
There was a problem hiding this comment.
I don't see well the difference between PyDict_GetItem() and _PyDict_GetItem_KnownHash() in term of performance. Maybe the difference is that PyDict_GetItem() calls PyErr_Fetch/PyErr_Restore. In that case, PyDict_GetItemWithError() would have the same speed, no?
| if (error == -1) { | ||
| /* It is unlikely by not impossible that there has been an exception | ||
| during lookup. Since this function originally expected no errors, | ||
| we ignore them here in order to keep up the interface. */ |
There was a problem hiding this comment.
I don't think that "Since this function originally expected no errors" matters here. Why not reporting the exception to the caller and handle it?
There was a problem hiding this comment.
I'm working on a patch that cleans up much of the mess around _PyType_Lookup(). See https://bugs.python.org/issue31465
|
"KnownHash is extremely short in comparison and probably gets inlined and streamlined with LTO. Substantially less branching." oh nice, good to know! |
| mro = type->tp_mro; | ||
| if (mro == NULL) { | ||
| *error = 1; | ||
| return NULL; |
There was a problem hiding this comment.
Since this is the only non-exception raising error case left, I actually wonder if this is really an error case. If there is no MRO in a ready-ied type, isn't that just fine from the point of view of a lookup?
vstinner
left a comment
There was a problem hiding this comment.
LGTM. I just proposed a minor coding style change.
The change enhances error handling, nice.
I don't understand everything, but I trust Python test suite to make sure that the change doesn't break anything :-)
| bases = PyTuple_Pack(1, base); | ||
| if (bases == NULL) | ||
| goto error; | ||
| return NULL; |
There was a problem hiding this comment.
PEP 7 nitpick: when you modify code, it's better to add { ... } to if blocks.
This gives me around 20% better performance for
class Test: pass. Admittedly, that's a micro-benchmark, but the change is obvious enough to not merit staying slow.https://bugs.python.org/issue31336