gh-118702: Implement vectorcall for BaseException#118703
gh-118702: Implement vectorcall for BaseException#118703vstinner merged 4 commits intopython:mainfrom
Conversation
|
Benchmark: import pyperf
EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))
def bench_keyerror():
d = EMPTY_DICT
for key in KEYS:
try:
d[key]
except KeyError:
pass
runner = pyperf.Runner()
runner.bench_func('keyerror', bench_keyerror, inner_loops=INNER_LOOPS) |
|
Nice change, but you broke CI :) |
Right, It should now be fixed. |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
I am surprised that switching to vector call makes a difference here. How is it in comparison with the following BaseException_vectorcall implementation?
argstuple = _PyTuple_FromArray(args, PyVectorcall_NARGS(nargsf));
self = type_obj->tp_new(type_obj, argstuple, NULL);
Py_TYPE(self)->tp_init(self, argstuple, NULL);
Py_DECREF(argstuple);If it is still faster than the current code, then there is something wrong in the non-vectorcall path.
In current main there is a call to |
This change avoids calling type_call() which contains more code. |
|
Benchmark result with CPU isolation (this PR): I also measured only the Python/errors.c changes; replace It's faster, but not as fast as this PR. |
But was is wrong with |
AFAICS, in |
Python/errors.c
Outdated
| PyObject *exc = PyObject_CallOneArg(PyExc_KeyError, arg); | ||
| if (!exc) { | ||
| /* caller will expect error to be set anyway */ | ||
| return; | ||
| } | ||
| _PyErr_SetObject(tstate, PyExc_KeyError, tup); | ||
| Py_DECREF(tup); | ||
|
|
||
| _PyErr_SetRaisedException(tstate, exc); |
There was a problem hiding this comment.
It is not the same. It does not set __context__, and there may be other differences in case when it fails to create an exception object.
There was a problem hiding this comment.
Oh, right. I fixed this regression and added an unit test.
|
The half of the gain is due to the change in I do not actively oppose this change, I only suggest that there may be place for more general optimization. If you still want to add |
fcc7a2f to
e689491
Compare
* BaseException_vectorcall() now creates a tuple from 'args' array.
* Creation an exception using BaseException_vectorcall() is now a
single function call, rather than having to call
BaseException_new() and then BaseException_init().
Calling BaseException_init() is inefficient since it overrides
the 'args' attribute.
* _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the
KeyError instance to use BaseException_vectorcall().
Micro-benchmark on creating a KeyError on accessing a non-existent
dictionary key:
Mean +- std dev: 447 ns +- 31 ns -> 373 ns +- 15 ns: 1.20x faster
4424f98 to
eb0b861
Compare
|
The PR is now ready for a review. The main branch is now Python 3.14. I rebased the PR on the main branch and squashed commit. I fixed the Updated benchmark, Python built with
I added a benchmark on Benchmark: import pyperf
EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))
def bench_key_error(loops):
range_it = range(loops)
t0 = pyperf.perf_counter()
value = "value"
for _ in range_it:
d = EMPTY_DICT
for key in KEYS:
try:
d[key]
except KeyError:
pass
return pyperf.perf_counter() - t0
def bench_value_error(loops):
range_it = range(loops)
t0 = pyperf.perf_counter()
value = "value"
for _ in range_it:
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
return pyperf.perf_counter() - t0
runner = pyperf.Runner()
runner.bench_time_func('key_error', bench_key_error, inner_loops=INNER_LOOPS)
runner.bench_time_func('value_error', bench_value_error, inner_loops=10) |
I fixed the code.
10% to 12% faster to create exceptions sound appealing since raising KeyError is a common operation. I ran benchmark on functions using METH_VARVARGS. KeyError is the clear winner in number of calls when running
I don't see any simple optimization opportunity, do you?
The optimization applies to 32 built-in exceptions:
|
|
Merged, thanks for reviews. |
* BaseException_vectorcall() now creates a tuple from 'args' array. * Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute. * _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall().
Micro-benchmark on creating a KeyError on accessing a non-existent dictionary key: