Currently iterating over a generator or awaiting a coroutine goes through several layers of C code, performing lots of wasteful transformations to do little more than make a jump in the bytecode.
By specializing FOR_ITER for generators, and SEND for coroutines we can remove this overhead.
However, we will either need trampolines to fix up returns, or to change the behavior of RETURN_VALUE in generators and coroutines
The following assumes that python/cpython#96319 has been merged.
Iterating over a generator
The FOR_ITER bytecode pushes the yielded value when __next__ returns a value, so that's simple enough. YIELD_VALUE already does that. The complication is that RETURN_VALUE pushes a value, but we actually need to POP the generator. So we need an additional two POPs after the return.
We can either change the way return works for generators, adding a new instruction GEN_RETURN, change the way FOR_ITER works, some combination of those, or insert a trampoline.
Inserting a trampoline is relatively expensive, so I'd like to do this without one.
First, we can implement GEN_RETURN which would cleanup the generator, and replace the caller's TOS with the returned value.
Then we change FOR_ITER to not pop the iterator on completion.
A for loop will now compile to:
FOR_ITER end
body
...
end:
POP_TOP
This cost one more POP_TOP per loop, but simplifies FOR_ITER a bit.
We can then specialize FOR_ITER for generators in a straightforward fashion, as no cleanup shim will be needed.
Awaiting a coroutine
SEND operates much like FOR_ITER, but the transformation is simpler, as we don't need to POP the result.
await compiles exactly as before, as GEN_RETURN leaves the result on the caller's stack.
The new bytecodes
GEN_RETURN
Does the following:
- Pops the TOS from the caller (will be the generator)
- Pushes the result to the caller's stack
- Pops and destroys the current frame
- Resumes the caller at
next_instr + gen_return_offset
FOR_ITER_GENERATOR
Does the following:
- Deopts if iterator is not a generator
- Deopts if the generator is not suspended
- Sets the current frame's
gen_return_offset to oparg
- Pushes the generator's frame
- Pushes
None to the generator's stack
- Resumes execution of the generator
SEND_COROUTINE
Does the following:
- Deopts if awaitable is not a coroutine
- Deopts if the coroutine is not suspended
- Sets the current frame's
gen_return_offset to oparg
- Pop the value from the callers' stack
- Pushes the coroutine's frame
- Pushes the value to the coroutine's stack
- Resumes execution of the coroutine
Currently iterating over a generator or awaiting a coroutine goes through several layers of C code, performing lots of wasteful transformations to do little more than make a jump in the bytecode.
By specializing
FOR_ITERfor generators, andSENDfor coroutines we can remove this overhead.However, we will either need trampolines to fix up returns, or to change the behavior of
RETURN_VALUEin generators and coroutinesThe following assumes that python/cpython#96319 has been merged.
Iterating over a generator
The
FOR_ITERbytecode pushes the yielded value when__next__returns a value, so that's simple enough.YIELD_VALUEalready does that. The complication is thatRETURN_VALUEpushes a value, but we actually need toPOPthe generator. So we need an additional twoPOPs after the return.We can either change the way return works for generators, adding a new instruction
GEN_RETURN, change the wayFOR_ITERworks, some combination of those, or insert a trampoline.Inserting a trampoline is relatively expensive, so I'd like to do this without one.
First, we can implement
GEN_RETURNwhich would cleanup the generator, and replace the caller's TOS with the returned value.Then we change
FOR_ITERto not pop the iterator on completion.A for loop will now compile to:
This cost one more
POP_TOPper loop, but simplifiesFOR_ITERa bit.We can then specialize
FOR_ITERfor generators in a straightforward fashion, as no cleanup shim will be needed.Awaiting a coroutine
SENDoperates much likeFOR_ITER, but the transformation is simpler, as we don't need to POP the result.awaitcompiles exactly as before, asGEN_RETURNleaves the result on the caller's stack.The new bytecodes
GEN_RETURNDoes the following:
next_instr+gen_return_offsetFOR_ITER_GENERATORDoes the following:
gen_return_offsettoopargNoneto the generator's stackSEND_COROUTINEDoes the following:
gen_return_offsettooparg