jit.md

The JIT

The adaptive interpreter consists of a main loop that executes the bytecode instructions generated by the bytecode compiler and their specializations. Runtime optimization in this interpreter can only be done for one instruction at a time. The JIT is based on a mechanism to replace an entire sequence of bytecode instructions, and this enables optimizations that span multiple instructions.

Historically, the adaptive interpreter was referred to as tier 1 and the JIT as tier 2. You will see remnants of this in the code.

The Trace Recorder and Executors

There are two interpreters in this section:

Adaptive interpreter (the default behavior)
Trace recording interpreter (enabled on JIT builds)

The program begins running on the adaptive interpreter, until a JUMP_BACKWARD or RESUME instruction determines that it is "hot" because the counter in its inline cache indicates that it executed more than some threshold number of times (see backoff_counter_triggers). It then calls the function _PyJit_TryInitializeTracing in Python/optimizer.c, passing it the current frame, instruction pointer and state. The interpreter then switches into "tracing mode" via the macro ENTER_TRACING(). On platforms that support computed goto and tail-calling interpreters, the dispatch table is swapped out, while other platforms that do not support either use a single flag in the opcode. Execution between the normal interpreter and tracing interpreter are interleaved via this dispatch mechanism. This means that while logically there are two interpreters, the implementation appears to be a single interpreter.

During tracing mode, after each interpreter instruction's DISPATCH(), the interpreter jumps to the TRACE_RECORD instruction. This instruction records the previous instruction executed and also any live values of the next operation it may require. It then translates the previous instruction to a sequence of micro-ops using _PyJit_translate_single_bytecode_to_trace. To ensure that the adaptive interpreter instructions and cache entries are up-to-date, the trace recording interpreter always resets the adaptive counters of adaptive instructions it sees. This forces a re-specialization of any new instruction should an instruction deoptimize. Thus, feeding the trace recorder up-to-date information. Finally, the TRACE_RECORD instruction decides when to stop tracing using various heuristics.

Once trace recording concludes, LEAVE_TRACING() swaps out the dispatch table/the opcode flag set earlier by ENTER_TRACING() is unset. stop_tracing_and_jit() then calls _PyOptimizer_Optimize() which optimizes the trace and constructs an _PyExecutorObject.

JIT execution is set up to either return to the adaptive interpreter and resume execution, or transfer control to another executor (see _PyExitData in Include/internal/pycore_optimizer.h). When resuming to the adaptive interpreter, a "side exit", generated by an EXIT_IF may trigger recording of another trace. While a "deopt", generated by a DEOPT_IF, does not trigger recording.

The executor is stored on the code object of the frame, in the co_executors field which is an array of executors. The start instruction of the trace (the JUMP_BACKWARD) is replaced by an ENTER_EXECUTOR instruction whose oparg is equal to the index of the executor in co_executors.

The micro-op optimizer

The micro-op (abbreviated uop to approximate μop) optimizer is defined in Python/optimizer.c as _PyOptimizer_Optimize. It takes a micro-op sequence from the trace recorder and optimizes with _Py_uop_analyze_and_optimize in Python/optimizer_analysis.c and an instance of _PyUOpExecutor_Type is created to contain it.

The JIT interpreter

After a JUMP_BACKWARD instruction invokes the uop optimizer to create a uop executor, it transfers control to this executor via the TIER1_TO_TIER2 macro.

CPython implements two executors. Here we describe the JIT interpreter, which is the simpler of them and is therefore useful for debugging and analyzing the uops generation and optimization stages. To run it, we configure the JIT to run on its interpreter (i.e., python is configured with --enable-experimental-jit=interpreter).

When invoked, the executor jumps to the tier2_dispatch: label in Python/ceval.c, where there is a loop that executes the micro-ops. The body of this loop is a switch statement over the uops IDs, resembling the one used in the adaptive interpreter.

The switch implementing the uops is in Python/executor_cases.c.h, which is generated by the build script Tools/cases_generator/tier2_generator.py from the bytecode definitions in Python/bytecodes.c.

When an _EXIT_TRACE or _DEOPT uop is reached, the uop interpreter exits and execution returns to the adaptive interpreter.

Invalidating Executors

In addition to being stored on the code object, each executor is also inserted into contiguous arrays (executor_blooms and executor_ptrs) stored in the interpreter state. These arrays are used when it is necessary to invalidate executors because values they used in their construction may have changed.

The JIT

When the full jit is enabled (python was configured with --enable-experimental-jit, the uop executor's jit_code field is populated with a pointer to a compiled C function that implements the executor logic. This function's signature is defined by jit_func in pycore_jit.h. When the executor is invoked by ENTER_EXECUTOR, instead of jumping to the uop interpreter at tier2_dispatch, the executor runs the function that jit_code points to. This function returns the instruction pointer of the next Tier 1 instruction that needs to execute.

The generation of the jitted functions uses the copy-and-patch technique which is described in Haoran Xu's article. At its core are statically generated stencils for the implementation of the micro ops, which are completed with runtime information while the jitted code is constructed for an executor by _PyJIT_Compile.

The stencils are generated at build time under the Makefile target regen-jit by the scripts in /Tools/jit. This script reads Python/executor_cases.c.h (which is generated from Python/bytecodes.c). For each opcode, it constructs a .c file that contains a function for implementing this opcode, with some runtime information injected. This is done by replacing CASE by the bytecode definition in the template file Tools/jit/template.c.

Each of the .c files is compiled by LLVM, to produce an object file that contains a function that executes the opcode. These compiled functions are used to generate the file jit_stencils.h, which contains the functions that the JIT can use to emit code for each of the bytecodes.

For Python maintainers this means that changes to the bytecodes and their implementations do not require changes related to the stencils, because everything is automatically generated from Python/bytecodes.c at build time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The JIT

The Trace Recorder and Executors

The micro-op optimizer

The JIT interpreter

Invalidating Executors

The JIT

Uh oh!

FilesExpand file tree

jit.md

Latest commit

History

jit.md

File metadata and controls

The JIT

The Trace Recorder and Executors

The micro-op optimizer

The JIT interpreter

Invalidating Executors

The JIT