Skip to content

Micro-op (tier 2) interpreter #580

@markshannon

Description

@markshannon

The output of the tier 2 optimization pipeline is a superblock of micro-ops.
We will need an interpreter for these micro-ops for a few reasons:

  • To execute the micro-ops if no JIT compiler is present.
  • To serve as a specification for the JIT
  • To help debug or verify the JIT

Instructions

To execute the micro-ops they will need to be represented in memory in an efficient format.
We can reject rare or complex bytecodes when optimizing, so we can ignore many corner cases.

Each instruction should consist of an opcode and oparg, as they do in normal bytecode, but the oparg can be either the original oparg or a cache entry, so we will need more than 8 bits.
A 32bit instruction with 8 bit opcode, 22 oparg and 2 spare bits should be a good starting pointing. We might want a 9 or 10 bit opcode if we are to have a large number of superinstructions, or we might need 24 bit operands. We'll see.

The interpreter

The interpreter should be created from bytecodes.c, much as we do for PyEval_EvalDefault()

Changes to bytecodes.c

In order to create the interpreter each micro-op can take at most one operand: either the original oparg or one cache entry.
To do this we will need to break down some instructions into quite small parts.

E.g. LOAD_ATTR_METHOD_WITH_VALUES has the signature:
inst(LOAD_ATTR_METHOD_WITH_VALUES, (unused/1, type_version/2, keys_version/2, descr/4, self -- res2 if (oparg & 1), res))

We can ignore the oparg as the optimizer will eliminate it. That leaves the type_version, keys_version and descr operand, so this instruction will be need split into at least three parts, something like:

macro(LOAD_ATTR_METHOD_WITH_VALUES) = 
    SKIP_COUNTER + CHECK_TYPE_VERSION + 
    CHECK_SHARED_DICT_KEYS_VERSION + LOAD_INLINED_ATTRIBUTE;

Where the parts would be defined as something like:

op(SKIPCOUNTER, (unused/1 -- )) {}

op(CHECK_TYPE_VERSION, (type_version/2, self -- self))
{
    PyTypeObject *self_cls = Py_TYPE(self);
    assert(type_version != 0);
    DEOPT_IF(self_cls->tp_version_tag != type_version);
    assert(self_cls->tp_flags & Py_TPFLAGS_MANAGED_DICT);
}

op(CHECK_SHARED_DICT_KEYS_VERSION, (keys_version/2, self -- self))
{
    PyTypeObject *self_cls = Py_TYPE(self);
    PyDictOrValues dorv = *_PyObject_DictOrValuesPointer(self);
    DEOPT_IF(!_PyDictOrValues_IsValues(dorv), LOAD_ATTR);
    PyHeapTypeObject *self_heap_type = (PyHeapTypeObject *)self_cls;
    DEOPT_IF(self_heap_type->ht_cached_keys->dk_version != keys_version);
    STAT_INC(opcode, hit);
}

op(LOAD_INLINED_ATTRIBUTE, (descr/4, self -- res, self))
{
    assert(descr != NULL);
    res = Py_NewRef(descr);
    assert(_PyType_HasFeature(Py_TYPE(res), Py_TPFLAGS_METHOD_DESCRIPTOR));
}

Fitting the 64 bit descr field into 22 bits is difficult, if not impossible.
However, we need to shrink this entry in order to reduce the size of code objects.
So we'll need to do that first, or make the instructions 64 bit each.

Metadata

Metadata

Assignees

No one assigned

    Labels

    epic-tier2-optimizerLinear code region optimizer for 3.13 and beyond.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions