Skip to content

Micro-optimize the LOAD_FAST opcode#92763

Closed
sweeneyde wants to merge 16 commits intopython:mainfrom
sweeneyde:microopt
Closed

Micro-optimize the LOAD_FAST opcode#92763
sweeneyde wants to merge 16 commits intopython:mainfrom
sweeneyde:microopt

Conversation

@sweeneyde
Copy link
Copy Markdown
Member

The most common opcode before:

TARGET_LOAD_FAST:
    frame->prev_instr = next_instr++;
    PyObject *value = frame->localsplus[oparg]
    if (value == NULL) { goto unbound_local_error; }
    value->ob_refcnt++;
    *stack_pointer++ = value;
    _Py_CODEUNIT word = *next_instr;
    opcode = word & 255;
    oparg = word >> 8;
    opcode |= cframe.use_tracing;
    goto *opcode_targets[opcode];

The most common opcode after:

TARGET_LOAD_FAST_KNOWN_QUICK:
    next_instr++;
    PyObject *value = frame->localsplus[oparg]
    value->ob_refcnt++;
    *stack_pointer++ = value;
    _Py_CODEUNIT word = *next_instr;
    opcode = word & 255;
    oparg = word >> 8;
    goto *opcode_targets[opcode];

In particular:

  • The write to frame->prev_instr is removed.
  • The NULL-check and branch are removed.
  • The memory read and |= are removed.

None of these were particularly significant in isolation, but together, they accounted for an approximately 1% speedup in pyperformance.

@sweeneyde sweeneyde closed this Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants