-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
Description
Typically each PyModuleDef for a builtin/extension module is a static global variable. Currently it's shared between all interpreters, whereas we are working toward interpreter isolation (for a variety of reasons). Isolating each PyModuleDef is worth doing, especially if you consider we've already run into problems1 because of m_copy.
The main focus here is on PyModuleDef.m_base.m_copy2 specifically. It's the state that facilitates importing legacy (single-phase init) extension/builtin modules that do not support repeated initialization3 (likely the vast majority).
(expand for more context)
PyModuleDef for an extension/builtin module is usually stored in a static variable and (with immortal objects, see gh-101755) is mostly immutable. The exception is m_copy, which is problematic in some cases for modules imported in multiple interpreters.
Note that m_copy is only relevant for legacy (single-phase init) modules, whether builtin and an extension, and only if the module does not support repeated initialization3. It is never relevant for multi-phase init (PEP 489) modules.
- initialization
m_copyis only set by_PyImport_FixupExtensionObject()(and thus indirectly_PyImport_FixupBuiltin()and_imp.create_builtin())_PyImport_FixupExtensionObject() is called by_PyImport_LoadDynamicModuleWithSpec()` when a legacy (single-phase init) extension module is loaded
- usage
m_copyis only used inimport_find_extension(), which is only called by_imp.create_builtin()and_imp.create_dynamic()(via the respective importers)
When such a legacy module is imported for the first time, m_copy is set to a new copy of the just-imported module's __dict__, which is "owned" by the current interpreter (the one importing the module). Whenever the module is loaded again (e.g. reloaded or deleted from sys.modules and then imported), a new empty module is created and m_copy is [shallow] copied into that object's __dict__.
When m_copy is originally initialized, normally that will be the first time the module is imported. However, that code can be triggered multiple times for that module if it is imported under a different name (an unlikely case but apparently a real one). In that case the m_copy from the previous import is replaced with the new one right after it is released (decref'ed). This isn't the ideal approach but it's also been the behavior for quite a while.
The tricky problem here is that the same code is triggered for each interpreter that imports the legacy module. Things are fine when a module is imported for the first time in any interpreter. However, currently, any subsequent import of that module in another interpreter will trigger that replacing code. The second interpreter decref's the old m_copy, but that object is "owned" by the first interpreter. This is a problem1.
Furthermore, even if the decref-in-the-wrong-interpreter problem was gone. When m_copy is copied into the new module's __dict__ on subsequent imports, it's only a shallow copy. Thus such a legacy module, imported in other interpreters than the first one, would end up with its __dict__ filled with objects not owned by the correct interpreter.
Here are some possible approaches to isolating each module's PyModuleDef to the interpreter that imports it:
- keep a copy of
PyModuleDeffor each interpreter (would_PyRuntimeState.imports.extensionsneed to move to the interpreter?) - keep just
m_copyfor/on each interpreter - fix
_PyImport_FixupExtensionObject()some other way...
Linked PRs
- gh-101758: Add a Test For Single-Phase Init Module Variants #101891
- gh-101758: Clean Up Uses of Import State #101919
- gh-101758: Add a Test For Single-Phase Init Modules in Multiple Interpreters #101920
- gh-101758: Fix the wasm Builtbots #101943
- gh-101758: Add _PyState_AddModule() Back for the Stable ABI #101956
- gh-101758: Fix Refleak Testing With test_singlephase_variants #101969
Footnotes
-
see https://github.com/python/cpython/pull/101660#issuecomment-1424507393 ↩ ↩2
-
We should probably consider isolating
PyModuleDef.m_base.m_index, but for now we simply sync themodules_by_indexlist of each interpreter. (Also,modules_by_indexandm_indexare only used for single-phase init modules.) ↩ -
specifically
def->m_size == -1; multi-phase init modules always havedef->m_size >= 0; single-phase init modules can also have a non-negativem_size↩ ↩2
Metadata
Metadata
Assignees
Labels
Projects
Status