A fun Python puzzle with circular imports
Baptiste Mispelon asked an interesting Python quiz (via, via @glyph):
Can someone explain this #Python import behavior?
I'm in a directory with 3 files:a.py contains `A = 1; from b import *`
b.py contains `from a import *; A += 1`
c.py contains `from a import A; print(A)`
Can you guess and explain what happens when you run `python c.py`?
I encourage you to guess which of the options in the original post is the actual behavior before you read the rest of this entry.
There are two things going on here. The first thing is what
actually happens when you do 'from module import ...'. The short version is that this copies
the current bindings of names from one module
to another. So when module b does 'from a import *', it copies
the binding of a.A to b.A and then the += changes that binding.
The behavior would be the same if we used 'from a import A' and
'from b import A' in the code, and if we did we could describe
what each did in isolation as starting with 'A = 1' (in a), then
'A = a.A; A += 2' (in b), and then 'A = b.A' (back in a)
successively (and then in c, 'A = a.A').
The second thing going on is that you can import incomplete modules
(this is true in both Python 2 and Python 3, which return the same
results here). To see how this works we need to combine the
description of 'import' and 'from'
and the approximation of what happens during loading a module, although
neither is completely precise. To summarize, when a module is being
loaded, the first thing that happens is that a module namespace is
created and is added to sys.modules; then the
code of the module is executed in that namespace. When Python
encounters a 'from', if there is an entry for the module in
sys.modules, Python immediately imports things from it; it
implicitly assumes that the module is already fully loaded.
At first I was surprised by this behavior, but the more I think
about it the more it seems a reasonable choice. It avoids having
to explicitly detect circular imports and it makes circular imports
work in the simple case (where you do 'import b' and then don't
use anything from b until all imports are finished and the program
is running). It has the cost that if you have circular name uses
you get an unhelpful error message about 'cannot import name' (or
'NameError: name ... is not defined' if you use 'from module
import *'):
$ cat a.py from b import B; A = 10 + B $ cat b.py from a import A; B = 20 + A $ cat c.py from a import A; print(A) $ python c.py [...] ImportError: cannot import name 'A' from 'a' [...]
(Python 3.13 does print a nice stack trace the points to the whole set of 'from ...' statements.)
Given all of this, here is what I believe is the sequence of execution in Baptiste Mispelon's example:
- c.py does '
from a import A', which initiates a load of the 'a' module. - an '
a' module is created and added tosys.modules - that module begins executing the code from a.py, which creates an
'
a.A' name (bound to 1) and then does 'from b import *'. - a '
b' module is created and added tosys.modules. - that module begins executing the code from b.py. This code starts
by doing '
from a import *', which finds that 'sys.modules["a"]' exists and copies the a.A name binding, creatingb.A(bound to 1). - b.py does '
A += 1', which mutates theb.Abinding (but not the separatea.Abinding) to be '2'. - b.py finishes its code, returning control to the code from a.py,
which is still part way through '
from b import *'. This import copies all names (and their bindings) fromsys.modules["b"]into the 'a' module, which means theb.Abinding (to 2) overwrites the olda.Abinding (to 1). - a.py finishes and returns control to c.py, where '
from a import A' can now complete by copying thea.Aname and its binding into 'c', make it the equivalent of 'import a; A = a.A; del a'. - c.py prints the value of this, which is 2.
At the end of things, there is all of c.A, a.A, and b.A, and they are bindings to the same object. The order of binding was 'b.A = 2; a.A = b.A; c.A = a.A'.
(There's also a bonus question, where I have untested answers.)
Sidebar: A related circular import puzzle and the answer
Let's take a slightly different version of my error message example above, that simplifies things by leaving out c.py:
$ cat a.py from b import B; A = 10 + B $ cat b.py from a import A; B = 20 + A $ python a.py [...] ImportError: cannot import name 'B' from 'b' [...]
When I first did this I was quite puzzled until the penny dropped.
What's happening is that running 'python a.py' isn't creating
an 'a' module but instead a __main__ module, so b.py doesn't
find a sys.modules["a"] when it starts and instead creates one
and starts loading it. That second version of a.py, now in an "a"
module, is what tries to refer to b.B and finds it not there (yet).
|
|