Skip to content

AttributeError when pickled function uses submodules #78

@benjimin

Description

@benjimin

This problem is when a function refers (by attribute) to a sub-module of a package. Cloudpickle appears to pickle functions not by name, but by code plus (a subset of) globals. So the parent package is injected into the pickle, but the sub-module is not.

def func():
    # import unittest.mock
    x = unittest.TestCase
    x = unittest.mock.Mock
import unittest.mock

import cloudpickle as pickle
s = pickle.dumps(func)

del unittest
import sys
del sys.modules['unittest']
del sys.modules['unittest.mock']

f = pickle.loads(s)
# import unittest.mock as anything
f()
AttributeError: module 'unittest' has no attribute 'mock'

This leads to non-intuitive bugs in applications such as cluster computing (e.g. with dask.distributed).
Workarounds:

  • perform imports inside functions (contrary to PEP8).
  • import sub-modules (or their contents) as globals.
  • ensure (somehow) that all relevant sub-modules are automatically imported by respective parent packages (e.g. __init__.py).
  • arrange for the unpickling process to have already done an import of the sub-module (e.g. by first uncloudpickling something else that did refer to the sub-module as a global).

I assume cloudpickle checks whether a global is an imported module, and if so then stores the name (rather than pickling its attributes). Is it practical to also check (via sys.modules.keys()) which sub-modules had previously been imported, and ensure every such module is subsequently initialised?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions