ENH added original version of pickled estimator in state dict #22094

Edern76 · 2021-12-29T17:20:31Z

Reference Issues/PRs

Fixes #22055

What does this implement/fix? Explain your changes.

As explained in the issue description, pickling an estimator saves the sklearn version with which it was last pickled but not the one with which it was first pickled.

This PR adds this information to the state dictionnary of BaseEstimator under the key _sklearn_pickle_version as suggested in the issue description. The value associated with this key is either the current version of sklearn if the key is not already present in the previous state dictionnary (i.e : if it has not been pickled before), or the previous value of that element if the key was present in the dictionnary (which allows us to "propagate" the version number of the first pickle in every other subsequent pickle)

Any other comments?

This PR was realized as part of the APPC course at INSA Rouen.

thomasjpfan

Thank you for the PR @Edern76 !

sklearn/base.py

ogrisel

I agree with @thomasjpfan suggestion above.

sklearn/base.py

thomasjpfan

Thanks for the update!

sklearn/tests/test_base.py

thomasjpfan · 2022-01-02T19:42:18Z

sklearn/base.py

+        if (
+            type(self).__module__.startswith("sklearn.")
+            and "_sklearn_pickle_version" not in state.keys()
+        ):


I think _sklearn_pickle_version should be overridden every time. This way _sklearn_pickle_version means: "the runtime version of the model when it was pickled".

Suggested change

if (

type(self).__module__.startswith("sklearn.")

and "_sklearn_pickle_version" not in state.keys()

):

if type(self).__module__.startswith("sklearn."):

Let's say a model was trained on 0.24.2 and pickled. Then it is loaded into 1.0.2 and we correctly have _sklearn_pickle_version=0.24.2 and raise a warning. If the model was fitted again on 1.0.2, the state would be reset and require the 1.0.2 to run. This means _sklearn_pickle_version should be updated to 1.0.2 when pickled.

There is an edge case where a model was trained on on 0.24.2, pickled, loaded on 1.0.2, repickled again on 1.0.2 and a user expects _sklearn_pickle_version to be 0.24.2. I am considering this usage error.

ogrisel · 2022-01-03T10:00:05Z

sklearn/tests/test_base.py

+    "Use at your own risk. "
+    "For more info please refer to:\\n"
+    "https://scikit-learn.org/stable/modules/model_persistence"
+    ".html#security-maintainability-limitations"


ogrisel

I think @thomasjpfan's suggestion https://github.com/scikit-learn/scikit-learn/pull/22094/files#r777246976 is valid.

The test can be updated accordingly as follows:

ogrisel · 2022-01-03T10:14:43Z

sklearn/tests/test_base.py

+def test_base_estimator_pickle_version(monkeypatch):
+    """Check that the original sklearn version with which a base estimator
+    has been pickled with is present"""
+    old_pickle_version = "0.21.3"
+    monkeypatch.setattr(sklearn.base, "__version__", old_pickle_version)
+    original_estimator = MyEstimator()
+
+    first_pickle_estimator = pickle.loads(pickle.dumps(original_estimator))
+    assert hasattr(first_pickle_estimator, "_sklearn_pickle_version")
+    assert first_pickle_estimator._sklearn_pickle_version == old_pickle_version
+
+    new_pickle_version = "1.1.0"
+    monkeypatch.setattr(sklearn.base, "__version__", new_pickle_version)
+    message = pickle_error_message.format(
+        estimator="MyEstimator",
+        old_version=old_pickle_version,
+        current_version=new_pickle_version,
+    )
+    with pytest.warns(UserWarning, match=message):
+        second_pickle_estimator = pickle.loads(pickle.dumps(first_pickle_estimator))
+        assert hasattr(second_pickle_estimator, "_sklearn_pickle_version")
+        assert second_pickle_estimator._sklearn_pickle_version == old_pickle_version


I think the main usage scenario we want to handle is rather:

Suggested change

def test_base_estimator_pickle_version(monkeypatch):

"""Check that the original sklearn version with which a base estimator

has been pickled with is present"""

old_pickle_version = "0.21.3"

monkeypatch.setattr(sklearn.base, "__version__", old_pickle_version)

original_estimator = MyEstimator()

first_pickle_estimator = pickle.loads(pickle.dumps(original_estimator))

assert hasattr(first_pickle_estimator, "_sklearn_pickle_version")

assert first_pickle_estimator._sklearn_pickle_version == old_pickle_version

new_pickle_version = "1.1.0"

monkeypatch.setattr(sklearn.base, "__version__", new_pickle_version)

message = pickle_error_message.format(

estimator="MyEstimator",

old_version=old_pickle_version,

current_version=new_pickle_version,

)

with pytest.warns(UserWarning, match=message):

second_pickle_estimator = pickle.loads(pickle.dumps(first_pickle_estimator))

assert hasattr(second_pickle_estimator, "_sklearn_pickle_version")

assert second_pickle_estimator._sklearn_pickle_version == old_pickle_version

def test_base_estimator_pickle_version(monkeypatch):

"""The version should be embedded at dump time and checked at load time"""

old_version = "0.21.3"

monkeypatch.setattr(sklearn.base, "__version__", old_pickle_version)

original_estimator = MyEstimator()

old_pickle = pickle.dumps(original_estimator)

loaded_estimator = pickle.loads(old_pickle)

assert loaded_estimator._sklearn_pickle_version == old_version

assert not hasattr(original_estimator, "_sklearn_pickle_version")

new_version = "1.1.0"

monkeypatch.setattr(sklearn.base, "__version__", new_version)

message = pickle_error_message.format(

estimator="MyEstimator",

old_version=old_version,

current_version=new_version,

)

with pytest.warns(UserWarning, match=message):

reloaded_estimator = pickle.loads(old_pickle)

assert reloaded_estimator._sklearn_pickle_version == old_version

Disclaimer: I have not run the code, there might be typos.

ogrisel · 2022-01-03T10:19:45Z

doc/whats_new/v1.1.rst

+- |Enhancement| All scikit-learn estimators now include the sklearn version
+  with which they have first been pickled when saving them with the pickle.
+  library.


Here is an updated changelog entry that would reflect the behavior with @thomasjpfan's suggested change.

Suggested change

- |Enhancement| All scikit-learn estimators now include the sklearn version

with which they have first been pickled when saving them with the pickle.

library.

- |Enhancement| All scikit-learn estimators now save the sklearn version

with which they have been pickled on a private attribute to avoid having

to parse the warning message to programmatically access this information

to introspect this.

It's a bit weird to document a pure-private API change but I am not sure what to do otherwise. Not documenting this change would even be worse I think.

I think it would be nice to make the pickled version public. There is a need if users are parsing the warning message for this information.

ogrisel · 2022-01-03T10:25:25Z

sklearn/base.py

    def __setstate__(self, state):
        if type(self).__module__.startswith("sklearn."):
-            pickle_version = state.pop("_sklearn_version", "pre-0.18")
+            pickle_version = state.get("_sklearn_pickle_version", "pre-0.18")


Let's preserve backward compat with old but not too old pickles:

Suggested change

pickle_version = state.get("_sklearn_pickle_version", "pre-0.18")

pickle_version = state.pop("_sklearn_version", "pre-0.18") # compat

pickle_version = state.setdefault("_sklearn_pickle_version", pickle_version)

Edern76 · 2022-01-08T16:52:09Z

Thanks for the suggestions, I made a new commit with all these suggested changes

thomasjpfan

Minor comment otherwise LGTM

sklearn/base.py

Co-authored-by: Thomas J. Fan <[email protected]>

thomasjpfan

Thank you for your patience, I still think this looks good.

thomasjpfan · 2022-08-03T20:34:02Z

doc/whats_new/v1.1.rst

  error message when setting invalid hyper-parameters with `set_params`.
  :pr:`21542` by :user:`Olivier Grisel <ogrisel>`.

+- |Enhancement| All scikit-learn estimators now save the sklearn version


This did not make it into v1.1, which means this change log entry needs to move to v1.2.

thomasjpfan · 2022-08-03T20:35:59Z

sklearn/base.py

-            return dict(state.items(), _sklearn_version=__version__)
+            return dict(
+                state.items(),
+                _sklearn_pickle_version=__version__,


Looking at this again, I am starting to prefer making this more official by naming the attribute:

Suggested change

_sklearn_pickle_version=__version__,

__sklearn_pickle_version__=__version__,

and then documenting this in https://scikit-learn.org/stable/model_persistence.html.

Using custom dunder attribute names is explicitly discouraged by PEP8. I prefer the original _sklearn_pickle_version attribute name.

glemaitre

I applied the changes proposed by @thomasjpfan and put my +1.
I will merge if the CIs turn green.

This PR is urgent because CI for our PRs will fail until there is a fix. After scikit-learn/scikit-learn#22094, sklearn estimators will contain an additional key in their __dict__ after loading, namely "__sklearn_pickle_version__". This causes our tests to fail, since they compare objects before and after loading. The quick solution is to pop the item in our tests if it exists and only compare the remaining items. Should the sklearn change be amended, we should remove the fix from this PR. Progress is tracked here: scikit-learn/scikit-learn#25273

After scikit-learn/scikit-learn#22094, sklearn estimators will contain an additional key in their __dict__ after loading, namely "__sklearn_pickle_version__". This causes our tests to fail, since they compare objects before and after loading. The quick solution is to pop the item in our tests if it exists and only compare the remaining items. Should the sklearn change be amended, we should remove the fix from this PR. Progress is tracked here: scikit-learn/scikit-learn#25273

…-learn#22094) Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]>

…#22094)" This reverts commit affaa62.

Edern76 added 3 commits December 29, 2021 18:14

ENH added original version of pickled estimator in state dict

faf6fe2

DOC added changelog entry

139cff2

FIX fixed comments length

8d2c64e

thomasjpfan reviewed Dec 30, 2021

View reviewed changes

sklearn/base.py Show resolved Hide resolved

ogrisel reviewed Dec 31, 2021

View reviewed changes

sklearn/base.py Outdated Show resolved Hide resolved

ENH changed sklearn_version to sklearn_pickle_version

b7e21c4

thomasjpfan reviewed Jan 2, 2022

View reviewed changes

sklearn/tests/test_base.py Outdated Show resolved Hide resolved

ENH use monkeypatch in tests

7b4aa37

thomasjpfan reviewed Jan 2, 2022

View reviewed changes

ogrisel reviewed Jan 3, 2022

View reviewed changes

ENH incorporated PR suggestions

6a44e76

thomasjpfan approved these changes Jan 8, 2022

View reviewed changes

sklearn/base.py Outdated Show resolved Hide resolved

ENH added comment to state pop

935e378

Co-authored-by: Thomas J. Fan <[email protected]>

cmarmo added the Waiting for Reviewer label Feb 1, 2022

thomasjpfan added 3 commits April 24, 2022 09:19

DOC Adds code comments

34d0169

Merge remote-tracking branch 'upstream/main' into pr/22094

6998ef3

DOC Reword whats new

1f9804b

thomasjpfan reviewed Aug 3, 2022

View reviewed changes

cmarmo added Waiting for Second Reviewer First reviewer is done, need a second one! and removed Waiting for Reviewer labels Oct 20, 2022

glemaitre self-requested a review December 28, 2022 17:19

glemaitre added 2 commits December 28, 2022 18:36

Merge remote-tracking branch 'origin/main' into pr/Edern76/22094

eff6a4b

update version

4474e2e

glemaitre approved these changes Dec 28, 2022

View reviewed changes

DOC additional information regarding sklearn version

3d8a06a

glemaitre enabled auto-merge (squash) December 28, 2022 18:12

glemaitre merged commit affaa62 into scikit-learn:main Dec 28, 2022

adrinjalali mentioned this pull request Jan 2, 2023

__sklearn_pickle_version__ makes estimator.__dict__.keys() == loaded.__dict__.keys() to fail #25273

Closed

BenjaminBossan mentioned this pull request Jan 2, 2023

FIX: Persistence tests failing when using sklearn nightly skops-dev/skops#260

Merged

glemaitre added a commit that referenced this pull request Jan 3, 2023

Revert "ENH added original version of pickled estimator in state dict (…

df218f0

…#22094)" This reverts commit affaa62.

glemaitre mentioned this pull request Jan 3, 2023

Revert "ENH added original version of pickled estimator in state dict" #25279

Merged

	pickle_version = state.get("_sklearn_pickle_version", "pre-0.18")
	pickle_version = state.pop("_sklearn_version", "pre-0.18") # compat
	pickle_version = state.setdefault("_sklearn_pickle_version", pickle_version)

	_sklearn_pickle_version=__version__,
	__sklearn_pickle_version__=__version__,

Uh oh!

ENH added original version of pickled estimator in state dict #22094

ENH added original version of pickled estimator in state dict #22094

Uh oh!

Conversation

Edern76 commented Dec 29, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan Jan 2, 2022 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jan 8, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Edern76 commented Jan 8, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thomasjpfan Jan 2, 2022 •

edited by ogrisel

Loading

ogrisel left a comment •

edited

Loading

ogrisel Jan 3, 2022 •

edited

Loading

ogrisel Jan 3, 2022 •

edited

Loading