Skip to content

Conversation

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 5, 2023

Reference Issues/PRs

Closes #25273
Alternative to #25280

What does this implement/fix? Explain your changes.

This PR adds more information to the warning, so a user can record the warning and get information about the version used to pickle the estimator:

with warnings.catch_warnings(record=True) as w:
    est = pickle.loads(...)
    
warning = w[0].message
print(warning.original_sklearn_version)

One can also turn the warning into an error and get the original_sklearn_version from there:

warnings.simplefilter("error", InconsistentVersionWarning)

try:
    est = pickle.loads(...)
except InconsistentVersionWarning as w:
    print(w.original_sklearn_version)

Any other comments?

This PR does not add any more state of the estimator itself.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I like the PR.

@glemaitre
Copy link
Member

I am split here. The change is simple. However, the solution is a bit in between parsing the warning message and just having a nice attribute.

Since the other solution involves adding a __new__ that is scary, the current solution might be enough. We should only document it in the pickling section to provide a code snippet that just works to get the pickling version.

@betatim
Copy link
Member

betatim commented Jan 5, 2023

I like it. Same question as Adrin about the parsing vs separate arguments.

A curiosity: when I load an estimator pickled in an older version of sklearn in ipython I see the same warning twice. Does someone know if this is a "weird thing" related to ipython or normal behaviour or something totally different?

A separate thing (so we can punt it to a new PR if we want to) that I find confusing about this is that a estimator pickled with v1 and then unpickled with this PR (or any future version) will return {..., "_sklearn_version": '1.3.dev0'}. This was also mentioned in the original issue (#22055). It seems weird and easy to fix. It happens because __getstate__ unconditionally adds _sklearn_version to the state dict (return dict(state.items(), _sklearn_version=__version__)), instead I think it should check if the key is present and only set it if it isn't present.

@adrinjalali
Copy link
Member

instead I think it should check if the key is present and only set it if it isn't present.

@betatim the key is never present (and I think it's a good thing) since __setstate__ pops it before setting the attributes. If the user saves a model which was loaded from an older sklearn version, we probably don't care about the usecase, do we? I'm happy with this solution.

@glemaitre
Copy link
Member

Shall we document the way to programmatically recover the pickle version in the persistence documentation?

@betatim
Copy link
Member

betatim commented Jan 6, 2023

Shall we document the way to programmatically recover the pickle version in the persistence documentation?

Sounds like a good idea. If there is a documented way to get the version used to fit an estimator that reduces the number of people calling __getstate__ to find out.

I hadn't realised that __setstate__ pops the variable, and I did quickly read the code ... . Combined with the fact that it doesn't contain what people are looking for when they poke around, having a documented way of getting the right answer is good.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, LGTM.

@glemaitre glemaitre merged commit 70c690e into scikit-learn:main Jan 12, 2023
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Jan 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

__sklearn_pickle_version__ makes estimator.__dict__.keys() == loaded.__dict__.keys() to fail

4 participants