-
-
Notifications
You must be signed in to change notification settings - Fork 211
raise a warning, not an error, when not matching version exactly #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[WIP] New release
Release version 0.9
I'd be in favor of that solution as it makes the user think about dependency versions. |
|
it looks like |
|
Oops, yes, I should have realized that. However, there is more: 'flow_name': 'sklearn.pipeline.Pipeline(standardscaler=sklearn.preprocessing.data.StandardScaler,svc=sklearn.svm.classes.SVC)(1)', I am not sure know where the words 'data' and 'classes' are coming from. As far as I know they are not part of the class hierarchy. Was this the case in older sklearn versions? |
|
@mfeurer where would you want to have that flag set? It seems like it would be a property of the extension but the extension is created on-demand whenever needed. I could add it to Passing it to where the extension is created in The other option would be to add a config option i.e. a global variable that stores this. Which is also not pretty. |
|
@hp2500 the data and classes are part of the module hierarchy and should be there. It's an issue with the deserialization of the pipeline, possibly because the format for storing flows has changed in openml? I can try to look into it but maybe @janvanrijn or @mfeurer know. |
I'm not sure about that. If you have a mismatching local libraries, and want to perform the flow on a task, does it not make sense to create a new flow for your version-loose replication and run that on the task? |
|
@PGijsbers how would you do that? Or do you mean just doing |
|
Actually, I'd like to take it back and agree with you instead, if we allow it in straight forward fashion then a parameter should be added there. Basically, the possible behavior I would expect is this (in
but this really shouldn't be done without explicit conset of the user, I don't think. However, we should also think about how it would interact with |
|
I really don't have a strong opinion on this. I don't even care that much about running the flow again, just instantiating a model at all is what we need, and right now that's quite tricky :-/ |
Could you please create a new issue for this? It seems to be a different problem.
This makes sense to me and I think I would prefer this solution to the other ones you mentioned. |
We could just forbid enabling both to avoid this issue. |
# Conflicts: # tests/test_flows/test_flow_functions.py
|
It looks like CI is not running with the lastest commit in the branch? I'm a bit confused... |
|
hm so this is a bit fragile. I'm not passing it around everywhere and it's hard to ensure that I'd be passing it around everywhere. The tests will pass now, but I haven't added it to |
|
I feel like this should be an attribute of the parser but the parser is wrapped up in the extension and we don't have control over where it's instantiated. I think the clean version would be to have the extension be refactored into several classes with more clear responsibilities, like a |
|
should work now ... maybe? |
|
Loose instantiation fails if the flow records hyperparameters present in newer scikit-learn versions, but not older ones. E.g. |
This just makes me think of a potential issue I didn't see yet:
The flow downloaded in step 4 now has an external version attribute scikit-learn 0.X, but we used scikit-learn 0.X+1 in step 5. When uploading, we will reference the flow with the wrong scikit-learn version and therefore create faulty entries on OpenML. I'm wondering if such re-instantation is a) a good idea and b) if it is, how we can get around this issue? Maybe we need to remove the flow's own ID if we deserialize it to a different version? |
From our design perspective, I think that is the easiest solution. For a user it might be a little confusing if it's running flow X on task Y and no new results for flow X are added (through |
|
I think if there's a real incompatibility, it's fine to just error (at least for now). For us, instantiation was mostly a way to get to the parameters, because that is so hard otherwise. |
|
|
How does the interface decide which libraries get a |
Codecov Report
@@ Coverage Diff @@
## develop #744 +/- ##
===========================================
- Coverage 88.06% 88.06% -0.01%
===========================================
Files 36 36
Lines 4090 4121 +31
===========================================
+ Hits 3602 3629 +27
- Misses 488 492 +4
Continue to review full report at Codecov.
|
|
So do I need to do something here? Do I need to handle resetting the flow id? Or change something so that the flow realized it's actually a different flow? |
That should be sufficient. Plus a check which actually makes sure that when downloading a flow, and running it again with a different scikit-learn version, a new flow is generated.
I would prefer to avoid this having unfinished/suboptimal code in the dev branch. |
|
Maybe a naive question, but why do flows have versions? And is there something that makes sure the versions of a run and a flow are consistent? |
|
anyway, I proposed a fix. |
|
There's an XML error in the CI, I'm not sure if it's related: https://travis-ci.org/openml/openml-python/jobs/564145864 |
|
I think I addressed your comments? |


Closes #718.
alternatively we could be more conservative and add a
strict=Truekeyword somewhere that the user needs to set toFalseto allow circumventing the version check.