-
-
Notifications
You must be signed in to change notification settings - Fork 211
Adds flow.get_structure and flow.get_subflow (which are complements of each other). Also fixes #564 #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Btw, the following testcases fail because they are ran against the live server. These will be fine once this PR is merged. |
Codecov Report
@@ Coverage Diff @@
## develop #567 +/- ##
===========================================
- Coverage 90.09% 90.08% -0.01%
===========================================
Files 32 32
Lines 2999 3118 +119
===========================================
+ Hits 2702 2809 +107
- Misses 297 309 +12
Continue to review full report at Codecov.
|
…class. updated unit tests accordingly
|
This PR now adds the functionality that is required for #564. In order to do this off we need more information about the flow structure. For this, I added the functions With these functionalities, solving #564 is rather easy. For this I added a function to the sklearn converter (in setups, but maybe we want to pull these in the flow sklearn converter later). This function now provides a convenience function that maps the name of the OpenMLParameter to the name of the sklearn parameter (e.g., if the openml name is Ready for review @mfeurer @ArlindKadra |
|
Could you please add an example on how to use this to the docs so it is more clear to potential users? |
|
I know that you asked for an example, but I found it hard to construct a general usable example. In the process, I did come up with a general (run setup) tutorial, and I used it to simplify and remove a complex part of the setup reinstantiation code. Hope this clarifies this PR a bit. Let me know if you have further questions |
|
To add to this, here is the function where I used this: It is to transform an OpenML setup object into a sklearn parameter dict, something that is quite hard without these convenience functions. This is what I wrote in the tutorial, before I realized that the tutorial is obsolute due to easier functions: |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think I got the functions and they make sense. I only have a few comments.
examples/run_setup_tutorial.py
Outdated
| use them) | ||
| A key requirement for reinstantiating a flow is to have the same scikit-learn | ||
| version as the flow that was uploaded. This tutorial will upload the flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these two sentences contradict each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I also don't get your updated version. You're saying it is important to have the same scikit-learn version, but then say this doesn't matter because the tutorial uploads the flow. I simply don't get how this makes the version not matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to reformulate. Please let me know if this is any better.
examples/run_setup_tutorial.py
Outdated
| and solve the same task again. | ||
| 3) We will verify that the obtained results are exactly the same. | ||
| Readers interested in reinstantiating a setup can skip part 1 and 2 and start | ||
| with part 3 immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't part 3 only compare the two arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but is is quite an important point and it combines pt 1 and pt 2. I would prefer to keep it in a separate part, albeit only one line of code (it could be extended with more checks later, if someone is motivated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on that. What I wanted to point out is that the sentence says:
Readers interested in reinstantiating a setup can skip part 1 and 2 and start with part 3 immediately.
which to me reads like "if you're interested in 1 and 2 you can skip them and commence to 3" because 2 is about reinstantiating the flow, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I removed the sentence.
|
ping @mfeurer |
|
Awesome! |
Maybe, let's solve #564 in two separate PR's, too keep the PR small and manageable? For #564 to happen, we need to have some knowledge about the structure of the scikit-learn flow. This is all in the OpenML flow object, but in a hard to extract format. I added a function
flow_format, that extracts all needed information from the scikit-learn flow. Nice byproduct is that the result of this function integrates nicely with unit test functions. I integrated several, and can add more.