-
-
Notifications
You must be signed in to change notification settings - Fork 211
Refactoring run_flow_on_task and doc add for run_model #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, I am interested in having this functionality and if wonder it's ok working on this. I am a bit of a loss of the changes made so far, so I have some questions to better understand your work:
In general, I thought to update the process to the following:
then upload the flow when:
|
|
Hey @PGijsbers
It has been quite a bit of time. Considering the commit name, I guess I was implementing the generation of a task dictionary and I just copy pasted the code from the _to_dict function in the flow class. I started tweaking it and I never got back to the pr, that is why there are still flow parts there, so basically it is not useful. You should remove that function completely. This is addressed in #607 .
Yes, it is not handled yet. Preventing flow uploads is not easily done, since a flow id is needed to create the run, with the idea of storing the run locally, if I remember correctly. As to how you want to proceed, it seems really good to me. We had a discussion on how to handle this in the last OpenML workshop and this was what we agreed upon. @mfeurer @janvanrijn can you confirm this was and still is the case ? The functions to get the cached flows should be useful and the documentation change for the run_model_on_task is nice too. |
|
@mfeurer should the flow cache functions be moved in another PR ? |
|
So basically undo the last commit 22b1e62 before proceeding to work? |
|
Yes, the other commits should be useful. |
Codecov Report
@@ Coverage Diff @@
## develop #516 +/- ##
===========================================
+ Coverage 90.12% 90.56% +0.43%
===========================================
Files 32 32
Lines 3232 3520 +288
===========================================
+ Hits 2913 3188 +275
- Misses 319 332 +13
Continue to review full report at Codecov.
|
… to do with changes.
|
I think this is what we discussed in Paris. And flow caching should be fine within this PR as it would already test the new flow caching functionality. |
…d as the name did not reflect the functionality.
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments regarding the unit tests.
|
I played a bit with the code and it appears that it cannot handle a simple random forest at the moment: as it results in: |
|
Weird. I can recreate the problem locally and am looking into this. |
|
It seems that with absence of an explicit |
…flow to ensure it does not exist on the server.
…f associated flow did not exist but was also not uploaded. This gave errors at publish-time.
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side we're good to merge. Only nitpick would be that some fields are not filled prior to uploading and the user can obtain a None when accessing them. However, I think this is more related to #634.
@ArlindKadra do you have any comments as you started this PR?
|
But that is essentially the same as a user just creating the flow through |
|
@mfeurer @PGijsbers except for a small suggestion, everything looks really good to me :) |
Reference Issue
#515, #498, #457
What does this PR implement/fix? Explain your changes.
The PR fixes the documentation for run_flow_on_task, it also tackles run_flow_or_task always uploading the flow.
How should this PR be tested?
Calling the function, getting a run object in the end without uploading the flow to OpenML.