-
-
Notifications
You must be signed in to change notification settings - Fork 211
Download updates #1256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download updates #1256
Conversation
|
@mfeurer I have not found a good way to put these functionalities into a general function among all OpenML entity types. Hence, I would like your opinion on the current proposed way of doing it. Does it suffice? |
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't have a close look, had a very small amount of time, still needs a review (not necessarily by me).
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1256 +/- ##
===========================================
- Coverage 85.03% 82.06% -2.97%
===========================================
Files 38 38
Lines 5012 5075 +63
===========================================
- Hits 4262 4165 -97
- Misses 750 910 +160
☔ View full report in Codecov by Sentry. |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I left some comments. All in all this looks great and I'm optimistic that we can merge this soonish.
Just a side note: we'd also need to update the task loading to disable dataset loading by default.
… on how to use it
|
I have also adjusted task loading. This got a bit hacky so that it is backward compatible and supports all arguments by passing kwargs. I am not sure that this is a good solution. Hence, I would be happy for feedback on how to handle the arguments best. Moreover, I added actually lazy loading based on properties such that the user does not have to call |
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good now, I only have a few minor comments, but besides that I'm pretty happy about this.
I only have two somewhat larger requests:
- Could you please change the deprecations to 0.15? We will make the deprecations appear in 0.14 with the next large release.
- Could you please add a
TODO(0.15)in the code so that we can easily find the changes we need to do in the next version?
…ersion 0.15.0, update documentation.
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 more change, then we can merge this one!
Co-authored-by: Matthias Feurer <[email protected]>
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look too much at the correctness of the code (since Matthias already had a look), but I think there are several things that could be improved from a code quality point of view. Please have a look at the comments.
This reverts commit 91b4bf0.
Related to #1034 #1081 #1132
Closes #1081 #1132
In detail, in relation to #1034, I have added a deprecation warning for
get_dataset, making clear that we will default to not downloading the data in the future (starting from version 1.4, but this is just my guess for now).In relation to #1081, I added an option to disable downloading features. Moreover, I adjusted the
reprmethod and added semi-lazy loading for features by the use of the functionOpenMLDataset.load_metadata. As part of this, I also added the option to load qualities after callingget_dataset, which seems to have been missing so far.In relation to #1132, I added the option
force_refresh_cacheto force refresh the cache by deleting the existing cache initially.Note, I have also changed the docstring of
get_datasetand adjusted the parts about being threading/multiprocessing safe, which were incorrect from my previous experiences and observations w.r.t. the cache. This might open a discussion to make this threading/multiprocessing safe based on an option that only downloads the data and ignores the cache entirely. I advise against this, for now, see here for more details on this.