-
-
Notifications
You must be signed in to change notification settings - Fork 211
Feature/give possibility to not download the dataset qualities #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/give possibility to not download the dataset qualities #1017
Conversation
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfeurer Do you think the qualities should be loaded if they are present in cache? That would mean different results of the function depending on the cache state. I would think we can skip it altogether?
@a-moadel Please also add unit tests that test the feature works as intended (after we agree on the desired behavior).
I noticed the pre-commit workflow is not triggered. Does anyone have an idea why this would not trigger?
Closes #1009.
Co-authored-by: PGijsbers <[email protected]>
Co-authored-by: PGijsbers <[email protected]>
… cache status , adding unit test for get_dataset_qualities
…alities' of https://github.com/a-moadel/openml-python into feature/give_possibility_to_not_download_the_dataset_qualities
|
This looks good to me now. @PGijsbers you had a comment about persistent behavior, that's not yet addressed, right? |
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, sorry for the delay. I think it works as intended now 🥳 however I do have a few minor comments that I would prefer to see addressed before merging. I'd like to see references to the old behavior updated, and I suggested a minor code change which puts the responsibility of not loading qualities in the get_dataset function instead. I don't think it makes sense to pass down a bool that signifies you just want "" back - then we don't need to call down in the first place.
| self.assertTrue(os.path.exists(qualities_xml_path)) | ||
|
|
||
| def test__get_dataset_qualities_skip_download(self): | ||
| qualities = _get_dataset_qualities_file(self.workdir, 2, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use keywords here (e.g. download_qualities=False)?
openml/datasets/functions.py
Outdated
|
|
||
| try: | ||
| qualities_file = _get_dataset_qualities_file(did_cache_dir, dataset_id) | ||
| qualities_file = _get_dataset_qualities_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense to use download_qualities here and not call the _get_dataset_qualities_file if we don't want to process the qualities.
openml/datasets/functions.py
Outdated
| If `False`, if multiple datasets match, return the least recent active dataset. | ||
| If `True`, if multiple datasets match, raise an error. | ||
| download_qualities : bool, optional (default=True) | ||
| If `True`, also download qualities.xml file. If false use the file if it was cached. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this and the other references where it says the cached version is used if false is passed, as this is not the behavior anymore.
…parameter, remove unnecessarily call for download qualities
Codecov Report
@@ Coverage Diff @@
## develop #1017 +/- ##
===========================================
+ Coverage 87.62% 88.08% +0.45%
===========================================
Files 36 36
Lines 4574 4565 -9
===========================================
+ Hits 4008 4021 +13
+ Misses 566 544 -22
Continue to review full report at Codecov.
|
Reference Issue
What does this PR implement/fix? Explain your changes.
We can Ignore downloading the dataset qualities if its not exist while getting datasets.
How should this PR be tested?
Clear the cache , and calling get_dataset with download_qualities set to false : this result in qulaities.xml to not be downloaded.
Clear the cache , and calling get_dataset with download_qualities set to default : this result in qulaities.xml to be downloaded.
Any other comments?