Skip to content

Conversation

@a-moadel
Copy link
Contributor

Reference Issue

What does this PR implement/fix? Explain your changes.

We can Ignore downloading the dataset qualities if its not exist while getting datasets.

How should this PR be tested?

Clear the cache , and calling get_dataset with download_qualities set to false : this result in qulaities.xml to not be downloaded.
Clear the cache , and calling get_dataset with download_qualities set to default : this result in qulaities.xml to be downloaded.

Any other comments?

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfeurer Do you think the qualities should be loaded if they are present in cache? That would mean different results of the function depending on the cache state. I would think we can skip it altogether?
@a-moadel Please also add unit tests that test the feature works as intended (after we agree on the desired behavior).

I noticed the pre-commit workflow is not triggered. Does anyone have an idea why this would not trigger?

Closes #1009.

@a-moadel a-moadel requested a review from PGijsbers January 26, 2021 12:03
@mfeurer
Copy link
Collaborator

mfeurer commented Jan 27, 2021

This looks good to me now. @PGijsbers you had a comment about persistent behavior, that's not yet addressed, right?

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, sorry for the delay. I think it works as intended now 🥳 however I do have a few minor comments that I would prefer to see addressed before merging. I'd like to see references to the old behavior updated, and I suggested a minor code change which puts the responsibility of not loading qualities in the get_dataset function instead. I don't think it makes sense to pass down a bool that signifies you just want "" back - then we don't need to call down in the first place.

self.assertTrue(os.path.exists(qualities_xml_path))

def test__get_dataset_qualities_skip_download(self):
qualities = _get_dataset_qualities_file(self.workdir, 2, False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use keywords here (e.g. download_qualities=False)?


try:
qualities_file = _get_dataset_qualities_file(did_cache_dir, dataset_id)
qualities_file = _get_dataset_qualities_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to use download_qualities here and not call the _get_dataset_qualities_file if we don't want to process the qualities.

If `False`, if multiple datasets match, return the least recent active dataset.
If `True`, if multiple datasets match, raise an error.
download_qualities : bool, optional (default=True)
If `True`, also download qualities.xml file. If false use the file if it was cached.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this and the other references where it says the cached version is used if false is passed, as this is not the behavior anymore.

…parameter, remove unnecessarily call for download qualities
@codecov-io
Copy link

Codecov Report

Merging #1017 (55c7196) into develop (fba6aab) will increase coverage by 0.45%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1017      +/-   ##
===========================================
+ Coverage    87.62%   88.08%   +0.45%     
===========================================
  Files           36       36              
  Lines         4574     4565       -9     
===========================================
+ Hits          4008     4021      +13     
+ Misses         566      544      -22     
Impacted Files Coverage Δ
openml/datasets/functions.py 94.46% <100.00%> (+1.01%) ⬆️
openml/_api_calls.py 89.23% <0.00%> (-3.08%) ⬇️
openml/runs/functions.py 83.16% <0.00%> (+0.25%) ⬆️
openml/testing.py 84.52% <0.00%> (+0.59%) ⬆️
openml/utils.py 91.33% <0.00%> (+0.66%) ⬆️
openml/datasets/dataset.py 87.92% <0.00%> (+3.48%) ⬆️
openml/exceptions.py 96.77% <0.00%> (+9.67%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fba6aab...1b6467d. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants