Skip to content

Conversation

@Neeratyoy
Copy link
Contributor

Reference Issue

Addresses #261.

What does this PR implement/fix? Explain your changes.

Keeps track of entities uploaded to test server and iteratively deletes them after the completion of all unit tests.

How should this PR be tested?

By running pytest -s and observing the logs.

@Neeratyoy
Copy link
Contributor Author

@mfeurer
A few things I need advise on.

  • The following IDs refuse to be deleted:
    flow is in use by other content (it is a subflow). Can not be deleted

    Can only delete studies based on runs or studies in preparation

    • study, 307 (unit test benchmark suite)
  • Currently the only way to manually check whether the deletion works is to capture the logs, wherein I have a bunch of print statements. However, though structured and hence visually discernible, but they will evidently populate the logs a lot lot more.

@Neeratyoy Neeratyoy requested a review from mfeurer July 11, 2019 21:24
@codecov-io
Copy link

codecov-io commented Jul 12, 2019

Codecov Report

Merging #735 into develop will increase coverage by 0.15%.
The diff coverage is 97.36%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #735      +/-   ##
===========================================
+ Coverage    87.82%   87.97%   +0.15%     
===========================================
  Files           36       36              
  Lines         3999     4033      +34     
===========================================
+ Hits          3512     3548      +36     
+ Misses         487      485       -2
Impacted Files Coverage Δ
openml/testing.py 96.38% <97.36%> (+0.17%) ⬆️
openml/_api_calls.py 88.31% <0%> (+3.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 347c4a6...9208c4f. Read the comment docs.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good, although I think it needs a bit of simplification.

Also, could you please add to the contribution guide that the test server needs to be cleaned up after the unit test?

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 12, 2019

The following IDs refuse to be deleted: flow is in use by other content (it is a subflow). Can not be deleted

I'm afraid that you need to recursively add all subflows to the tracker to make sure there are no leftovers.

Can only delete studies based on runs or studies in preparation study, 307 (unit test benchmark suite)

@janvanrijn is this intended behavior?

Currently the only way to manually check whether the deletion works is to capture the logs, wherein I have a bunch of print statements. However, though structured and hence visually discernible, but they will evidently populate the logs a lot lot more.

That's fine by me.

@Neeratyoy
Copy link
Contributor Author

@mfeurer I tried to debug the clustering task failure. Here are my observations.

The task being uploaded has the following attributes, as initialized by the setUp of class OpenMLClusteringTaskTest:

  • task_type_id = 5
  • estimation_procedure = 17
  • dataset_id as generated by _get_compatible_rand_dataset()

This is attempted 100 times until the publish() of the task succeeds. Failure of which gives the current ValueError of Could not create a valid task for task type ID 5.

On the test server, the following commands:
l = openml.tasks.list_tasks(task_type_id=5, output_format='dataframe')
print(set(l.did))
result in {2, 10, 11, 14, 22, 24, 25, 35, 36, 37, 42, 91, 92}

If we record all the 100 dataset_id generated by _get_compatible_rand_dataset() during the test case publish attempts, that is exactly {2, 10, 11, 14, 22, 24, 25, 35, 36, 37, 42, 91, 92} again.
This explains why we cannot upload a task in even 100 trials.

Using the apikey we define in class TestBase, I explicitly deleted one of the tasks linked with dataset_id = 22, and then ran the unit test. It passed as expected since the task with those attributes didn't exist.

Are all these expected behaviour?

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 18, 2019

Are all these expected behaviour?

No, it turns out that _get_compatible_rand_dataset only takes classification and regression into account. Clustering tasks could be created for any dataset and therefore, the tests should not fail. Could you please extend that function?

@Neeratyoy
Copy link
Contributor Author

@mfeurer

Most test case issue seems to be fixed barring one test which fails consistently.
Could you please explain what this does: https://github.com/openml/openml-python/blob/develop/tests/test_flows/test_flow_functions.py#L277

My local sklearn version is 0.20.2 while flow_id=20 returns code 181: unknown flow from the test server. Please advise on how I can solve this.

@Neeratyoy
Copy link
Contributor Author

@mfeurer just another tiny thing.
I should be adding instructions to collect files being uploaded when adding a unit test.
Should I be editing the contributing.rst or contributing.md for that? Where would you like that information to be included?

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 19, 2019

Where would you like that information to be included?

Actually, I think I meant this file: https://raw.githubusercontent.com/openml/openml-python/develop/PULL_REQUEST_TEMPLATE.md

@Neeratyoy
Copy link
Contributor Author

I didn't edit contributing.rst since that page links to contributing.md. I also edited pull_request_template.md with brief instructions on the same.

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 19, 2019

Okay, let me know when I should have a look at this again.

@Neeratyoy Neeratyoy requested a review from mfeurer July 19, 2019 14:57
@mfeurer mfeurer merged commit 56fcc00 into develop Jul 19, 2019
@mfeurer mfeurer deleted the fix_261 branch July 19, 2019 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants