-
-
Notifications
You must be signed in to change notification settings - Fork 211
fork api #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fork api #944
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #944 +/- ##
===========================================
- Coverage 87.67% 87.60% -0.07%
===========================================
Files 37 37
Lines 4502 4510 +8
===========================================
+ Hits 3947 3951 +4
- Misses 555 559 +4
Continue to review full report at Codecov.
|
mfeurer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR itself looks good to me, but I'm worried that the workflows of edit and fork are only documented in a PR in a different repository. I think there should be some more documentation on openml.org about such concepts.
|
|
||
| ############################################################################ | ||
| # Fork dataset | ||
| # Used to create a copy of the dataset with a different owner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to give more details what this means - one can read up in the PR implementing this for the server - but maybe a glossary somewhere under doc.openml.org would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add this to the documentation. Especially the API itself is not documented, except in the examples in python API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with a different you as the owner".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the description. What I'm wondering though is, how is the dataset then finalized (i.e. critical fields should not be editable at all, so there must be a way of finalizing the dataset such that even the fork cannot be changed)? Could you please extend the example by that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset is "finalized" when a task is created for it. After a task has been created, critical fields can no longer be edited, not even by the dataset owner. (But yes, it is a good idea to write this down in the documentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PGijsbers for the explanation. @sahithyaravi1493 could you please add that to the docs?
Squashed commits: [ec5c0d10] import changes
Squashed commits: [1822c99] improve docs (+1 squashed commits) Squashed commits: [ec5c0d10] import changes
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left some remarks about the documentation.
|
|
||
| ############################################################################ | ||
| # Fork dataset | ||
| # Used to create a copy of the dataset with a different owner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with a different you as the owner".
|
|
||
| ############################################################################ | ||
| # Fork dataset | ||
| # Used to create a copy of the dataset with a different owner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).
|
Thanks! It looks good! I was perhaps a bit unclear with one part of the feedback, so I left a comment there. |
|
@mfeurer I think it looks good to me. Can you check that your review comments are addressed to your satisfaction? |
openml/datasets/functions.py
Outdated
| default_target_attribute, ignore_attribute, row_id_attribute. | ||
| In addition to providing the dataset id of the dataset to edit (through data_id), | ||
| you must specify a value for at least one of the optional function arguments, | ||
| i.e. one value for a field to edit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The i.e. appears to be indented too far
|
Thanks for the reminder @PGijsbers. I just had a look and would like to have a further clarification in the example. |
Codecov Report
@@ Coverage Diff @@
## develop #944 +/- ##
===========================================
- Coverage 87.67% 87.60% -0.07%
===========================================
Files 37 37
Lines 4502 4510 +8
===========================================
+ Hits 3947 3951 +4
- Misses 555 559 +4
Continue to review full report at Codecov.
|
Reference Issue
What does this PR implement/fix? Explain your changes.
Refer openml/OpenML#1058.
Fork API - Clone the row as such with change in dataset ID and uploader ID.
Uses the same ARFF file.
How should this PR be tested?
data_id = fork_dataset(ID)
Any other comments?