Skip to content

Conversation

@sahithyaravi
Copy link
Member

@sahithyaravi sahithyaravi commented Sep 3, 2020

Reference Issue

What does this PR implement/fix? Explain your changes.

Refer openml/OpenML#1058.
Fork API - Clone the row as such with change in dataset ID and uploader ID.
Uses the same ARFF file.

How should this PR be tested?

data_id = fork_dataset(ID)

Any other comments?

@sahithyaravi sahithyaravi marked this pull request as draft September 3, 2020 13:14
@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2020

Codecov Report

Merging #944 into develop will decrease coverage by 0.06%.
The diff coverage is 87.50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #944      +/-   ##
===========================================
- Coverage    87.67%   87.60%   -0.07%     
===========================================
  Files           37       37              
  Lines         4502     4510       +8     
===========================================
+ Hits          3947     3951       +4     
- Misses         555      559       +4     
Impacted Files Coverage Δ
openml/datasets/__init__.py 100.00% <ø> (ø)
openml/datasets/functions.py 93.87% <87.50%> (-0.15%) ⬇️
openml/_api_calls.py 87.93% <0.00%> (-2.59%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d85fa7...6767a0b. Read the comment docs.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR itself looks good to me, but I'm worried that the workflows of edit and fork are only documented in a PR in a different repository. I think there should be some more documentation on openml.org about such concepts.


############################################################################
# Fork dataset
# Used to create a copy of the dataset with a different owner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to give more details what this means - one can read up in the PR implementing this for the server - but maybe a glossary somewhere under doc.openml.org would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add this to the documentation. Especially the API itself is not documented, except in the examples in python API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with a different you as the owner".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the description. What I'm wondering though is, how is the dataset then finalized (i.e. critical fields should not be editable at all, so there must be a way of finalizing the dataset such that even the fork cannot be changed)? Could you please extend the example by that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset is "finalized" when a task is created for it. After a task has been created, critical fields can no longer be edited, not even by the dataset owner. (But yes, it is a good idea to write this down in the documentation)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PGijsbers for the explanation. @sahithyaravi1493 could you please add that to the docs?

@sahithyaravi sahithyaravi marked this pull request as ready for review September 21, 2020 11:18
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left some remarks about the documentation.


############################################################################
# Fork dataset
# Used to create a copy of the dataset with a different owner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with a different you as the owner".


############################################################################
# Fork dataset
# Used to create a copy of the dataset with a different owner
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).

@PGijsbers
Copy link
Collaborator

Thanks! It looks good! I was perhaps a bit unclear with one part of the feedback, so I left a comment there.

@PGijsbers
Copy link
Collaborator

@mfeurer I think it looks good to me. Can you check that your review comments are addressed to your satisfaction?

default_target_attribute, ignore_attribute, row_id_attribute.
In addition to providing the dataset id of the dataset to edit (through data_id),
you must specify a value for at least one of the optional function arguments,
i.e. one value for a field to edit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i.e. appears to be indented too far

@mfeurer
Copy link
Collaborator

mfeurer commented Oct 7, 2020

Thanks for the reminder @PGijsbers. I just had a look and would like to have a further clarification in the example.

@codecov-io
Copy link

Codecov Report

Merging #944 into develop will decrease coverage by 0.06%.
The diff coverage is 87.50%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #944      +/-   ##
===========================================
- Coverage    87.67%   87.60%   -0.07%     
===========================================
  Files           37       37              
  Lines         4502     4510       +8     
===========================================
+ Hits          3947     3951       +4     
- Misses         555      559       +4     
Impacted Files Coverage Δ
openml/datasets/__init__.py 100.00% <ø> (ø)
openml/datasets/functions.py 93.87% <87.50%> (-0.15%) ⬇️
openml/_api_calls.py 87.93% <0.00%> (-2.59%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d85fa7...b624e07. Read the comment docs.

@PGijsbers PGijsbers mentioned this pull request Oct 23, 2020
@PGijsbers PGijsbers merged commit 9bc84a9 into develop Oct 23, 2020
@PGijsbers PGijsbers deleted the new_fork_api branch October 23, 2020 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants