fork api #944

sahithyaravi · 2020-09-03T13:08:34Z

Reference Issue

What does this PR implement/fix? Explain your changes.

Refer openml/OpenML#1058.
Fork API - Clone the row as such with change in dataset ID and uploader ID.
Uses the same ARFF file.

How should this PR be tested?

data_id = fork_dataset(ID)

Any other comments?

codecov-commenter · 2020-09-08T11:29:50Z

Codecov Report

Merging #944 into develop will decrease coverage by 0.06%.
The diff coverage is 87.50%.

@@             Coverage Diff             @@
##           develop     #944      +/-   ##
===========================================
- Coverage    87.67%   87.60%   -0.07%     
===========================================
  Files           37       37              
  Lines         4502     4510       +8     
===========================================
+ Hits          3947     3951       +4     
- Misses         555      559       +4

Impacted Files	Coverage Δ
openml/datasets/__init__.py	`100.00% <ø> (ø)`
openml/datasets/functions.py	`93.87% <87.50%> (-0.15%)`	⬇️
openml/_api_calls.py	`87.93% <0.00%> (-2.59%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d85fa7...6767a0b. Read the comment docs.

mfeurer

The PR itself looks good to me, but I'm worried that the workflows of edit and fork are only documented in a PR in a different repository. I think there should be some more documentation on openml.org about such concepts.

mfeurer · 2020-09-17T07:53:42Z

examples/30_extended/datasets_tutorial.py

+
+############################################################################
+# Fork dataset
+# Used to create a copy of the dataset with a different owner


I think it would be good to give more details what this means - one can read up in the PR implementing this for the server - but maybe a glossary somewhere under doc.openml.org would be great.

Yes, I will add this to the documentation. Especially the API itself is not documented, except in the examples in python API.

I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with ~~a different~~ you as the owner".

I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).

Thanks for updating the description. What I'm wondering though is, how is the dataset then finalized (i.e. critical fields should not be editable at all, so there must be a way of finalizing the dataset such that even the fork cannot be changed)? Could you please extend the example by that?

The dataset is "finalized" when a task is created for it. After a task has been created, critical fields can no longer be edited, not even by the dataset owner. (But yes, it is a good idea to write this down in the documentation)

Thanks @PGijsbers for the explanation. @sahithyaravi1493 could you please add that to the docs?

openml/datasets/functions.py

tests/test_datasets/test_dataset_functions.py

Squashed commits: [ec5c0d10] import changes

Squashed commits: [1822c99] improve docs (+1 squashed commits) Squashed commits: [ec5c0d10] import changes

…into new_fork_api

PGijsbers

Looks good, I left some remarks about the documentation.

PGijsbers · 2020-09-21T14:23:44Z

examples/30_extended/datasets_tutorial.py

+
+############################################################################
+# Fork dataset
+# Used to create a copy of the dataset with a different owner


I think I would be more explicit, e.g. prefer "Used to create a copy of the dataset with ~~a different~~ you as the owner".

PGijsbers · 2020-09-21T14:26:11Z

examples/30_extended/datasets_tutorial.py

+
+############################################################################
+# Fork dataset
+# Used to create a copy of the dataset with a different owner


I agree that the general description of what forking a dataset entails and why you would do it should be hosted in the cross-platform documentation. However, I would still dedicate one or two sentences to recap it here as well (best to assume users are lazy).

openml/datasets/functions.py

tests/test_datasets/test_dataset_functions.py

openml/datasets/functions.py

PGijsbers · 2020-09-25T09:20:27Z

Thanks! It looks good! I was perhaps a bit unclear with one part of the feedback, so I left a comment there.

PGijsbers · 2020-10-05T09:23:55Z

@mfeurer I think it looks good to me. Can you check that your review comments are addressed to your satisfaction?

mfeurer · 2020-10-07T15:25:42Z

openml/datasets/functions.py

-       default_target_attribute, ignore_attribute, row_id_attribute.
+      In addition to providing the dataset id of the dataset to edit (through data_id),
+      you must specify a value for at least one of the optional function arguments,
+       i.e. one value for a field to edit.


The i.e. appears to be indented too far

mfeurer · 2020-10-07T15:26:54Z

Thanks for the reminder @PGijsbers. I just had a look and would like to have a further clarification in the example.

codecov-io · 2020-10-22T08:14:35Z

Codecov Report

Merging #944 into develop will decrease coverage by 0.06%.
The diff coverage is 87.50%.

@@             Coverage Diff             @@
##           develop     #944      +/-   ##
===========================================
- Coverage    87.67%   87.60%   -0.07%     
===========================================
  Files           37       37              
  Lines         4502     4510       +8     
===========================================
+ Hits          3947     3951       +4     
- Misses         555      559       +4

Impacted Files	Coverage Δ
openml/datasets/__init__.py	`100.00% <ø> (ø)`
openml/datasets/functions.py	`93.87% <87.50%> (-0.15%)`	⬇️
openml/_api_calls.py	`87.93% <0.00%> (-2.59%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d85fa7...b624e07. Read the comment docs.

doc/progress.rst

tests/test_datasets/test_dataset_functions.py

fork api

a9cbedf

sahithyaravi marked this pull request as draft September 3, 2020 13:14

sahithyaravi requested review from PGijsbers and mfeurer September 15, 2020 15:23

mfeurer reviewed Sep 17, 2020

View reviewed changes

sahithyaravi added 2 commits September 21, 2020 09:29

Merge branch 'develop' into new_fork_api

33822b3

improve docs (+1 squashed commits)

1822c99

Squashed commits: [ec5c0d10] import changes

sahithyaravi marked this pull request as ready for review September 21, 2020 11:18

sahithyaravi added 2 commits September 21, 2020 14:06

minor change (+1 squashed commits)

ce94f93

Squashed commits: [1822c99] improve docs (+1 squashed commits) Squashed commits: [ec5c0d10] import changes

Merge branch 'new_fork_api' of https://github.com/openml/openml-python …

f0fcfbf

…into new_fork_api

PGijsbers requested changes Sep 21, 2020

View reviewed changes

docs update

6767a0b

sahithyaravi requested a review from PGijsbers September 25, 2020 08:02

PGijsbers reviewed Sep 25, 2020

View reviewed changes

openml/datasets/functions.py Outdated Show resolved Hide resolved

PGijsbers approved these changes Oct 5, 2020

View reviewed changes

mfeurer reviewed Oct 7, 2020

View reviewed changes

clarify example

b624e07

mfeurer approved these changes Oct 22, 2020

View reviewed changes

PGijsbers approved these changes Oct 23, 2020

View reviewed changes

PGijsbers mentioned this pull request Oct 23, 2020

Dataframe run on task #777

Merged

Merge branch 'develop' into new_fork_api

4962697

PGijsbers reviewed Oct 23, 2020

View reviewed changes

doc/progress.rst Outdated Show resolved Hide resolved

Update doc/progress.rst

e8e3205

PGijsbers mentioned this pull request Oct 23, 2020

Prepare release of 0.11.0 #966

Merged

Fix whitespaces for docstring

15864f4

fix error

1aa2660

PGijsbers reviewed Oct 23, 2020

View reviewed changes

tests/test_datasets/test_dataset_functions.py Outdated Show resolved Hide resolved

Use id 999999 for unknown dataset

eda3fd8

PGijsbers merged commit 9bc84a9 into develop Oct 23, 2020

PGijsbers deleted the new_fork_api branch October 23, 2020 14:57

Uh oh!

fork api #944

fork api #944

Uh oh!

Conversation

sahithyaravi commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

Uh oh!

codecov-commenter commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PGijsbers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PGijsbers commented Sep 25, 2020

Uh oh!

PGijsbers commented Oct 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented Oct 7, 2020

Uh oh!

codecov-io commented Oct 22, 2020

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sahithyaravi commented Sep 3, 2020 •

edited

Loading

codecov-commenter commented Sep 8, 2020 •

edited

Loading