Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Apr 10, 2018

Fixes/Implements #337, #331, #292, #117 and #113.

@mfeurer mfeurer changed the base branch from master to develop April 10, 2018 06:42
Copy link
Collaborator Author

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Could you please remove the function the publish function, too? I think that this function currently does not provide enough information to the user on how to use it and is therefore not really helpful. Also, besides for re-uploading of a fixed dataset, I don't see a reason to keep this. Maybe we should add a helper function to re-upload a fixed dataset?



def upload_dataset(dataset_description, file):
"""Upload a dataset to OpenMl.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use an uppercase L in OpenML.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type, fixed on the next commit.

dataset = OpenMLDataset(
name="anneal", version=1, description="test",
format="ARFF", licence="public", default_target_attribute="class", data_file=file_path)
"anneal", "test", "ARFF",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do these changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I changed name, description and format to positional arguments as they were mandatory when uploading a dataset to the server. To reflect the changes on the dataset class, I made the changes above to the unit tests.

dataset = OpenMLDataset(
name="UploadTestWithURL", version=1, description="test",
format="ARFF",
"UploadTestWithURL", "test", "ARFF",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do these changes?

xml_dataset += "<oml:{0}>{1}</oml:{0}>\n".format(prop, content)
xml_dataset += "</oml:data_set_description>"
return xml_dataset
#if isinstance(content, (list,set)):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove those comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

----------
dataset_description : OpenMLDataset
OpenMLDataset which contains the description of the dataset.
file : str
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if this was an arff dictionary as expected by liac-arff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the function as we discussed regarding the arff file.

Parameters
----------
dataset_description : OpenMLDataset
OpenMLDataset which contains the description of the dataset.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a great idea, and I think besides for re-uploading a dataset this will be handy very often. Could you please change it so that the user needs to pass all relevant information separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree on this one, I think we should pass an OpenMLDataset as it encapsulates all the necessary information and the user can decide what he wants to fill, aside from the mandatory fields. I also did not increase the number of function arguments, as some can be retrieved from the OpenMLDataset object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Probably we should then move the logic of this function to publish and actually add a function called create_dataset. The user would then pass all arguments necessary to create the dataset object locally and could then upload it.

@ArlindKadra
Copy link
Member

@mfeurer

Copy link
Collaborator Author

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Could you please make the requested changes? As I cannot approve this as I created the PR in the first place, please also approve and squash and merge it.

License of the data.
attributes: list
A list of tuples. Each tuple consists of the attribute name and type.
data : numpy.matrix
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, it should be array-like or sparse matrix, shape=(n_samples, n_features)

attributes: list
A list of tuples. Each tuple consists of the attribute name and type.
data : numpy.matrix
A matrix that contains both the attributes and targets.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the term array instead of matrix because arrays are more common in python.

Returns
-------
class:`openml.OpenMLDataset
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a formatting error.

],
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please remove the autoreload extension and run the notebook again to remove the text?

@ArlindKadra ArlindKadra merged commit 5b1eb29 into develop Jun 3, 2018
@ArlindKadra ArlindKadra deleted the fix/dataset_upload branch June 3, 2018 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants