-
-
Notifications
You must be signed in to change notification settings - Fork 211
Description
Description
I often hear users complain that they don't know what to do when create_dataset complains about string constraints. Typically this is because people used a space (' ') in the name (I'm not actually sure why we don't allow that) or a special character in the description.
Could we maybe return a more informative general error message, like 'Character ' ' is not allowed in field x'?
Alternatively, let the python API replace spaces in the dataset name with underscores automatically, and replace special characters with '?' or ' '.
Steps/Code to Reproduce
Example:
import openml
my_dataset = create_dataset(
name="My cool dataset",
description="foo",
creator="bar"
contributor=None,
collection_date='01-01-2011',
language='English',
licence=None,
default_target_attribute='label',
row_id_attribute=None,
ignore_attribute=None,
citation="foo",
attributes='auto',
data=df,
version_label='1.0',
)Expected Results
A more informative general error message, like 'Character ' ' is not allowed in field x'?
Or: replace the 'bad' characters automatically
Actual Results
A hard-to-read stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-45-6289268889ab> in <module>
13 attributes='auto',
14 data=df,
---> 15 version_label='1.0',
16 )
~/anaconda3/lib/python3.7/site-packages/openml/datasets/functions.py in create_dataset(name, description, creator, contributor, collection_date, language, licence, attributes, data, default_target_attribute, ignore_attribute, citation, row_id_attribute, original_data_url, paper_url, update_comment, version_label)
774 paper_url=paper_url,
775 update_comment=update_comment,
--> 776 dataset=arff_dataset,
777 )
778
~/anaconda3/lib/python3.7/site-packages/openml/datasets/dataset.py in __init__(self, name, description, format, data_format, dataset_id, version, creator, contributor, collection_date, upload_date, language, licence, url, default_target_attribute, row_id_attribute, ignore_attribute, version_label, citation, tag, visibility, original_data_url, paper_url, update_comment, md5_checksum, data_file, features, qualities, dataset)
121 if not re.match("^[a-zA-Z0-9_\\-\\.\\(\\),]+$", name):
122 # regex given by server in error message
--> 123 raise ValueError("Invalid symbols in name: {}".format(name))
124 # TODO add function to check if the name is casual_string128
125 # Attributes received by querying the RESTful API
ValueError: Invalid symbols in name: My cool dataset
Versions
Darwin-19.4.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.4
SciPy 1.4.1
Scikit-Learn 0.23.1
OpenML 0.11.0dev