Skip to content

Better error messages for string constraints #920

@joaquinvanschoren

Description

@joaquinvanschoren

Description

I often hear users complain that they don't know what to do when create_dataset complains about string constraints. Typically this is because people used a space (' ') in the name (I'm not actually sure why we don't allow that) or a special character in the description.

Could we maybe return a more informative general error message, like 'Character ' ' is not allowed in field x'?

Alternatively, let the python API replace spaces in the dataset name with underscores automatically, and replace special characters with '?' or ' '.

Steps/Code to Reproduce

Example:

import openml

my_dataset = create_dataset(
    name="My cool dataset",
    description="foo",
    creator="bar"
    contributor=None,
    collection_date='01-01-2011',
    language='English',
    licence=None,
    default_target_attribute='label',
    row_id_attribute=None,
    ignore_attribute=None,
    citation="foo",
    attributes='auto',
    data=df,
    version_label='1.0',
)

Expected Results

A more informative general error message, like 'Character ' ' is not allowed in field x'?
Or: replace the 'bad' characters automatically

Actual Results

A hard-to-read stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-6289268889ab> in <module>
     13     attributes='auto',
     14     data=df,
---> 15     version_label='1.0',
     16 )

~/anaconda3/lib/python3.7/site-packages/openml/datasets/functions.py in create_dataset(name, description, creator, contributor, collection_date, language, licence, attributes, data, default_target_attribute, ignore_attribute, citation, row_id_attribute, original_data_url, paper_url, update_comment, version_label)
    774         paper_url=paper_url,
    775         update_comment=update_comment,
--> 776         dataset=arff_dataset,
    777     )
    778 

~/anaconda3/lib/python3.7/site-packages/openml/datasets/dataset.py in __init__(self, name, description, format, data_format, dataset_id, version, creator, contributor, collection_date, upload_date, language, licence, url, default_target_attribute, row_id_attribute, ignore_attribute, version_label, citation, tag, visibility, original_data_url, paper_url, update_comment, md5_checksum, data_file, features, qualities, dataset)
    121             if not re.match("^[a-zA-Z0-9_\\-\\.\\(\\),]+$", name):
    122                 # regex given by server in error message
--> 123                 raise ValueError("Invalid symbols in name: {}".format(name))
    124         # TODO add function to check if the name is casual_string128
    125         # Attributes received by querying the RESTful API

ValueError: Invalid symbols in name: My cool dataset

Versions

Darwin-19.4.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.18.4
SciPy 1.4.1
Scikit-Learn 0.23.1
OpenML 0.11.0dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions