Skip to content

Verify ignore_attributes and row_id_attribute on create_dataset #964

@PGijsbers

Description

@PGijsbers

From #960.

When create_dataset is called, the user can specify ignored_attributes and row_id_attribute. These are attributes in the dataset which should be ignored when building models, because they represent e.g. an id field or a feature with label leakage. However, there is currently no check that the attributes listed are actually present in the data.

Using the dataset upload tutorial, if we overwrite row_id_attribute=None or ignore_attribute=None in the create_dataset calls to instead say row_id_attribute="not exist" or ignore_attribute="not exist" no error is raised, even though there is no "not exist" attribute in the dataset.

We expect either scenario to raise a ValueError clearly stating which id or ignored attribute is not present in the dataset.

Metadata

Metadata

Assignees

Labels

Good First IssueIssues suitable for people new to contributing to openml-python!

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions