Verify `ignore_attributes` and `row_id_attribute` on `create_dataset`

From #960.

When [`create_dataset`](https://github.com/openml/openml-python/blob/88b7cc0292bb5a7b86a9f45cf29d1733ee3cc300/openml/datasets/functions.py#L580) is called, the user can specify `ignored_attributes` and `row_id_attribute`. These are attributes in the dataset which should be ignored when building models, because they represent e.g. an id field or a feature with label leakage. However, there is currently no check that the attributes listed are actually present in the data.

Using the [dataset upload tutorial](https://openml.github.io/openml-python/master/examples/30_extended/create_upload_tutorial.html#sphx-glr-examples-30-extended-create-upload-tutorial-py), if we overwrite `row_id_attribute=None`  or `ignore_attribute=None` in the `create_dataset` calls to instead say  `row_id_attribute="not exist"` or  `ignore_attribute="not exist"` no error is raised, even though there is no "not exist" attribute in the dataset.

We expect either scenario to raise a `ValueError` clearly stating which id or ignored attribute is not present in the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Verify ignore_attributes and row_id_attribute on create_dataset #964

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964