-
-
Notifications
You must be signed in to change notification settings - Fork 211
Description
For a supervised classification task, the task.class_labels is determined automatically here:
openml-python/openml/datasets/dataset.py
Lines 911 to 913 in 326bf0b
| for feature in self.features.values(): | |
| if (feature.name == target_name) and (feature.data_type == "nominal"): | |
| return feature.nominal_values |
Sometimes people are not very meticulous when creating datasets, and the feature type may be listed as string instead of nominal, which means that task.class_labels will be None.
A simple work-around would be to add a case where feature.data_type == 'string' and then fetch the unique values from the column. It might be worth it to encourage users to fix the feature type of the dataset, but unfortunately the only way to do that is 1) being the dataset owner or 2) creating an entirely new version of the dataset (and thus also requires a new task).
We should consider giving a warning, maybe, but honestly this probably should be fixed on task creation (i.e., say that the target is invalid for a classification task if the feature type is string and not nominal).