-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
check_array does not raise error when input contains something other than numbers or strings #11401
Description
Now that imputers allow inputs with object dtype, e.g. strings or pandas categoricals, it seems that either check_array should be enhanced or that some common tests should be updated.
There is common test, check_dtype_object, that checks the estimators on input X that contains numbers and X[0,0] = {'foo':'bar'}. When expecting numeric inputs, the check_array is instanciated with dtype='numeric' and an error is raised as expected.
However, when instanciated with dtype=None or dtype=object, no error is raised. See the code below:
X = np.array([{'foo':'bar'}, "a", "b", "c"], dtype=object).reshape(-1, 1)
X
>>> array([[{'foo': 'bar'}],
['a'],
['b'],
['c']], dtype=object)
imputer = SimpleImputer(strategy='constant', missing_values='a')
imputer.fit_transform(X)
>>> array([[{'foo': 'bar'}],
['missing_value'],
['b'],
['c']], dtype=object)No error is raised and the estimator works fine. Don't you think that we should raise an error in that case ?
This currently passes the test because when imputing on inputs with object dtypes, we can't set dtype='numeric' in check_array. I think the error should be raised even with dtype=object or dtype=None.
We could check that in the fit function of the estimators that accept non-numeric inputs, but I think it the role of check_array to do that. What's your opinion about that ?