As numpy does not have dtype for variable length strings, is very common to use dtype=object for arrays of strings so as to no waste memory: the default fixed width string dtype of numpy allocates zero padded memory otherwise.
However in sklearn 0.14, the sklearn.multiclass.type_of_target function explicitly rejects:
if y.ndim > 2 or y.dtype == object:
return 'unknown'
In consequence it's possible to have: y = ['cat', 'dog', 'fish'], but not y = np.asarray(['cat', 'dog', 'fish', dtype=object]) anymore (it used to work in 0.13).
Note that np.array(list_of_string, dtype=object) is a necessary idiom (instead of just using list_of_string directly) to do cross-validation or other fancy indexing operations.
I think we should accept y to have dtype=object if and only if all(isinstance(y_i, (six.text_type, six.binary_type)) for y_i in y.ravel()).
This regression was found in the sklearn_pandas project: scikit-learn-contrib/sklearn-pandas#2
WDYT @arjoly ?