fix check_array dtype check for pandas series#12625
fix check_array dtype check for pandas series#12625jnothman merged 7 commits intoscikit-learn:masterfrom
Conversation
sklearn/utils/validation.py
Outdated
| dtypes_orig = None | ||
| if hasattr(array, "dtypes") and hasattr(array, "__array__"): | ||
| dtypes_orig = np.array(array.dtypes) | ||
| dtypes_orig = np.array(array.dtypes, ndmin=1) |
There was a problem hiding this comment.
I am not sure this is the "correct" fix. It fixes the problem, but in essence, a Series should not take this path as it can never have multiple dtypes in it. So I would rather ensure that a Series does not pass this if check
There was a problem hiding this comment.
@qinhanmin2014 had one in #12622 (comment)
I am trying to think of a robust "duck type check" for Series .. Personally, I would actually start doing actual isinstance checks (we could eg have a util function that combines that with trying to import pandas), but that's maybe a broader issue.
There was a problem hiding this comment.
we could also do and array.dtypes.ndim so it stays None?
There was a problem hiding this comment.
Yes, that's a good idea (for sure nicer than the hasattr(array.dtypes, "__array__"))
rth
left a comment
There was a problem hiding this comment.
LGTM, a "what's new" is probably needed?
| def test_check_array_series(): | ||
| # regression test that check_array works on pandas Series | ||
| pd = importorskip("pandas") | ||
| check_array(pd.Series([1, 2, 3]), ensure_2d=False, warn_on_dtype=True) |
There was a problem hiding this comment.
maybe assert that the output is equal to array([1, 2, 3])?
ogrisel
left a comment
There was a problem hiding this comment.
LGTM as well besides the CI failure on style :) Too many reviews.
|
hopefully good now ;) |
doc/whats_new/v0.20.rst
Outdated
| :class:`decomposition.IncrementalPCA` when using float32 datasets. | ||
| :issue:`12338` by :user:`bauks <bauks>`. | ||
|
|
||
| - |Fix| Calling :func:`utils.check_array` on pandas `Series`, which |
There was a problem hiding this comment.
(I think you can use `pandas.Series` for an intersphinx link)
There was a problem hiding this comment.
I was wondering after I saw the edit. Would be cool ;)
|
thanks! |
Reference Issues/PRs
Example: Fixes #12622
What does this implement/fix? Explain your changes.
Can't call set on zero-ndim array.
does this need a whatsnew? probably?