Skip to content

Support nullable pandas dtypes in confusion_matrix #25635

@tamargrey

Description

@tamargrey

Describe the workflow you want to enable

I would like to be able to pass the nullable pandas dtypes ("Int64", "Float64", "boolean") into sklearn's confusion_matrix function. Because the dtypes become object dtype when converted to numpy arrays we get ValueError: Classification metrics can't handle a mix of unknown and binary targets:

Repro with sklearn 1.2.1:

    import pandas as pd
    import pytest
    from sklearn.metrics import confusion_matrix

    for dtype in ["Int64", "Float64", "boolean"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        with pytest.raises(ValueError, match="Classification metrics can't handle a mix of unknown and binary targets"):
            confusion_matrix(y_true, y_predicted)

Describe your proposed solution

We should get the same behavior as when int64, float64, and bool dtypes are used, which is no error:

    import pandas as pd
    from sklearn.metrics import confusion_matrix

    for dtype in ["int64", "float64", "bool"]:
        y_true = pd.Series([1, 0, 0, 1, 0, 1, 1, 0, 1], dtype=dtype)
        y_predicted = pd.Series([0, 0, 1, 1, 0, 1, 1, 1, 1], dtype="int64")

        confusion_matrix(y_true, y_predicted)

Describe alternatives you've considered, if relevant

Our current workaround is to convert the data to numpy arrays with the corresponding dtype that works prior to passing it into confusion_matrix

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions