Skip to content

fetch_openml difference between pandas and liac-arff parser #23381

@lesteve

Description

@lesteve

Seen in a scipy-dev build.

There are additional quotes in the pandas parser case.

cc @glemaitre

import numpy as np
from sklearn.datasets import fetch_openml

mice_pandas = fetch_openml(name='miceprotein', version=4, parser="pandas")
mice_liac_arff = fetch_openml(name='miceprotein', version=4, parser="liac-arff")

assert np.testing.assert_array_equal(mice_pandas.target, mice_liac_arff.target)

Output:

AssertionError: 
Arrays are not equal

Mismatched elements: 1080 / 1080 (100%)
 x: array(["'c-CS-m'", "'c-CS-m'", "'c-CS-m'", ..., "'t-SC-s'", "'t-SC-s'",
       "'t-SC-s'"], dtype=object)
 y: array(['c-CS-m', 'c-CS-m', 'c-CS-m', ..., 't-SC-s', 't-SC-s', 't-SC-s'],
      dtype=object)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions