Skip to content

Whitespace stripped from XML files can lead to congruence #1125

@PGijsbers

Description

@PGijsbers

We use xmltodict.parse with the default strip_whitespace=True which can lead to a scenario where the features have categories that don't match the ARFF file categories (e.g., '50000.+' in features and ' 50000.+' in the data).
In principle it's an easy fix, but we should take care to test we don't break anything, and I'd propose to check if this "bug" can also lead to issues in reading other XML files.

For more info, see openml/automlbenchmark#350 (comment)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions