We use xmltodict.parse with the default strip_whitespace=True which can lead to a scenario where the features have categories that don't match the ARFF file categories (e.g., '50000.+' in features and ' 50000.+' in the data).
In principle it's an easy fix, but we should take care to test we don't break anything, and I'd propose to check if this "bug" can also lead to issues in reading other XML files.
For more info, see openml/automlbenchmark#350 (comment)