Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Oct 21, 2022

Debugging the errors of some dataset unit tests were particularly difficult, because the real reason for their errors (malformed XML) got swallowed as if the dataset is still in preprocessing (hence the change in test_dataset_functions). By our own definition, we should only catch OpenMLServerException there ("exception for when the result of the server was not 200").

The malformed XML error was still hard to debug, since only the xml parse ExpartError was provided, which does not provide information on the XML file. Restructuring the _get_dataset_description function has two purposes:

  • now a new XML file will be downloaded, even if a local cached description xml got malformed somehow, and
  • check that an XML file from the server is properly formed before storing it, and reporting a clearer error otherwise (provide the exact endpoint that serves the malformed URL).

I ranpytest pytest tests/test_datasets/test_dataset_functions.py::TestOpenMLDataset locally, and all tests still pass (except those with known server or parquet issues).

@PGijsbers PGijsbers requested a review from mfeurer October 21, 2022 13:22
@PGijsbers PGijsbers merged commit e6250fa into develop Oct 24, 2022
@PGijsbers PGijsbers deleted the improve_error_message_bad_dataset_description branch October 24, 2022 17:58
PGijsbers added a commit to Mirkazemi/openml-python that referenced this pull request Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants