-
-
Notifications
You must be signed in to change notification settings - Fork 211
Description
Description
In datasets.get_dataset(data_id) the default is currently to always download the dataset:
https://openml.github.io/openml-python/master/generated/openml.datasets.get_dataset.html#openml.datasets.get_dataset
This is problematic for large datasets - it takes a long time and may cause out-of-memory errors. Sometimes we need to look at the full meta-data (of many datasets) without downloading the data. We can do that now with the option download_data=False, but it feels like this should be the default. Some users may also be unaware of this option or the fact that get_dataset will actually download the data and consume resources.
A simple solution would be to make download_data=False the default.
Steps/Code to Reproduce
import openml
openml.datasets.get_dataset(41081)Expected Results
The dataset metadata within seconds
Actual Results
A long time waiting until the dataset has downloaded and parsed.
Versions
macOS-10.16-x86_64-i386-64bit
Python 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ]
NumPy 1.19.5
SciPy 1.5.2
Scikit-Learn 0.23.2
OpenML 0.11.1dev