-
-
Notifications
You must be signed in to change notification settings - Fork 212
Description
Description
I often see users struggle with interpreting what an openml object (e.g. a dataset or a pipeline) really is and what it contains. For beginning users, it is also unclear what they can do to get more insight. Nobody likes to run print(vars(flow), and going to the website to get more information is a bit troublesome (and not obvious).
What about we provide a summary() function for dataset, tasks, flows, and runs that provides a quick but rich overview of the object?
Expected Results
For instance, running something like
import openml
data = openml.flows.get_dataset(1)
data.summary()could output a snippet of the description, a small table with statistics, maybe even a small table with the first couple of rows and columns.
Likewise, running
flow = openml.flows.get_flow(1)
flow.summary()Could print out a list (or graph) of the pipeline components.
Likewise, task.summary() could return the task details, and run.summary() could return some run details and evaluation results.
In a notebook environment, you could even include some nice pandas tables and plots, but that will always work.