Skip to content

Quick summaries of openml objects #653

@joaquinvanschoren

Description

@joaquinvanschoren

Description

I often see users struggle with interpreting what an openml object (e.g. a dataset or a pipeline) really is and what it contains. For beginning users, it is also unclear what they can do to get more insight. Nobody likes to run print(vars(flow), and going to the website to get more information is a bit troublesome (and not obvious).

What about we provide a summary() function for dataset, tasks, flows, and runs that provides a quick but rich overview of the object?

Expected Results

For instance, running something like

import openml
data = openml.flows.get_dataset(1)
data.summary()

could output a snippet of the description, a small table with statistics, maybe even a small table with the first couple of rows and columns.

Likewise, running

flow = openml.flows.get_flow(1)
flow.summary()

Could print out a list (or graph) of the pipeline components.

Likewise, task.summary() could return the task details, and run.summary() could return some run details and evaluation results.

In a notebook environment, you could even include some nice pandas tables and plots, but that will always work.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions