Create OpenMLBase, have most OpenML objects derive from it #828

PGijsbers · 2019-10-15T12:18:50Z

Several OpenML entities are sharing a lot of code in theory, but have copy paste duplicate code in practice. This PR introduces a base class from which OpenML entities can derive and share common logic. In this draft some bare basics are implemented and OpenMLFlow derives from it.

This will be further extended to make the other main entities (Dataset, Task, Run and Study) derive from the base class.
I will take a look at a few other functions shared between some or all of those types such as _to_xml and publish.

edit: looks like unit tests fail due to illegal type annotation for Py3.6, please ignore. The Flow unit tests worked locally, will fix it to be Py3.6 compatible.

edit: Also fixes #433

codecov-io · 2019-10-16T07:32:02Z

Codecov Report

❗ No coverage uploaded for pull request base (develop@23d4e6f). Click here to learn what that means.
The diff coverage is 83.13%.

@@            Coverage Diff             @@
##             develop     #828   +/-   ##
==========================================
  Coverage           ?   88.55%           
==========================================
  Files              ?       37           
  Lines              ?     4248           
  Branches           ?        0           
==========================================
  Hits               ?     3762           
  Misses             ?      486           
  Partials           ?        0

Impacted Files	Coverage Δ
openml/evaluations/evaluation.py	`66.66% <ø> (ø)`
openml/flows/functions.py	`87.42% <ø> (ø)`
openml/setups/setup.py	`44% <ø> (ø)`
openml/study/functions.py	`88.23% <0%> (ø)`
openml/runs/trace.py	`90.75% <100%> (ø)`
openml/flows/flow.py	`93.96% <100%> (ø)`
openml/config.py	`90% <100%> (ø)`
openml/tasks/task.py	`83.89% <55.55%> (ø)`
openml/study/study.py	`72.5% <68.18%> (ø)`
openml/utils.py	`90.9% <85.71%> (ø)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23d4e6f...88e9cf0. Read the comment docs.

PGijsbers · 2019-10-16T08:47:29Z

For the __repr__ header I now parse the class names, which gives nearly identical output with what was before, but is generic and shared between all child classes. For tasks there is now also more information in the header, instead of "OpenML Task" it now says the full task type, e.g.:

OpenML Classification Task
==========================
Task Type...........: Supervised Classification
Task ID.............: 59
Task URL............: https://www.openml.org/t/59
Estimation Procedure: crossvalidation
Evaluation Measure..: predictive_accuracy
Target Feature......: class
# of Classes........: 3
Cost Matrix.........: Available

I propose to remove the 'Task Type' row from the body of the __repr__.

…ook anyway.

…ect.

mfeurer

This looks really good! I have one more suggestion, but we can really merge this.

mfeurer · 2019-10-16T12:23:56Z

openml/config.py

 # Default values are actually added here in the _setup() function which is
 # called at the end of this module
-server = _defaults['server']
+server = str(_defaults['server'])  # so mypy knows it is a string


you could also use mypy.cast to not have to use a comment here

I generally prefer this for simple types, as it is clear what is happening (whereas you have to be familiar with mypy.cast to immediately know what's being done and why).

mfeurer

It seems that there are right now three type issues which I unfortunately cannot handle from the web interface, could you please check these?

PGijsbers · 2019-10-17T06:40:47Z

I fixed the mypy errors by using getattr as mypy can't know the attributes exist in child classes. I think this is probably the cleanest way to do this.

PGijsbers · 2019-10-17T06:41:52Z

Looks like my local mypy reported different errors than CI. Great.

PGijsbers · 2019-10-17T08:26:41Z

@mfeurer I think this is good to go. The only failure on Appveyor seems like a server issue to me:

================================== FAILURES ===================================
__________________ TestRun.test_run_and_upload_randomsearch ___________________
[gw1] win32 -- Python 3.5.6 c:\miniconda35-x64\python.exe
self = <tests.test_runs.test_run_functions.TestRun testMethod=test_run_and_upload_randomsearch>
    def test_run_and_upload_randomsearch(self):
        randomsearch = RandomizedSearchCV(
            RandomForestClassifier(n_estimators=5),
            {"max_depth": [3, None],
             "max_features": [1, 2, 3, 4],
             "min_samples_split": [2, 3, 4, 5, 6, 7, 8, 9, 10],
             "min_samples_leaf": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
             "bootstrap": [True, False],
             "criterion": ["gini", "entropy"]},
            cv=StratifiedKFold(n_splits=2, shuffle=True),
            n_iter=5)
        # The random states for the RandomizedSearchCV is set after the
        # random state of the RandomForestClassifier is set, therefore,
        # it has a different value than the other examples before
        task_id = self.TEST_SERVER_TASK_SIMPLE[0]
        n_missing_vals = self.TEST_SERVER_TASK_SIMPLE[1]
        n_test_obs = self.TEST_SERVER_TASK_SIMPLE[2]
        run = self._run_and_upload_classification(
            clf=randomsearch,
            task_id=task_id,
            n_missing_vals=n_missing_vals,
            n_test_obs=n_test_obs,
>           flow_expected_rsv='12172',
        )
tests\test_runs\test_run_functions.py:607: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests\test_runs\test_run_functions.py:460: in _run_and_upload_classification
    sentinel=sentinel,
tests\test_runs\test_run_functions.py:424: in _run_and_upload
    raise e
tests\test_runs\test_run_functions.py:420: in _run_and_upload
    fold=0,
openml\runs\functions.py:332: in initialize_model_from_trace
    run_trace = get_run_trace(run_id)
openml\runs\functions.py:274: in get_run_trace
    'get')
openml\_api_calls.py:52: in _perform_api_call
    return _read_url(url, request_method, data)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
url = 'https://test.openml.org/api/v1/xml/run/trace/101829'
request_method = 'get', data = {'api_key': '610344db6388d9ba34f6db45a3cf71de'}
    def _read_url(url, request_method, data=None):
        data = {} if data is None else data
        if config.apikey is not None:
            data['api_key'] = config.apikey
    
        response = send_request(request_method=request_method, url=url, data=data)
        if response.status_code != 200:
>           raise _parse_server_exception(response, url, file_elements=None)
E           openml.exceptions.OpenMLServerException: No successful trace associated with this run. - None
openml\_api_calls.py:99: OpenMLServerException

Create OpenMLBase, have OpenMLFlow derive from it.

5343169

PGijsbers requested a review from mfeurer October 15, 2019 12:19

PGijsbers added 5 commits October 15, 2019 14:44

Derive ID and entity_letter based on class type.

cdfcf65

Add #433 open_in_browser.

b799c39

Use OpenMLBase in Dataset, Run, Task.

0430464

Use OpenMLBase for Study

9aed492

Update tag functions to take into account entity type.

393cce5

Quote OpenMLBase typing as it is only imported for type checking.

8d36941

PGijsbers added 4 commits October 16, 2019 10:48

Remove _repr_pretty_ as the default __repr__ prints pretty in a noteb…

b1cd25c

…ook anyway.

Move _to_xml to base

11a26b0

Fix bug, actually check for instance type to determine entity.

d623cb9

Provide list to task type description in task __repr__

cf34909

PGijsbers changed the title ~~Create OpenMLBase, have OpenMLFlow derive from it.~~ Create OpenMLBase, have most OpenML objects derive from it Oct 16, 2019

PGijsbers added 3 commits October 16, 2019 13:53

Move fetching id to derived classes.

bdbb7e2

Share base_url logic. Fix mypy warnings.

e058e41

Make child classes responsible for making sure _entity_letter is corr…

5fbf8b2

…ect.

mfeurer approved these changes Oct 16, 2019

View reviewed changes

PGijsbers and others added 3 commits October 16, 2019 15:13

Docstring and type hint changes.

324d1bb

PEP8

23ada6a

PEP8

321252c

mfeurer requested changes Oct 16, 2019

View reviewed changes

Fix mypy issues

6f78aa7

PGijsbers added 3 commits October 17, 2019 08:45

Fix CI mypy issues.

fecce88

Dont use Py3.6 syntax

88a2630

Fix CI mypy issue

88e9cf0

mfeurer approved these changes Oct 17, 2019

View reviewed changes

PGijsbers marked this pull request as ready for review October 17, 2019 07:36

Merge branch 'develop' into OpenMLBase

15e5de6

mfeurer merged commit 43596e0 into develop Oct 17, 2019

mfeurer deleted the OpenMLBase branch October 17, 2019 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Create OpenMLBase, have most OpenML objects derive from it #828

Create OpenMLBase, have most OpenML objects derive from it #828

Uh oh!

PGijsbers commented Oct 15, 2019 •

edited

Loading

Uh oh!

codecov-io commented Oct 16, 2019 •

edited

Loading

Uh oh!

PGijsbers commented Oct 16, 2019

Uh oh!

mfeurer left a comment

Uh oh!

mfeurer Oct 16, 2019

Uh oh!

PGijsbers Oct 16, 2019

Uh oh!

mfeurer left a comment

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Create OpenMLBase, have most OpenML objects derive from it #828

Create OpenMLBase, have most OpenML objects derive from it #828

Uh oh!

Conversation

PGijsbers commented Oct 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Oct 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PGijsbers commented Oct 16, 2019

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

mfeurer Oct 16, 2019

Choose a reason for hiding this comment

Uh oh!

PGijsbers Oct 16, 2019

Choose a reason for hiding this comment

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

PGijsbers commented Oct 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PGijsbers commented Oct 15, 2019 •

edited

Loading

codecov-io commented Oct 16, 2019 •

edited

Loading